An architectural overview for WebRTC — A protocol for implementing video conferencing

It’s no secret that remote work has been getting a lot more popular since the beginning of COVID era, and even though vaccines are already here, many companies and teams have fully embraced the idea of working online and are not planning to let go. As a result, the demand for online collaboration tools has been increasing; especially video conferencing solutions. For reference, Zoom stock price has jumped from 66.64USD in January 2020 to 559.00USD in October 2020, an increase of ~838% in 10 months:

Source: Google
Source: Wowza

WebRTC overview

WebRTC is a protocol that was designed to enable direct communication between browsers. It includes a set of classes and methods to standardize the process, and it is available ever since Chrome 23:

See: RTCPeerConnection, the most primitive WebRTC class

Peer connections

WebRTC is based on p2p architecture (peer to peer); the participants of the call are responsible for transferring data from one end to another, without relying on a middleman (for the most part, I’ll cover that later). If one participant disconnects for whatever reason, the others will keep broadcasting data; unlike traditional communication, where data is no longer streamed if connection to the server is lost. In addition, peers are geographically much closer to one another, so the data doesn’t have a long distance to travel.

A system of 4 peers

Signaling server

As the call goes, I’ll have to keep track of people who join or leave the conversation, and create or dispose connections respectively. To keep track of these events, we need to have a signaling server.

When I join a conversation, I broadcast it with the signaling server so everyone can know about it

SDP

Once we know that someone has joined the conversation, we need exchange information about each other’s systems inorder to establish a connection. This information is based on a protocol called SDP (session description protocol), and it includes details about its belonging peer e.g. what agent is it using, what hardware does it support, what type of media would it like to exchange, etc. The SDP config is a simple key-value object:

Source: researchgate.net
Connecting 2 peers with their SDPs

ICE candidates

A peer might have many communication transports, not just one. Someone might have multiple private IPs/ports, and/or multiple public IPs/ports, and/or various protocols, and/or one or more reverse proxies, etc. As soon as we create an SDP offer, WebRTC will try to find every possible communication transport to the browser, which is known as ICE candidate (interactive connection establishment):

An actual RTCIceCandidate instance
As soon as we create an SDP, WebRTC starts looking for ICE candidates

NAT

Today, most machines aren’t connected directly to the global network, and they most likely go through a NAT layer (Network Address Translation). Your machine’s private IP/port will literally be translated to a different public IP/port when transporting through the router.

Illustration: A peer with destination IP/port, tries to establish a connection with us by making a request to one of our router’s public IPs/ports, which will then be translated to our machine’s private IP/port.
  • Public IP/port
  • Destination IP/port

STUN

If our machine is connected to a NAT layer, we need our public IP/port to create ICE candidates. Because of that, WebRTC gives us the ability to specify a STUN server URL (Session Traversal Utils for NAT) when initializing a WebRTC connection.

STUN is a standardized set of methods, including a network protocol, for traversal of network address translator (NAT) gateways in applications of real-time voice, video, messaging, and other interactive communications. — Wikipedia

Practically speaking, all it really does is return the public IP/port. So this is what happens when we try to establish a connection between 2 peers:

  • Peer A will get information about its public IP/port using the STUN server.
  • Peer A will send that information to peer B using the signaling server.
  • Peer B will get that information and will try to establish a connection with peer A.
  • Same goes the other way around.
Source: MDN

Hole punching

The other 2 restricted cone NATs are similar to a full cone NAT in the way they connect with peers, but they impose a small limitation. They need to be aware of their public IPs/ports, and they also need to make sure that the destination IP/port of the incoming request exist in the NAT table. Unlike full cone NAT, where the router basically trusts everyone, the restricted cones will only trust those who it tried to initiate a connection establishment with.

  • Peer A will get information about its public IP/port using the STUN server.
  • Peer A will send that information to peer B using the signaling server.
  • Peer B will get that information and will try to establish a connection with peer A.
  • Peer B will fail to establish a connection, but it will store peer A public information in its NAT table.
  • Peer A will try to establish a connection with peer B. Since peer A already exists in peer B NAT table, the connection is accepted.
  • Peer A will store public information about peer B.
  • Peer B can now establish a connection with peer A.

TURN

When we’re dealing with symmetric NAT we can completely throw p2p and direct browser communication to the trash. Let’s observe the connection process first:

  • Peer A will get information about its public IP/port using the STUN server.
  • Peer A will send that information to peer B using the signaling server.
  • Peer B will get that information and will try to establish a connection with peer A.
  • Peer B will fail to establish a connection, but it will store peer A public information in its NAT table.
  • Peer A will try to establish a connection with peer B. However, peer B will reject peer A, because the public information stored in its NAT table is actually different than the one it actually received.

Traversal Using Relays around NAT (TURN) is a protocol that assists in traversal of network address translators (NAT) or firewalls for multimedia applications. — Wikipedia

TURN literally goes around just to avoid direct communication. It uses a reverse proxy, and this way the public IP/port remain constant, and we can establish a connection. Not only this will make the connection slower and less efficient, but it will also make it a lot more expensive. Hence, A TURN ICE will always be prioritized the lowest.

Source: MDN
  • Data Channels — how can you send data JSONs across your peers using WebRTC.
  • Reducing latency and costs of TURN transactions with multi-regional deployment.

Eytan is a JavaScript artist who comes from the land of the Promise(). His hobbies are eating, sleeping; and open-source… He loves open-source.