The network protocols behind server push, online gaming, and email

The network protocols behind server push, online gaming, and email


Now we will delve into the key network protocols and their role in different applications. The focus is on understanding how these protocols shape how we communicate and interact on the Internet.

We have talked a lot about network protocols before, and now we will delve into the key network protocols and their role in different applications. The focus is on understanding how these protocols shape how we communicate and interact on the Internet. We will delve into the following areas:

WebSockets

In the previous discussion, we looked at HTTP and its role in typical request-response interactions between clients and servers. HTTP performs well in most cases, especially when the response is immediate. However, in cases where the server needs to actively push updates to the client, especially if these updates depend on events that the client cannot predict (such as actions by other users), HTTP may not be the most efficient approach. This is because HTTP is basically a pull-based protocol and the client must initiate all requests. So, how do you get the server to push data to the client without the client having to predict and request every update? There are generally four ways to handle this type of push communication, as shown in the figure below.

1. Short polling

This is the most basic method. In this approach, the client, usually a web application running in our browser, continuously sends HTTP requests to the server. Imagine this scenario: We log into a web application and are asked to scan a QR code with our smartphone. This QR code is usually used for a specific action, such as authentication or starting a process. The web application doesn't know when we will scan the QR code. Therefore, it will send a request to the server every 1-2 seconds to check the status of the QR code. Once we scan the QR code with our smartphone, the server recognizes the scan and then sends back the updated status on the next check request to the web application. In this way, we will get a response within the next 1-2 seconds after scanning the QR code. Because of this frequent checking, we call this method "short polling".

There are two problems with this approach:

  • It sends lots of HTTP requests, hogging bandwidth and increasing server load.
  • In the worst case, we may have to wait up to 2 seconds for a response, causing a noticeable delay.

2. Long polling

Long polling solves the problem of short polling by setting a longer HTTP request timeout. It can be understood like this: we adjust the timeout to 30 seconds. If we scan the QR code within this time period, the server will send a response immediately. This approach significantly reduces the number of HTTP requests.

However, long polling is not without challenges. Even though long polling reduces the number of requests, each open request still requires a connection to the server. If there are many clients, this can put a strain on server resources.

3. WebSocket

Short polling and long polling are suitable for simple tasks, such as scanning QR codes. But for tasks that are complex, data-intensive and require real-time interaction, such as online games, a more efficient solution is needed - this is WebSocket.

TCP essentially allows bidirectional data flow, enabling the client and server to send data to each other simultaneously. However, HTTP/1.1 over TCP does not take full advantage of this capability. In HTTP/1.1, data transfer is usually sequential - one party sends data, and the other party responds. This design is sufficient for web interaction, but insufficient for applications such as online games that require real-time interaction. WebSocket is another TCP-based protocol that fills this gap by allowing full-duplex communication over a single connection. We'll go into more detail later.

4. SSE (Server-Sent Events)

SSE, or Server Push Events, is suitable for specific use cases. When a client establishes an SSE connection, the server keeps this connection open to continuously send updates. This setup is ideal for situations where the server needs to push data to the client regularly, and the client only needs to receive data without sending information to the server. A classic example is real-time stock market data updates. Using SSE, the server can push real-time data to the client every time there is an update, without having to send a request every time there is an update. It is worth noting that unlike WebSocket, SSE does not support two-way communication, so it is not suitable for use cases that require two-way interaction.

How to establish a WebSocket connection

To establish a WebSocket connection, we need to include specific fields in the HTTP header that tell the browser to switch to the WebSocket protocol. A randomly generated Base64 encoded key (Sec-WebSocket-Key) is sent to the server.

Request header:

Connection: Upgrade 
Upgrade: WebSocket
Sec-WebSocket-Key: T2a6wZlAwhgQNqruZ2YUyg==
  • 1.
  • 2.
  • 3.

Server response headers:

HTTP/1.1 101 Switching Protocols
Sec-WebSocket-Accept: iBJKv/ALIW2DobfoA4dmr3JHBCY=
Upgrade: WebSocket
Connection: Upgrade
  • 1.
  • 2.
  • 3.
  • 4.

Status code 101 indicates that the protocol is switching. After this additional handshake, the WebSocket connection is established, as shown in the figure below:

9f2ff945-1c60-4e43-9252-474e74dc4fe7_1600x1303.png

WebSocket message

Once HTTP is upgraded to WebSocket, the client and server will exchange data in frames. Let’s take a look at what the data looks like:

Opcode (Opcode) is a 4-bit field indicating the type of frame data.

  • "1" represents a text frame.
  • "2" represents a binary frame.
  • "8" indicates the signal to close the connection.

The payload length can be a 7-bit field, or can be extended to contain an extended payload length. If both length fields are fully utilized, the payload length can represent several terabytes of data.

WebSocket is suitable for scenarios that require frequent interaction between the client and the server, such as online games, chat rooms, and collaborative editing applications.

RPC

RPC allows to execute functions on different services. From the point of view of the calling program, it appears to be executing the function locally. The following diagram shows the difference between local procedure call and remote procedure call. We can deploy modules such as order management and payment on the same process or on different servers. When deployed in the same process, this is a local function call. When deployed on a different server, this is a remote procedure call.

Why do we need RPC? Can't we use HTTP to communicate between services? Let us compare RPC and HTTP in the table below.

 The main advantage of RPC over HTTP is its lightweight message format and superior performance. For example, gRPC is an example, it runs on HTTP/2 and due to this, it has better performance.

Next, we'll explore another important application layer protocol - RPC (Remote Procedure Call).

Let's understand how gRPC works step by step:

  • Step 1: The client initiates a REST call. The request body is usually represented in JSON format.
  • Steps 2 to 4: After receiving the REST call, the order service (acting as a gRPC client) converts it into the appropriate format and initiates an RPC call to the payment service. gRPC encodes the client stub into a binary format and sends it to the underlying transport layer.
  • Step 5: gRPC sends the data packet to the network via HTTP2. Binary encoding and network optimizations make gRPC up to five times faster than JSON.
  • Steps six to eight: After the payment service (acting as a gRPC server) receives the packet, it decodes it and calls the server application.
  • Steps 9 to 11: The results returned by the server application are encoded and sent back to the transport layer.
  • Steps 12 to 14: After the order service receives the data packet, it decodes it and sends the result to the client application.