A brief introduction to the WebSocket protocol - RFC 6455

2023.12.30



Internet
These methods have many problems in terms of efficiency and network bandwidth utilization. The WebSocket protocol came into being, providing simple bidirectional data transmission capabilities.

Labs Guide

Before the advent of WebSockets, the common bidirectional data exchange methods between the client and the server of a web application (instant chat, multi-person collaboration) were short polling, long polling, and SSE (Server-Sent). Events, the server sends events). These methods have many problems in terms of efficiency and network bandwidth utilization. The WebSocket protocol came into being, providing simple bidirectional data transmission capabilities.

Part 01, Introduction

WebSocket is a network communication protocol that performs full-duplex communication over a TCP connection. It was born in 2009 and was selected by the IETF (The) in 2011 Internet Engineering Task Force) and publish RFCs 6455 Internet Standards Tracking Document, 2016 released RFC7936 document to supplement. The WebSocket API is also standardised by the W3C.

ImageImage

The WebSocket protocol was originally designed to replace HTTP communication, as the RFC6202 mentions that the HTTP protocol was not originally intended for bidirectional data communication. The WebSocket protocol does not completely abandon HTTP, it achieves the goal of bidirectional communication in the existing environment based on the HTTP basic service. As the RFC As mentioned in 6455, WebSocke's design philosophy is a framework with minimal constraints, the only constraint being that the protocol is frame-based rather than stream-based, and supports both Unicode text and binary frames.

Part 02, Handshake

The WebSocket protocol is divided into three parts: handshake, message transmission, and handshake, as shown in the following figure.

Image

2.1 Jianlian Handshake - Client

In order to be compatible with applications and proxies on the HTTP server side, the client connection handshake (including connections via proxy or TLS encrypted tunnel) is a valid HTTP upgrade request that follows the definition in the RFC2616, and the client connection handshake request header field is shown in the following figure. In addition, once the client has sent a connection handshake, it must wait for a response from the server.

Image

- Request URI

Format, ws-URI = "ws:" "//" host [ ":" port ] path [ "?" query ] or wss-URI = "wss:" "//" host [ ":" port ] path [ "?" query ], any invalid value will cause the connection to fail

- Request line

Must be a GET method with at least 1.1 HTTP version

- Upgrade

The value must be "websocket", ASCII value, and case-insensitive

- Connection

The value must contain "Upgrade", an ASCII value, and is not case-sensitive

- Sec-WebSocket-Key

The client is a randomly generated 16-byte base64-encoded string

- Origin

Source address, required for browser client, optional for non-browser client

- Sec-WebSocket-Protocol

One or more comma-separated subprotocols supported by the client, in order of priority

- Sec-WebSocket-Version

The client intends to use the protocol version number, which must be 13. Historical versions 9, 10, 11, and 12 are no longer valid values

- Sec-WebSocket-Extensions

The client intends to use the protocol extension. At present, HyBi Working Group has carried out multiplexing and compression expansion, and the multiplexing extension realizes the sharing of the underlying TCP connection. The compression extension adds compression capabilities to the WebSocket protocol, such as x-webkit-deflate-frame

2.2 Jianlian handshake - server

When a client establishes a WebSocket connection with the server, the server must reply to the client's request to establish a handshake, as shown in the following figure.

ImageImage

- Status line

HTTP/1.1 101 Switching Protocols, which allows clients to establish connections. If the server wants to stop processing the client handshake, it can return an HTTP response with an error code such as 401

- Upgrade

The value must be "websocket"

- Connection

The value must contain "Upgrade"

- Sec-WebSocket-Accept

This value is generated if the server accepts the client connection. Start with the client request header The Sec-WebSocket-Key value is concatenated with the globally unique identifier 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 defined in the RFC4122 document, and then SHA-1 hashed and base64-encoded to obtain the value

- Sec-WebSocket-Protocol

The protocol to be used by the server is selected from the Sec-WebSocket-Protocol sent by the client, if the server does not support it, the value is empty

- Sec-WebSocket-Extensions

The server is intended to use protocol extensions

2.3 Disconnect the handshake

Both the client and the server can send a control frame containing the specified control sequence (Close control frame) to start closing the handshake. When a party receives an off control frame, it simply sends a close frame in response and then closes the connection. The TCP close handshake is not always a reliable end-to-end handshake in the presence of intercept proxies, etc., and the above close handshake process is intended to complement the TCP close handshake (FIN/ACK).

Part 03, Data Transfer

Once the client and the server connection handshake are successful, the two parties can start the data transfer. This is a two-way communication channel that follows RFC Based on the concept of "message" in the 6455 specification, both parties can send data independently and at will. A message contains one or more data frames (not necessarily corresponding to messages in the network layer), and the Websocket frame format is shown in the following figure.

Image

3.1 Frame Structure

- FIN

1 bit, indicates whether it is the last shard of a message.

- RSV1, RSV2, RSV3

1 bit, the default value is 0 when the extended function is not in use.

- Opcode

4 bits, which defines the "Playload data" data type.

  • 0 (decimal): Consecutive frames
  • 1: Text frame
  • 2: Binary frame
  • 3-7: Reserve non-control frames
  • 8: Connection closes frame
  • 9: heartbeat ping frame
  • 10: Heartbeat pong frame
  • 11-15: reserved control frames

- MASK

1 bit, whether to block "Playload data", 1 yes, 0 no.

- Payload length

7 bits, or 7 + 16 bits, or 7 + 64 bits, indicates Payload The length of the data. Specifically, the payload length is less than 125, and the data length is represented by the payload length. Payload length is equal to 126, and the data length is represented by 16 bits after the payload length. The payload length is equal to 127, and the data length is Payload The 64 bits after length are indicated.

- Masking-key

32-bit, which holds the mask sent by the client. In order to prevent proxy cache contamination attacks, the RFC6455 requires that the mask must come from a strong source of entropy and be unpredictable. The conventional algorithm traverses the payload data in byte steps. For the ith byte of the payload data, Do i to 4 modulo to get j, and the value of the ith byte of the payload data after the mask cover is the original ith byte and the jth byte of the Masking-Key to do bitwise XOR operation.

- Payload data

Payload data is divided into two types: extended data is negotiated in the handshake stage and application data is used after extended data.

3.2 Control Frames

The control frame is determined by the Opcode value, and the opcode of the control frame currently defined by the protocol includes 0x8 (Close), 0x9 (Ping), and 0xA (Pong). The control frame must have a payload length of less than or equal to 125 bytes, for the Close control frame, the first 2 bytes of the payload represent the status code, and the remaining bytes represent the shutdown reason, as shown in the following figure.

Image

3.3 Message sharding

Message sharding refers to the concept of sending a "message" through multiple data frames. Message sharding allows you to send a message of unknown size without having to buffer the entire message. At the same time, message sharding, combined with the extension of the multiplexing protocol, can split messages into smaller segments to share the output channel.

In the protocol, the FIN bit of the start frame of the fragment message is 0, the opcode bit is non-0, the frame is a message fragment, the FIN bit of the middle frame is 0, the opcode bit is 0, and finally the end of the fragment is marked by the FIN bit 1 and the opcode bit is 0. The protocol requires that the sharded data frames be sent sequentially to the other end.

Part 04, Summary

WebSocke is designed on top of the TCP layer, and does not need to consider the length of the data. It can also be combined with HTTP/2 multiplexing through the extension function to make full use of bandwidth. Developers only need to process the message sharding logic sequentially in the server-side and client-side code.