ONE MINUTE TO UNDERSTAND TCP STICKY BAG UNPACKING, DO YOU UNDERSTAND?

ONE MINUTE TO UNDERSTAND TCP STICKY BAG UNPACKING, DO YOU UNDERSTAND?


TCP IS A PROTOCOL FOR "STREAMING", WHICH IS A LONG STRING OF BINARY DATA WITHOUT BOUNDARIES. IN THE ACTUAL TRANSMISSION PROCESS, TCP WILL SPLIT OR ASSEMBLE THE PACKETS ACCORDING TO THE NETWORK CONDITIONS, IF THE SERVICE DOES NOT DEFINE A CLEAR BOUNDARY RULE, THERE WILL BE A PHENOMENON OF STICKY PACKET UNPACKING IN THE APPLICATION LAYER OF THE BUSINESS.

usually, in the process of network programming, we may encounter such a phenomenon: the client sends a long list of messages, and the messages accepted by the server are kneaded together or split, which will cause the message to be difficult to understand correctly.

for example, one day you especially want to drink milk tea, look at the takeaway, "a little" milk tea looks good, (a little quick to give me money doge), so you sent a message in the group, want to find a few people to spell milk tea:

is a little milk tea anyone drinking?

as a result, colleagues in the group replied:

isn't it already three o'clock now?

you feel inexplicable, look at a colleague's mobile phone, he received a message like this two lines:

a little

does anyone drink milk tea?

haha, told a cold joke. in technical terms, this phenomenon is "unpacking", and we will continue to talk about it.

TCP STICKY UNPACKING PHENOMENON

sticky packet unpacking problems are generally problems at the application layer, which can occur at the data link layer, network layer, and transport layer. most of our daily network application development is carried out at the transport layer, so this article focuses on the problem of unpacking at the transport layer.

THERE ARE TWO PROTOCOLS AT THE TRANSPORT LAYER THAT WE ARE ALL FAMILIAR WITH: UDP AND TCP, UDP HAS A MESSAGE PROTECTION BOUNDARY, AND THERE IS NO STICKY PACKET UNPACKING PROBLEM, SO THE STICK PACKET UNPACKING PROBLEM ONLY OCCURS IN THE TCP PROTOCOL.

the following is a simple example to explain what is sticky packing and unpacking.

assuming that the client sends two packets in succession to the server, represented by packet1 and packet2, there may be four types of data received by the server:

(1) in the first case, the server receives two packages in order, that is, there is no sticky package and unpacking phenomenon.

(2) IN THE SECOND CASE, THE SERVER ONLY RECEIVES ONE PACKET, DUE TO THE TCP GUARANTEE DELIVERY CHARACTERISTICS, SO THIS PACKET CONTAINS THE INFORMATION OF THE TWO PACKETS SENT BY THE CLIENT, THIS PHENOMENON IS A STICKY PACKET. UNLESS THERE ARE EXPLICIT RULES FOR PACKETS SENT BY THE CLIENT, THE SERVER DOES NOT KNOW THE BOUNDARIES OF THE TWO PACKETS AND IS DIFFICULT TO PROCESS THE DATA.

(3) In the third case, the server receives three packets, and the Package1 packet is split into two packets: Package1.1 and Package1.2, this phenomenon is unpacking, as for the reasons for unpacking, as will be discussed below, the server receives the disassembled packets and is difficult to process.

(4) in the fourth case, some large packets are split into small packets, and small packets are glued together with other packets, which is a combination of the above sticky packets and unpacking.

THE CAUSE OF TCP STICKY PACKET UNPACKING

TCP IS A PROTOCOL FOR "STREAMING", WHICH IS A LONG STRING OF BINARY DATA WITHOUT BOUNDARIES. TCP AS A TRANSPORT LAYER PROTOCOL DOES NOT UNDERSTAND THE SPECIFIC MEANING OF THE UPPER LAYER OF BUSINESS DATA, IT WILL BE BASED ON THE ACTUAL SITUATION OF TCP BUFFER PACKET DIVISION, SO IN THE BUSINESS IS CONSIDERED A COMPLETE PACKET, MAY BE SPLIT INTO MULTIPLE PACKETS BY TCP TO SEND, MAY ALSO BE MULTIPLE SMALL PACKETS ENCAPSULATED INTO A LARGE PACKET SENT, WHICH WILL OCCUR THE PROBLEM OF PACKET UNPACKING.

FOR EXAMPLE, THE TCP BUFFER IS 1024 BYTES IN SIZE, IF THE AMOUNT OF DATA SENT BY THE APPLICATION REQUEST IS RELATIVELY SMALL, NOT REACHING THE BUFFER SIZE, TCP WILL MERGE MULTIPLE REQUESTS INTO THE SAME REQUEST TO SEND, FROM A BUSINESS POINT OF VIEW, THIS IS THE "STICKY PACKET";

IF THE AMOUNT OF DATA SENT BY AN APPLICATION REQUEST IS RELATIVELY LARGE AND EXCEEDS THE BUFFER SIZE, TCP WILL SPLIT IT INTO MULTIPLE SENDS, WHICH IS "UNPACKING", THAT IS, SPLITTING A LARGE PACKET INTO MULTIPLE PACKETS FOR SENDING.

TCP STICKY UNPACKING WORKAROUND

TCP IS STREAM-ORIENTED, THERE ARE STICKY PACKETS AND UNPACKING, SO AS AN APPLICATION, HOW TO SPLIT OR MERGE MEANINGFUL INFORMATION FROM THIS CONTINUOUS STREAM OF DATA? THERE ARE USUALLY SOME COMMON METHODS:

(1) the sender adds a packet header to each packet, and the header should contain at least the length of the packet, so that the receiver can read the length field of the packet header by reading the length field of the packet header.

as shown in the following figure, the actual length of the package is preceded by each package.

(2) the sender encapsulates each packet as a fixed length (insufficient can be filled by 0), so that the receiver naturally splits each packet every time it reads fixed-length data from the receive buffer.

in the following figure, each packet has a fixed length of 4, and the receiving end is easy to distinguish.

(3) you can set a boundary between packets, such as adding a special symbol, so that the receiving end can split different packets through this boundary.

as shown in the following figure, add special characters after each package: /

How the Netty Framework solves the problem of sticky unpacking

As a high-performance Java network programming framework, Netty is not only deeply encapsulated based on Java NIO, but also effectively handles the data transfer between the client and the server.

As mentioned earlier, TCP transmission will have the phenomenon of sticking and unpacking, Netty has built-in a variety of data stream codecs for this point, and the client server can solve this problem by data transmission according to the agreed rules.

Netty offers several codecs out of the box:

(1) FixedLengthFrameDecoder fixed-length decoder

(2) DelimiterBasedFrameDecoder specifies the delimiter decoder

(3) LengthFieldBasedFrameDecoder is based on the packet length decoder

(4) etc... i will not enumerate them here

brief summary

TCP IS A PROTOCOL FOR "STREAMING", WHICH IS A LONG STRING OF BINARY DATA WITHOUT BOUNDARIES. IN THE ACTUAL TRANSMISSION PROCESS, TCP WILL SPLIT OR ASSEMBLE THE PACKETS ACCORDING TO THE NETWORK CONDITIONS, IF THE SERVICE DOES NOT DEFINE A CLEAR BOUNDARY RULE, THERE WILL BE A PHENOMENON OF STICKY PACKET UNPACKING IN THE APPLICATION LAYER OF THE BUSINESS.

FOR THE PHENOMENON OF TCP STICKY PACKET UNPACKING, THE COMMON SOLUTION IDEAS ARE AS FOLLOWS:

(1) the sender adds the packet header to each packet.

(2) the sender encapsulates each packet as a fixed length.

(3) boundaries can be set between packets.

To solve the problem of sticky unpacking, the Netty Framework also provides a number of out-of-the-box codecs, greatly simplifying the difficulty of network programming to solve such problems.