The love-hate relationship between TCP and UDP

2022.09.07
The love-hate relationship between TCP and UDP
There are differences between TCP and UDP in terms of efficiency, segment, flow control, and connection management. These differences lead to different choices for application scenarios. Since each packet of TCP needs to be confirmed, TCP is not suitable for transmitting data. For scenarios such as this one, it is good to use UDP; for example, Ping and DNS Lookup, this type of operation only needs a simple request/return, and does not need to establish a connection, and UDP is enough.

Recently, yq in Qiaoxi District, Shijiazhuang has been more serious. Last week, I was locked in the company for three days. Washing and eating became a problem. It has been a week since the company applied to go home. In the vegetable segment, it is really about grabbing instead of buying. Carrefour supermarket products are gone online in two minutes and instantly. I don’t know how this group of people grabbed the food. Are they all in the circle? There are also endless nucleic acid tests every day, which makes people helpless and headaches, and the double-digit positive increase every day makes people numb and hopeless. I just received a notice today that you can go out of the unit and move your muscles...

Say something nice: ?

I have written a lot of articles about TCP and UDP, but I haven't talked about the difference between these two protocols. Let's talk about this issue in this article.

Regarding TCP and UDP, everyone must have seen such a picture.

picture

There is a little girl drinking water slowly from the mouth of the bottle, and it says reliable transmission below. The girl's clothes are not soaked by water. This picture is called TCP.

Then another little girl was holding a water bottle and pouring water down at a very fast speed. The girl's hair was messy, her face was red, and her clothes were soaked in water. This picture is called UDP.

I think that a programmer can roughly summarize the differences between these two transmission protocols in these two pictures (after all, the pictures are clearly written) and even many students have evil thoughts about UDP, you say the author is good Can't you draw a picture? You have to put a red face on your face and get your clothes wet. . . . . . .

Well, let's get down to business, the difference between TCP and UDP has always been the focus of the interview, and it is also the two protocols that are often used for various comparisons.

Differences in establishing connections

TCP connection needs to go through three handshakes, and TCP disconnection needs to go through four handshakes, which also means that TCP is a connection-oriented protocol. This connection does not use a network cable or a pipe to bind two communication parties together, but Is to establish a virtual communication channel.

The TCP three-way handshake process (the client sends a connection establishment request to the server):

picture

  • The server process is ready to receive TCP connections from the outside. Generally, it is done by calling bind, listen, and socket. This type of opening is considered passive open​. Then the server process is in the LISTEN state, waiting for a client connection request.
  • The client initiates an active open through connect​ and sends a connection request to the server. The synchronization bit in the request is SYN = 1, and an initial sequence number is selected at the same time, abbreviated as seq = x. The SYN segment is not allowed to carry data and only consumes a sequence number. At this point, the client enters the SYN-SEND state.
  • After the server receives the client connection, it needs to confirm the client's segment. In the acknowledgment segment, set both the SYN and ACK bits to 1. The confirmation number is ack = x + 1, and also chooses an initial sequence number seq = y for itself. Note that this segment also cannot carry data, but also consumes a sequence number. At this point, the TCP server enters the SYN-RECEIVED state.
  • After receiving the response from the server, the client needs to confirm the connection. The ACK in the acknowledgment connection is set to 1, the sequence number is seq = x + 1, and the acknowledgment number is ack = y + 1. TCP stipulates that this segment can carry data or not. If it does not carry data, the sequence number of the next data segment is still seq = x + 1. At this point, the client enters the ESTABLISHED (connected) state.
  • After the server receives the confirmation from the client, it also enters the ESTABLISHED state.

UDP is a datagram-oriented protocol, so UDP does not have the concept of connection at all, and there will be no three-way handshake to establish a connection.

After the data transmission is over, the two communicating parties can release the connection. After the data transmission is completed, both the client host and the server host are in the ESTABLISHED state, and then enter the process of releasing the connection.

(The client host actively closes the connection)

picture

The process of TCP disconnection is as follows:

  • The client application sends a message segment to release the connection, stops sending data, and actively closes the TCP connection. The client host sends a message segment to release the connection, the first FIN bit in the message segment is 1, does not contain data, and the sequence number bit seq = u, at this time, the client host enters the FIN-WAIT-1 (termination wait 1) stage.
  • After the server host receives the message segment sent by the client, it sends an acknowledgement response message, confirms that ACK = 1 in the response message, generates its own sequence number bit seq = v, ack = u + 1, and then the server host enters CLOSE -WAIT (close waiting) state, at this time, the connection in the direction of client host -> server host is released, and the client host has no data to send. At this time, the server host is in a semi-connected state, but the server host is still data can be sent.
  • After the client host receives the confirmation response from the server host, it enters the state of FIN-WAIT-2 (termination waiting 2). A segment waiting for the client to issue a connection release.
  • When the server host has no data to send, the application process will notify TCP to release the connection. At this time, the server host will send a disconnected segment. In the segment, ACK = 1 and the sequence number seq = w. Because some data may have been sent in between, seq is not necessarily equal to v + 1. ack = u + 1, after sending the disconnect request message, the server host enters the LAST-ACK (last acknowledgment) stage.
  • After the client receives the disconnection request from the server, the client needs to respond, and the client sends a disconnected segment. In the segment, ACK = 1, sequence number seq = u + 1, because the client After the connection is disconnected, no more data is sent, ack = w + 1, and then enter the TIME-WAIT (time waiting) state, please note that the TCP connection has not been released at this time. The client will enter the CLOSED​ state only after the time-waiting setting, that is, 2MSL​. The time MSL is called the Maximum Segment Lifetime.
  • After the server mainly receives the disconnection confirmation from the client, it will enter the CLOSED state. Because the server ends the TCP connection earlier than the client, and the entire connection disconnection process needs to send four message segments, the process of releasing the connection is also called four times of waving.

This connection doesn't exist for UDP, so it doesn't require four hand waves.

So to summarize: TCP is connection-oriented, it needs to maintain a virtual connection before data transmission, data transmission needs to be carried out on this virtual connection, and the connection needs to be disconnected after data transmission, while UDP transmission is not connection-oriented , UDP will not establish a connection when sending data, and will not care about the status of the receiving end.

Differences in reliability

One of the main comparisons between TCP and UDP is reliability. TCP is a reliable transport layer protocol, and UDP is an unreliable transport layer protocol. This reliability of TCP is mainly guaranteed by the following characteristics:

Reliability through serial number and reply number

The mutual communication between computer network hosts is very similar to the phone calls between two people in our daily life. This kind of conversation is usually in the form of a question and answer. If you say a word and receive no response, you usually need to speak again. Make sure that the other party hears you once. If the other party responds with a sentence to you, it means that he has heard your speech. This is a complete call flow (aside from establishing a connection, we focus on establishing a connection.) after).

"The other party's response to you" is called acknowledgment (ACK)​ in computer networks, TCP uses ACK to achieve reliable data transmission, that is, the sender will wait for the target host's response after sending the request, if If no response is received, the sender will retransmit the request after a period of time. Therefore, even if packets are lost during transmission, TCP can still achieve reliability through retransmission.

picture

The situation described above belongs to the sender's request loss, and another situation belongs to the response loss, that is to say, after the request is sent to the target host, the target host will send an ACK to the requester, and this ACK may also be lost. If the ACK is lost in the link , the requester does not receive the ACK from the target host after a period of time, and still chooses to retransmit the request without receiving the ACK.

picture

In addition to message loss, there is also a phenomenon of delayed arrival. Delayed arrival refers to the fact that after the sender sends a packet, the packet may be delayed due to network jitter or network congestion. The phenomenon that the response ACK from the host, or the target host, has not arrived at the sender. The criterion for judging this period of time is the retransmission time. Once the retransmission time has passed, the sender will retransmit the segment. It is very likely that after the retransmission segment arrives, the segment sent for the first time has just arrived. There is a problem here: the target host receives two identical segments. A segment must be chosen to discard, but which segment should be chosen?

It can be achieved by the sequence number (seq), which is a number that labels each byte of the transmitted data in sequence. By querying the serial number and data length in the TCP header, the receiving end returns the serial number that it should receive in the next step as an acknowledgment response. Through the serial number and the confirmation response number, TCP can identify whether the data has been received and whether it needs to be received, so as to achieve reliable transmission.

picture

As shown in the figure above, if the request is sent in order, seq = 1, this request will send the data from the 1st byte to the nth byte together, wait for the target host to confirm each byte once, and then send seq = n + 1 request, confirm the completion and then send the seq = m + 1 request, which can ensure that the sequence number will not be repeated.

UDP does not have a so-called serial number and confirmation number, so it will not confirm the data, and it will not retransmit after the data is lost, so UDP is an unreliable protocol.

If you use TCP and UDP as a metaphor for developers: TCP is the kind of engineer who needs to design everything well, and can't develop without design, and needs to take all factors into consideration before starting! So it is very reliable​; and UDP is the kind of direct work that comes up and starts work immediately after receiving the project demand, regardless of design or technical selection, just work, this kind of developer is very unreliable, but suitable for Rapid iterative development because you can get started right away!

ordered difference

As we mentioned above, TCP will send requests separately, and the data carried in each request will be confirmed by the target host. After the target host confirms each request in turn, it will reorganize the data in the request, because the request is made by seq Therefore, when TCP reassembles these data, it will also reassemble in order, while UDP does not have this guarantee of ordering.

segment differences

Both TCP and UDP belong to the transport layer protocol, and the data transmitted by the transport layer protocol are collectively referred to as segments. The main differences between the segments of TCP and UDP are as follows.

UDP segment structure

picture

  • Source Port: This field occupies the first 16 bits of the UDP packet header and usually contains the UDP port used by the application that sends the datagram. The receiving application uses the value of this field as the destination address to send the response to. This field is optional, and sometimes the source port number is not set. Defaults to 0 without a source port number, typically used in communications that do not require return messages.
  • Destination Port: Indicates the receiving port, the field length is 16 bits.
  • Length: This field occupies 16 bits, indicating the length of UDP datagram, including UDP header and UDP data length. Because the length of the UDP header is 8 bytes, the minimum value is 8, and the maximum length is 2^16 = 65535 bytes.
  • Checksum (Checksum): UDP uses checksum to ensure data security, UDP checksum also provides error detection function, error detection is used to check the process of the message segment from the source to the target host, the data whether integrity has changed.

TCP segment structure

picture

Compared with the UDP packet structure, the TCP segment structure has a lot more content. But the first two 32-bit fields are the same. They are source port number​ and destination port number​. In addition, like UDP, TCP also contains checksum (checksum field), in addition, the header of TCP segment has the following

  • 32-bit sequence number field and 32-bit acknowledgment number field. These fields are used by TCP senders and receivers for reliable data transfer.
  • The 4-bit header length field indicates the length of the TCP header in 32-bit words. The length of the TCP header is variable, but usually, the options field is empty, so the length of the TCP header field is 20 bytes.
  • 16-bit receive window field, this field is used for flow control. It is used to indicate the number of bytes the receiver is able/willing to accept
  • Variable options field (options field), this field is used for the sender and receiver to negotiate the maximum message length, which is used when MSS
  • The 6-bit flag field (flag field)​, the ACK​ flag is used to indicate that the value in the confirmation field is valid, this segment includes a confirmation that the segment has been successfully received; RST, SYN, FIN​ flags It is used for connection establishment and closing; CWR​ and ECE​ are used for congestion control; the PSH​ flag is used to indicate that the data is immediately handed over to the upper layer for processing; the URG​ flag is used to indicate that there is urgent data in the data that needs to be processed by the upper layer. Urgent data The last byte is indicated by the 16-bit urgeent data pointer field. In general, PSH and URG are not used.

Therefore, it can be seen from the comparison of the segment structure that TCP has many more Flags, sequence numbers and acknowledgment numbers than UDP, which all belong to the connection control of TCP. In addition to this, there are receive windows, which belong to congestion control and flow control. The header overhead of TCP is larger than that of UDP, because the TCP header is fixed at 20 bytes, and the UDP header is fixed at 8 bytes. Both TCP and UDP provide data verification functions.

difference in efficiency

The sending of TCP segment is in the form of "one question and one answer". Each request will be confirmed by the target host before sending the next message, which is very slow. Later, in order to solve this problem, TCP introduced the concept of window. , it can control the degradation of network performance even in the case of long round-trip time and high frequency.

We used to send each request in the form of segments. After introducing the window, each request can send multiple segments, that is to say, one window can send multiple segments. The window size is the maximum size of the segment that can continue to be sent without waiting for an acknowledgment.

In this window mechanism, a large number of buffers are used, and the function of acknowledging and responding to multiple segments at the same time is used.

As shown in the figure below, the highlighted part of the sending segment is the window we mentioned. In the window, the request can be sent even if no confirmation response is received. However, the sender will still retransmit if part of the segment is lost before the acknowledgment for the entire window arrives. To do this, the sender needs to set up a cache to hold these segments that need to be retransmitted until they receive an acknowledgment.

picture

The part outside the sliding window is the segment that has not been sent and the segment that has been received. If the segment has been acknowledged, it cannot be retransmitted, and the segment can be cleared from the buffer at this time.

In the case of receiving confirmation, the window will be slid to the position of the confirmation number in the confirmation response, as shown in the figure above, so that multiple segments can be sent simultaneously in order to improve communication performance. This kind of window is also called a sliding window ( Sliding window).

The segment sent by UDP does not need confirmation, so there is no concept of window, so the transmission efficiency of UDP is relatively high.

Differences in usage scenarios

There are differences between TCP and UDP in terms of efficiency, segment, flow control, and connection management. These differences lead to different choices for application scenarios. Since each packet of TCP needs to be confirmed, TCP is not suitable for transmitting data. For scenarios such as this one, it is good to use UDP; for example, Ping and DNS Lookup, this type of operation only needs a simple request/return, and does not need to establish a connection, and UDP is enough. For example, the HTTP protocol needs to consider the reliability of the request and response. In this scenario, the TCP protocol should be used. However, for application layer protocols such as HTTP 3.0, from the functional perspective, we have not found too many optimization points, but we want to optimize the network. To the extreme, UDP will be used as the underlying technology, and then reliability will be solved on the basis of UDP.