The client sends a SYN (synchronization) packet to the server, indicating that it wishes to establish a connection. This SYN package contains a randomly generated initial sequence number (ISN, Initial Sequence Number), which is used as the number for subsequent data transmission. After the transmission is completed, the client enters the SYN_SENT state and waits for the server's response.
(2) Second handshake
After receiving the SYN packet from the client, the server will reply to the client with a SYN+ACK packet. SYN indicates that the server agrees to establish a connection, and ACK is the confirmation of the client's SYN packet. The confirmation number is the client's initial sequence number plus 1. At the same time, the server will also generate its own initial sequence number and include it in the SYN package. At this time, the server enters the SYN_RCVD state.
(3) The third handshake
After the client receives the SYN+ACK packet from the server, it will send another ACK packet as an acknowledgment of the server's SYN packet. The acknowledgment number of this ACK packet is the server's initial sequence number plus 1. After the sending is completed, both the client and the server enter the ESTABLISHED state, the connection is officially established, and both parties can start transmitting data.
2. Why is a three-way handshake required instead of two or four?
(1) Synchronize the initial sequence numbers of both parties
Both parties in the TCP protocol must maintain a sequence number, which is a key factor in ensuring reliable transmission. Sequence numbers play an important role in TCP connections. They have the following functions:
The receiver can eliminate duplicate data to ensure data accuracy.
The receiver can receive data packets in the order of sequence numbers to ensure data integrity.
Sequence numbers can identify data packets that have been received by the other party to achieve reliable data transmission.
Therefore, when establishing a TCP connection, the client sends a SYN packet with an initial sequence number, and the server needs to reply with an ACK packet, indicating that the client's SYN packet has been successfully received. Then, the server sends a SYN packet with an initial sequence number to the client and waits for the client's response. This back-and-forth process ensures that the initial sequence numbers of both parties can be reliably synchronized.
Although the four-way handshake can also reliably synchronize the initial sequence numbers of both parties, since the second and third steps can be combined into one step, it eventually evolved into a three-way handshake. The two-way handshake can only guarantee that the initial sequence number of one party is successfully received by the other party, but cannot guarantee that the initial sequence numbers of both parties can be confirmed to be received. Therefore, the three-way handshake is the best choice to ensure the stability and reliability of the TCP connection.
(2) Prevent old repeated connection initialization from causing confusion
In network communications, packets may be delayed due to network congestion, routing problems, or other reasons. These delayed packets are called "historical packets" or "old packets". If TCP only uses two-way handshakes, the server cannot distinguish whether the currently received SYN package is a new connection request or an old connection request that has been delayed.
Assume the following scenario:
The client sends a SYN packet (connection request), but due to network delays, this SYN packet does not reach the server in time.
The client resends a new SYN package (new connection request) because it did not receive a response from the server within a timeout period.
If a two-way handshake is used, after receiving the new SYN package, the server replies with a SYN+ACK package and considers the connection to be established.
Subsequently, the old SYN package arrives at the server with a delay. Since the server only performs two handshakes, it will mistake this old SYN package for a new connection request and reply with a SYN+ACK package, thereby allocating resources for this non-existent connection.
How does the three-way handshake avoid this problem? In the three-way handshake:
When the server receives the SYN package, it will reply with a SYN+ACK package (the second handshake), but will not immediately enter the connection state.
Only when the client receives the SYN+ACK packet and replies with an ACK packet (the third handshake), the server will confirm that this is a valid connection.
If the old SYN packet arrives at the server late, the server will reply with a SYN+ACK packet, but the client will not send the final ACK packet because the client did not initiate the old connection request. Therefore, the server will abandon the invalid connection to avoid resource waste.
(3) Confirm the sending and receiving capabilities of both parties
The three-way handshake ensures that both parties can send and receive data normally. For example, the client confirms that the server can receive its own package through the second handshake; the server confirms that the client can receive its own package through the third handshake.
2. Interviewer asked: What will happen if the client's ACK does not reach the server during the third handshake? What if the connection has been established but the client fails?
During the TCP three-way handshake, if the ACK packet of the third handshake does not reach the server, the following will happen:
1. Server behavior
After sending the SYN+ACK package, the server will enter the SYN_RCVD state and wait for the client's ACK confirmation.
If the server does not receive the client's ACK packet, it will think that its SYN+ACK packet may be lost, so it will resend the SYN+ACK packet according to the TCP timeout retransmission mechanism.
Normally, the server will try to retransmit the SYN+ACK package up to 5 times (the specific number depends on the implementation), and the time interval between each retransmission will gradually increase (for example, 3 seconds, 6 seconds, 12 seconds, etc.).
If the client still does not receive an ACK confirmation after multiple retransmissions, the server will give up establishing the connection and enter the CLOSED state, releasing related resources.
After the server enters the CLOSED state, if the client sends data to the server, the server will respond with a RST packet.
2. Client behavior
After sending the ACK packet, the client enters the ESTABLISHED state and believes that the connection has been established.
If the client then starts sending data to the server and the server has not received an ACK confirmation, the server will check the ACK flag bit in the received data packet. If the packet contains a valid ACK message, the server will treat it as an acknowledgment of SYN+ACK and complete the connection establishment.
3. What if the connection has been established but the client fails?
In the TCP protocol, if the connection has been established but the client fails, the server will detect the connection status through the keepalive mechanism (Keepalive) and take corresponding measures to release resources. The following is a detailed analysis:
(1) TCP connection failure handling mechanism
Keepalive mechanism:
Trigger condition: When there is no data interaction for a long time (the default is 2 hours), the server will start the keepalive detection.
Probe process: The server sends a probe packet (empty data packet) every 75 seconds. If no response is received from the client for 10 consecutive times (about 11 minutes), the connection is considered invalid.
Result processing: The server actively closes the connection to release the occupied port and memory resources to avoid resource leakage.
(2) Potential problems and solutions
Keep-alive mechanism defects:
Detection delay: The default detection cycle of 11 minutes is too long and may not meet the real-time requirements.
Optimization suggestions: Implement the heartbeat mechanism (such as HTTP long polling, WebSocket) at the application layer, and send heartbeat packets regularly through custom protocols to shorten the fault detection time.
Risk of resource competition:
Scenario: If the client does not release the connection before crashing, the server may be unable to establish a new connection due to resource exhaustion (such as TIME_WAIT state accumulation).
Optimization suggestions:
Adjust core parameters (such as `net.ipv4.tcp_keepalive_time`) to shorten the keepalive interval.
Use connection pool technology to reuse connections and reduce the overhead of frequent establishment/release.
3. Interviewer asked: What is the process of TCP four waves? Why can't the ACK and FIN sent by the server be combined to become three waves?
TCP needs to disconnect the connection through four waves. Both parties have the ability to actively disconnect the connection. Once the connection is disconnected, various "resources" in the host will be released.
TCP connections are full-duplex, so each direction must be closed separately. The party that closes first will perform an active close, while the other party will perform a passive close.
The whole process usually involves four steps, each of which is completed by exchanging TCP packets. And each time the client and the server will enter the corresponding state (FIN_WAIT_1, FIN_WAIT_2, CLOSED_WAIT, LAST_ACK and TIME_WAIT).
1. Detailed steps of four waves
(1) First wave: The active closing party sends a FIN message
Process: The active closing party (usually the client) sends a FIN (Finish) segment, indicating that it has no data to send and hopes to close the connection, but the active closing party can still receive data. At this time, the active closing party enters the FIN_WAIT_1 state and waits for confirmation from the passive closing party.
Segment content: In the FIN segment, the FIN flag bit is set to 1, and it may also contain a sequence number (seq) to identify the position of the packet in the data stream.
(2) Second wave: The passive closing party responds with an ACK message
Process: After receiving the FIN message segment, the passive closing party (usually the server) sends an ACK (Acknowledgment) packet segment as a response, indicating that it has received the closing request. At this time, the passive closing party enters the CLOSE_WAIT state, indicating that it is ready to close the connection, but may still need to process the remaining data. After receiving the ACK message segment, the active closing party enters the FIN_WAIT_2 state. At this stage, if the server has processed the data (if any), it will send the data to the client.
Segment content: In the ACK segment, the ACK flag bit is set to 1, and the confirmation number (ack) is the number of the received FIN segment plus 1, indicating the confirmation of the FIN segment.
(3) The third wave: The passive closing party sends a FIN message
Process: After processing the remaining data, the passive closing party sends a FIN segment, indicating that it has no more data to transmit and requests to close the connection. At this time, the passive closing party enters the LAST_ACK state and waits for confirmation from the active closing party.
Segment content: In the FIN segment, the FIN flag bit is set to 1, and it may also contain a sequence number (seq) to identify the position of the packet in the data stream.
(4) The fourth wave: The active closing party responds with an ACK message and enters the TIME_WAIT state
Process: After the active closing party receives the FIN segment from the passive closing party, it sends an ACK segment as a response, indicating that it has received the closing request. At this time, the active closing party enters the TIME_WAIT state and waits for a period of time (usually 2MSL, which is twice the maximum segment lifetime) to ensure that the passive closing party receives the final ACK segment. If the passive closing party does not receive the ACK segment, it will resend the FIN segment, and the active closing party can send the ACK segment again. After the waiting time is over, the active closing party enters the CLOSED state, and the connection is officially closed. After the passive closing party receives the ACK segment, it also immediately enters the CLOSED state.
Segment content: In the ACK segment, the ACK flag bit is set to 1, and the confirmation number (ack) is the number of the received FIN segment plus 1, indicating the confirmation of the FIN segment.
2. Why can't ACK and FIN be combined into three waves?
In order to better understand why it takes four waves, let's review the process of both parties sending FIN packets. In this way, we can understand why it takes four waves.
When closing the connection, when the client sends a FIN to the server, it only means that the client is no longer sending data, but it can still receive data.
When the server receives the FIN message from the client, it will first reply with an ACK message. However, the server may still have data to process and transmit, so it will wait until it no longer sends data before sending a FIN packet to the client, indicating that it agrees to close the connection now.
Through the above process, we can see that the server usually needs to wait for the data to be sent and processed, so the server's ACK and FIN are usually sent separately, which results in one more handshake process than the three-way handshake.
4. Interviewer asked: Why does the active closing party enter the TIME_WAIT state after the last wave instead of releasing resources directly? Why does the TIME_WAIT state last for 2MSL?
The TIME_WAIT state exists to ensure reliable closure of the network connection. Only the party that actively initiates the closure of the connection (i.e. the active closing party) will have the TIME_WAIT state.
The client entering the TIME_WAIT state is an important manifestation of the reliability and robustness design of the TCP protocol. Its core significance is reflected in the following key aspects:
1. Ensure reliable termination of the connection
Avoid data loss: During the TCP four-wave process, the client enters the TIME_WAIT state after sending the last ACK confirmation message, waiting for 2 times the maximum segment lifetime (2MSL). If the ACK is lost, the server will resend the FIN packet. The client can resend the ACK during the TIME_WAIT period to ensure that the server has properly closed the connection.
Handling delayed or retransmitted packets: There may be delayed or retransmitted packets in the network. The TIME_WAIT state ensures that these packets are discarded before the connection is completely terminated to avoid interfering with subsequent new connections.
2. Preventing connection reuse conflicts
Avoid port and address confusion: TCP connections are uniquely identified by a 4-tuple (source IP, source port, destination IP, destination port). The TIME_WAIT state ensures that the same 4-tuple is not reused by a new connection before the old connection is completely terminated. If a new connection reuses the address and port of an old connection, delayed packets from the old connection may be mistaken for data from the new connection, resulting in data confusion.
Maintain connection uniqueness: By waiting for 2MSL, the TIME_WAIT state ensures that the packets of the old connection disappear naturally in the network, thereby ensuring the independence and correctness of the new connection.
Why does the TIME_WAIT state last for 2MSL?
MSL is the maximum time a packet can survive in the network. After this time, the packet will be discarded. Different operating systems or network devices may have different definitions of MSL (for example, RFC 793 recommends that MSL be 2 minutes, but in practice it may be shorter).
As mentioned above, TCP packets may be stuck in the network due to delays, routing retransmissions, etc., and the longest time may be up to the MSL time. If the active close party does not wait enough time in the TIME_WAIT phase, the newly established connection may reuse the same four-tuple group (source IP, source port, destination IP, destination port), causing the stuck old connection message to be mistaken for new connection data, causing data confusion or protocol errors.
Waiting for 2MSL ensures that all old connection packets are discarded due to timeouts, avoiding interference with new connections. For example, if the MSL is 30 seconds, the 2MSL is 60 seconds, which is enough to cover the longest possible retention time in the network.
Although the TIME_WAIT state occupies port and system resources, the 2MSL waiting time is a necessary price for protocol reliability. In high-concurrency scenarios, resource utilization can be optimized by adjusting system parameters (such as shortening the TIME_WAIT time or reusing TIME_WAIT connections), but reliability and performance must be carefully balanced.
5. Interviewer's follow-up question: What is the difference between TCP and UDP protocols? What are their applicable scenarios?
TCP is a connection-oriented, reliable, byte stream-based transport layer communication protocol.
Connection-oriented: Connection-oriented means that TCP communication is one-to-one, that is, point-to-point end-to-end communication, unlike UDP which can send messages to multiple hosts at the same time, so one-to-many communication cannot be achieved.
Reliable: TCP's reliability ensures that no matter what changes occur in the network connection, TCP can ensure reliable transmission of packets to the receiving end, which also makes the TCP protocol packet format more complex than UDP.
Based on byte stream: Based on the characteristics of byte stream, TCP can transmit messages of any size and ensure the order of messages. If the previous message is not fully received, even if the subsequent bytes have been received, TCP will not deliver it to the application layer for processing. At the same time, duplicate packets will be automatically discarded.
UDP (User Datagram Protocol) is a connectionless communication protocol. Compared with TCP, UDP does not provide complex control mechanisms. The UDP protocol allows applications to directly transmit encapsulated IP packets without establishing a connection. When developers choose to use UDP instead of TCP, the application communicates directly with IP.
1. Comparison of core features
2. Analysis of key differences
(1) Reliability vs. efficiency
TCP: Ensures reliable data transmission through technologies such as acknowledgment mechanism (ACK), retransmission, and sequence number, but increases latency and overhead.
UDP: Only provides "best effort" transmission, does not guarantee data integrity, but because there is no need to wait for confirmation, it can achieve lower latency real-time communication.
(2) Connection establishment vs. send and go
TCP: A three-way handshake is required to establish a connection (SYN→SYN+ACK→ACK), similar to dialing a number before making a phone call; four waves are required to close the connection (FIN→ACK→FIN→ACK), similar to a polite farewell before hanging up the phone.
UDP: Send data packets directly without any connection establishment process, similar to writing a letter without confirming whether the recipient is online.
(3) Performance overhead comparison
TCP: The header contains fields such as sequence number, confirmation number, window size, etc., with a total of 20 bytes of basic overhead, which may be expanded to 60 bytes in complex scenarios.
UDP: The header only contains 8-byte fields such as source port, destination port, length, checksum, etc., with extremely low overhead.
3. Typical application scenarios
(1) TCP application scenarios
Applications that require high reliability: such as HTTP/HTTPS (web pages), FTP (file transfer), SMTP (mail), and SSH (remote login).
Scenarios that are sensitive to data integrity: such as banking transactions and database synchronization.
(2) UDP application scenarios
Applications with high real-time requirements: such as live video broadcasting (allowing a small amount of packet loss) and online games (low latency is more important than packet loss).
Broadcast/multicast communications: such as video conferencing and IoT device group control.
Simple request-response mode: such as DNS query (usually a single packet can be completed).
Customized protocol design: such as the QUIC protocol (developed by Google based on UDP and integrating TCP reliability).
4. Summary of advantages and disadvantages
Summary: The choice between TCP and UDP is essentially a trade-off between "reliability" and "efficiency". TCP is suitable for scenarios with strict data integrity requirements, while UDP is more suitable for scenarios with real-time priority or simple communication needs. In actual development, you can choose a protocol based on business characteristics, or use a hybrid solution to take into account the advantages of both.
As mentioned above, TCP packets may be stuck in the network due to delays, routing retransmissions, etc., and the longest time may be up to the MSL time. If the active close party does not wait enough time in the TIME_WAIT phase, the newly established connection may reuse the same four-tuple group (source IP, source port, destination IP, destination port), causing the stuck old connection message to be mistaken for new connection data, causing data confusion or protocol errors.