In-depth understanding of the Linux kernel socket essence

This article will start from a beginner's perspective, so that everyone can understand what Socket is and its principles and core implementation.

1. The concept of Socket
Socket is like the connection between plugs and sockets in our daily life. In network program design, Socket is an interface or mechanism for realizing network communication. Imagine that after the plug is inserted into the socket, the current can flow and realize the energy transfer. In the network world, when a program uses Socket to establish a "connection" with another machine, it is like the plug is successfully inserted into the socket, and data can flow and exchange between the two.


For example, when we chat online, the sender's program sends messages through Socket, and the receiver's program receives these messages through the corresponding Socket. For example, when downloading a file, the download program establishes a connection with the server that provides the file through Socket, so that the required file data can be obtained.
2. Socket usage scenarios
We want to send data from a process on computer A to a process on computer B. If you need to ensure that the data can be sent to the other party, choose the reliable TCP protocol; if it doesn't matter if the data is lost, choose the unreliable UDP protocol. Beginners generally choose TCP first.
At this time, you need to use socket for programming. First, create a socket about TCP:
This method returns sock_fd, which is the handle of the socket file.

For the server, after getting sock_fd, execute the bind(), listen(), and accept() methods in sequence to wait for the client's connection request; for the client, after getting sock_fd, execute the connect() method to initiate a connection request to the server, and the TCP three-way handshake will occur at this time.
After the connection is established, the client can execute the send() method to send messages, and the server can execute the recv() method to receive messages, and vice versa.

3. Socket Design
Now let's put aside the socket and redesign a kernel network transmission function. We want to send data from a process on computer A to a process on computer B. From an operational point of view, it is sending data to the remote end and receiving data from the remote end, that is, writing data and reading data.
But there are two problems here:

There may be more than one receiver and sender, so IP and port are needed to distinguish. IP is used to locate which computer it is, and port is used to locate which process on this computer it is.
There are many differences in the transmission methods of the sender and receiver, such as the reliable TCP protocol, the unreliable UDP protocol, and even the need to support the ping command based on the icmp protocol.
In order to support these functions, a data structure sock needs to be defined, and IP and port fields need to be added to sock. Although these protocols are different, they have some similar functions. Different protocols can be treated as different object classes (or structures), and the common parts can be extracted to reuse functions through "inheritance".

Therefore, some data structures are defined:

sock is the most basic structure, maintaining some send and receive data buffers that may be used by any protocol.
In the source code related to Linux kernel 2.6, the definition of the sock structure may be similar to:

inet_sock specifically refers to the sock that uses the network transmission function. On the basis of sock, TTL, port, IP address and other field information related to network transmission are also added. For example, Unix domain socket is used for communication between local processes, and directly reads and writes files without going through the network protocol stack.

Possible definition:
inet_connection_sock refers to a connection-oriented sock. On the basis of inet_sock, the fields related to the connection-oriented protocol are added, such as the accept queue, the size of the data packet fragment, the number of retries when the handshake fails, etc. Although the connection-oriented protocol mentioned now refers to TCP, Linux needs to support the expansion of other new connection-oriented protocols.

For example:

tcp_sock is a sock structure dedicated to the TCP protocol. On the basis of inet_connection_sock, the TCP-specific sliding window and congestion avoidance functions are also added. Similarly, the UDP protocol will also have a dedicated data structure called udp_sock.
It is roughly as follows:

With this data structure, connect it to the hardware network card to realize the network transmission function.

4. Provide Socket layer
Because the code here is complex and also operates the network card hardware, it requires higher operating system permissions. Considering performance and security, it is placed in the operating system kernel.
In order to allow user space applications to use this part of the function, this part of the function is abstracted into a simple interface, and the core sock is encapsulated into a file. When creating a sock, a file is also created. The file has a file descriptor fd, which can be used to uniquely identify which sock it is. Expose fd to the user, and the user can operate this sock like a file handle.

When creating a socket, a file structure is actually created, and the private_data field points to sock.
With the sock_fd handle, some interfaces are provided, such as send(), recv(), bind(), listen(), connect(), etc. These are the interfaces provided by socket.

So, socket is actually a code library or interface layer, which is between the kernel and the application, providing a bunch of interfaces for us to use the kernel functions. In essence, it is a bunch of highly encapsulated interfaces.

Although the code in the application we usually write uses socket to implement the function of sending and receiving data packets, it is actually not the application, but the Linux kernel that actually executes the network communication function.
In the operating system core space, the structure that implements the network transmission function is sock. Based on different protocols and application scenarios, it will be generalized into various types of xx_sock, which are combined with hardware to jointly implement the network transmission function. In order to expose this part of the function to the application in the user space, the socket layer was introduced, and the sock was embedded in the framework of the file system. The sock becomes a special file, and the user can use the file handle in the user space, that is, socket_fd to operate the network transmission capability of the kernel sock.

5. How does Socket implement network communication
Taking the most commonly used TCP protocol as an example, the realization of network transmission function is divided into two stages: establishing connection and data transmission.

1. Establishing connection

On the client side, when executing the connect(sockfd, "ip:port") method provided by socket, the corresponding file will be found through the sockfd handle, and then the information in the file will be used to point to the kernel's sock structure, and the three-way handshake will be initiated through this sock structure.
On the server side, a connection that has not completed three handshakes is called a semi-connection, and a connection that has completed three handshakes is called a full connection. They are stored in semi-connection queues and full-connection queues, respectively. These two queues are created when the listen() method is executed. When the server executes the accept() method, a full connection is taken out from the full-connection queue.

Although they are both called queues, the semi-connection queue is actually a hash table, while the full-connection queue is actually a link table.
In the source code of Linux kernel version 2.6, the relevant code implementation may be located in the network subsystem. For example, the process of establishing a connection may involve functions such as tcp_connect().

2. Data transmission
In order to implement the functions of sending and receiving data, the sock structure has a send buffer and a receive buffer, which is actually a chain list with data ready to be sent or received.
When an application executes the send() method to send data, it will find the corresponding file through the sock_fd handle, find the send buffer in the sock structure according to the sock structure pointed to by the file, put the data in the send buffer, and then end the process. The kernel will decide when to send the data according to its mood.

The process of receiving data is similar. When the data is sent to the Linux kernel, it is first placed in the receive buffer and waits for the application to execute the recv() method to get it.
When the application executes the recv() method to try to obtain (in a blocking scenario) data from the receive buffer, if there is data, just take it away; if there is no data, it will register its own process information in the wait queue used by this sock, and then the process will sleep. If data is sent from the remote end at this time, when the data enters the receive buffer, the kernel will take out the process in the wait queue of the sock and wake up the process to get the data.
When multiple processes listen to the same socket_fd through fork, they are the same sock in the core. After multiple processes execute listen(), they will register their own process information in the waiting queue of the core sock corresponding to this socket_fd. Before Linux 2.6, all processes in the waiting queue will be awakened, but in the end only one process will handle the connection request, and the other processes will go back to sleep, which will consume certain resources. This is the herd effect. After Linux 2.6, only one process in the waiting queue will be awakened. This problem has been fixed.
When the server listens, how can it distinguish multiple clients when so much data arrives at one socket? Taking TCP as an example, after the server executes the listen method, it will wait for the client to send data. The data packet sent by the client will have the source IP address and port, as well as the destination IP address and port. These four elements form a four-tuple that can be used to uniquely mark a client. The server will create a new core sock, and use the four-tuple to generate a hash key, and put it into a hash table. The next time a message comes in, the hash key is generated through the four-tuple that comes with the message, and then the corresponding sock is retrieved from this hash table.
6. How does Socket achieve "inheritance"?
The Linux kernel is implemented in C language, and C language has no class or inheritance features. It achieves the effect of "inheritance" through the feature that the memory in the structure is continuous. Put the "parent class" to be inherited at the first position of the structure, and then forcibly intercept the memory through the length of the structure name, so that the structure can be converted, thereby achieving an effect similar to "inheritance".


7. Summary
Socket in Chinese can be understood as a set of numbers used for connection.

sock is in the kernel, socket_fd is in the user space, and the socket layer is between the kernel and the user space.
In the operating system core space, the structure that implements the network transmission function is sock. Based on different protocols and application scenarios, it will be generalized into various types of xx_sock. They are combined with hardware to jointly implement the network transmission function. In order to expose this part of the function to the application in the user space, the socket layer is introduced, and sock is embedded in the framework of the file system. Sock becomes a special file. Users can use the file handle in the user space, that is, socket_fd, to operate the network transmission capability of the kernel sock.
The server can distinguish multiple clients through the quadruple.

The kernel implements a similar inheritance effect through the C language feature that "the memory in the structure is continuous".