Linux network contract delivery process

2023.09.08

Linux network contract delivery process


When Linux wants to send a data packet, how does the packet go from the application to the Linux kernel and finally be sent out by the network card?

Hello everyone, I am Xianyu.

Previously, Xianyu introduced how Linux receives data packets on the network in the article " Linux Network Packet Receiving Process ".

A brief review:

  • After the data reaches the network card, the network card puts the data into a ring buffer allocated in the memory through DMA, and then triggers a hard interrupt.
  • After the CPU receives the hard interrupt, it simply processes it (allocates skb_buffer), and then triggers the soft interrupt.
  • The soft interrupt process ksoftirqd performs a series of operations (such as removing the data frame from the ring buffer) and then sends the data to the three-layer protocol stack.
  • In the three-layer protocol stack, the data is further processed and sent to the four-layer protocol stack.
  • In the four-layer protocol stack, data will be copied from the kernel to user space for application to read.
  • Finally, it is read by the application at the application layer

When Linux wants to send a data packet, how does the packet go from the application to the Linux kernel and finally be sent out by the network card?

So today Xianyu will introduce to you how Linux implements sending data packets over the network.

Contract issuance process

Assume that our network card has been started (RingBuffer allocated and initialized) and the server and client have established sockets.

What needs to be noted here is that there are two RingBuffers that the network card applies to allocate during the startup process:

  • igb_tx_buffer array: This array is used by the kernel to store the description information of the data packet to be sent. It is applied through vzalloc.
  • e1000_adv_tx_desc array: This array is used by the network card hardware to store data packets to be sent. The network card hardware can directly access this memory through DMA and allocate it through dma_alloc_coherent

Each element in the igb_tx_buffer array has a pointer pointing to e1000_adv_tx_desc;


In this way, the kernel can fill the e1000_adv_tx_desc array with the data to be sent;


The network card hardware will then read the actual data directly from the e1000_adv_tx_desc array and send the data to the network.

Copy to kernel

The socket system call copies data to the kernel

The application first implements the system call through the interface provided by the socket. The send function and sendto function we use in the user mode are actually implemented by the sendto system call. The send/sendto function is just for user convenience and is encapsulated in a way that is easier to call. That’s all.

Inside the sendto system call, first the sockfd_lookup_light function looks for the socket associated with the given file descriptor (fd);

Then call the sock_sendmsg function (sock_sendmsg ==> __sock_sendmsg ==> __sock_sendmsg_nosec);

Among them, the sock->ops->sendmsg function actually executes the inet_sendmsg protocol stack function:

At this time, the kernel will look for the corresponding specific protocol sending function on the socket.

Taking TCP as an example, the specific protocol sending function is tcp_sendmsg:

tcp_sendmsg will apply for a kernel-mode memory skb (sk_buff), and then hang it on the send queue (the send queue is a linked list composed of skb):

Then copy the data to be sent by the user to skb, and the [Send] operation will be triggered after copying. The sending here refers to the data to be sent from the socket layer to the transport layer in the current context.

It should be noted that the actual sending does not necessarily start at this time, because some conditional judgments must be made (for example, the data in the sending queue has exceeded half of the window size).

Only when the conditions are met can it be sent. If the conditions are not met, the system call may return directly.

Network protocol stack processing

Transport layer processing

Then the data comes to the transport layer. The transport layer mainly looks at the tcp_write_xmit function. This function handles the congestion control of the transport layer and the work related to the sliding window. This function will calculate the size of the data sent this time based on factors such as the sending window and the maximum segment size. , and then encapsulate the data into a TCP segment and send it out. If the window requirements are met, set the TCP header and then pass the data to the lower network layer for processing.

In the transport layer, the kernel mainly does two things:

(1) Copy a data (skb)

Why make a copy? Because after the network card sends, the skb will be released, but the TCP protocol supports lost retransmission, so a skb must be backed up to prepare for retransmission before receiving the ACK from the other party.

In fact, a copy of skb is sent at the beginning. The system will delete the real skb only after receiving the ACK from the other party.

(2) Encapsulate TCP header

The system will add a TCP header and encapsulate it into a TCP segment according to the actual situation.

What you need to know here is that each skb contains all header information in the network protocol, such as MAC header, IP header, TCP/UDP header, etc. When setting these headers, the kernel will fill in the corresponding header information by adjusting the position of the pointer. fields instead of frequently allocating and copying memory.

For example, when setting the TCP header, just point the pointer to the appropriate location of skb. When setting the IP header later, just move the pointer.

This method takes advantage of the linked list feature of the skb data structure to avoid the performance overhead caused by memory allocation and data copying, thereby improving the efficiency of data transmission.

Network layer processing

After the data leaves the transport layer, it comes to the network layer.

The network layer mainly does the following things:

(1) Routing entry lookup:

Search the routing table based on the destination IP address to determine the next hop of the data packet (ip_queue_xmit function).

(2) IP header settings:

Based on the results of the routing table lookup, set the source and destination IP addresses, TTL (time to live), IP protocol and other fields in the IP header.

(3) netfilter filtering:

Netfilter is a framework in the Linux kernel that is used to filter and modify data packets.

At the network layer, netfilter can be used to filter packets, NAT (Network Address Translation) and other operations.

(4) skb segmentation:

If the size of the data packet exceeds the MTU (Maximum Transmission Unit), the data packet needs to be divided into multiple fragments to accommodate network transmission, and each fragment will be encapsulated into a separate skb.

Data Link Layer Processing

When the data reaches the data link layer, two subsystems will work together to ensure that the data can be correctly encapsulated, parsed, and transmitted during the sending and receiving process of the data packet.

(1) Neighborhood subsystem

Manage and maintain neighbor relationships between hosts or routers and other devices. The neighbor subsystem will send arp requests to find neighbors, and then store the neighbor information in the neighbor cache table to store the MAC address of the target host.

When a data packet needs to be sent to a target host, the data link layer will first query the neighbor cache table to obtain the MAC address of the target host, thereby correctly encapsulating the data packet (encapsulating the MAC header).

(2) Network equipment subsystem

The network device subsystem is responsible for handling operations related to the physical network interface, including the encapsulation and sending of data packets, as well as receiving and parsing data packets from the physical interface.

The network device subsystem not only handles the format conversion of data packets, such as adding frame headers and frame trailers in Ethernet, and extracting data from frames, it is also responsible for handling hardware-related operations, such as clock synchronization of sending and receiving data packets, physical Layer error detection, etc.

(3) Arrive at the network card sending queue

Then the network device subsystem will select a suitable network card sending queue and add skb to the queue (bypassing the soft interrupt handler), and then, the kernel will call the entry function dev_hard_start_xmit of the network card driver to trigger the sending of data packets.

In some cases, the neighbor subsystem will also add the skb packet to the soft interrupt queue (softnet_data) and trigger the soft interrupt (NET_TX_SOFTIRQ). This process is to hand over the skb packet to the soft interrupt handler for further processing and send. The soft interrupt handler will be responsible for the actual data packet sending, which is one of the reasons why generally NET_RX is much larger than NET_TX when viewing /proc/softirqs on a server.

That is, for receiving packets, the NET_RX soft interrupt is required; for sending packets, the NET_TX soft interrupt is only triggered under certain circumstances:

Data sending

The driver reads the description information of skb from the send queue and hangs it on the RingBuffer (the igb_tx_buffer array mentioned above)

Then map the description information of skb to the memory DMA area accessible by the network card (the e1000_adv_tx_desc array mentioned above)

The network card will directly read the actual data from the e1000_adv_tx_desc array according to the description information and send the data to the network. This completes the sending process of the data packet

finishing touches

When the data transmission is completed, the network card device will trigger an interrupt (NET_RX_SOFTIRQ), which is usually called "transmission completion interrupt" or "transmission queue cleaning interrupt";

The main function of this interrupt is to perform the cleaning work after sending, including releasing the memory allocated for the data packet before, that is, releasing the skb memory and RingBuffer memory;

Finally, when the ACK response of this TCP message is received, the transport layer will release the original skb (it was mentioned earlier that what is sent is actually a copy of the skb).

It can be seen that when the data transmission is completed, the driver is notified of the completion of transmission through a hard interrupt, and the interrupt type is NET_RX_SOFTIRQ.

We mentioned earlier that when the network card receives a network packet, it will trigger the NET_RX_SOFTIRQ interrupt to tell the CPU that there is data to process. In other words, whether the network card receives a network packet or sends a network packet, it will trigger NET_RX_SOFTIRQ.

Summarize

Finally, let’s summarize the process of sending network data packets in Linux systems:

Finally, let’s summarize the process of sending network data packets in Linux systems:

(1) The application makes a system call through the interface provided by the socket and copies the data from the user state to the socket buffer in the kernel state.

(2) The network protocol stack takes the data from the socket buffer and processes it layer by layer from top to bottom according to the TCP/IP protocol stack.

  • Transport layer processing: Taking TCP as an example, a copy of the data is copied in the transport layer (for loss retransmission), and then the TCP header is encapsulated for the data.
  • Network layer processing: selecting routes (confirming the IP of the next hop), filling in IP headers, netfilter filtering, fragmenting packets exceeding the MTU size, etc.
  • Neighbor subsystem and network device subsystem processing: Here the data will be further processed and encapsulated, and then added to the send queue of the network card

(3) The driver reads the description information of skb from the send queue and hangs it on the RingBuffer, and then maps the description information of skb to the memory DMA area accessible by the network card.

(4) The network card sends data to the network

(5) When the data transmission is completed, a hard interrupt is triggered to release the skb memory and RingBuffer memory.