Practice: A project of adding hundreds of thousands of APs,
The case shared in this issue is related to wireless network issues.
1. Background of the problem
A company is committed to the work of smart agricultural technology. Recently, it received a network expansion project from a government-owned enterprise unit. The scale of the planting shed is expanded, and it is necessary to add "automatic sprinklers", "data acquisition devices" and other wireless 2.4G frequency band sensor equipment, so the increase in the scale of the wireless network is inevitable. Therefore, in the expansion area, wireless AP points are added, but the brands are different. The original AP uses H3C, and the new AP uses a certain P.
1. The simplified topology is as follows:
2. After the new AP is installed, a problem is found:
The upgrade method of field sensor equipment is to download files to the internal system through HTTP protocol. The HTTP server is fixed at 172.16.1.208. However, the sensors in the newly added area cannot be upgraded concurrently (multiple units), and how can the sensors in the original area be upgraded at the same time? As follows:
For the same model of sensors under the same project, why does the newly added J AP not work, while the original H3C AP works normally? Some friends said that the J AP is rubbish. Well, before making a conclusion, let's diagnose it first.
2. Troubleshooting ideas
Use a mobile phone or laptop to access a J AP wireless 2.4G long ping server to confirm whether there is packet loss and delay;
Compare mobile phones or laptops to access a J AP wireless 2.4G, and download files from the server at the same time to see if it is normal;
Confirm whether the channel, bandwidth, wireless mode, encryption method, etc. of the newly added AP are consistent with the original H3C AP;
Confirm the on-site interference situation, whether the poor anti-interference ability of the newly added AP leads to frequent disconnection of session establishment;
Capture packets to confirm the interaction of data flows and see if there is any available information.
3. Basic analysis
Step 1: Confirm the connectivity of the wireless network
Using a mobile phone or laptop to access a Japanese AP wireless 2.4G long ping server, it can be confirmed that there is basically no delay or packet loss, and generally speaking, there should be no problem with wired/wired communication.
Step 2: Compare the situation of the mobile phone and laptop pulling files at the same time http
http downloading files is very simple, just open the browser on the computer and enter the server address URL to download. Because the sensor has a problem downloading, try to compare multiple laptops to download at the same time:
The result shows that only one PC can download successfully, and other PCs cannot establish a connection and download files. However, there is no such problem when connecting to the original H3C AP wireless network. The next step is to compare the differences in the wireless AP settings of both parties.
Step 3: Check the differences in the settings of the newly added AP and H3C AP
Wireless configuration is nothing more than a few items: SSID, encryption method, wireless mode, channel, bandwidth, security protection (isolation, bandwidth control, etc.), roaming, etc., so the comparison is as follows:
SSID: All names are the same
Encryption method: All are WPA2-PSK
Wireless mode: All are 802.11b/g/n/ax, wifi6 default AP, but the sensor does not know this WIFI 6, and the negotiation is bgn, which is also the same
Channel: All work on channel 1
Bandwidth: All are configured with 20Mhz bandwidth
Security protection: Not configured
Roaming settings: On-site sensors are fixed and do not involve roaming
There is no difference. The next step is to confirm the on-site interference situation, whether the poor anti-interference ability of the newly added AP leads to frequent disconnections due to unsuccessful session establishment.
Step 4: Confirm wireless interference
Since it is 2.4G wireless access, we first suspect that it is caused by competitive interference. We can clearly see strong interference caused by the same channel on site. There are many SSIDs in the 1st channel where the newly added AP and H3C AP work. It is estimated that there is multi-party competitive interference:
However, the problem still exists after changing the channel. The next step is to capture the abnormal data interaction to see if there are any clues.
Step 5: Capture abnormal data flow to confirm valid information
Because the problem occurred in the newly added area, we captured packets on the "secondary router" of the newly added area and configured the monitoring port to capture the data flow of the abnormal sensor and the server:
Based on the data packet analysis, we found that:
After the abnormal TCP handshake succeeds, the terminal will receive the HTTP 429 message from the server when requesting download, and then actively FIN the connection, and there is no packet loss problem. Therefore, it can be roughly judged that the communication quality of the wireless link and the wired link is normal. In contrast, let's take a look at the normal data flow, the server did not reply to the HTTP 429 packet:
OK, it is almost determined that the server's HTTP packet caused the session establishment failure. Since HTTP is plain text, let's look at the relevant content.
Step 6: Confirm the relevant content of the HTTP 429 packet
The content of the HTTP 429 packet is as follows:
I translate it: "The number of simultaneous downloads on this server is limited, and the limit has been reached. Try again later." In other words, the session limit exists on the server side rather than the network layer.
So here comes the problem:
H3C also has multi-terminal access, why does the server there not reject it, but the newly added AP does? Through the on-site full-area survey, it was found that the actual locations of H3C AP and the newly added AP are different! The newly added AP has a secondary router and is NATed, while the H3C AP is not! As follows:
So it is reasonable to suspect that the HTTP server only recognizes the source IP when establishing a session with the terminal. After NAT, there is only one source IP that interacts with the server, so it can only be one-to-one. Therefore, the following effective comparison test was conducted:
Server-2nd layer network-Huasan AP ))(( Multiple sensors, concurrent download of upgrade files is normal
Server-2nd layer network-J AP ))(( Multiple sensors, concurrent download of upgrade files is normal
Server-2nd layer network-(WAN) router (LAN)-Huasan AP ))(( Multiple sensors, the terminal cannot download files concurrently, only one sensor can download
Server-2nd layer network-(WAN) router (LAN)-J AP ))(( Multiple sensors, the terminal cannot download files concurrently, only one sensor can download
4. Problem Summary and Solution
Problem Summary:
The root cause is that the HTTP server has a session restriction that only recognizes one source IP, which causes the newly added J AP and H3C AP to behave differently after passing through NAT. At first, if you look at it from a global perspective, it is a very simple problem;
Solution:
Know that users use Layer 2 networking, and do not need to go through NAT to allow the terminal to communicate directly with the server;
Turn off the session limit of the HTTP server to ensure that multiple sessions can be established based on the port.