Explore the impact of network latency on transactions
1. Background overview
Recently I was doing a data synchronization test. I needed to synchronize the data in kafka to the database through DTS. It took about 4 hours to synchronize the 4G data volume to the database. This does not seem reasonable. At this time, check the CPU of the host where the database is located. , the IO usage rate is not high and there is no bottleneck; finally, through investigation, it was found that because kafka, DTS, and the database are no longer in the same computer room, the network delay is large, resulting in a slow synchronization rate;
After deploying kafka, DTS, and database to the same computer room, the synchronization speed is significantly improved, and it only takes 15 minutes to complete the synchronization.
2. Problem recurrence
This test uses sysbench to perform data writing and performance stress testing under different network delays to compare the impact of network delays on database transactions.
2.1 Check the current network delay
$ ping 192.168.137.162
PING 192.168.137.162 (192.168.137.162) 56(84) bytes of data.
64 bytes from 192.168.137.162: icmp_seq=1 ttl=64 time=0.299 ms
64 bytes from 192.168.137.162: icmp_seq=2 ttl=64 time=0.180 ms
64 bytes from 192.168.137.162: icmp_seq=3 ttl=64 time=0.297 ms
64 bytes from 192.168.137.162: icmp_seq=4 ttl=64 time=0.329 ms
64 bytes from 192.168.137.162: icmp_seq=5 ttl=64 time=0.263 ms
64 bytes from 192.168.137.162: icmp_seq=6 ttl=64 time=0.367 ms
64 bytes from 192.168.137.162: icmp_seq=7 ttl=64 time=0.237 ms
64 bytes from 192.168.137.162: icmp_seq=8 ttl=64 time=0.160 ms
64 bytes from 192.168.137.162: icmp_seq=9 ttl=64 time=0.180 ms
64 bytes from 192.168.137.162: icmp_seq=10 ttl=64 time=0.257 ms
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
Currently, the two hosts are in the same computer room, and the network delay is about 0.3ms.
2.2 (normal delay) writing data through sysbench
2.2.1 Create a table and write 5 million pieces of data
$ time sysbench lua/oltp_read_write.lua --mysql-db=sysbench --mysql-host=192.168.137.162 --mysql-port=3307 --mysql-user=root --mysql-password=greatdb --tables=1 --table_size=5000000 --report-interval=2 --threads=10 --time=600 --mysql-ignore-errors=all prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 5000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
real1m56.459s
user0m7.187s
sys0m0.400s
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
Writing 5 million data takes 1m56s
2.2.2 sysbench stress test for 3 minutes
SQL statistics:
queries performed:
read: 1711374
write: 488964
other: 244482
total: 2444820
transactions: 122241 (407.37 per sec.)
queries: 2444820 (8147.45 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
Throughput:
events/s (eps): 407.3725
time elapsed: 300.0718s
total number of events: 122241
Latency (ms):
min: 10.68
avg: 122.72
max: 1267.88
95th percentile: 502.20
sum: 15000894.94
Threads fairness:
events (avg/stddev): 2444.8200/14.99
execution time (avg/stddev): 300.0179/0.02
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- twenty one.
- twenty two.
- twenty three.
- twenty four.
- 25.
- 26.
You can see TPS: 407.37 QPS: 8147.45
2.3 Simulate network delay through tc command
The tc command is a network management tool in Linux systems, used to configure and manage network flow control. It can be used to limit network bandwidth, delay, packet loss, etc., and implement functions such as QoS (Quality of Service).
# 对ens3网卡进行延迟设置,设置延迟为10ms
tc qdisc add dev ens3 root netem delay 10ms
- 1.
- 2.
If you get the following error when using the tc command, you can upgrade the kernel module.
# 报错
tc qdisc add dev ens3 root netem delay 10ms
Error: Specified qdisc not found.
# 升级
$ yum install kernel-modules-extra*
# 重启主机
$ reboot
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
2.4 Check the current network delay
$ ping 192.168.137.162
PING 192.168.137.162 (192.168.137.162) 56(84) bytes of data.
64 bytes from 192.168.137.162: icmp_seq=1 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=2 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=3 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=4 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=5 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=6 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=7 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=8 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=9 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=10 ttl=64 time=10.2 ms
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
2.5 (10ms delay) writing data through sysbench
2.5.1 Create a table and write 5 million pieces of data
$ time sysbench lua/oltp_read_write.lua --mysql-db=sysbench --mysql-host=192.168.137.162 --mysql-port=3307 --mysql-user=root --mysql-password=greatdb --tables=1 --table_size=5000000 --report-interval=2 --threads=10 --time=600 --mysql-ignore-errors=all prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 5000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
real2m11.656s
user0m7.314s
sys0m0.470s
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
Writing 5 million data takes 2m11s
2.5.2 sysbench stress test for 3 minutes
SQL statistics:
queries performed:
read: 788214
write: 225204
other: 112602
total: 1126020
transactions: 56301 (187.41 per sec.)
queries: 1126020 (3748.16 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
Throughput:
events/s (eps): 187.4079
time elapsed: 300.4196s
total number of events: 56301
- 1.
- 2.
- 3.
- 4.
- 5.
Latency (ms):
min: 210.14
avg: 266.68
max: 493.91
95th percentile: 419.45
sum: 15014235.80
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
Threads fairness:
events (avg/stddev): 1126.0200/1.16
execution time (avg/stddev): 300.2847/0.16
- 1.
- 2.
- 3.
You can see TPS: 187.41 QPS: 3748.16
3. Summary
From the above test, it can be seen that when the network delay is large, it will have a great impact on data writing and the number of transactions executed per second; if performance testing and data synchronization are required, try to deploy the stress testing tool or synchronization tool in In the same computer room, avoid large network delays that may affect the test results.