Cutover Failed, Causing 3/4/5G Network Communication Failure

2021.10.18

Today's mobile network has the same infrastructure as water and electricity, especially in the 5G era, when it is applied to the industrial Internet, coal mines, hospitals, etc., the network is no small matter.

 


 

A cutover and replacement of the Japanese operator DoCoMo caused a large area of ​​failure to the country's entire network. It caused strong dissatisfaction from a large number of users, and even the Minister of General Affairs of Japan had to come out to deal with and explain.

 

It is reported that this was originally a simple upgrade and replacement. The equipment to be replaced is: a server that stores user/location information of IoT terminal devices. About 200,000 IoT terminals are migrating their location information from old devices to new ones., something is wrong.


So the operator started the rollback operation, back to the old equipment again. This fallback happens to be the crux of the problem: the fallback triggered a large number of IoT terminals to re-initiate location registration information to the old server. A surging "signaling storm" quickly caused network congestion, directly reducing 3/4 The core network of /5G was "paralyzed".



What is incomprehensible is that this "upgrade-cutover-rollback" operation occurred during the evening peak that is about to leave work in the afternoon of the working day. (I am up, but the cutover of island countries is not required at night?)


Starting around 5 pm on October 14, 2021, a network accident occurred that made DoCoMo voice calls and data communication services difficult to use.

 

At 7:57 pm on October 14, 2021, the operator took emergency network operations and the failure began to recover gradually. However, due to network congestion, some customers were still unable to connect to the network.


At 5:05 am on October 15, 2021, 5G and 4G networks will return to normal, but 3G networks in some areas are still difficult to use. We are working hard to recover. We will inform users that they have subscribed to 4G packages and display 3G signals. You can connect to the 4G network by restarting the phone to get normal communication.

 

On the afternoon of October 15, 2021, the deputy president of NTT DoCoMo stated at a press conference that 3G network restoration "cannot give a clear time" and explained that the prospects are uncertain.

 

The management of NTT DoCoMo apologized publicly and deeply apologized for the inconvenience caused to customers and many people by the accident, and said that they would work hard to prevent the accident from happening again.


Well, in an island country, there is no problem that can't be solved with one bow. If there is, three people will bow together!


After the accident, the Japanese Minister of Internal Affairs and Communications stated at a press conference after the cabinet meeting:


As an important infrastructure related to people's daily life, it is regrettable that the mobile network has experienced a large-scale failure. The Ministry of Internal Affairs and Communications attaches great importance to this matter and has requested NTT DoCoMo to investigate and report the cause and extent of the accident in a timely manner in order to give a full explanation to the majority of users. I hope NTT DoCoMo can fulfill its social responsibilities and take all possible measures to prevent similar accidents from happening again.


Three glasses of fine wine, processing is over!




Revelation:

Although this happened in the opposite island country, we still need to learn from it. Today's mobile network has the same infrastructure as water and electricity, especially in the 5G era, when it is applied to the industrial Internet, coal mines, hospitals, etc., the network is no small matter.

 

1. Upgrade cutover is never possible during busy hours.

This is almost impossible in our country. It is done in the middle of the night. This has become the iron law of communication in the past 20 years. Thanks to our "Communication Nightcrawler" for their hard work.

 

2. Full redundancy and backup mechanism of the network.

The state of the network is never predictable. To ensure that the network does not go wrong, the most reliable way is the redundancy and backup mechanism. From the AB side to the cluster pool, the redundancy mechanism in the core network, transmission network and access network is fully guaranteed. This is bound to increase investment, but a quality network is necessary.

 

3. The core network is the top priority.

Other failures generally affect the local area, while the core network affects the overall situation. In addition to redundancy and backup, the network architecture should be upgraded as soon as possible. The 5G SA's core network SBA architecture can ensure the safe operation of the network while saving investment as much as possible.