Cutover Failed, Causing 3/4/5G Network Communication Failure
Today's mobile network has the same
infrastructure as water and electricity, especially in the 5G era, when it is
applied to the industrial Internet, coal mines, hospitals, etc., the network is
no small matter.
A cutover and replacement of the Japanese
operator DoCoMo caused a large area of failure to the country's entire
network. It caused strong dissatisfaction from a large number of users, and
even the Minister of General Affairs of Japan had to come out to deal with and
explain.
It is reported that this was originally a
simple upgrade and replacement. The equipment to be replaced is: a server that
stores user/location information of IoT terminal devices. About 200,000 IoT
terminals are migrating their location information from old devices to new
ones., something is wrong.
So the operator started the rollback
operation, back to the old equipment again. This fallback happens to be the
crux of the problem: the fallback triggered a large number of IoT terminals to
re-initiate location registration information to the old server. A surging
"signaling storm" quickly caused network congestion, directly
reducing 3/4 The core network of /5G was "paralyzed".
What is incomprehensible is that this
"upgrade-cutover-rollback" operation occurred during the evening peak
that is about to leave work in the afternoon of the working day. (I am up, but
the cutover of island countries is not required at night?)
Starting around 5 pm on October 14, 2021,
a network accident occurred that made DoCoMo voice calls and data communication
services difficult to use.
At 7:57 pm on October 14, 2021, the
operator took emergency network operations and the failure began to recover
gradually. However, due to network congestion, some customers were still unable
to connect to the network.
At 5:05 am on October 15, 2021, 5G and 4G
networks will return to normal, but 3G networks in some areas are still
difficult to use. We are working hard to recover. We will inform users that
they have subscribed to 4G packages and display 3G signals. You can connect to
the 4G network by restarting the phone to get normal communication.
On the afternoon of October 15, 2021, the
deputy president of NTT DoCoMo stated at a press conference that 3G network
restoration "cannot give a clear time" and explained that the
prospects are uncertain.
The management of NTT DoCoMo apologized
publicly and deeply apologized for the inconvenience caused to customers and
many people by the accident, and said that they would work hard to prevent the
accident from happening again.
Well, in an island country, there is no problem that can't be
solved with one bow. If there is, three people will bow together!
After the accident, the Japanese Minister of Internal Affairs
and Communications stated at a press conference after the cabinet meeting:
As an important infrastructure related to people's daily
life, it is regrettable that the mobile network has experienced a large-scale
failure. The Ministry of Internal Affairs and Communications attaches great
importance to this matter and has requested NTT DoCoMo to investigate and
report the cause and extent of the accident in a timely manner in order to give
a full explanation to the majority of users. I hope NTT DoCoMo can fulfill its
social responsibilities and take all possible measures to prevent similar
accidents from happening again.
Three glasses of fine wine, processing is over!
Revelation:
Although this happened in the opposite island country, we
still need to learn from it. Today's mobile network has the same infrastructure
as water and electricity, especially in the 5G era, when it is applied to the
industrial Internet, coal mines, hospitals, etc., the network is no small
matter.
1. Upgrade cutover is never possible during busy hours.
This is almost impossible in our country. It is done in the
middle of the night. This has become the iron law of communication in the past
20 years. Thanks to our "Communication Nightcrawler" for their hard
work.
2. Full redundancy and backup mechanism of the network.
The state of the network is never predictable. To ensure that
the network does not go wrong, the most reliable way is the redundancy and
backup mechanism. From the AB side to the cluster pool, the redundancy
mechanism in the core network, transmission network and access network is fully
guaranteed. This is bound to increase investment, but a quality network is
necessary.
3. The core network is the top priority.
Other failures generally affect the local area, while the
core network affects the overall situation. In addition to redundancy and
backup, the network architecture should be upgraded as soon as possible. The 5G
SA's core network SBA architecture can ensure the safe operation of the network
while saving investment as much as possible.