Analysis of MySQL master-slave AUTO_INCREMENT inconsistency problem


1. Problem description

1.1 Problem phenomenon

In MySQL version 5.7, when the REPLACE INTO operation has an auto-increment primary key in the table, the master-slave inconsistency in the auto_increment value of the table may occur. If a master-slave failover occurs during this period, when the original slave node becomes the new The master node, because the auto_increment value of the table is smaller than the original master database, when the business continues to write, you will receive an error message indicating a primary key conflict.

The relevant error information is as follows:

! Error message

ERROR 1062 (23000): Duplicate entry 'XXX' for key 'PRIMARY'

1.2 Impact Assessment

Replace into, or INSERT...ON DUPLICATE KEY UPDATE is used in business logic.

Once the master-slave inconsistency of the auto_increment value of the table occurs, after the MySQL master-slave failover occurs, the normal writing of the business will report a primary key conflict error. When the auto_increment is similar, the error may be skipped when the business is retried. However, when the auto_increment differs greatly, it will exceed the number of business retries, which will have a greater impact.

2. Recurrence of the problem

2.1 Environment setup

Here in the test environment, MySQL Community Edition 5.7 is built, with a master-slave architecture.

【OS】:CentOS Linux release 7.3

[MySQL]: Community version 5.7

[Master-slave architecture]: one master and one slave

[Library table information]: Library name: test2023

Table name: test_autoincrement

The table structure is as follows:

CREATE TABLE `test_autoincrement` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增主键',
  `name` varchar(100) NOT NULL DEFAULT 'test' COMMENT '测试名字',
  `uid` int(11) NOT NULL COMMENT '测试表唯一键',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uid` (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.

2.2 Prepare test data

MySQL [test2023]> insert into test_autoincrement(name,uid) select '张三',1001;
Query OK, 1 row affected (0.08 sec)
Records: 1  Duplicates: 0  Warnings: 0
 
MySQL [test2023]> insert into test_autoincrement(name,uid) select '李四',1002;
Query OK, 1 row affected (0.06 sec)
Records: 1  Duplicates: 0  Warnings: 0
 
MySQL [test2023]>
MySQL [test2023]> insert into test_autoincrement(name,uid) select '王五',1003;
Query OK, 1 row affected (0.08 sec)
Records: 1  Duplicates: 0  Warnings: 0
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.

Under normal circumstances, when inserting a row of data, the number of rows affected is 1.

At this time, check the autoincrement value of the master-slave node table. You can see that the AUTO_INCREMENT of the master-slave node is consistent at this time, both are 4, that is, the value of the next application for the auto-increment primary key is 4.

2.3 Problem recurrence simulation

2.3.1 Simulate REPLACE INTO operation

MySQL [test2023]> REPLACE INTO test_autoincrement (name,uid) values('张三丰',1001);
Query OK, 2 rows affected (0.01 sec)
  • 1.
  • 2.

Here, judging by the REPLACE INTO operation, if there is a record with a unique ID of 1001, then change the value of the name field to "Zhang Sanfeng", and you can find that the number of rows affected at this time is 2. Now we look at the autoincrement value of the master-slave node table again.

At this time, the AUTO_INCREMENT inconsistency in the master-slave node table appears.

2.3.2 Simulate master-slave switching

Since this is a test environment, the master-slave relationship is changed directly here.

(1) Stop the replication thread of the current slave node

MySQL [test2023]> stop slave;
Query OK, 0 rows affected (0.08 sec)
  • 1.
  • 2.

(2) View the Executed_Gtid_Set value of the current slave node

MySQL [test2023]> show master status\G
*************************** 1. row ***************************
File: binlog.000002
Position: 4317
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 9cc90407-ff89-11ed-8b7a-fa163e2d11e1:1-82,
a0c1d6ff-5764-11ee-94ea-fa163e2d11e1:1-11
1 row in set (0.01 sec)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

(3) Redo the master-slave relationship

MySQL [test2023]> CHANGE MASTER TO MASTER_HOST = '原slave节点的IP地址', MASTER_USER = '复制账户', MASTER_PASSWORD = '密码', MASTER_PORT = 端口, MASTER_AUTO_POSITION = 1 ;
Query OK, 0 rows affected, 2 warnings (0.21 sec)
MySQL [test2023]> start slave;
Query OK, 0 rows affected (0.05 sec)
MySQL [test2023]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: XXX
Master_User: XXX
Master_Port: XXX
Connect_Retry: 60
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 4317
Relay_Log_File: relay.000004
Relay_Log_Pos: 445
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 4317
Relay_Log_Space: 726
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 461470011
Master_UUID: a0c1d6ff-5764-11ee-94ea-fa163e2d11e1
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: a0c1d6ff-5764-11ee-94ea-fa163e2d11e1:11
Executed_Gtid_Set: 9cc90407-ff89-11ed-8b7a-fa163e2d11e1:1-82,
a0c1d6ff-5764-11ee-94ea-fa163e2d11e1:1-11
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.

2.3.3 Simulate normal writing of business

MySQL [test2023]> insert into test_autoincrement(name,uid) select '赵六',1004; ERROR 1062 (23000): Duplicate entry '4' for key 'PRIMARY'

At this point we have seen the expected error reporting phenomenon. If it is a normal business system, the AUTO_INCREMENT of the master and slave node tables here may be very different, and normal insertion of the business will continue to report errors.

It means that the real operation is to perform the delete operation first and then insert.

3. Cause analysis

3.1 Why does the autoincrement of the slave node not change?

at 10790
#230927 16:23:45 server id 46147000  end_log_pos 10863 CRC32 0x85c60fb7         Update_rows: table id 122 flags: STMT_END_F
BINLOG '
keYTZRO4JcACRQAAACYqAAAAAHoAAAAAAAEACHRlc3QyMDIzABJ0ZXN0X2F1dG9pbmNyZW1lbnQA
AwMPAwKQAQCCO6qB
keYTZR+4JcACSQAAAG8qAAAAAHoAAAAAAAEAAgAD///4AQAAAAYA5byg5LiJ6QMAAPgEAAAACQDl
vKDkuInkuLDpAwAAtw/GhQ==
'/!/;
UPDATE test2023.test_autoincrement
WHERE
@1=1 /* INT meta=0 nullable=0 is_null=0 */
@2='张三' /* VARSTRING(400) meta=400 nullable=0 is_null=0 */
@3=1001 /* INT meta=0 nullable=0 is_null=0 */
SET
@1=4 /* INT meta=0 nullable=0 is_null=0 */
@2='张三丰' /* VARSTRING(400) meta=400 nullable=0 is_null=0 */
@3=1001 /* INT meta=0 nullable=0 is_null=0 */
at 10863
#230927 16:23:45 server id 46147000  end_log_pos 10894 CRC32 0xe204d99b         Xid = 331
COMMIT/!/;
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.

Here you can see that the binlog log record corresponding to the REPLACE INTO operation is actually an update operation. When the slave node applies the update operation and finds that the data is hit, the corresponding autoincrement does not change.

3.2 What is the official definition of the REPLACE INTO operation?

The official definition of REPLACE INTO is as follows:

Excerpted from

https://dev.mysql.com/doc/refman/5.7/en/replace.html

REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted. See Section 13.2.5, “INSERT Statement”.

REPLACE is a MySQL extension to the SQL standard. It either inserts, or deletes and inserts. For another MySQL extension to standard SQL—that either inserts or updates—see Section 13.2.5.2, “INSERT ... ON DUPLICATE KEY UPDATE Statement”.

Here you can see that when a table contains a primary key or a unique key, the replace operation will determine whether the original data row exists. If it exists, the old data will be deleted first, and then the insert operation will be performed. If it does not exist, it will The same as the insert operation.

The second paragraph also mentions INSERT ... ON DUPLICATE KEY UPDATE Statement. In fact, this operation will also cause the above master-slave autoincrement inconsistency, so we will not discuss it here.

! Note

REPLACE makes sense only if a table has a PRIMARY KEY or UNIQUE index. Otherwise, it becomes equivalent to INSERT, because there is no index to be used to determine whether a new row duplicates another.

3.3 Why is the REPLACE INTO operation recorded in the binlog log as an update operation?

Here we analyze through the source code files sql_insert.cc and log_event.cc.

sql_insert.cc:
...
/* Check if there is more uniq keys after field */
static int last_uniq_key(TABLE table,uint keynr)
{
/
The information about the last key conflict will be used to
do a replace of the new row on the conflicting row, rather
than doing a delete (of old row) + insert (of new row).

Hence check for this flag and disable replacing the last row
by returning 0 always. Returning 0 will result in doing
a delete + insert always.*/
if (table->file->ha_table_flags() & HA_DUPLICATE_KEY_NOT_IN_ORDER){
return 0;
}
while (++keynr < table->s->keys){
if (table->key_info[keynr].flags & HA_NOSAME){
return 0;
}
}
return 1;
}
...
/*
  The manual defines the REPLACE semantics that it is either
  an INSERT or DELETE(s) + INSERT; FOREIGN KEY checks in
  InnoDB do not function in the defined way if we allow MySQL
  to convert the latter operation internally to an UPDATE.
      We also should not perform this conversion if we have
      timestamp field with ON UPDATE which is different from DEFAULT.
      Another case when conversion should not be performed is when
      we have ON DELETE trigger on table so user may notice that
      we cheat here. Note that it is ok to do such conversion for
      tables which have ON UPDATE but have no ON DELETE triggers,
      we just should not expose this fact to users by invoking
      ON UPDATE triggers.
*/
if (last_uniq_key(table,key_nr) &&
    !table->file->referenced_by_foreign_key() &&
        (!table->triggers || !table->triggers->has_delete_triggers()))
    {
      if ((error=table->file->ha_update_row(table->record[1],
                        table->record[0])) &&
          error != HA_ERR_RECORD_IS_THE_SAME)
        goto err;
      if (error != HA_ERR_RECORD_IS_THE_SAME)
        info->stats.deleted++;
      else
        error= 0;
      thd->record_first_successful_insert_id_in_cur_stmt(table->file->insert_id_for_cur_row);
      /*
        Since we pretend that we have done insert we should call
        its after triggers.
      */
      goto after_trg_n_copied_inc;
    }
    else
    {...
}
...
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.

You can see from the above source code that the replace operation in the main library is actually insert or delete + insert

The manual defines the REPLACE semantics that it is either an INSERT or DELETE(s) + INSERT;
  • 1.

In the master-slave synchronization binlog log of MySQL, the conditions for converting the replace operation into the update operation are: when the conflicting key is the last unique key, there are no foreign key constraints, and there are no triggers, because in our test table There are no foreign key constraints and no triggers, so the condition converted into update from the binlog received from the library is the last unique key.

Here, we test it again (removing the unique index uid in the table).

(1) Create a new table

CREATE TABLE `test_autoincrement_2` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增主键',
  `name` varchar(100) NOT NULL DEFAULT 'test' COMMENT '测试名字',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

(2) Insert test data

insert into test_autoincrement_2(name) select '孙七';
insert into test_autoincrement_2(name) select '周八';
insert into test_autoincrement_2(name) select '吴九';
#此时主从表结构是一致的,如下:
CREATE TABLE test_autoincrement_2 (
id int(11) NOT NULL AUTO_INCREMENT COMMENT '自增主键',
name varchar(100) NOT NULL DEFAULT 'test' COMMENT '测试名字',
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

(3) The replace into operation verifies the AUTO_INCREMENT of the main library and the slave library

MySQL [test2023]> REPLACE INTO test_autoincrement_2 (id,name) values(3,'郑十');
Query OK, 2 rows affected (0.08 sec)
  • 1.
  • 2.

Here we change the name corresponding to the row of data with id=3 to 'Zheng Shi', and we can find that the number of rows affected by the above is 2.

Verify the AUTO_INCREMENT of the main library and slave library again and find that there is no change, it is still 4.

CREATE TABLE `test_autoincrement_2` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增主键',
  `name` varchar(100) NOT NULL DEFAULT 'test' COMMENT '测试名字',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

(4) Analyze binlog log files

at 8089
#230928 15:52:08 server id 461470011  end_log_pos 8151 CRC32 0xc2ff85bb         Update_rows: table id 481 flags: STMT_END_F
BINLOG '
qDAVZRM7eYEbRgAAAJkfAAAAAOEBAAAAAAEACHRlc3QyMDIzABR0ZXN0X2F1dG9pbmNyZW1lbnRf
MgACAw8CkAEAFSqQxg==
qDAVZR87eYEbPgAAANcfAAAAAOEBAAAAAAEAAgAC///8AwAAAAYA5ZC05Lmd/AMAAAAGAOmDkeWN
gbuF/8I=
'/!/;
UPDATE test2023.test_autoincrement_2
WHERE
@1=3 /* INT meta=0 nullable=0 is_null=0 */
@2='吴九' /* VARSTRING(400) meta=400 nullable=0 is_null=0 */
SET
@1=3 /* INT meta=0 nullable=0 is_null=0 */
@2='郑十' /* VARSTRING(400) meta=400 nullable=0 is_null=0 */
at 8151
#230928 15:52:08 server id 461470011  end_log_pos 8182 CRC32 0xaa39d2a4         Xid = 699
COMMIT/!/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog / /!*/;
DELIMITER ;
End of log file
/!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE/;
/!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0/;
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.

Summary: It can be found that the binlog log records are also update operations. Only when there are no additional unique keys in the table besides the primary key, the replace into operation will not trigger the auto_increment exception of the slave library. For example, in the above case REPLACE INTO test_autoincrement_2 (id,name) values(3,'Zheng Shi');, only the name field is changed here, from 'Wu Jiu' to 'Zheng Shi'. However, the primary key id has not changed. Of course, there is no need to use auto_increment again. Here you can also see that the auto_increment of the main database has not changed. (When the table contains additional unique keys in addition to the primary key, the application for auto_increment will be triggered. ), the binlog still receives the update operation, so the auto_increment of the slave library has not changed, so that the problem of inconsistency between auto_increment and the main library cannot be caused.

4. Solution

At this point, we understand that replace into will cause the master-slave auto_increment to be inconsistent, but how to solve it?

4.1 Upgrade to MySQL 8.0 version

In MySQL version 8.0, the AUTO_INCREMENT value has been persisted, and during update operations, the auto-increment column on the table will be updated to a value larger than auto_increment, and the auto_increment value will also be updated.

4.2 Modify AUTO_INCREMENT value

There may already be many such situations in the online environment. Without triggering a business error, it is generally difficult to find this hidden danger. How to find these problems during daily inspections is the key.

Inspection logic 1: Here you can find out the table information that already has problems by inspecting and judging the max(id) of the slave database >= AUTO_INCREMENT. Then modify it through the SQL statement: ALTER TABLE table_name AUTO_INCREMENT = new_value;.

For inspection steps, please refer to:

(1) Only detect a slave node and a table containing the auto_increment attribute. The filtering SQL is as follows:

select TABLE_SCHEMA,TABLE_NAME,AUTO_INCREMENT from information_schema.tables where table_schema not in ('information_schema','mysql','performance_schema','sys') AUTO_INCREMENTis not null \G
  • 1.

(2) Read the table information after locking, the statement is as follows:

① Lock the table

lock tables table_name write;
  • 1.

②Read the data and compare it with the table auto_increment value

MAXID=select max(id) from table_name;
AUTO_INCREMENT=select AUTO_INCREMENT from information_schema.tables where TABLE_NAME='t1' ;
  • 1.
  • 2.

③ Judgment conditions

If MAXID >= AUTO_INCREMENT, it is judged as an exception

Inspection logic two: You can add AUTO_INCREMENT value judgment during high availability switching. If the AUTO_INCREMENT value is inconsistent, no switching will occur. However, the value of AUTO_INCREMENT of the slave node here may lag behind the master node due to delays and other issues. Normal inspection is still difficult, and when the MySQL master-slave switchover is triggered, if the original master database is down, there will be problems if the switchover is not triggered, so this hidden danger still needs to be eliminated as soon as possible in advance.

4.3 Disable replace into operation

On the business side, replace into or insert ... on duplicate key update is disabled, and the implementation can be implemented through code logic.

4.4 No other unique index will be added to the table operated by replace into

In fact, it is still difficult to implement here. Auto-incrementing IDs are uncontrollable. Businesses generally do not use the auto-incrementing IDs that come with the database.

5. Problem summary

  1. REPLACE INTO operation, when the table has an auto-incrementing primary key and contains a unique index, when a data conflict occurs, it will trigger the inconsistency of AUTO_INCREMENT in the master-slave node. Once the master-slave switch occurs, it will cause a primary key conflict in the business write report. mistake. Solution: Change the business implementation method, avoid using replace into, or use MySQL8.0 and above to solve this problem.
  2. This problem is an official BUG, ​​but it has not been fixed in MySQL 5.7.

https://bugs.mysql.com/bug.php?id=83030