What is leftmost prefix matching? Why should we follow it?

2024.05.27

In MySQL, the leftmost prefix match refers to the use of the leftmost part of the index to match when querying. When you execute a query, if the query condition involves the first few columns of the composite index, MySQL can use the composite index to match.

A composite index is a joint index composed of multiple fields, such as idx_col1_col2_col3 (col1,col2,col3).

Suppose we create a composite index (col1, col2, col3). If the query condition is for col1, (col1, col2) or (col1, col2, col3), MySQL can use the composite index to perform a leftmost prefix match.

However, if the query condition involves only col2, only col3, or only col2 and col3, that is, does not include col1, then usually (excluding other optimizations such as index skip scan), the index cannot be used for leftmost prefix matching.

It is worth noting that the leftmost prefix match has nothing to do with the order of the query conditions. Whether you write where col1 = "Paidaxing" and col2 = "666" or where col2 = "666" and col1 = "Paidaxing", it has no effect on the result, and the hit result is still the same.

In addition, it is important to note that many people may mistakenly believe that when creating a composite index (col1, col2, col3), the database will create three indexes (col1), (col1, col2) and (col1, col2, col3). This understanding is incorrect. In fact, the database will only create one B+ tree, but in this tree, it will first sort by col1, then sort by col2 if col1 is the same, and then sort by col3 if col2 is the same.

In addition, if there is no joint index involved, the index of a single field also needs to comply with the leftmost prefix principle. That is, when the value of a field is "abc", when we use like for fuzzy matching, like "ab%" can use the index, but "%bc" cannot, because the latter does not comply with the leftmost prefix matching principle.

Why follow the leftmost prefix matching

We all know that in MySQL's InnoDB engine, indexes are implemented through B+ trees. Whether it is a common index or a joint index, a B+ tree index structure must be constructed.

For ordinary indexes, the storage structure is to record the index value on each non-leaf node of the B+ tree, while on the leaf nodes of the B+ tree, the index value and the value of the clustered index (primary key index) are recorded.

like:

picturepicture

In a joint index, such as the joint index (age, name), a B+ tree is also constructed. In this B+ tree, the values ​​of the name and age fields are recorded in the non-leaf nodes, while the values ​​of the name, age and primary key id are recorded in the leaf nodes.

picturepicture

In the stored procedure, as described above, when age is different, it is sorted by age; when age is the same, it is sorted by name.

Therefore, after understanding the storage structure of the index, it is easy for us to understand the leftmost prefix match: since the underlying index is a B+ tree, if it is a joint index, when constructing the B+ tree, it will first be sorted according to the keys on the left. When the keys on the left are the same, they will be sorted in turn according to the keys on the right.

Therefore, when searching through indexes, you also need to follow the principle of leftmost prefix matching, that is, you need to start matching from the leftmost of the joint index. This requires that the WHERE condition of the query statement contains the leftmost index value.

Does MySQL index always follow leftmost prefix matching?

Because the underlying index is a B+ tree, if it is a joint index, in the process of constructing the B+ tree, it will first sort by the left key. When the left keys are the same, they will be sorted in turn by the right keys.

Therefore, when searching through indexes, you also need to follow the principle of leftmost prefix matching, that is, you need to start matching from the leftmost of the joint index. This requires that the WHERE condition of the query statement contains the leftmost index value. This is the concept of leftmost prefix matching.

In previous versions of MySQL, the principle of leftmost prefix matching has always been followed. This statement was correct in the past and there was no problem. However, in MySQL 8.0, the situation is different because the index skip scan feature was introduced in 8.0.13.

Additional knowledge

Index Skip Scan

MySQL 8.0.13 introduced the Index Skip Scan optimization, which supports range queries. Even if the leftmost prefix principle of the composite index is not met, SQL can still use the composite index to reduce unnecessary scans.

Let's explain this with an example. First, we have the following table (refer to the example on the MySQL official website, but with some modifications and optimizations):

CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL);
CREATE INDEX idx_t on t1(f1,f2);
INSERT INTO t1 VALUES
  (1,1), (1,2), (1,3), (1,4), (1,5),
  (2,1), (2,2), (2,3), (2,4), (2,5);
INSERT INTO t1 SELECT f1, f2 + 5 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 10 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 20 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 40 FROM t1;
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

Through the following SQL statements, we first create a table named t1 and set fields f1 and f2 as joint indexes. Then we insert some records into it.

Next, execute EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 = 40; on MySQL 5.7.9 and MySQL 8.0.30 respectively.

picturepicture

As you can see, there are mainly the following differences:

MySQL 5.7 中,type = index,rows=160,extra=Using where;Using index

MySQL 8.0 中,type = range,rows=16,extra=Using where;Using index for skip scan

The type field indicates the scanning method, where range indicates range scan and index indicates index tree scan. Generally, range scan is much faster than index tree scan.

This can also be observed through the rows field. A total of 160 rows were scanned using the index tree scan method, while only 16 rows were scanned using the range scan method.

Then, the key is why the scanning method in MySQL 8.0 is faster? This is mainly due to the use of the "Using index for skip scan" technology.

In other words, although our SQL does not follow the leftmost prefix principle and only uses f2 as the query condition, after the optimization of MySQL 8.0, the index is still utilized through index skip scanning.

Optimization principle

So how does it optimize? In MySQL 8.0.13 and later versions, the process of executing SELECT f1, f2 FROM t1 WHERE f2 = 40; is as follows:

  1. Get the first unique value of the f1 field, that is, f1=1.
  2. Construct the conditions f1=1 and f2=40 to perform a range query.
  3. Get the second unique value of the f1 field, that is, f1=2.
  4. Construct the conditions f1=2 and f2=40 to perform a range query.
  5. Repeat the above steps until all unique values ​​of the f1 field are scanned.
  6. Finally, the results are combined and returned.

In other words, the SQL statement that is ultimately executed is similar to the following:

SELECT f1, f2 FROM t1 WHERE f1 =1 and f2 = 40
UNION
SELECT f1, f2 FROM t1 WHERE f1 =2 and f2 = 40;
  • 1.
  • 2.
  • 3.

That is, the MySQL optimizer helps us use the f1 field in the joint index as the query condition for the query.

limitation factor

After understanding the execution process of index skip scan, some smart readers may realize that this query optimization is more suitable for fields with a smaller value range and low discrimination (such as gender), and when the discrimination of the field is particularly high (such as date of birth), this query may become slower.

Therefore, whether to use index skip scan actually depends on the decision made by the MySQL optimizer after cost estimation.

Generally, this optimization technique is suitable for the case where the first field in the joint index has a low degree of discrimination. However, it should be noted that this is not always the case. Although we generally do not place fields with low discrimination on the left side of the joint index, MySQL provides such an optimization solution, which shows that there is indeed such a demand.

However, we should not over-rely on this optimization. When creating an index, we should still give priority to placing fields with high discrimination and frequent queries on the left side of the joint index.

In addition, the MySQL official website also mentions some other restrictions on index jump scanning:

  • Table T must have at least one composite index, but for a composite index (A,B,C,D), A and D can be empty, but B and C must be non-empty.
  • The query can only rely on a single table and cannot connect multiple tables.
  • You cannot use GROUP BY or DISTINCT clauses in the query.
  • The fields to be queried must be columns in the index.