I was recently tasked with optimizing a table for a rather tricky query. A typical query went something like this:


SELECT  `ID` FROM `ExistingUsers`
WHERE  `EntryType` =  'NEW'
AND  `Version` =  'long.version.string'
AND 
(
	(
		`IDTypeOne` =  '4e46552376ef6ab7'
		AND  `IDTypeOne` IS NOT NULL 
	)
	OR 
	(
		`IDTypeTwo` =  '860525021806844'
		AND  `IDTypeTwo` IS NOT NULL 
	)
	OR 
	(
		`IDTypeThree` =  'g2r376ef6ab7'
		AND  `IDTypeThree` IS NOT NULL 
	)
);

The query’s purpose is to look into the `ExistingUsers` table to find if we already have a matching entry for this user. A qualifying matching entry would satisfy both the `EntryType` and `Version` columns, and at least one of the three possible `IDTypeXXX` columns.

For this particular problem, optimizing the query was not an option, and cleaning the data in the table wasn’t either. So I was left with optimizing the table itself.

Here’s what the CREATE TABLE showed:


CREATE TABLE `ExistingUsers` (
  `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `EntryType` enum('NEW','EXISTING') DEFAULT NULL,
  `Version` varchar(64) DEFAULT NULL,
  `IDTypeOne` varchar(128) DEFAULT NULL,
  `IDTypeTwo` varchar(64) DEFAULT NULL,
  `IDTypeThree` varchar(64) DEFAULT NULL,
  `Time` datetime DEFAULT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `Uniq_idx` (`EntryType`,`IDTypeOne`,`IDTypeTwo`,`IDTypeThree`,`Version`),
  KEY `Time` (`Time`),
) ENGINE=InnoDB AUTO_INCREMENT=3816978 DEFAULT CHARSET=utf8

A relatively small table of less than 4 million rows (actually fewer since not all the auto increments resulted in a new row). Each time this query ran it took between 2.5 and 5 seconds.

Here’s what EXPLAIN EXTENDED showed:


id  select_type  table          type  possible_keys  key        key_len  ref
1   SIMPLE       ExistingUsers  ref   Uniq_idx       Uniq_idx   2        const

rows    filtered  Extra
1894887 100       Using where; Using index

The important number here is rows. MySQL is expecting to examine about half of all rows in the table. That’s telling me the index is not doing enough of the heavy lifting.

The first thing I noticed right away is that our query has two direct columns it is always going to match against and then three more columns that are all “OR”s. The order of the unique index might be better suited to have those two hard matches first, followed by the three OR columns. From this:

`Uniq_idx` (`EntryType`,`IDTypeOne`,`IDTypeTwo`,`IDTypeThree`,`Version`)

to this:

`Uniq_idx` (`EntryType`,`Version`,`IDTypeOne`,`IDTypeTwo`,`IDTypeThree`)

Here is EXPLAIN EXTENDED after this change:


id  select_type  table          type  possible_keys  key        key_len  ref
1   SIMPLE       ExistingUsers  ref   Uniq_idx       Uniq_idx   197      const,const

rows    filtered  Extra
78550   100.00    Using where; Using index

This change actually helped the performance quite a bit. We can see the number of rows expecting to be examined is about 5% of what it was before. Also note the ref column shows const,const telling us there are two constraints being used in the index.

After this change, if there was a matching row to return, the query took around 0.03 secs. Definitely acceptable. However, if there was no match to return, there was no improvement in run time at all. I would expect that roughly 10% of the time this query runs, it would not have a row to match against.

So that leaves one final optimization to try. Can we somehow create one or more indices that work for the other three columns that are examined in the “OR” clauses of the query? The answer here lies in a MySQL optimization called Index Merge. What we need to do to take advantage of this optimization is to index each of the columns to be examined by the “OR” clauses separately.

This command should do it:


ALTER TABLE `ExistingUsers`
ADD KEY `IDTypeOne_idx` (`IDTypeOne`),
ADD KEY `IDTypeOne_idx` (`IDTypeTwo`),
ADD KEY `IDTypeThree_idx` (`IDTypeThree`);

Now let’s look at an explain of our query:


id  select_type  table          type         possible_keys  
1   SIMPLE       ExistingUsers  index_merge  Uniq_idx,IDTypeOne_idx,IDTypeOne_idx,IDTypeThree_idx

key                                          key_len      ref
IDTypeOne_idx,IDTypeOne_idx,IDTypeThree_idx  387,195,195  null

rows  filtered  Extra
3     100.00    Using union(IDTypeOne_idx,IDTypeOne_idx,IDTypeThree_idx); Using where

EXPLAIN is telling us it expects to only examine three rows. That’s looking much better. We also see type showing index_merge.

MySQL documentation states:

The Index Merge method is used to retrieve rows with several range scans and to merge their results into one.

In possible_keys we see the index that was being used before, along with the three new indexes. In key we see that the optimizer is planning on using the three new indexes together and not even using the previous index.

Also note the Extra column is telling us it is planning on using the union flavor of the index merge optimization.

Again from MySQL documentation:

The [Union] algorithm can be employed when the table’s WHERE clause was converted to several range conditions on different keys combined with OR

There are some special cases and restrictions on MySQL’s use of the index merge optimization. I won’t go into any more details here, but if you’re having a hard time getting indexes to work on a query containing multiple or nested AND/OR clauses, the index Merge optimization might be what you’re looking for.

Leave a reply