Indexes in MySQL

Database Indexes

What is the index in a table?

Imagine you want to find a title in a book. How do you find that title without using the book’s index? You have to turn the pages and read each title until reaching the title you are searching. However, by using the book’s index, it is an easier and faster way to find that because of the page number of each title is determined in the book’s index. Finding and retrieving data from a table of the database is also the same as this example. When you run a select query on a table with N records and without any index, the DBMS must read each record from number 1 to number N and compare each one with your query statement. This means that the number of operations in the worst case scenario is O(N).

Table indexes are data structures that aid to find results faster at the cost of additional storage and writing data to maintain index data structures. An index could be applied to one or more columns of the table that could be searched efficiently. These indexes are a copy of selected indexes columns of data that are linked with low-level disk address to its original row. This low-level disk address is the same as the page number in the book’s index.

Table indexes must be defined correctly to gain maximum efficiency. Defining indexes on little tables even can have a negative effect on performance, but on tables, with millions or even billions of records, it is essential to define indexes correctly.

Which columns of the table must be indexed?

It depends on your queries. Based on queries that you run on your DBMS you must create indexes. Generally, the more searching columns must be indexed. There are many columns in your tables that many of them are just storing data to retrieve and show. You must detect your most searching columns and build indexes base on them. Building indexes on inappropriate columns can increase your database size on storage without any gaining on performance.

MySQL Index Types

MySQL has many types of indexes that each one is designed to perform different porpuses. These indexes are implemented in storage engines layer and they slightly work differently in each storage engine and even not all index types are supported by some storage engines. I don’t want to explain MySQL storage engines in this article. However, if you are interested to know more about storage engines, you can take a look at these sources:

 

1. B-Tree indexes

B-Tree is default index type and in most cases when we define a new index without talking about its type, we mean B-Tree. B-Tree is supported by all storage engines except Archive engine.

The b-tree structure has a root node and other nodes store behind it. Root node keeps key-1 to key-N values. All nodes store to the left of key-1 has smaller values than key-1 and all nodes store to the right site of key-1 has greater values than key-1. This image illustrates the idea of B-Tree:

B-tree

B-tree indexes are suitable to find by full key value, a key range, or a key prefix. Also, they can use for ORDER BY queries. In the b-tree indexes, we can index multiple columns and the order of columns that are defined in the index is extremely important.

This is an example of how to define multiple columns b-tree index:

CREATE TABLE user (
    Last_name	varchar(50)	not null,
    first_name 	varchar(50)	not null,
    birth_date	date		not null,
    gender 	    enum('m', 'f')   not null,
    key(last_name, first_name, birth_date)
);

This is a schematic image that illustrates how the b-tree index data structure looks like:

B-tree

2. Hash Indexes

Another type of index in MySQL is hash index. Hashing is a one-way algorithm that converts a specific string to a specific number. We call it a one-way algorithm because it is impossible to reconvert hashed number to its original string. If two string (S1 and S2) are equal (S1 = S2), then their hashes also have to be equal. (hash(S1) = hash(S2)).

The idea behind the hash index is simple but powerful. There is a hash table for the hash index. The storage engine computes the hash value of the indexed column for each row and stores it in the hash table. Also, each hashed value has a pointer to its row in the table. This is important to know that the hash table is ordered by hash values. I will explain it with an example. Suppose we created a user table with index hash like this:

CREATE TABLE user (
    first_name	varchar(50)	not null,
    last_name 	varchar(50)	not null,
    key USING HASH (first_name)
) ENGINE = MEMORY;

 We set the storage engine to MEMORY because only this storage engine supports hash index explicitly. Now suppose we have this data in the user table:

first_name

last_name

Navid

Hosseini

Liam

Pit

Anjelina

Jefferson

Peter

Bander

Storage engine calculates hashed values of each first_name columns, For example, suppose these values are like these:

hash(‘Navid’) = 2323
hash(‘Liam’) = 7437
hash(Angelina’) = 8784
hash(‘Peter’) = 2458

These values are stored in the hash table like this:

Slot

Value

2323

Pointer to row 1

2458

Pointer to row 4

7437

Pointer to row 2

8784

Pointer to row 3

 As you can see, the hash table is ordered by hash values (Slot column). Now, for example when you run a query like this:

mysql> SELECT * FROM user WHERE first_name='Peter';

At the first step, MySQL will calculate the hash value for ‘Peter’ (hash(‘Peter’) = 2458). Next, it will retrieve the row pointer of 2458 row in the hash table. With retrieving the row pointer of 2458 slot, now we can retrieve its row from the user table.

This is important to know that the hash index can only be used to find full key value, not key range or key prefix.

3. Full-text Indexes

Full-text indexes are for full-text search. This is different from queries with WHERE clauses. WHERE clauses compare values for equality, filter out ranges and so on but, full-text searches are implemented to perform keyword searches. It is like searching in search engines. Envisage we have a movie table that stores information about all movies made until now. This table holds comprehensive information about movies such as title, director, awards, country, synopsis and etc. We want to find all movies that contain “Rock Star” keywords in their information. We can do this with simple WHERE clauses but it is very hard and inefficient. Instead of using simple WHERE clause we use it with MATCH() … AGAINST clause. Here is an example:

CREATE TABLE movie (
    title		varchar(255)	not null,
    director 	varchar(255)	not null,
    synopsis	text,
    …
    FULLTEXT (title,synopsis)
) ENGINE = InnoDB;

As you can see we have defined FULLTEXT index on title and synopsis fields, so we can perform full-text search whit this query:

SELECT * FROM movie
WHERE MATCH (title, synopsis) AGAINST ('rock star');

A full-text index can be defined in MyISAM and InnoDB (from version 5.6 and above) storage engines. Personally, I prefer alternative solutions like Lucene, Elasticsearch, Solr and etc for full-text search because I think full-text indexes in MySQL are more complex, buggy and harder to implement.

These are the most common use index types in MySQL. There are also more index types that are implemented by third-party storage engines and they have their special data structure and usage. If you are interested in deeper knowledge of storage engines and indexes, I recommend these books:

  • High Performance MySQL, by Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko (O'REILLY)

  • Relational Database Index Design and the Optimizers, by Tapio Lah-denmaki and Mike Leach (Wiley)

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
1 + 19 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.