Photo by Rubaitul Azad on Unsplash
Understanding MySQL Indexes: A Comprehensive Guide to Database Efficiency
Welcome to the first post in our series on MySQL indexes. This article aims to provide a comprehensive understanding of the pivotal role of indexes in MySQL, the logic behind using B+Tree as the primary data structure, and how different MySQL storage engines manage indexing.
What is an Index?
In the context of databases, an index serves a role analogous to an index in a book. A book index allows you to swiftly locate specific content without reading through the entire book, similarly, a database index facilitates rapid data retrieval without scanning the entire database.
An index in MySQL is a data structure that enhances the speed of data retrieval operations on a database table at the expense of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. They can be created using one or more columns, providing the basis for both rapid random lookups and efficient access to ordered records.
Index Data Structures
The architecture of an index is based on data structures. Common data structures used for indexing include Binary Trees, Red-Black Trees, Hash Maps, and B-Trees. Each of these structures has its pros and cons and is suited for particular types of operations. (Please follow this post if you want to know more details)
MySQL, however, primarily uses the B+Tree data structure for its indexes. B+Tree is a variation of B-Tree that can handle equality and range queries more efficiently. This is achieved by keeping all keys in the leaf nodes along with the record pointers, and only the duplicate keys in the internal nodes just to guide the search. The B+Tree structure, with its ability to maintain sorted data and perform efficient range queries, is more efficient for read-heavy workloads typically seen in a database system.
MyISAM Engine and Indexing
The MyISAM storage engine is one of the engines provided by MySQL. In MyISAM, the table data file (.MYD) and the index file (.MYI) are separate. This means that MyISAM can perform certain types of queries more efficiently, as it can access the index file without needing to access the actual data file, making the read operations faster.
Indexing in the InnoDB Engine
Contrary to MyISAM, InnoDB employs a different strategy when it comes to storing table data and indexes:
Integrated Data and Indexes: In InnoDB, each table is structured as a B+Tree, with the leaf nodes of the B+Tree containing the actual row data. This allows InnoDB to perform fewer disk I/O operations during data retrieval, improving its efficiency.
Primary Key Requirement: Every InnoDB table should have a primary key. This is because InnoDB stores rows in primary key order, and having a primary key allows InnoDB to better optimize queries and DML statements. Auto-incrementing IDs are recommended as the primary key because it avoids the random insertions in the B+Tree structure, preventing unnecessary page splitting and fragmentation.
Secondary Indexes: InnoDB's secondary indexes contain the primary key columns for the row, not the actual data. This means that a query using a secondary index can find the primary key value for a row, but must then search the primary key index to retrieve the actual row. This design helps to save space and maintain consistency.
Multiple-Column Indexes
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
Multiple-column indexes, also known as composite indexes, are formed by using more than one column. They are useful when you have queries that filter or sort by a set of columns. Here are some key aspects of multiple-column indexes:
Structure: A composite index's structure in MySQL is built on a combination of the indexed columns' values. It starts from the leftmost column and proceeds right.
Leftmost Prefix: MySQL can use the leftmost prefix of an index to speed up the query. This means that the order of columns in the index definition is critical. It can use the left part of the index, but not the right part if the left part isn’t used in the query.
By gaining a more profound understanding of how indexes function in MySQL, we can better optimize our databases to ensure efficient and effective performance. In the subsequent posts in this series, we will delve further into topics such as the B+Tree data structure and why it's the preferred choice for MySQL indexes, the details of how MySQL optimizes queries using indexes, and more. Stay tuned!