Notes: Anatomy of an index and slow indexes
Here are some of my notes from reading "What every developer should know about SQL performance".
Anatomy of an Index
- An index is pure redundancy. Just like the index in a book where everything in the book is listed and refers to the page on which you find it.
- A node consist of the value of the indexed column and the row id of the record.
- An index entry points to the location of the entry of the in the table, in which rows are completely unsorted, by means of row id.
- Row is actually the physical location of the row inside that table (which is simply a file or several). It is the fastest way to locate that row (even faster than primary key, since primary key is nothing more than an index itself.)
- An index is a B-Tree (not always, there are also other indexes) with the leaf nodes connected as a doubly linked list.
- The B-Tree can be searched by tree traversal and then the doubly linked list must be walked down by visiting one element after another.
- The keys of the leaf node elements are the values of the colums that is being indexed.
Slow Indexes
- Index traversal has 3 steps:
- 1 tree traversal
- 2 following leaf node chain
- 3 accessing table data
- First ingredient to slow index is the leaf node chain, where the database must not only traverse the B-tree but then also must walk through all the leaf nodes in the linked list to see if there are more matching entries.
- Second ingredient is accessing the table. The actual corresponding table data to a leaf node is usually scattered across many table blocks. That means, for every hit, there is a table access.
Three ways of accessing and index:
- Index Unique Scan: Performs tree traversal only. Used if there is a unique constraint in the query making sure that only one entry will match
- Index Range Scan: Tree traversal and then follows the leaf node chain to find all matching entries. Used if multiple entries could match the query.
- Can potentially read large parts of an index. Makes the query slow if many entries will have to be accessed with table access afterwards.
- Table Access by Index RowId: Retrieves the actual row from table. This is usually done for every matched entry after an index scan (the ones above) operation.