Jan 17, 2018 1 min read databases

Notes: Anatomy of an index and slow indexes

Anatomy of an Index

An index is pure redundancy. Just like the index in a book where everything in the book is listed and refers to the page on which you find it.
A node consist of the value of the indexed column and the row id of the record.
An index entry points to the location of the entry of the in the table, in which rows are completely unsorted, by means of row id.
- Row is actually the physical location of the row inside that table (which is simply a file or several). It is the fastest way to locate that row (even faster than primary key, since primary key is nothing more than an index itself.)
An index is a B-Tree (not always, there are also other indexes) with the leaf nodes connected as a doubly linked list.
- The B-Tree can be searched by tree traversal and then the doubly linked list must be walked down by visiting one element after another.
The keys of the leaf node elements are the values of the colums that is being indexed.

Index traversal has 3 steps:
- 1 tree traversal
- 2 following leaf node chain
- 3 accessing table data
First ingredient to slow index is the leaf node chain, where the database must not only traverse the B-tree but then also must walk through all the leaf nodes in the linked list to see if there are more matching entries.
Second ingredient is accessing the table. The actual corresponding table data to a leaf node is usually scattered across many table blocks. That means, for every hit, there is a table access.

Three ways of accessing and index:

Index Unique Scan: Performs tree traversal only. Used if there is a unique constraint in the query making sure that only one entry will match
Index Range Scan: Tree traversal and then follows the leaf node chain to find all matching entries. Used if multiple entries could match the query.
- Can potentially read large parts of an index. Makes the query slow if many entries will have to be accessed with table access afterwards.
Table Access by Index RowId: Retrieves the actual row from table. This is usually done for every matched entry after an index scan (the ones above) operation.