Celsius' Notes
Hire me as a freelancer for your mobile (iOS native, cross-platform with React Native), web and backend software needs.
Go to portfolio/resumé
1 min read

Notes: Anatomy of an index and slow indexes

Here are some of my notes from reading "What every developer should know about SQL performance".

Anatomy of an Index

  • An index is pure redundancy. Just like the index in a book where everything in the book is listed and refers to the page on which you find it.
  • A node consist of the value of the indexed column and the row id of the record.
  • An index entry points to the location of the entry of the in the table, in which rows are completely unsorted, by means of row id.
    • Row is actually the physical location of the row inside that table (which is simply a file or several). It is the fastest way to locate that row (even faster than primary key, since primary key is nothing more than an index itself.)
  • An index is a B-Tree (not always, there are also other indexes) with the leaf nodes connected as a doubly linked list.
    • The B-Tree can be searched by tree traversal and then the doubly linked list must be walked down by visiting one element after another.
  • The keys of the leaf node elements are the values of the colums that is being indexed.

Slow Indexes

  • Index traversal has 3 steps:
    • 1 tree traversal
    • 2 following leaf node chain
    • 3 accessing table data
  • First ingredient to slow index is the leaf node chain, where the database must not only traverse the B-tree but then also must walk through all the leaf nodes in the linked list to see if there are more matching entries.
  • Second ingredient is accessing the table. The actual corresponding table data to a leaf node is usually scattered across many table blocks. That means, for every hit, there is a table access.

Three ways of accessing and index:

  • Index Unique Scan: Performs tree traversal only. Used if there is a unique constraint in the query making sure that only one entry will match
  • Index Range Scan: Tree traversal and then follows the leaf node chain to find all matching entries. Used if multiple entries could match the query.
    • Can potentially read large parts of an index. Makes the query slow if many entries will have to be accessed with table access afterwards.
  • Table Access by Index RowId: Retrieves the actual row from table. This is usually done for every matched entry after an index scan (the ones above) operation.