How databases work: Part 1


I recently started reading a very interesting article tutorial series explaining how databases work internally.
As developers we often work with technologies without fully understanding and appreciating the internals of those technologies. Since this article series is not only about explaining databases theoretically but also about implementing a toy SQLite clone, I decided to follow along and write my own implementation in Swift. Because why not?


The first article in the series gives a high level overview of architecture of a database.


A database consists of a front-end and a back-end.
The front-end is made of the following components:

Tokenizer (input: SQL query, output: individual tokens)

Parser (input: tokens, output: parse tree or abstract syntax tree)

Code Generator (input: tree representation, output: VM byte code)


The back-end consist of the following components:

Virtual Machine (input: Byte code, output: B-Tree instructions)

B-Tree (input B-Tree instructions output: pager commands)

Pager (input: pager commands, output: pages)


B-trees are used to store database tables and indexes. Each node in the B-tree is one page in length. The B-trees are responsible for retrieving pages from disk and writing it back there by issuing commands to the pager. Apart from disk I/O the pager also does caching of recently accessed pages. the OS interface is simply the underlying OS and the facilities provided by it for tasks such as file I/O etc.


In this first part we'll get started with writing a very simple REPL for our database.
When starting sqlite from command line you get a prompt where you can enter commands. The REPL reads the line and depending on the command that was given takes an action
We start with a simple REPL that only knows the .exit command.

import Foundation

func printPrompt() {
    print("db >")

func readInput() -> String {
    if let line = readLine() {
        return line
    } else {
        return ""

enum EXIT: Int32 {
    case EXIT_SUCCESS = 0

    let input = readInput();
    if input == ".exit" {
    } else {
        print("Unrecognized command \(input). \n")


The while loop at the bottom is an infinite loop that prints the prompt "db >" and waits for user input to process.
The printPrompt() and readInput() functions and the EXIT enum type are fairly self-explanatory.
If the received input is .exit the program terminates with a success error code, else it tells the user that the given input is unrecognized, as we're not yet able to recognize any commands other than .exit.

That's it for the first part of my copycat How databases work series.
Make sure to check out the original article series for an implementation in C and more in-depth explanations.