Crate sanakirja

source ·
Expand description

Transactional, on-disk datastructures with concurrent readers and writers (writers exclude each other).

This crate is based on the no-std crate sanakirja-core, whose goal is to implement different datastructures.

Here’s an example of how to use it (starting with 64 pages, 2 versions, see below for details about what that means). The file grows automatically, as needed.

use sanakirja::*;
let dir = tempfile::tempdir().unwrap();
let path = dir.path().join("db");
let env = Env::new(&path, 1 << 20, 2).unwrap();
let mut txn = Env::mut_txn_begin(&env).unwrap();
let mut db = btree::create_db::<_, u64, u64>(&mut txn).unwrap();
for i in 0..100_000u64 {
    btree::put(&mut txn, &mut db, &i, &(i*i)).unwrap();
}
let root_db = 0;
txn.set_root(root_db, db.db);
txn.commit().unwrap();
let txn = Env::txn_begin(&env).unwrap();
let db: btree::Db<u64, u64> = txn.root_db(root_db).unwrap();
assert_eq!(btree::get(&txn, &db, &50_000, None).unwrap(), Some((&50_000, &(50_000 * 50_000))));
for entry in btree::iter(&txn, &db, None).unwrap() {
  let (k, v) = entry.unwrap();
  assert_eq!(*k * *k, *v)
}

The binary format of a Sanakirja database is the following:

  • There is a fixed number of “current versions”, set at file initialisation. If a file has n versions, then for all k between 0 and n-1 (included), the k^th page (i.e. the byte positions between k * 4096 and (k+1) * 4096, also written as k << 12 and (k+1) << 12) stores the data relative to that version, and is called the “root page” of that version.

    This is a way to handle concurrent access: indeed, mutable transactions do not exclude readers, but readers that started before the commit of a mutable transaction will keep reading the database as it was before the commit. However, this means that older versions of the database have to be kept “alive”, and the “number of current versions” here is the limit on the number of versions that can be kept “alive” at the same time.

    When a reader starts, it takes a shared file lock on the file representing the youngest committed version. When a writer starts, it takes an exclusive file lock on the file representing the oldest committed version. This implies that if readers are still reading that version, the writer will wait for the exclusive lock.

    After taking a lock, the writer (also called “mutable transaction” or MutTxn) copies the entire root page of the youngest committed version onto the root page of the oldest committed version, hence erasing the root page of the oldest version.

  • Root pages have the following format: a 32-bytes header (described below), followed by 4064 bytes, usable in a more or less free format. The current implementation defines two methods on MutTxn, MutTxn::set_root and MutTxn::remove_root, treating that space as an array of type [u64; 510]. A reasonable use for these is to point to different datastructures allocated in the file, such as the offsets in the file to the root pages of B trees.

    Now, about the header, there’s a version identifier on the first 16 bytes, followed by two bytes: root is the version used by the current mutable transaction (if there is current mutable transaction), or by the next mutable transaction (else). The n_roots field is the total number of versions.

    #[repr(C)]
    pub struct GlobalHeader {
        /// Version of Sanakirja
        pub version: u16,
        /// Which page is currently the root page? (only valid for page 0).
        pub root: u8,
        /// Total number of versions (or "root pages")
        pub n_roots: u8,
        /// CRC of this page.
        pub crc: u32,
        /// First free page at the end of the file (only valid for page 0).
        pub length: u64,
        /// Offset of the free list.
        pub free_db: u64,
        /// Offset of the RC database.
        pub rc_db: u64,
    }

Modules§

  • An implementation of B trees. The core operations on B trees (lookup, iterate, put and del) are generic in the actual implementation of nodes, via the BTreePage and BTreeMutPage. This allows for a simpler code for the high-level functions, as well as specialised, high-performance implementations for the nodes.

Macros§

  • A macro to implement Storable on “plain” types, i.e. fixed-sized types that are repr(C) and don’t hold references.

Structs§

  • A CRC check failed
  • Representation of a mutable or shared page. This is an owned page (like Vec in Rust’s std), but we do not know whether we can mutate it or not.
  • An environment, which may be either a memory-mapped file, or memory allocated with std::alloc.
  • A 64-bit unsigned integer in little-endian ordering.
  • An owned page on which we can write. This is just a wrapper around CowPage to avoid checking the “dirty” bit at runtime.
  • A mutable transaction.
  • Representation of a borrowed, or immutable page, like a slice in Rust.
  • An immutable transaction.

Enums§

  • Errors that can occur while transacting.

Traits§

  • Trait for allocating and freeing pages.
  • Transactions that can be committed. This trait is an abstraction over mutable transactions and their subtransactions.
  • Trait for loading a page.
  • The trait, implemented by Txn and MutTxn, for treating the 4064 bytes after the header of root pages as pointers to B trees (well, actually Option of pointers to databases, where None is encoded by 0).
  • Access the root page of a transaction.
  • Types that can be stored on disk. This trait may be used in conjunction with Sized in order to determine the on-disk size, or with UnsizedStorable when special arrangements are needed.
  • Types that can be stored on disk.

Unions§