# Bitalosdb

# Problem & Solution

  • Problem: Standard LSM-Tree exists read&write amplification problem. Data size is larger, and resource consumption caused by read-write amplification is greater. How to carry more write & reads with lower resource cost on large data scale?

  • Solution: Based on Bitalostree to solve read amplification, based on Bitash to realize KV separation to solve write amplification, based on hot and cold data separation to further save memory and hard disk consumption.

# Read & Write Amplification Analysis

  • Definition

Write amplification is the ratio of the amount of data written to the storage device versus the amount of data written to the database.

Read amplification is the number of disk reads per query.

  • Parameter
N=Amount of all items (bytes)
L=Average length of one item (bytes)
B=Block size, the number of bytes read from the disk at one time
F=Single file size at the lowest level for lsm-tree; file size of single updatetable in bitalostree for bitalosdb
k=For lsm-tree,(single file size at Ln)/(single file size at Ln-1); for bitalosdb, (single file size of succinctable)/(single file size of updatetable)
j=Bitalostree split coefficient for bitalosdb, split 1 succinctable to j succinctable(s)
  • Conclusion
Write Amp Read Amp
b+tree O(B/L) O(logB/L(N/B))
lsm-leveled O(k*logk(N/F)) O((log(N/F)*log(N/B))/log(k))
lsm-size tiered O(logk(N/F)) O(k*log(N/F)*log(N/B)/log(k))
bitalosdb (k+1)/2+1/(j-1) O(log(kF/B))

# Write Amplification Comparison

  • Engine Configuration
bitalosdb pebble(lsm-tree) bboltdb(b+tree)
DisableWAL: true
UseBithash: true
UseMapIndex: true
DisableWAL: true
L0FileSize: 256MB
Compression: Snappy
BlockSize: 32KB
L0CompactionThreshold: 8
L0StopWritesThreshold: 32
LBaseMaxBytes: 4GB
NoSync: true
NoGrowSync: true
NoFreelistSync: false
PageSize: 4KB
  • Test Results
Cumulative Write 10G 20G 100G
Key: 20B
Value: 128B
Count: 74051160 Count: 148102320 Count: 740511600
bitalosdb 37GB 82GB 331GB
pebble 72GB 493GB 7172GB
bboltdb
(10000 items per commit)
551GB 1189GB 6045GB

# Key Technology

  • Bithash (KV separation technology), time complexity of retrieval is O(1). GC can be completed independently, and value and index are decoupled.

  • Bitalostree (high-performance compression index technology), optimized B+ Tree based on ultra-large Page, using a creative index compression technology, eliminates write amplification based on traditional B+ Tree, and improves the read performance significantly.

  • Bitalostable (cold and hot data separation technology), stores cold data, improves data compression rate, reduces index memory consumption, and achieves more rational resource utilization. (Open source stable edition has basic features, enterprise edition supports more comprehensive hot and cold separation).

# IO Architecture

bitalosdb-io

# KV Separation

# Solution Analysis

  • Solution A

kv-separation-1

  • Solution B

kv-separation-2

  • Solution C

kv-separation-3

  • Analysis

kv-separation-analysis

  • Conclusion
    • Solution A&B needs to query & update the index during vlog-GC, which consumes additional CPU and IO resource.

    • Solution C will trigger multiple random reads when reading vLog, and read performance can be further optimized.

    • Bitalosdb not only ensures high-performance reading of vLog, but also realizes GC inside vLog without querying & updating index.

# Implementation details

  • File structure

bithash-4

  • Data writing

bithash-2

  • Data reading

bithash-1

  • Index write

bithash-3

# High performance compressed index

bitalostree

  • Note: The open source version supports 90% of the compression formats in the figure, and the enterprise version supports all index compression formats.
ZUOYEBANG