# Bitalosdb
# Problem & Solution
Problem: Standard LSM-Tree exists read&write amplification problem. Data size is larger, and resource consumption caused by read-write amplification is greater. How to carry more write & reads with lower resource cost on large data scale?
Solution: Based on Bitalostree to solve read amplification, based on Bitash to realize KV separation to solve write amplification, based on hot and cold data separation to further save memory and hard disk consumption.
# Read & Write Amplification Analysis
- Definition
Write amplification is the ratio of the amount of data written to the storage device versus the amount of data written to the database.
Read amplification is the number of disk reads per query.
- Parameter
N=Amount of all items (bytes)
L=Average length of one item (bytes)
B=Block size, the number of bytes read from the disk at one time
F=Single file size at the lowest level for lsm-tree; file size of single updatetable in bitalostree for bitalosdb
k=For lsm-tree,(single file size at Ln)/(single file size at Ln-1); for bitalosdb, (single file size of succinctable)/(single file size of updatetable)
j=Bitalostree split coefficient for bitalosdb, split 1 succinctable to j succinctable(s)
- Conclusion
| Write Amp | Read Amp | |
|---|---|---|
| b+tree | O(B/L) | O(logB/L(N/B)) |
| lsm-leveled | O(k*logk(N/F)) | O((log(N/F)*log(N/B))/log(k)) |
| lsm-size tiered | O(logk(N/F)) | O(k*log(N/F)*log(N/B)/log(k)) |
| bitalosdb | (k+1)/2+1/(j-1) | O(log(kF/B)) |
# Write Amplification Comparison
- Engine Configuration
| bitalosdb | pebble(lsm-tree) | bboltdb(b+tree) |
|---|---|---|
| DisableWAL: true UseBithash: true UseMapIndex: true | DisableWAL: true L0FileSize: 256MB Compression: Snappy BlockSize: 32KB L0CompactionThreshold: 8 L0StopWritesThreshold: 32 LBaseMaxBytes: 4GB | NoSync: true NoGrowSync: true NoFreelistSync: false PageSize: 4KB |
- Test Results
| Cumulative Write | 10G | 20G | 100G |
|---|---|---|---|
| Key: 20B Value: 128B | Count: 74051160 | Count: 148102320 | Count: 740511600 |
| bitalosdb | 37GB | 82GB | 331GB |
| pebble | 72GB | 493GB | 7172GB |
| bboltdb (10000 items per commit) | 551GB | 1189GB | 6045GB |
# Key Technology
Bithash (KV separation technology), time complexity of retrieval is O(1). GC can be completed independently, and value and index are decoupled.
Bitalostree (high-performance compression index technology), optimized B+ Tree based on ultra-large Page, using a creative index compression technology, eliminates write amplification based on traditional B+ Tree, and improves the read performance significantly.
Bitalostable (cold and hot data separation technology), stores cold data, improves data compression rate, reduces index memory consumption, and achieves more rational resource utilization. (Open source stable edition has basic features, enterprise edition supports more comprehensive hot and cold separation).
# IO Architecture

# KV Separation
# Solution Analysis
- Solution A

- Solution B

- Solution C

- Analysis

- Conclusion
Solution A&B needs to query & update the index during vlog-GC, which consumes additional CPU and IO resource.
Solution C will trigger multiple random reads when reading vLog, and read performance can be further optimized.
Bitalosdb not only ensures high-performance reading of vLog, but also realizes GC inside vLog without querying & updating index.
# Implementation details
- File structure

- Data writing

- Data reading

- Index write

# High performance compressed index

- Note: The open source version supports 90% of the compression formats in the figure, and the enterprise version supports all index compression formats.