Rust and In-Memory Management Inside TarsDB
The Traditional Garbage Collector (GC) Problem
Database languages typically rise on C++ or Java. Java (and its derivatives) carry a massive Garbage Collector overhead. When coding TarsDB Sense, our main goal was to escape GC pauses rather than just processing data.
Rust gave us the perfect key with its ownership and borrowing concepts.
Zero-Copy Architecture
Consider TarsDB's pipeline when reading data from massive files like CSV or Excel:
pub fn parse_columnar_data(buffer: &[u8]) -> Result<Vec<&[u8]>, EbpfError> {
// Memory is not copied here, it's referenced as a slice.
let mut columns = Vec::new();
let mut current_pos = 0;
while current_pos < buffer.len() {
// ... Zero-copy reading logic
}
Ok(columns)
}
This function causes no crazy RAM spikes even when reading terabytes of data because the buffer is not being copied.
Associative Engine's Romance with Rust
The Associative Logic inspired by Qlik Sense means continuously mapping green, white, and gray table rows in memory. Using Rust's BTreeMap and BitVec libraries, the state of rows (Selected, Alternative, Excluded) is found at O(1) speed.
- Green (Selected): Active filter.
- White (Alternative): Other selectable conditions.
- Gray (Excluded): Those not included in this set.
The result? TarsDB, in its "Nebula" vision, delivers massive data to the frontend with smooth O(1) access time and full memory safety.