Architecture

Five layers, bottom to top. Each layer only depends on the one below it.

REPL cmd/columnar-db — Interactive SQL shell + demo dataset SQL Frontend Tokenize Parse Plan Execute Vectorized Execution Scan Filter Project Aggregate GroupBy OrderBy Limit Encoding & Compression Plain RLE Dictionary Delta LZ4 Column Storage RowGroup ColumnChunk NullBitmap Writer Reader

Query Pipeline

SELECT city, COUNT(*), AVG(age) FROM people WHERE age > 30 GROUP BY city ORDER BY city

ScanOp reads RowGroup, yields Batches FilterOp narrows selection (age > 30) GroupByOp hashes by city, runs aggregators OrderByOp sorts result by city ASC results

Data Flow

SQL path Tokenize Parse Plan Operator tree drain Next() Write path Columns ColumnChunks RowGroup Encoder LZ4 disk Read path open file read footer decompress decode RowGroup

Component Details