Columnar DB

A column-oriented analytical database built from scratch in Go, inspired by DuckDB & ClickHouse.

~19KLines of Go

474Tests

0External deps

5Phases

Quick Start

go run ./cmd/columnar-db

columnar-db> SELECT city, COUNT(*), AVG(age) FROM people GROUP BY city ORDER BY city
city     | COUNT(*) | AVG(age)
---------+----------+---------
Beijing  | 166      | 47.82
Kunming  | 184      | 45.51
Lyon     | 156      | 44.48
Paris    | 165      | 44.72
Shanghai | 174      | 44.94
Tokyo    | 155      | 45.02
(6 rows)

The REPL ships with a 1000-row demo dataset. Type .schema to see columns, .help for syntax, .quit to exit.

SQL Subset

SELECT { * | col [, ...] | agg(col) [, ...] }
FROM table
[WHERE col { = | != | < | <= | > | >= } literal]
[GROUP BY col [, col ...]]
[ORDER BY col [ASC | DESC]]
[LIMIT n]

Aggregates: COUNT(*), COUNT(col), SUM, MIN, MAX, AVG on Int64 and Float64.

Components

Storage

Column chunks, row groups, null bitmaps, file I/O

Encoding

RLE, dictionary, delta encoding + LZ4 compression

Execution

Vectorized operators, batches of 1024, late materialization

SQL Frontend

Hand-rolled lexer, parser, planner, and REPL

Key Concepts

Columnar storage	Contiguous typed arrays + null bitmaps, not row tuples
Vectorized execution	Process ~1024 values per batch in tight typed loops
Late materialization	Selection vectors narrow live rows without copying data
Zero-alloc hot path	Per-batch Update/Finalize allocates nothing at steady state

Build Phases

Phase	Topic	Highlight
1	Column Storage	Types, null bitmap, row groups, file format
2	Encoding	RLE, dictionary, delta, LZ4
3	Vectorized Scan	Batch processing, filters, selection vectors
4	Aggregation	Hash GROUP BY, 1.67x speedup on high-cardinality
5	SQL Frontend	Lexer, parser, planner — +0.3% overhead vs direct API