Vectorized, pull-based operator pipeline processing ~1024 rows per batch.
A typed array of up to 1024 values from a single column. The basic unit of computation — operators work on vectors, not individual values.
A set of aligned vectors (one per column) plus a Selection — an index list that tracks which rows are "live." All vectors share the same raw length; the selection narrows visible rows without copying data.
Instead of removing filtered rows from vectors, a selection vector records which indices survived. Downstream operators only process selected indices. This avoids expensive data movement.
// Before filter: selection = [0, 1, 2, 3, 4, 5]
// After filter (age > 30): selection = [1, 3, 5]
// Vectors unchanged — only the index list shrinks
All operators implement the pull-based Operator interface:
type Operator interface {
Next() (*Batch, bool)
Reset()
}
| Operator | What it does |
|---|---|
ScanOp | Reads a RowGroup and yields batches of up to 1024 rows |
FilterOp | Evaluates a predicate and narrows the selection |
ProjectOp | Keeps only specified columns (by index, no copying) |
AggregateOp | Runs aggregators over the entire stream (no grouping) |
GroupByOp | Single-key hash GROUP BY with fast paths for int64/string |
OrderByOp | Sorts batches by a column |
LimitOp | Stops after N rows |
Seven comparison predicates for filter evaluation:
= != < <= > >=
Each predicate is implemented per type (int64, float64, string, bool) as a tight loop over the selection vector.
| Aggregator | Types |
|---|---|
COUNT(*) | All |
COUNT(col) | All (skips nulls) |
SUM | Int64, Float64 |
MIN | Int64, Float64 |
MAX | Int64, Float64 |
AVG | Int64, Float64 |
All 9 aggregators are zero-alloc at steady state — they resize internal state once per new group, not per row.
GROUP BY on 1M rows (Apple M4):
| Cardinality | Speedup vs naive | Allocs |
|---|---|---|
| 10 groups (string) | 0.79x | 13 → 0 |
| 10k groups (int64) | 1.67x | 10,033 → 0 |
| 100k groups (int64) | 1.44x | 100,252 → 6 |
The vectorized approach wins on high-cardinality data where allocation pressure matters. On low-cardinality, the overhead of batch processing slightly exceeds the gains.
| File | Role |
|---|---|
operator.go | Operator interface |
batch.go | Batch struct (vectors + selection) |
vector.go | Typed vector (up to 1024 values) |
selection.go | Selection index list |
scan.go | ScanOp — RowGroup reader |
filter.go | FilterOp — predicate evaluation |
project.go | ProjectOp — column projection |
predicate.go | 7 comparison predicates |
aggregate.go | AggregateOp — single-group aggregation |
aggregators.go | 9 concrete aggregator implementations |
groupby.go | GroupByOp — hash GROUP BY |
orderby.go | OrderByOp — sort |
limit.go | LimitOp — row limit |