Compiler Pipeline

Timber follows a classical compiler architecture with four phases.

Overview

                    ┌─────────────┐
                    │ Model File  │
                    │ .json .pkl  │
                    │ .txt .onnx  │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  Front-End  │  5 format-specific parsers
                    │  (Parsing)  │  → Framework-agnostic IR
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  Middle-End │  6 optimization passes
                    │ (Optimizer) │  → Optimized IR
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  Back-End   │  3 code emitters
                    │  (Codegen)  │  → C99 / WASM / MISRA-C
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  Native     │  gcc/clang compilation
                    │  Compiler   │  → .so / .dylib / .a
                    └─────────────┘

Phase 1: Front-End

Input: Framework-specific model artifact Output: TimberIR — a list of pipeline stages

Each parser converts its native format into Timber's IR. The IR uses a generic tree representation that abstracts away framework-specific details:

XGBoost: Converts base_score from probability to logit space, handles default_left for missing values
LightGBM: Handles negative-indexed leaves, re-indexes to 0-based
scikit-learn: Traverses sklearn tree arrays, handles Pipeline with StandardScaler
CatBoost: Expands oblivious (symmetric) trees into general form
ONNX: Reconstructs trees from flat node arrays in TreeEnsemble operators

Entry point: timber.frontends.auto_detect.parse_model()

Phase 2: Middle-End (Optimizer)

Input: TimberIR Output: Optimized TimberIR

Six passes run sequentially, each transforming the IR. Passes are independent and can be skipped or reordered. Each pass produces an audit log entry.

See Optimization Passes for details.

Entry point: timber.optimizer.pipeline.run()

Phase 3: Back-End (Code Generation)

Input: Optimized TimberIR Output: Dictionary of {filename: content} pairs

Three emitters are available:

C99 (c99.py) — primary target for servers and embedded
WebAssembly (wasm.py) — browser and edge deployment
MISRA-C (misra_c.py) — wraps C99 emitter with compliance transformations

Entry point: timber.codegen.c99.C99Emitter.emit()

Phase 4: Native Compilation

Input: Generated C source files Output: Shared library (.so / .dylib)

Uses the system's C compiler (gcc or clang) with -O3 -shared -std=c99. The Makefile and CMakeLists.txt generated alongside the source provide build configuration.

Pipeline Orchestration

The timber load command orchestrates all four phases:

# Simplified flow inside cli.py
ir = parse_model(model_path, format=format)      # Phase 1
ir = optimizer_pipeline.run(ir)                    # Phase 2
files = C99Emitter(ir).emit()                      # Phase 3
subprocess.run(["gcc", "-O3", "-shared", ...])     # Phase 4

The ModelStore then caches the compiled artifact in ~/.timber/models/<name>/.

Overview​

Phase 1: Front-End​

Phase 2: Middle-End (Optimizer)​

Phase 3: Back-End (Code Generation)​

Phase 4: Native Compilation​

Pipeline Orchestration​