Getting Started
Get a trained model running at native speed in 60 seconds.
Prerequisites
- Python 3.10+
- C compiler (
gccorclang) — comes pre-installed on macOS and most Linux distros - A trained tree-based model file (XGBoost, LightGBM, scikit-learn, CatBoost, or ONNX)
Installation
pip install timber-compiler
Verify the install:
timber --help
Quick Start
1. Load a Model
timber load model.json --name my-model
Timber will:
- Auto-detect the framework (XGBoost, LightGBM, etc.)
- Parse into a framework-agnostic IR
- Run 6 optimization passes
- Emit C99 source code
- Compile a native shared library
- Cache everything in
~/.timber/models/my-model/
2. Serve It
timber serve my-model
The server starts on port 11434 (same as Ollama) and exposes a REST API.
3. Query It
curl http://localhost:11434/api/predict \
-d '{"model": "my-model", "inputs": [[1.0, 2.0, 3.0]]}'
Response:
{
"model": "my-model",
"outputs": [0.97],
"n_samples": 1,
"latency_us": 91.0,
"done": true
}
4. Manage Models
# List all loaded models
timber list
# Remove a model
timber remove my-model
What Happened Under the Hood?
When you ran timber load, the compiler pipeline executed:
model.json
│
├─ Front-end: XGBoost JSON parser
│ → Extracted 50 trees, 30 features, binary:logistic objective
│ → Converted base_score from probability to logit space
│
├─ Optimizer: 6 passes
│ 1. Dead leaf elimination (pruned near-zero leaves)
│ 2. Constant feature detection (folded redundant splits)
│ 3. Threshold quantization (analyzed precision requirements)
│ 4. Branch sorting (optimized for branch prediction)
│ 5. Pipeline fusion (absorbed scaler into thresholds)
│ 6. Vectorization analysis (computed SIMD hints)
│
├─ Code generator: C99 emitter
│ → model.h (public API with ABI version)
│ → model_data.c (tree data as static const arrays)
│ → model.c (inference logic — no malloc, no recursion)
│ → CMakeLists.txt + Makefile
│
└─ Compiler: gcc -O3 -shared -std=c99
→ libtimber_model.so (48 KB)
When you ran timber serve, the HTTP server loaded the pre-compiled .so via Python ctypes. The actual inference call goes directly to compiled C — Python only handles the HTTP envelope (JSON parsing, buffer copying). That's why inference is 2 µs while the HTTP round-trip is ~91 µs.
Don't Have a Model Yet?
Train a quick one:
import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
X_train, _, y_train, _ = train_test_split(data.data, data.target, random_state=42)
model = xgb.XGBClassifier(n_estimators=50, max_depth=4, random_state=42)
model.fit(X_train, y_train)
model.get_booster().save_model("model.json")
Then load and serve it:
timber load model.json --name breast-cancer
timber serve breast-cancer
Next Steps
- How It Works — deep dive into the compiler pipeline
- Examples — per-framework walkthroughs
- API Reference — complete CLI and HTTP docs