Benchmarks
This page documents Timber's benchmark methodology and how to reproduce results.
Reference Claim Context
The commonly cited 336x speedup is measured as:
- Baseline: Python XGBoost (
booster.predict) single-sample inference - Timber path:
TimberPredictorcalling compiled native artifact - Metric: in-process latency (microseconds), excluding HTTP/network overhead
Methodology
Hardware and Environment
Reference setup:
- CPU: Apple M2 Pro
- RAM: 16 GB
- OS: macOS
- Python: 3.11
To record your own hardware metadata:
python benchmarks/system_info.py
Model Specification
- Framework: XGBoost
- Objective:
binary:logistic - Trees: 50
- Max depth: 4
- Features: 30
- Dataset: sklearn
breast_cancer
Benchmark Parameters
- Warmup iterations: 1,000
- Timed iterations: 10,000
- Input shape: single sample (
batch=1)
Reproducible Scripts
All scripts are in benchmarks/:
run_benchmarks.py— runs Timber vs Python XGBoost and optional backendsrender_table.py— renders markdown comparison table from JSONsystem_info.py— captures hardware/software metadata
Run from repo root:
python benchmarks/run_benchmarks.py --output benchmarks/results.json
python benchmarks/render_table.py --input benchmarks/results.json
Comparison Targets
The benchmark runner includes:
- Python XGBoost (required)
- Timber native predictor (required)
- ONNX Runtime (optional)
- Treelite runtime (optional)
- lleaves (optional)
Optional targets are skipped automatically when dependencies are missing.
Reporting Guidance
When publishing benchmark numbers, include:
- Full hardware metadata (
system_info.pyoutput) - Model spec (trees, depth, features)
- Warmup and timed iteration counts
- Baselines used
- Raw
benchmarks/results.jsonartifact
This keeps claims auditable and reproducible.