Skip to main content

Differential Compilation

When models are retrained frequently, Timber can diff two model versions and identify exactly what changed.

Computing a Diff

from timber.optimizer.diff_compile import diff_models
from timber.frontends.auto_detect import parse_model

old_ir = parse_model("model_v1.json")
new_ir = parse_model("model_v2.json")

diff = diff_models(old_ir, new_ir)

print(f"Added trees: {len(diff.added)}")
print(f"Removed trees: {len(diff.removed)}")
print(f"Modified trees: {len(diff.modified)}")
print(f"Unchanged trees: {len(diff.unchanged)}")

How Tree Hashing Works

Each tree is assigned a content hash based on:

  • Node structure (parent/child indices)
  • Feature indices at each split
  • Threshold values
  • Leaf values
  • Default-left flags

Two trees with identical structure and values will have the same hash, even if they appear at different positions in the ensemble.

Use Cases

Model Monitoring

# Scheduled job: check how much the model changed after retraining
diff = diff_models(production_ir, retrained_ir)

change_ratio = len(diff.modified) / (len(diff.unchanged) + len(diff.modified))
if change_ratio > 0.5:
alert("Model drift detected: >50% of trees changed")

Incremental Deployment

In pipelines where models are retrained hourly, typically only 5–20% of trees change. Differential compilation identifies exactly which trees need recompilation.

Audit & Compliance

The diff output provides a precise record of what changed between model versions — useful for change management in regulated environments.

API Reference

@dataclass
class ModelDiff:
added: list[str] # Hashes of new trees
removed: list[str] # Hashes of deleted trees
modified: list[str] # Trees with same position but different hash
unchanged: list[str] # Trees identical in both versions