Differential Compilation

When models are retrained frequently, Timber can diff two model versions and identify exactly what changed.

Computing a Diff

from timber.optimizer.diff_compile import diff_models
from timber.frontends.auto_detect import parse_model

old_ir = parse_model("model_v1.json")
new_ir = parse_model("model_v2.json")

diff = diff_models(old_ir, new_ir)

print(f"Added trees:     {len(diff.added)}")
print(f"Removed trees:   {len(diff.removed)}")
print(f"Modified trees:  {len(diff.modified)}")
print(f"Unchanged trees: {len(diff.unchanged)}")

How Tree Hashing Works

Each tree is assigned a content hash based on:

Node structure (parent/child indices)
Feature indices at each split
Threshold values
Leaf values
Default-left flags

Two trees with identical structure and values will have the same hash, even if they appear at different positions in the ensemble.

Use Cases

Model Monitoring

# Scheduled job: check how much the model changed after retraining
diff = diff_models(production_ir, retrained_ir)

change_ratio = len(diff.modified) / (len(diff.unchanged) + len(diff.modified))
if change_ratio > 0.5:
    alert("Model drift detected: >50% of trees changed")

Incremental Deployment

In pipelines where models are retrained hourly, typically only 5–20% of trees change. Differential compilation identifies exactly which trees need recompilation.

Audit & Compliance

The diff output provides a precise record of what changed between model versions — useful for change management in regulated environments.

API Reference

@dataclass
class ModelDiff:
    added: list[str]      # Hashes of new trees
    removed: list[str]    # Hashes of deleted trees
    modified: list[str]   # Trees with same position but different hash
    unchanged: list[str]  # Trees identical in both versions

Computing a Diff​

How Tree Hashing Works​

Use Cases​

Model Monitoring​

Incremental Deployment​

Audit & Compliance​

API Reference​