Embargo¶
Real-world ML pipelines have latency: data arrives late, ETL jobs run on schedules, and features take time to compute. The embargo parameter models this lag.
Usage¶
rolling_spend = timefence.Feature(
source=transactions,
sql="SELECT ...",
embargo="1d" # Feature available 1 day after event
)
How it works¶
With embargo="1d", a feature recorded at 2024-03-15 10:00 is only eligible for labels at 2024-03-16 10:00 or later. This prevents optimistic leakage from features that wouldn't actually be available in production.
The full temporal constraint becomes:
When to use embargo¶
| Scenario | Recommended Embargo |
|---|---|
| Real-time features | "0d" (no embargo) |
| Daily ETL pipeline | "1d" |
| Weekly batch features | "7d" |
| Monthly aggregates | "30d" |
Tip
When in doubt, set embargo to match your production pipeline's worst-case latency. It's better to be conservative (larger embargo) than to train on features that wouldn't actually be available at prediction time.
Duration format¶
Accepted formats: "30d", "1d12h", "6h", "30m", "15s".
See Duration Format for the full specification.