Timefence — Temporal correctness for ML training data

Your features know the future.
Your model shouldn't.¶
When you join features to labels, future data leaks in — no error, no warning.
Timefence finds it, fixes it, and proves every row is clean.
-
Guaranteed Correctness
Enforce
feature_time < label_timefor every row. Embargo, staleness, and lookback — all configurable. -
Lightning Fast
Built on DuckDB. Process millions of rows locally in seconds. No Spark cluster or cloud infrastructure needed.
-
Audit Any Dataset
Don't rebuild — just audit. Point at any existing training set and get a full leakage report instantly.
-
CI/CD Ready
--strictexits code 1 on leakage. Add temporal correctness to your pipeline in one line.
See it in action¶
Point Timefence at any training set. If future data leaked in, it finds it.
$ timefence audit train_fraud_v3.parquet
TEMPORAL AUDIT REPORT
Scanned 1,247,392 rows across 6 features
WARNING LEAKAGE DETECTED in 2 of 6 features
LEAK merchant_risk_score
41,580 rows (3.3%) — future data
Severity: MEDIUM
LEAK rolling_txn_count_7d
98,423 rows (7.9%) — future data
Severity: HIGH
OK account_age_days — clean
OK avg_txn_amount_30d — clean
OK device_fingerprint_count — clean
OK customer_tenure_months — clean
$ timefence build -o train_fraud_v3_clean.parquet
$ timefence audit train_fraud_v3_clean.parquet
ALL CLEAN — no temporal leakage detected
How it works
Define your data, and Timefence handles point-in-time correctness for every row.
Define Sources
Tell Timefence where your raw data lives and which columns represent time and entities.
src = timefence.Source(
path="txns.parquet",
keys=["user_id"],
timestamp="ts",
)
Build Clean Data
For every label, Timefence finds the most recent feature value strictly before the label timestamp — respecting embargo, lookback, and staleness.
df = timefence.build(
features=features,
labels=labels,
output="train.parquet",
)
Verify & Ship
Audit the output to confirm zero leakage. A build manifest records exactly what happened.
report = timefence.audit(
"train.parquet",
features=features,
labels=labels,
)
report.assert_clean()