Timefence — Temporal correctness for ML training data

Your features know the future.
Your model shouldn't.¶

When you join features to labels, future data leaks in — no error, no warning.
Timefence finds it, fixes it, and proves every row is clean.

pip install timefence Copied!

Get Started View on GitHub

Guaranteed Correctness

Enforce feature_time < label_time for every row. Embargo, staleness, and lookback — all configurable.

Learn more
Lightning Fast

Built on DuckDB. Process millions of rows locally in seconds. No Spark cluster or cloud infrastructure needed.

See benchmarks
Audit Any Dataset

Don't rebuild — just audit. Point at any existing training set and get a full leakage report instantly.

Audit guide
CI/CD Ready

--strict exits code 1 on leakage. Add temporal correctness to your pipeline in one line.

CI guide

See it in action¶

Point Timefence at any training set. If future data leaked in, it finds it.

timefence

$ timefence audit train_fraud_v3.parquet

TEMPORAL AUDIT REPORT
Scanned 1,247,392 rows across 6 features

WARNING  LEAKAGE DETECTED in 2 of 6 features

  LEAK  merchant_risk_score
        41,580 rows (3.3%) — future data
        Severity: MEDIUM

  LEAK  rolling_txn_count_7d
        98,423 rows (7.9%) — future data
        Severity: HIGH

  OK    account_age_days — clean
  OK    avg_txn_amount_30d — clean
  OK    device_fingerprint_count — clean
  OK    customer_tenure_months — clean

$ timefence build -o train_fraud_v3_clean.parquet
$ timefence audit train_fraud_v3_clean.parquet
ALL CLEAN — no temporal leakage detected

Try the quickstart

How it works

Define your data, and Timefence handles point-in-time correctness for every row.

1

Define Sources

Tell Timefence where your raw data lives and which columns represent time and entities.

src = timefence.Source(
  path="txns.parquet",
  keys=["user_id"],
  timestamp="ts",
)

2

Build Clean Data

For every label, Timefence finds the most recent feature value strictly before the label timestamp — respecting embargo, lookback, and staleness.

df = timefence.build(
  features=features,
  labels=labels,
  output="train.parquet",
)

3

Verify & Ship

Audit the output to confirm zero leakage. A build manifest records exactly what happened.

report = timefence.audit(
  "train.parquet",
  features=features,
  labels=labels,
)
report.assert_clean()

1M+

rows in seconds

Zero

infrastructure needed

1 line

to add to CI/CD

100%

row-level guarantees

Ready to find out if your training data is clean?¶

Get Started in 60 Seconds Read the Docs