Skip to content

Timefence — Temporal correctness for ML training data

Your features know the future.
Your model shouldn't.

When you join features to labels, future data leaks in — no error, no warning.
Timefence finds it, fixes it, and proves every row is clean.

pip install timefence Copied!
CI codecov PyPI Python License: MIT

Get Started View on GitHub

  • Guaranteed Correctness


    Enforce feature_time < label_time for every row. Embargo, staleness, and lookback — all configurable.

    Learn more

  • Lightning Fast


    Built on DuckDB. Process millions of rows locally in seconds. No Spark cluster or cloud infrastructure needed.

    See benchmarks

  • Audit Any Dataset


    Don't rebuild — just audit. Point at any existing training set and get a full leakage report instantly.

    Audit guide

  • CI/CD Ready


    --strict exits code 1 on leakage. Add temporal correctness to your pipeline in one line.

    CI guide

See it in action

Point Timefence at any training set. If future data leaked in, it finds it.

timefence
$ timefence audit train_fraud_v3.parquet

TEMPORAL AUDIT REPORT
Scanned 1,247,392 rows across 6 features

WARNING  LEAKAGE DETECTED in 2 of 6 features

  LEAK  merchant_risk_score
        41,580 rows (3.3%) — future data
        Severity: MEDIUM

  LEAK  rolling_txn_count_7d
        98,423 rows (7.9%) — future data
        Severity: HIGH

  OK    account_age_days — clean
  OK    avg_txn_amount_30d — clean
  OK    device_fingerprint_count — clean
  OK    customer_tenure_months — clean

$ timefence build -o train_fraud_v3_clean.parquet
$ timefence audit train_fraud_v3_clean.parquet
ALL CLEAN — no temporal leakage detected

How it works

Define your data, and Timefence handles point-in-time correctness for every row.

1

Define Sources

Tell Timefence where your raw data lives and which columns represent time and entities.

src = timefence.Source(
  path="txns.parquet",
  keys=["user_id"],
  timestamp="ts",
)
2

Build Clean Data

For every label, Timefence finds the most recent feature value strictly before the label timestamp — respecting embargo, lookback, and staleness.

df = timefence.build(
  features=features,
  labels=labels,
  output="train.parquet",
)
3

Verify & Ship

Audit the output to confirm zero leakage. A build manifest records exactly what happened.

report = timefence.audit(
  "train.parquet",
  features=features,
  labels=labels,
)
report.assert_clean()
1M+
rows in seconds
Zero
infrastructure needed
1 line
to add to CI/CD
100%
row-level guarantees

Ready to find out if your training data is clean?

Get Started in 60 Seconds Read the Docs