v0.9.0 is now available

Stop future data leakage
before it trains.

Timefence guarantees point-in-time correctness for ML datasets. Audit pipelines, catch leakage, and build trusted training data in seconds. Zero infrastructure required.

pip install timefence
Start in 3 min
user@machine: ~/churn-project
$ timefence audit data/train_LEAKY.parquet
TEMPORAL AUDIT REPORT
Scanned 5,000 rows

WARNING LEAKAGE DETECTED in 3 of 4 features

LEAK rolling_spend_30d 1,520 rows (30.4%) use feature data from the future Severity: HIGH

OK user_country - clean (5,000 rows)
_

Why Timefence?

Most ML pipelines leak future data by accident. Timefence makes temporal correctness a compile-time guarantee.

Guaranteed Correctness

Enforce feature_time < label_time for every row. No more "accidental" peeking at next week's churn event.

Lightning Fast

Built on DuckDB. Process millions of rows locally in seconds. No Spark cluster or cloud infrastructure needed.

Audit Existing Data

Don't want to rebuild? Just audit. timefence audit old_data.parquet

Declarative & Simple

Define your data sources and features in Python. Timefence handles the complex point-in-time joins automatically.

1

Define Source

Where does history live? (Parquet, CSV, SQL)

2

Define Features

SQL or Python transforms with embargo periods.

3

Build

Generate training data that is mathematically correct.

features.py
import timefence

# 1. Define Source
transactions = timefence.Source(
    path="data/transactions.parquet",
    keys=["user_id"],
    timestamp="created_at"
)

# 2. Define Feature (Point-in-time correct)
rolling_spend = timefence.Feature(
    source=transactions,
    sql="""
        SELECT user_id, created_at,
        SUM(amount) OVER (PARTITION BY user_id 
                          ORDER BY created_at 
                          RANGE INTERVAL 30 DAYS PRECEDING)
    """,
    embargo="1d" # Simulates 1-day pipeline lag
)

# 3. Build Training Set
timefence.build(
    labels="data/churn_labels.parquet",
    features=[rolling_spend],
    output="train_CLEAN.parquet"
)

Ready to trust your training data?

View on GitHub