Timefence guarantees point-in-time correctness for ML datasets. Audit pipelines, catch leakage, and build trusted training data in seconds. Zero infrastructure required.
Most ML pipelines leak future data by accident. Timefence makes temporal correctness a compile-time guarantee.
Enforce feature_time < label_time for every row.
No more "accidental" peeking at next week's churn event.
Built on DuckDB. Process millions of rows locally in seconds. No Spark cluster or cloud infrastructure needed.
Don't want to rebuild? Just audit.
timefence audit old_data.parquet
Define your data sources and features in Python. Timefence handles the complex point-in-time joins automatically.
Where does history live? (Parquet, CSV, SQL)
SQL or Python transforms with embargo periods.
Generate training data that is mathematically correct.
import timefence
# 1. Define Source
transactions = timefence.Source(
path="data/transactions.parquet",
keys=["user_id"],
timestamp="created_at"
)
# 2. Define Feature (Point-in-time correct)
rolling_spend = timefence.Feature(
source=transactions,
sql="""
SELECT user_id, created_at,
SUM(amount) OVER (PARTITION BY user_id
ORDER BY created_at
RANGE INTERVAL 30 DAYS PRECEDING)
""",
embargo="1d" # Simulates 1-day pipeline lag
)
# 3. Build Training Set
timefence.build(
labels="data/churn_labels.parquet",
features=[rolling_spend],
output="train_CLEAN.parquet"
)