Guide: Audit Existing Data¶
Already have a training dataset? Audit it for temporal leakage without rebuilding.
Step 1: Inspect your data¶
This shows column types and suggests which columns are likely keys and timestamps.
Step 2: Run the audit¶
import timefence
transactions = timefence.Source(
path="data/transactions.parquet",
keys=["user_id"],
timestamp="created_at",
)
rolling_spend = timefence.Feature(
source=transactions,
columns=["amount"],
embargo="1d",
)
report = timefence.audit(
data="data/train.parquet",
features=[rolling_spend],
keys=["user_id"],
label_time="label_time",
)
print(report)
Step 3: Inspect results¶
import timefence
# (assuming `report` from step 2)
# Check overall status
report.has_leakage # True/False
report.leaky_features # ["rolling_spend_30d"]
report.clean_features # ["user_country"]
# Per-feature details
detail = report["rolling_spend_30d"]
detail.leaky_row_count # 1520
detail.leaky_row_pct # 0.304
detail.severity # "HIGH"
detail.max_leakage # timedelta(days=15)
# Export
report.to_html("audit_report.html")
report.to_json("audit_report.json")
Step 4: Assert in tests¶
See the audit() API reference for full parameter documentation.