eval_toolkit.leakage#

CrossSplitLeakageCheck

Train↔eval near-duplicate leakage (the genuinely dangerous one).

ExactDuplicateCheck

Within-split exact-duplicate detection.

GroupLeakageCheck

Group-id leakage: a single group spans multiple splits.

LabelConflictCheck

Cross-source label conflicts: same text, different labels across splits.

LeakageCheck

A pluggable validator over a dict of splits.

LeakageFinding

A single LeakageCheck's finding.

LeakageReport

Aggregate of one or more LeakageCheck results.

NearDuplicateCheck

Within-split near-duplicate detection via a pluggable similarity strategy.

NormalizedFormLeakageCheck

Encoding-obfuscated duplicate detection.

TemporalLeakageCheck

Temporal ordering invariant: every earlier split's max(time) ≤ next's min(time).

run_leakage_checks

Run a sequence of LeakageCheck over splits and aggregate.