eval_toolkit.leakage#
|
Train↔eval near-duplicate leakage (the genuinely dangerous one). |
|
Within-split exact-duplicate detection. |
|
Group-id leakage: a single group spans multiple splits. |
|
Cross-source label conflicts: same text, different labels across splits. |
|
A pluggable validator over a dict of splits. |
|
A single |
|
Aggregate of one or more |
|
Within-split near-duplicate detection via a pluggable similarity strategy. |
|
Encoding-obfuscated duplicate detection. |
|
Temporal ordering invariant: every earlier split's max(time) ≤ next's min(time). |
|
Run a sequence of |