eval-toolkit#

Reusable evaluation contracts for binary classification — metrics, bootstrap confidence intervals, calibration, leakage detection, threshold selection, and a pluggable harness that ties them together.