# eval-toolkit Reusable evaluation contracts for binary classification — metrics, bootstrap confidence intervals, calibration, leakage detection, threshold selection, and a pluggable harness that ties them together. ## Get started ```{toctree} :maxdepth: 2 :caption: Get started getting-started whats-new ``` ## Examples ```{toctree} :maxdepth: 1 :caption: Examples examples/index examples/metrics_and_bootstrap examples/evaluate_harness examples/calibration examples/leakage_detection examples/claims_and_gates examples/paired_comparison examples/prompt_injection_walkthrough examples/pytorch_scorer_example examples/ood_dataset_from_manifest examples/character_injection_sweep examples/activation_delta_probe examples/spotlighting examples/recall_at_low_fpr ``` ## Methodology ```{toctree} :maxdepth: 1 :caption: Methodology methodology/README methodology/splits methodology/comparison methodology/reproducibility methodology/claims methodology/artifacts methodology/evidence methodology/bootstrap methodology/calibration methodology/leakage methodology/text_dedup methodology/fairness methodology/length_stratification methodology/parallelism methodology/reading_list methodology/testing methodology/thresholds methodology/versioning ``` ## API reference ```{toctree} :maxdepth: 1 :caption: API reference api/index api/analysis api/artifacts api/bootstrap api/calibration api/claims api/config api/docs api/embeddings api/evidence api/harness api/leakage api/loaders api/manifest api/metrics api/operating_points api/paths api/plotting api/protocols api/provenance api/seeds api/splits api/text_dedup api/thresholds ``` ## Migration guides ```{toctree} :maxdepth: 1 :caption: Upgrade guides migration/v0.7 migration/v0.8 migration/v0.9 ``` ## Project ```{toctree} :maxdepth: 1 :caption: Project extending schemas roadmap repo-strategy DEPRECATION MIGRATION RELEASING ``` ## Indices - {ref}`genindex` - {ref}`modindex` - {ref}`search`