eval_toolkit.metrics#

DEFAULT_ASSUMED_PRIORS

Built-in immutable sequence.

ThresholdResult

Outcome of operating-point selection at a given criterion.

brier_decomposition

Murphy 1973 [#murphy]_ decomposition of the Brier score.

brier_score

expected_calibration_error

Expected calibration error on equal-width probability bins.

expected_calibration_error_debiased

Bias-corrected L1 ECE via simulated-H0 Monte-Carlo (Roelofs 2022 spirit).

expected_calibration_error_equal_mass

ECE on equal-mass (quantile) bins.

expected_calibration_error_l2

Equal-mass L2 ECE — root mean squared bin-level miscalibration.

expected_calibration_error_l2_debiased

Bias-corrected L2 ECE per Kumar 2019 [#kumar]_ §3.3.

headline_metrics

Bundle PR-AUC + ROC-AUC + 3 operating-point F1s + per-stratum recall (if provided).

metrics_at_threshold

Precision / recall / F1 / accuracy / TN/FP/FN/TP at a fixed threshold.

pr_auc

precision_at_prior

Project precision under a different positive-class prior.

quantile_stratified_pr_auc

PR-AUC on the central [q_low, q_high] range of any 1-D stratifier.

quantile_stratified_report

Full vs trimmed PR-AUC report with a gap-flag (SDD reporting convention).

roc_auc

score_distribution_summary

Threshold-free score-distribution summary.

single_class_threshold_metrics

Operating metrics for all-positive or all-negative slices.

stratified_recall

Recall (TPR) per categorical stratum.