eval_toolkit.bootstrap#

DEFAULT_CONFIDENCE

Convert a string or number to a floating-point number, if possible.

DEFAULT_METHOD

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

DEFAULT_N_RESAMPLES

int([x]) -> integer int(x, base=10) -> integer

DEFAULT_SEED

int([x]) -> integer int(x, base=10) -> integer

BootstrapCI

95% CI for a metric on a single condition.

DeLongResult

Result of a DeLong paired ROC-AUC comparison.

MDEEstimate

Minimum detectable Δ at the requested (α, 1-β).

MetricFn

PairedBootstrapCI

95% CI for metric(B) metric(A) on shared resample indices.

ThresholdedMetricFn

ThresholdFn

bootstrap_ci

Per-condition CI via scipy.stats.bootstrap().

cross_validate_metric

K-fold cross-validation of a metric on caller-supplied scores.

cv_clt_ci

CV-corrected confidence interval per Bayle et al. 2020 [#bayle]_ Theorem 3.1.

delong_roc_variance

DeLong's variance of the paired ROC-AUC difference.

mde_from_ci

Derive MDE from an existing BootstrapCI or PairedBootstrapCI.

paired_bootstrap_diff

Paired-bootstrap CI on metric(B) metric(A) using the same resample indices.

paired_bootstrap_ece_diff

Paired-bootstrap CI on ECE(B) ECE(A) for two calibrated outputs.

paired_bootstrap_op_point_diff

Two-level paired bootstrap for operating-point lifts.

paired_mde

Minimum detectable paired Δ at (α, power).