v0.6.x → v0.7.x migration#

v0.7.0 is a BREAKING release: the select_threshold string API was removed in favor of a ThresholdSelector Protocol. Five new extension Protocols ship for downstream projects.

This guide lists every change with copy-pastable before/after.

At a glance#

Change

Type

select_threshold(criterion=str) removed

BREAKING

OperatingPoint Literal alias removed

BREAKING (only affects callers using the alias as a type hint)

select_threshold moved from eval_toolkit.metrics to eval_toolkit.thresholds

Re-export preserves from eval_toolkit import select_threshold; only direct submodule imports break

5 new Protocol surfaces (ThresholdSelector, LeakageCheck, Splitter, DatasetLoader, Versioned)

Additive

evaluate(...) gains leakage_checks / on_leakage / on_scorer_error parameters

Additive (defaults preserve old behavior)

RunResult gains by_fold / fold_summary / schema_version fields

Additive (default-empty / "v1")

1. The select_threshold migration#

The single most-impactful change. Every call site updates mechanically.

v0.6

v0.7

criterion="max_f1"

criterion=MaxF1Selector()

criterion="recall_0.90"

criterion=TargetRecallSelector(0.90)

criterion="recall_0.95"

criterion=TargetRecallSelector(0.95)

criterion="precision@0.90" (local fork in some consumers)

criterion=TargetPrecisionSelector(0.90)

criterion="recall@0.90" (local fork in some consumers)

criterion=TargetRecallSelector(0.90)

Decoding the v0.7 TypeError#

If you call select_threshold(y, s, criterion="max_f1") on v0.7, you’ll get:

TypeError: select_threshold requires a ThresholdSelector instance (v0.7.0+);
got str='max_f1'.
Migration:
  'max_f1'      -> MaxF1Selector()
  'recall_0.90' -> TargetRecallSelector(0.90)
  'recall_0.95' -> TargetRecallSelector(0.95)
  'precision@p' -> TargetPrecisionSelector(p)
  'recall@p'    -> TargetRecallSelector(p)
See CHANGELOG v0.7.0 for the full guide.

Worked example#

import numpy as np
from eval_toolkit import select_threshold, MaxF1Selector, TargetRecallSelector

y = np.array([0, 0, 1, 1, 0, 1])
s = np.array([0.1, 0.2, 0.7, 0.9, 0.3, 0.8])

# v0.6 (now broken):
# tr = select_threshold(y, s, criterion="max_f1")  # TypeError!

# v0.7 (works):
tr = select_threshold(y, s, criterion=MaxF1Selector())
print(f"max-F1 threshold = {tr.threshold:.3f}")

tr_r = select_threshold(y, s, criterion=TargetRecallSelector(0.95))
print(f"recall>=0.95 threshold = {tr_r.threshold:.3f}")

Naming-only difference#

If you used criterion="recall@0.90" (an at-sign separator from some local forks), it’s now TargetRecallSelector(0.90) — same behavior under a normalized API. Note the recall@p semantics also standardized — see the thresholds.md Pitfall on the smallest-vs-highest-threshold-meeting-floor convention.

2. OperatingPoint Literal alias removed#

v0.6 exposed OperatingPoint = Literal["max_f1", "recall_0.90", "recall_0.95"] as a type hint. v0.7 removes it because the new ThresholdSelector instance API doesn’t take strings.

v0.6

v0.7

def my_fn(crit: OperatingPoint = "max_f1") -> ...

def my_fn(crit: ThresholdSelector = MaxF1Selector()) -> ...

from eval_toolkit.metrics import OperatingPoint

drop the import; use ThresholdSelector from eval_toolkit.thresholds

3. select_threshold module location#

v0.6

v0.7

from eval_toolkit.metrics import select_threshold

works (forwarded)

from eval_toolkit import select_threshold

works (always did)

from eval_toolkit.thresholds import select_threshold

works (canonical home)

The function moved to eval_toolkit.thresholds; eval_toolkit.metrics no longer defines it but the package-level re-export (from eval_toolkit import select_threshold) is preserved.

5. Adopting the new evaluate(...) parameters#

evaluate(...) and evaluate_folded(...) gained three optional parameters in v0.7. None break v0.6 callers; defaults preserve old behavior.

from eval_toolkit import (
    evaluate, EvalSlice,
    NormalizedFormLeakageCheck, LabelConflictCheck,
)
import pandas as pd, numpy as np

class _Scorer:
    def predict_proba(self, X):
        return np.full(len(X), 0.5)

df = pd.DataFrame({"text": ["a", "b"], "label": [0, 1]})
slice_ = EvalSlice(name="test", df=df)

# v0.7 — inline leakage validation:
result = evaluate(
    {"s": _Scorer()},
    [slice_],
    run_id="r",
    leakage_checks=[NormalizedFormLeakageCheck(), LabelConflictCheck()],
    on_leakage="record",     # or "raise" (default) / "skip"
    on_scorer_error="raise",  # or "record"
)

6. New RunResult fields#

RunResult gains three additive fields:

  • by_fold: dict[str, RunResult] — populated by evaluate_folded; empty for non-folded runs.

  • fold_summary: dict — auto-CV-CI summary; empty for non-folded.

  • schema_version: str = "v1" — for downstream JSON parsers.

Old code that does result.by_slice[name]["by_scorer"][...] is unchanged.

See also#

  • docs/migration/v0.8.md — v0.7 → v0.8.

  • The four prompt_injection_* repos’ v0.7 migration commits — real worked examples of the threshold migration applied at scale.