v0.6.x → v0.7.x migration#
v0.7.0 is a BREAKING release: the select_threshold string API
was removed in favor of a ThresholdSelector Protocol. Five new
extension Protocols ship for downstream projects.
This guide lists every change with copy-pastable before/after.
At a glance#
Change |
Type |
|---|---|
|
BREAKING |
|
BREAKING (only affects callers using the alias as a type hint) |
|
Re-export preserves |
5 new Protocol surfaces ( |
Additive |
|
Additive (defaults preserve old behavior) |
|
Additive (default-empty / |
1. The select_threshold migration#
The single most-impactful change. Every call site updates mechanically.
v0.6 |
v0.7 |
|---|---|
|
|
|
|
|
|
|
|
|
|
Decoding the v0.7 TypeError#
If you call select_threshold(y, s, criterion="max_f1") on v0.7, you’ll get:
TypeError: select_threshold requires a ThresholdSelector instance (v0.7.0+);
got str='max_f1'.
Migration:
'max_f1' -> MaxF1Selector()
'recall_0.90' -> TargetRecallSelector(0.90)
'recall_0.95' -> TargetRecallSelector(0.95)
'precision@p' -> TargetPrecisionSelector(p)
'recall@p' -> TargetRecallSelector(p)
See CHANGELOG v0.7.0 for the full guide.
Worked example#
import numpy as np
from eval_toolkit import select_threshold, MaxF1Selector, TargetRecallSelector
y = np.array([0, 0, 1, 1, 0, 1])
s = np.array([0.1, 0.2, 0.7, 0.9, 0.3, 0.8])
# v0.6 (now broken):
# tr = select_threshold(y, s, criterion="max_f1") # TypeError!
# v0.7 (works):
tr = select_threshold(y, s, criterion=MaxF1Selector())
print(f"max-F1 threshold = {tr.threshold:.3f}")
tr_r = select_threshold(y, s, criterion=TargetRecallSelector(0.95))
print(f"recall>=0.95 threshold = {tr_r.threshold:.3f}")
Naming-only difference#
If you used criterion="recall@0.90" (an at-sign separator from
some local forks), it’s now TargetRecallSelector(0.90) — same
behavior under a normalized API. Note the recall@p semantics
also standardized — see the
thresholds.md Pitfall
on the smallest-vs-highest-threshold-meeting-floor convention.
2. OperatingPoint Literal alias removed#
v0.6 exposed OperatingPoint = Literal["max_f1", "recall_0.90", "recall_0.95"]
as a type hint. v0.7 removes it because the new
ThresholdSelector instance API doesn’t take strings.
v0.6 |
v0.7 |
|---|---|
|
|
|
drop the import; use |
3. select_threshold module location#
v0.6 |
v0.7 |
|---|---|
|
works (forwarded) |
|
works (always did) |
|
works (canonical home) |
The function moved to eval_toolkit.thresholds; eval_toolkit.metrics
no longer defines it but the package-level re-export
(from eval_toolkit import select_threshold) is preserved.
4. Adopting the new Protocols (optional but recommended)#
v0.7 adds five extension surfaces that the four prompt_injection_*
consumer repos collectively migrated to:
ThresholdSelector— pluggable threshold rules.LeakageCheck— pluggable leakage validators (incl. the newNormalizedFormLeakageCheckfor encoding-obfuscated dupes).Splitter— pluggable train/test splitting (incl.SourceDisjointKFoldSplitter).DatasetLoader— pluggable named-splits dataset loading.Versioned— opt-in per-object version capture forRunManifest.
See docs/extending.md for end-to-end recipes.
5. Adopting the new evaluate(...) parameters#
evaluate(...) and evaluate_folded(...) gained three optional
parameters in v0.7. None break v0.6 callers; defaults preserve old
behavior.
from eval_toolkit import (
evaluate, EvalSlice,
NormalizedFormLeakageCheck, LabelConflictCheck,
)
import pandas as pd, numpy as np
class _Scorer:
def predict_proba(self, X):
return np.full(len(X), 0.5)
df = pd.DataFrame({"text": ["a", "b"], "label": [0, 1]})
slice_ = EvalSlice(name="test", df=df)
# v0.7 — inline leakage validation:
result = evaluate(
{"s": _Scorer()},
[slice_],
run_id="r",
leakage_checks=[NormalizedFormLeakageCheck(), LabelConflictCheck()],
on_leakage="record", # or "raise" (default) / "skip"
on_scorer_error="raise", # or "record"
)
6. New RunResult fields#
RunResult gains three additive fields:
by_fold: dict[str, RunResult]— populated byevaluate_folded; empty for non-folded runs.fold_summary: dict— auto-CV-CI summary; empty for non-folded.schema_version: str = "v1"— for downstream JSON parsers.
Old code that does result.by_slice[name]["by_scorer"][...] is
unchanged.
See also#
docs/migration/v0.8.md— v0.7 → v0.8.The four
prompt_injection_*repos’ v0.7 migration commits — real worked examples of the threshold migration applied at scale.