eval_toolkit.harness#
|
int([x]) -> integer int(x, base=10) -> integer |
|
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
|
A single eval slice (dev test, OOD slice, ablation slice, etc.). |
|
Outcome of a full evaluation run. |
|
Run every scorer on every slice; return a pure |
|
Run a fold aggregator: |
|
Score one scorer on one slice; return headline + bootstrap CI on PR-AUC. |
|
Return a copy of |
|
Write a |