Worked example: plot_slice_metric_heatmap — stratified metric grid#
What this shows. A 2-D heatmap of
(row_label × col_label → metric)values with colorbar + optional per-cell annotations. Use case: rung × OOD-slice AUPRC grid; model × dataset accuracy matrix; method × fold performance table. Shipped in v0.33.0 (closes upstream issue #16).Runtime: <1 s. Requires
[plotting]extra.
Setup#
import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from eval_toolkit import plot_slice_metric_heatmap
Synthetic data: 5 rungs × 6 OOD slices#
Build an AUPRC grid where some rungs are uniformly strong, some are slice-specific:
rung_labels = ["baseline", "minilm", "deberta", "gpt4_zs", "gpt4_fs"]
slice_labels = ["ood_a", "ood_b", "ood_c", "ood_d", "ood_e", "id_holdout"]
# Random-but-plausible AUPRC values; in production this comes from
# eval-toolkit's evaluate() per (scorer, slice) pair.
rng = np.random.default_rng(42)
grid = rng.uniform(0.55, 0.95, size=(len(rung_labels), len(slice_labels)))
# Make one rung (gpt4_fs) uniformly strong:
grid[4] = np.maximum(grid[4], 0.85)
# Make one slice (ood_e) uniformly hard:
grid[:, 4] = np.minimum(grid[:, 4], 0.65)
Basic heatmap#
fig = plot_slice_metric_heatmap(
grid,
row_labels=rung_labels,
col_labels=slice_labels,
metric_name="AUPRC",
title="Per-rung × per-slice AUPRC",
)
plt.close(fig)
Per-cell annotations (the AUPRC values) are drawn by default
(annotate=True); the colormap defaults to viridis. The colorbar uses
the supplied metric_name for its label.
Without per-cell annotations#
For dense grids (e.g., 20 rungs × 30 slices = 600 cells) annotations get visually busy. Disable them:
big_grid = rng.uniform(0.5, 0.95, size=(20, 30))
fig = plot_slice_metric_heatmap(
big_grid,
row_labels=[f"r{i}" for i in range(20)],
col_labels=[f"s{i}" for i in range(30)],
metric_name="AUPRC",
annotate=False, # too dense to annotate readably
figsize=(12, 6),
)
plt.close(fig)
With NaN cells (intentional gaps)#
Some (rung, slice) pairs may be intentionally un-evaluated (e.g., rung
doesn’t apply to certain slice types). Pass np.nan for those cells; the
heatmap masks them in a neutral color:
grid_with_gaps = grid.copy()
grid_with_gaps[0, 5] = np.nan # baseline rung not evaluated on id_holdout
grid_with_gaps[1, 4] = np.nan # minilm not evaluated on ood_e
fig = plot_slice_metric_heatmap(
grid_with_gaps,
row_labels=rung_labels,
col_labels=slice_labels,
metric_name="AUPRC",
title="With intentional gaps (NaN cells)",
)
plt.close(fig)
With caller-managed ax#
fig, ax = plt.subplots(figsize=(8, 5))
plot_slice_metric_heatmap(
grid,
row_labels=rung_labels,
col_labels=slice_labels,
ax=ax,
metric_name="AUPRC",
)
plt.close(fig)
Common pitfalls#
Shape mismatch: the function raises
ValueErrorifgrid.shape != (len(row_labels), len(col_labels)). Validate upstream.Annotation format:
annot_fmt="{:.3f}"is the default; use"{:.0%}"for percentages or"{:.2g}"for compact scientific. Numbers outside the colormap’s perceptual range (e.g.,1e9) just won’t render legibly — pick acmap=whose midpoint matches your value range.Colormap choice:
"viridis"is the default (perceptually uniform, colorblind-safe). For diverging metrics (e.g., delta-AUPRC vs baseline, where 0 is the reference), passcmap="RdBu_r"and center the normalisation.
See also#
plot_metric_bars()for the per-slice (1-D) view — use when you only have one row dimensionplot_confusion_matrix_grid()for the confusion-matrix-shaped equivalent (the only v0.33.0+ plotting fn that doesn’t acceptax=because it’s intrinsically grid-shaped)