---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Worked example: `plot_slice_metric_heatmap` — stratified metric grid

> **What this shows.** A 2-D heatmap of `(row_label × col_label → metric)`
> values with colorbar + optional per-cell annotations. Use case: rung ×
> OOD-slice AUPRC grid; model × dataset accuracy matrix; method × fold
> performance table. Shipped in v0.33.0 (closes upstream issue #16).
>
> **Runtime:** <1 s. Requires `[plotting]` extra.

## Setup

```{code-cell}
import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from eval_toolkit import plot_slice_metric_heatmap
```

## Synthetic data: 5 rungs × 6 OOD slices

Build an AUPRC grid where some rungs are uniformly strong, some are
slice-specific:

```{code-cell}
rung_labels = ["baseline", "minilm", "deberta", "gpt4_zs", "gpt4_fs"]
slice_labels = ["ood_a", "ood_b", "ood_c", "ood_d", "ood_e", "id_holdout"]

# Random-but-plausible AUPRC values; in production this comes from
# eval-toolkit's evaluate() per (scorer, slice) pair.
rng = np.random.default_rng(42)
grid = rng.uniform(0.55, 0.95, size=(len(rung_labels), len(slice_labels)))
# Make one rung (gpt4_fs) uniformly strong:
grid[4] = np.maximum(grid[4], 0.85)
# Make one slice (ood_e) uniformly hard:
grid[:, 4] = np.minimum(grid[:, 4], 0.65)
```

## Basic heatmap

```{code-cell}
fig = plot_slice_metric_heatmap(
    grid,
    row_labels=rung_labels,
    col_labels=slice_labels,
    metric_name="AUPRC",
    title="Per-rung × per-slice AUPRC",
)
plt.close(fig)
```

Per-cell annotations (the AUPRC values) are drawn by default
(`annotate=True`); the colormap defaults to `viridis`. The colorbar uses
the supplied `metric_name` for its label.

## Without per-cell annotations

For dense grids (e.g., 20 rungs × 30 slices = 600 cells) annotations get
visually busy. Disable them:

```{code-cell}
big_grid = rng.uniform(0.5, 0.95, size=(20, 30))
fig = plot_slice_metric_heatmap(
    big_grid,
    row_labels=[f"r{i}" for i in range(20)],
    col_labels=[f"s{i}" for i in range(30)],
    metric_name="AUPRC",
    annotate=False,  # too dense to annotate readably
    figsize=(12, 6),
)
plt.close(fig)
```

## With NaN cells (intentional gaps)

Some `(rung, slice)` pairs may be intentionally un-evaluated (e.g., rung
doesn't apply to certain slice types). Pass `np.nan` for those cells; the
heatmap masks them in a neutral color:

```{code-cell}
grid_with_gaps = grid.copy()
grid_with_gaps[0, 5] = np.nan  # baseline rung not evaluated on id_holdout
grid_with_gaps[1, 4] = np.nan  # minilm not evaluated on ood_e
fig = plot_slice_metric_heatmap(
    grid_with_gaps,
    row_labels=rung_labels,
    col_labels=slice_labels,
    metric_name="AUPRC",
    title="With intentional gaps (NaN cells)",
)
plt.close(fig)
```

## With caller-managed `ax`

```{code-cell}
fig, ax = plt.subplots(figsize=(8, 5))
plot_slice_metric_heatmap(
    grid,
    row_labels=rung_labels,
    col_labels=slice_labels,
    ax=ax,
    metric_name="AUPRC",
)
plt.close(fig)
```

## Common pitfalls

- **Shape mismatch**: the function raises `ValueError` if
  `grid.shape != (len(row_labels), len(col_labels))`. Validate upstream.
- **Annotation format**: `annot_fmt="{:.3f}"` is the default; use
  `"{:.0%}"` for percentages or `"{:.2g}"` for compact scientific. Numbers
  outside the colormap's perceptual range (e.g., `1e9`) just won't render
  legibly — pick a `cmap=` whose midpoint matches your value range.
- **Colormap choice**: `"viridis"` is the default (perceptually uniform,
  colorblind-safe). For diverging metrics (e.g., delta-AUPRC vs baseline,
  where 0 is the reference), pass `cmap="RdBu_r"` and center the
  normalisation.

## See also

- {func}`~eval_toolkit.plotting.plot_metric_bars` for the per-slice
  (1-D) view — use when you only have one row dimension
- {func}`~eval_toolkit.plotting.plot_confusion_matrix_grid` for the
  confusion-matrix-shaped equivalent (the only v0.33.0+ plotting fn that
  doesn't accept `ax=` because it's intrinsically grid-shaped)