Worked example: PyTorch + LoRA Scorer adapter#

What this shows. How to wrap a PyTorch transformer with a LoRA adapter as a Scorer for eval-toolkit’s harness — batched inference, GPU/CPU placement, deterministic-mode setup, returning a numpy array.

Optional dependencies. This example requires torch ≥ 2.5 and transformers (and peft if you actually fine-tune with LoRA). The toolkit’s core does not depend on any of these — they’re consumer-side concerns.

Code blocks below are marked <!-- skip: next --> so Sybil doesn’t try to execute them in CI without torch installed. The toolkit’s own tests/ includes a guarded smoke test (pytest.importorskip("torch")) that does run end-to-end if torch is available — see verification step in the v0.7.0 plan.

Setup (CPU baseline; runs in CI)#

import numpy as np
import pandas as pd
from eval_toolkit import EvalSlice, evaluate

Minimal Scorer Protocol shape#

The Protocol is just predict_proba(X) -> np.ndarray:

class _UniformBaseline:
    """Reference shape — not a real model, just shows the Protocol."""
    version = "0.0.0"

    def predict_proba(self, X: list[str]) -> np.ndarray:
        rng = np.random.default_rng(42)
        return rng.uniform(0, 1, size=len(X))


df = pd.DataFrame({"text": [f"row_{i}" for i in range(40)],
                   "label": [i % 2 for i in range(40)]})
slice_ = EvalSlice(name="test", df=df)
result = evaluate({"u": _UniformBaseline()}, [slice_], run_id="proto-shape")
print(f"PR-AUC: {result.by_slice['test']['by_scorer']['u']['pr_auc']:.3f}")

Anything implementing this is a valid Scorer. The PyTorch adapter below is the same shape, with the work happening inside predict_proba.

Transformer Scorer skeleton#

The pattern: a class wrapping (tokenizer, model, device, batch_size), with predict_proba doing tokenize → forward → softmax → numpy in batches.

# Requires: torch, transformers. Marked skip for Sybil.
from __future__ import annotations

import os
# IMPORTANT: set CUBLAS_WORKSPACE_CONFIG BEFORE importing torch.cuda.
os.environ.setdefault("CUBLAS_WORKSPACE_CONFIG", ":4096:8")

import numpy as np
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# One-time, BEFORE first CUDA op:
torch.use_deterministic_algorithms(True, warn_only=True)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False


class TransformerScorer:
    """eval_toolkit.Scorer wrapping a HuggingFace classification model.

    Run with:
        scorer = TransformerScorer(
            "protectai/deberta-v3-base-prompt-injection-v2",
            device="cuda" if torch.cuda.is_available() else "cpu",
            batch_size=16,
        )
    """
    version = "v2-2025-q4"  # bump when the underlying checkpoint changes

    def __init__(self, model_name: str, *, device: str = "cpu",
                 batch_size: int = 16, max_length: int = 512) -> None:
        self.model_name = model_name
        self.device = device
        self.batch_size = batch_size
        self.max_length = max_length
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = (
            AutoModelForSequenceClassification.from_pretrained(model_name)
            .to(device)
            .eval()
        )

    @torch.no_grad()
    def predict_proba(self, X: list[str]) -> np.ndarray:
        """Return P(positive) for each text. Batched + numpy-out."""
        out = []
        for i in range(0, len(X), self.batch_size):
            batch = X[i : i + self.batch_size]
            enc = self.tokenizer(
                batch,
                padding=True,
                truncation=True,
                max_length=self.max_length,
                return_tensors="pt",
            ).to(self.device)
            logits = self.model(**enc).logits  # shape (batch, n_classes)
            probs = torch.softmax(logits, dim=-1)[:, 1]  # P(injection)
            out.append(probs.cpu().numpy())
        return np.concatenate(out)

With LoRA adapters#

If you fine-tuned with PEFT, load the base model + adapter and otherwise reuse the wrapper above:

# Requires: torch, transformers, peft. Marked skip for Sybil.
from peft import PeftModel  # noqa


class LoRATransformerScorer(TransformerScorer):
    """Loads base model + LoRA adapter; predict_proba reused unchanged."""
    version = "lora-2026-q1"

    def __init__(self, base_model_name: str, lora_path: str, **kwargs) -> None:
        super().__init__(base_model_name, **kwargs)
        self.model = PeftModel.from_pretrained(self.model, lora_path).eval()

With SliceAwareScorer for cost control#

LLM-judge and large-transformer scorers are expensive — typical production runs skip them on inexpensive subgroup slices and only run them on the headline test slice. Implement SliceAwareScorer’s should_score_slice hook:

# Requires: torch, transformers. Marked skip for Sybil.
class CostControlledTransformerScorer(TransformerScorer):
    """Skip subgroup slices to save GPU minutes."""
    version = "v2-cost-controlled"

    SLICE_ALLOW_LIST: frozenset[str] = frozenset({
        "test", "ood_lakera", "ood_llmail",  # full-cost slices
    })

    def should_score_slice(self, slice_name: str) -> bool:
        return slice_name in self.SLICE_ALLOW_LIST

The harness honors should_score_slice automatically — see evaluate(...); skipped slices land in RunResult.by_slice[name].by_scorer[scorer_name] = {"skipped": "..."}.

DataLoader worker seeding (when training, not just inference)#

If your Scorer.fit(...) trains a transformer, the DataLoader worker seeding rules in reproducibility.md §”DataLoader worker seeding” apply:

# Requires: torch. Marked skip for Sybil.
import random  # noqa
import numpy as np  # noqa
import torch  # noqa
from torch.utils.data import DataLoader  # noqa

def seed_worker(worker_id: int) -> None:
    """Per-worker reseed used as DataLoader.worker_init_fn."""
    seed = torch.initial_seed() % 2**32
    np.random.seed(seed)
    random.seed(seed)


# generator = torch.Generator()
# generator.manual_seed(42)
# loader = DataLoader(
#     dataset, batch_size=32, shuffle=True,
#     worker_init_fn=seed_worker,
#     generator=generator,
# )

Without these two arguments, every fold’s data shuffle is non-deterministic regardless of set_global_seeds(42).

fp16 / bf16: trade-offs#

Mixed-precision speeds up inference 1.5–3× on modern GPUs but introduces small numerical noise that bootstrap CIs absorb but bit- identity does not. Two implications:

  • Calibrate at inference precision. If you’ll deploy in bf16, fit fit_temperature on bf16 logits, not fp32 logits cast to bf16.

  • Don’t compare ECE across precision levels. A 0.001–0.005 ECE delta is well within fp16/bf16 noise on moderate-size eval sets.

# Requires: torch. Marked skip for Sybil.
# bf16 inference:
# self.model = self.model.to(dtype=torch.bfloat16)
# logits.float()  # cast back to fp32 BEFORE softmax for numerical stability

Putting the LoRA scorer into the harness#

Drop-in replacement for _UniformBaseline in the PI walkthrough:

# Requires: torch, transformers, peft. Marked skip for Sybil.
# from eval_toolkit import evaluate_folded, SourceDisjointKFoldSplitter
#
# scorer = LoRATransformerScorer(
#     base_model_name="microsoft/deberta-v3-base",
#     lora_path="my/adapter",
#     device="cuda",
#     batch_size=32,
# )
#
# result = evaluate_folded(
#     {"deberta_lora": scorer},
#     SourceDisjointKFoldSplitter(source_col="source", k=3, seed=42),
#     parent_slice,
#     run_id="lora-v1",
#     leakage_checks=[NormalizedFormLeakageCheck(), CrossSplitLeakageCheck()],
#     on_leakage="record",
# )

Copy-paste minimal scorer (CPU, no LoRA, no PEFT)#

If you just want to start scoring with a public injection-detector checkpoint:

# Requires: torch, transformers. Marked skip for Sybil.
# import os; os.environ.setdefault("CUBLAS_WORKSPACE_CONFIG", ":4096:8")
# import torch
# torch.use_deterministic_algorithms(True, warn_only=True)
#
# scorer = TransformerScorer(
#     "protectai/deberta-v3-base-prompt-injection-v2",
#     device="cpu",
#     batch_size=8,
# )
# # Now plug into evaluate(...) as in the PI walkthrough.

Pitfalls / Common mistakes#

  • CUBLAS_WORKSPACE_CONFIG set after import. Has no effect once the CUDA context exists. Set in os.environ BEFORE import torch.

  • Calling predict_proba outside torch.no_grad(). Builds the autograd graph; OOMs fast on long inputs. The decorator above prevents this.

  • model.train() left on during eval. Dropout is still active; every predict_proba call returns slightly different scores. model.eval() is mandatory.

  • Returning a torch tensor from predict_proba. The harness expects np.ndarray. Always .cpu().numpy() at the end.

  • fp16 inference with torch.softmax on small logits. Underflows to 0 in the tail; you lose calibration in the [0, 0.01] regime. Cast to fp32 before softmax.

See also#