`eval_toolkit.probes`#

Activation probes — linear classifiers over transformer hidden states. ActivationDeltaProbe is a port of the TaskTracker methodology (Abdelnabi et al. 2024, arXiv:2406.00799) for prompt-injection detection via linear probing of activation deltas at a chosen transformer layer.

Optional dependency: pip install eval-toolkit[probes] (installs torch

transformers).

`ActivationDeltaProbe`	TaskTracker-style activation-delta linear probe.
`ActivationExtractor`	Extract a hidden-state vector for each input text.
`Probe`	Minimal probe surface: fit + predict on raw text inputs.

eval_toolkit.probes#

`eval_toolkit.probes`#