Threat model — summary

Public-site note. This is a scope reference. It tells you which attack classes were in scope and which were deliberately deferred; it does not claim deployment coverage.

This file is a convenience aggregator. Canonical content lives in WRITEUP.md §0 + §5.6; the decision ledger rows in SPEC_GREENFIELD.md §0 Threat are the lock points for what’s in / out of scope. This file gives a single-page security-flavor entry to those.

Attack classes named

Class In scope? Spec lock
Direct injection — adversarial text in user input attempting to override system instructions In scope ADR-014 (Phase 0-01)
Indirect injection — adversarial text arriving via context channels (retrieved docs, tool outputs, file attachments) In scope ADR-014 (Phase 0-01)
Multi-turn injection — adversarial payload split across multiple conversation turns Deferred ADR-014; see WRITEUP §5.6
Encoded payloads — base64 / leetspeak / hex / Unicode confusables Deferred ADR-014; see WRITEUP §5.6
Paraphrase attacks — semantic equivalents of training-set injections Deferred ADR-014; see WRITEUP §5.6
Adversarial perturbations — gradient-guided or search-based evasion against a specific classifier Deferred (named, not silently dropped) ADR-014

Scope decisions (all locked at Phase 0-01 via ADR-014)

  • Attack classes in scope: direct + indirect injection (multi-turn / encoded / paraphrase / adversarial perturbations deferred)
  • Language scope: English-only
  • Length cap: 512 tokens, single-turn

Reference-scorer training-overlap audit (locked discipline)

Any external reference scorer evaluated alongside in-house rungs gets a training-data audit. Verdict per the three-state taxonomy:

  • verified_disjoint — training data verifiably disjoint from project sources
  • suspected_contamination — known overlap with one or more project sources
  • vendor_black_box — training data not disclosed (audit shifts to fold-pattern + scope-mismatch analysis)

Verdicts land in EVIDENCE.md §1-2. The taxonomy is encoded in eval-toolkit manifests’ contamination_flags field (see docs/MANIFEST_SCHEMA.md).

Out of scope (named, not silently)

  • Deployment recommendation
  • SOTA chasing
  • Production-readiness testing
  • See WRITEUP §8 for the consolidated deferred list

Cross-references

  • WRITEUP.md §0 Motivation, §5.6 Adversarial robustness
  • SPEC_GREENFIELD.md §0 Threat (Feature Specs section) + decision ledger §0 rows
  • EVIDENCE.md §1-2 (reference scorer audits)
  • docs/MANIFEST_SCHEMA.md (contamination_flags field)
  • docs/research/attacks_defenses/ (literature dossier on attack taxonomies + defenses)