Secrets management — three-store split aligned with execution context (.env + RunPod pod-secrets + GH Actions repo Secrets)
ADR-035: Secrets management — three-store split aligned with execution context
Status
Accepted (2026-05-16). Closes the §Tech-Stack ledger row 305 (Secrets management). Companion to ADR-020 (runpod-deploy compute infrastructure + cost discipline — the RunPod-side mechanism for pod-secrets injection), ADR-030 (Quarto + GH Actions publish — the CI-side surface where GH Actions repo Secrets are consumed), and ADR-032 (HF Hub publication — the primary HF_TOKEN consumer at Phase 5 close).
Context
Four-to-five API tokens span three execution contexts during the submission lifecycle.
| Secret | Required by | Local laptop | RunPod cloud pod | GH Actions runner |
|---|---|---|---|---|
HF_TOKEN |
ADR-016 dataset SHA pinning + ADR-032 model publication | yes (Phase 5 publish) | yes (Phase 2-3 dataset fetch) | conditional (if model card push runs in CI) |
RUNPOD_API_KEY |
ADR-020 runpod-deploy CLI auth | yes (Phase 2-3 dispatch) | n/a (pod itself doesn’t need this) | n/a |
OPENAI_API_KEY |
ADR-018 R-LLM-OpenAI scorer (gpt-4o-2024-08-06) |
conditional | yes (Phase 3 eval) | n/a |
ANTHROPIC_API_KEY |
ADR-018 R-LLM-Anthropic scorer (claude-sonnet-4-6) |
conditional | yes (Phase 3 eval) | n/a |
GITHUB_TOKEN |
ADR-030 GH Actions publish workflow | n/a | n/a | yes (auto-injected by GH Actions runtime) |
(No WANDB_API_KEY — no W&B integration scoped.)
The constraint surface that drives the choice:
- All consumer tools default to env-var discovery —
huggingface_hubreadsHF_TOKEN;openaireadsOPENAI_API_KEY;anthropicreadsANTHROPIC_API_KEY;runpod-deployreadsRUNPOD_API_KEY. The libraries find tokens via env vars without configuration. .envis the universal local-dev convention (12-factor app config principle — https://12factor.net/config).- RunPod has a pod-secrets primitive — set via
runpod-deployconfig (per ADR-020); injected as env vars on pod start. - GH Actions has repo-level Secrets — set via repo Settings → Secrets and variables → Actions; available as
${{ secrets.NAME }}in workflow steps. - Pre-commit
gitleaksis already enabled — all prior commits in this session passed gitleaks; catches accidental.envcommits.
Four options were considered (per Phase 0-08 Q5 walk):
- Three-store split aligned with execution context — standard 12-factor pattern.
- Cloud secret manager (Doppler / Infisical / 1Password / AWS Secrets Manager / GCP Secret Manager) — centralized.
- Encrypted-in-repo (git-crypt / sops / age) — encrypted blobs committed.
- HF-canonical token cache + ad-hoc for others — asymmetric across secrets.
User selection at Q5 walk: A.
Decision
Three-store split
Local laptop — gitignored .env at repo root contains real tokens. Loaded by scripts via python-dotenv or manual os.environ read; consumer libraries find tokens via their default env-var discovery.
RunPod cloud pod — pod-secrets injected via runpod-deploy config (pod.env_vars or equivalent per the runpod-deploy 0.7.7 schema). Tokens become env vars on pod start; consumer libraries find them.
GitHub Actions runner — repo Settings → Secrets and variables → Actions: - GITHUB_TOKEN is auto-injected by GH Actions runtime (per ADR-030 publish workflow); no manual configuration. - HF_TOKEN added as repo secret if model card push runs in CI (per ADR-032; final publication-step location TBD at Phase 5 — may be local or CI).
Committed template — .env.example
A committed .env.example at repo root (no real tokens; placeholder values) enumerates the four canonical env vars:
# Required for Phase 1-5 execution; see decisions/ADR-035-secrets-management-three-store-split.md
# Copy this file to .env (gitignored) and fill in real values.
# DO NOT commit .env (.gitignore covers; gitleaks pre-commit hook catches).
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
RUNPOD_API_KEY=YOUR_RUNPOD_API_KEY_HERE
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxxxxxxxxxxxxxxxx
Reviewer reading the repo at submission tag sees the secret surface immediately without running anything.
Pre-commit gitleaks gate
Already enabled in .pre-commit-config.yaml; all session commits pass the gate. This ADR explicitly ratifies the gate as part of the secrets posture.
Rotation protocol
Token rotation requires updating all three stores in sequence:
- Generate new token at the provider (HuggingFace / RunPod / OpenAI / Anthropic).
- Update local
.env. - Update
runpod-deployconfig (or RunPod pod-secret directly if config-driven injection is not used). - Update GH Actions repo Secrets if the token is used in CI.
- Verify via
scripts/preflight_secrets.py(Phase 1 work item).
Documented in docs/secrets.md rotation runbook (Phase 1 work item; out-of-scope for Phase 0-08 close).
Pre-flight verification
scripts/preflight_secrets.py (Phase 1 work item) asserts the four env vars are non-empty before any real-cost run. Fails loud per Python standards:
def preflight_secrets() -> None:
"""Verify all required secrets are present before real-cost runs.
Raises ValueError with explicit context naming each missing token + its consumer.
"""
required = {
"HF_TOKEN": "huggingface_hub (ADR-016 dataset fetch + ADR-032 model publish)",
"RUNPOD_API_KEY": "runpod-deploy CLI (ADR-020 GPU dispatch)",
"OPENAI_API_KEY": "openai SDK (ADR-018 R-LLM-OpenAI scorer)",
"ANTHROPIC_API_KEY": "anthropic SDK (ADR-018 R-LLM-Anthropic scorer)",
}
missing = [(name, consumer) for name, consumer in required.items() if not os.environ.get(name)]
if missing:
details = "; ".join(f"{name} (needed by {consumer})" for name, consumer in missing)
raise ValueError(f"Missing required env vars: {details}. See .env.example.")Consequences
Positive
- Aligns with library defaults — every consumer library discovers tokens via env vars natively; no glue code needed; zero adaptation cost.
- No new infrastructure —
.env+ gitignore + gitleaks pre-commit + RunPod pod-secrets + GH Actions repo Secrets are all standard / existing primitives. - Audit-friendly via
.env.example— reviewer sees the secret surface (which tokens, from which providers, for which consumers) without running anything. - Deadline-realistic — option B (cloud secret manager) would consume 4+ hours of auth-chain setup before any real work; ADR-001 calendar doesn’t have that budget.
- Defense-in-depth —
.gitignoreblocks accidental.envstaging; gitleaks pre-commit hook catches any leak that slips through; preflight script catches missing tokens before real spend. - Honors existing posture — the gitleaks pre-commit hook has been passing every commit in this session; this ADR ratifies that gate explicitly.
Negative / cost
- Secrets in three stores means rotation discipline matters; if a token rotates and
.envupdates but RunPod pod-config doesn’t, the pod fails mid-run. Mitigation: pre-flight script + rotation runbook. .env.exampleplaceholder values must look obviously-not-real to avoid reviewer confusion + to avoid gitleaks false negatives. Mitigation: usehf_xxxx.../sk-xxxx.../YOUR_..._HEREpatterns that gitleaks recognizes as placeholders.- GH Actions Secrets are repo-scope — anyone with repo admin can read them. Acceptable for the public-repo + solo-maintainer posture of this submission; mitigation = rotate after submission if leaving the repo public long-term.
Neutral
- GH Actions
HF_TOKENis conditional — added only if Phase 5 publish-side decides to run model card push in CI. Defer to Phase 5; the discipline is locked at ADR-035; the specific CI-vs-local boundary is per-step. - Cloud secret manager remains a future extension — not chosen at Phase 0-08 close; revisited if scope extends to production-grade deployment or if Phase 1+ adds 5+ more secrets.
Limitation
Three separate stores means rotation discipline is critical. The pre-flight script catches missing-token cases but cannot catch stale-token cases (where .env has the new token but RunPod pod-config still has the old token). Mitigation: rotation runbook documents the sequence; post-rotation runs are checked via runpod-deploy validate --all (per ADR-020 preflight) which fails on auth errors.
Extension condition for revisit
- Production-grade deployment scope extension triggers migration to a cloud secret manager (option B from the Q5 walk) — Doppler / Infisical / 1Password / AWS Secrets Manager / GCP Secret Manager — via superseding ADR. The CI auth chain + audit logging needed at production grade exceeds what three-store split provides.
- Phase 1+ surfaces 5+ more secrets (e.g., a third LLM-judge ablation per ADR-018 afterword expanding to gpt-4.1 / opus-4-7 / o1 / o3) — friction of three-store rotation starts to dominate; superseding ADR migrates to a secret manager. Currently below the friction threshold (4 secrets).
- Repo migration to a Ciphero org post-submission — GH Actions repo Secrets must be re-provisioned at the new repo location; documented in the rotation runbook.
Alternatives Considered
- (B) Cloud secret manager (Doppler / Infisical / 1Password / AWS Secrets Manager / GCP Secret Manager) — adds infrastructure dependency; auth-chain setup is hours-to-days first-time; overkill for 2-day submission with 4 tokens. Rejected per Q5 walk in favor of A.
- (C) Encrypted-in-repo (git-crypt / sops / age) — encrypted secrets committed; decrypt at use. Key management is its own secret problem (chicken-and-egg); consumer libraries don’t read encrypted blobs; not standard for ML projects. Rejected per Q5 walk.
- (D) HuggingFace CLI token caching (
~/.cache/huggingface/token) + RunPod pod-secrets + GH Actions Secrets — HF-canonical for HF only; other tokens still need.envor env-var injection; adds asymmetry across secrets. Rejected for inconsistency. - Hardcoded in scripts — never acceptable; standard anti-pattern.
References
- 12-Factor App config principles — https://12factor.net/config
huggingface_hubtoken authentication — https://huggingface.co/docs/huggingface_hub/quick-start#authentication- GitHub Actions Encrypted Secrets — https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions
runpod-deploy(RunPod pod-secrets injection mechanism) — https://github.com/brandon-behring/runpod-deploy- ADR-020 (compute infrastructure — RunPod pod-secrets primitive consumer)
- ADR-018 (reference scorer slate —
OPENAI_API_KEY+ANTHROPIC_API_KEYconsumers) - ADR-030 (Quarto + GH Actions publish —
GITHUB_TOKENconsumer) - ADR-032 (HF Hub publication —
HF_TOKENconsumer at Phase 5)
Transcript
See transcripts/2026-05-16__phase-0-08__process-tech-stack-acceptance.md for the conversation that led to this decision (Q5 walk + option A selection).