Migration notes (v0.8.2 and prompt-injection-v3)#
Lifecycle policy: stop: → lifecycle: (v0.8.2)#
The YAML stop: block is renamed to lifecycle: with three-valued
actions instead of booleans, and the defaults change so that
successful runs release their volume disk by default.
Motivation#
On 2026-05-17 the repo’s RunPod account held 76 stale EXITED pods
totaling 3,930 GB of preserved volume disk — **\(1.10/hr (~\)26/day,
~\(393/month)** of idle storage burn. The leak existed because
`runpodctl pod stop` (which `runpod-deploy` issued under the old
`on_success: true` default) *pauses* a pod but **keeps the volume
disk allocated indefinitely** at RunPod's \)0.10/GB·month rate.
Operators reasonably assumed “stop” meant “terminated” — the
documentation at lifecycle.md:214-222 literally said so. The
schema change makes the action space explicit and the cost trade-off
visible at config-edit time.
New schema#
lifecycle:
on_success: delete # NEW default — releases volume disk on success
on_failure: stop # NEW default — preserves paused pod for SSH forensics
Each field accepts one of three strings (plus a fourth on
on_success only):
value |
runpodctl call |
volume disk after |
|---|---|---|
|
(none) |
continues at full rate (compute + disk) |
|
|
continues at ~$0.10/GB·month indefinitely |
|
|
released |
|
|
continues at ~$0.10/GB·month; next run resumes this pod ( |
See lifecycle.md §7 for
the full table and lifecycle.md §7b
for the cleanup-after-forensics workflow.
Legacy stop: block — bool shim#
Existing configs using the old stop: {on_success: bool, on_failure: bool}
block continue to parse; a single [deprecated] WARNING is emitted
per parse. The shim maps:
old form |
new equivalent |
|---|---|
|
|
|
|
|
|
|
|
v0.8.3 removed the bool shim. A YAML config containing stop: now
raises ValueError with a message naming the v0.8.3 removal and
pointing at this doc. Consumers pinned to v0.8.2 or earlier continue
to parse the legacy form with a [deprecated] WARNING; pinning to
runpod-deploy>=0.8.3 requires migrating to the lifecycle: block
first.
CLI changes#
old command |
new command |
|---|---|
|
|
(no equivalent — was a manual |
|
(no equivalent) |
|
The stop subcommand remains as a deprecated alias.
Python API changes (breaking for direct importers)#
# Before
from runpod_deploy import StopPolicySpec
from runpod_deploy.provider import stop_pod
# After
from runpod_deploy import LifecyclePolicySpec, LIFECYCLE_ACTIONS, StalePod
from runpod_deploy.provider import cleanup_pod, list_stale_pods, bulk_delete_pods
RunpodJobSpec.stop is renamed to RunpodJobSpec.lifecycle.
What you need to do#
Now: nothing required — your existing configs and any in-flight runs continue to work via the bool shim. Watch the
[deprecated]warnings to gauge your migration backlog.Next sweep / next config edit: rename the
stop:block tolifecycle:and replace booleans with string values. The migration is mechanical; the table above is the full mapping.Audit: run
runpod-deploy ls-staleto find any historical pods that the old code left behind; bulk-release withrunpod-deploy cleanup --all-stopped --yes.Hygiene: wire
runpod-deploy ls-staleinto a weekly cron or CI job to detect drift. Seerecipes/stale-pod-audit.md.
prompt-injection-v3 Migration#
This document walks prompt-injection-v3 consumers through replacing
v3’s hand-rolled deploy commands (uv run reviewer-runpod,
uv run v3-1-runpod, uv run v3-1-runpod-ephemeral) with thin
wrappers around runpod-deploy run.
If you’re migrating a different consumer (e.g., a fresh project), skip
this doc and go straight to quickstart.md.
Why migrate#
prompt-injection-v3 (the project) pre-dates runpod-deploy (the
tool). The v3-era deploy scripts were hand-rolled bash that
duplicated GPU/DC failover, staging, and artifact-pull logic. Every
sweep maintenance change required editing six different scripts in
parallel.
runpod-deploy absorbs those primitives:
GPU/DC failover —
pod.gpu_order+pod.datacentersiterate the matrix automatically; v3 had to encode this in bash per script.Staging excludes —
staging[].excludes_default+ standard rsync excludes replace the--excludeflag stacks v3 maintained inline.Cost capping —
budget.cost_cap_usdenforces both per-invocation budget and derives the implicit runtime ceiling; v3 had cost caps only viatimeoutonrunpodctl pod create.Deploy metadata capture — git SHA + lockfile hash land in
runpod_deploy_pull_manifest.jsonautomatically; v3 hand-rolledGIT_SHA=$(git rev-parse HEAD)injection.Artifact pull manifest —
runpod_deploy_pull_manifest.jsonrecords what was pulled, when, with what cost; v3 had ad-hocpulled_log.txt.
The migration is mechanical: each v3 deploy command becomes a YAML
config + a Makefile target that invokes runpod-deploy run.
One-time setup#
In the prompt-injection-v3 repo:
# Add runpod-deploy as an optional dependency
# (in pyproject.toml's [project.optional-dependencies.cloud]):
# cloud = ["runpod-deploy>=0.8.1"]
uv sync --extra cloud
# runpod-deploy is now at .venv/bin/runpod-deploy
.venv/bin/runpod-deploy --help
This is the recommended “consumer-owned configs” pattern (see the runpod-deploy README’s “Consumer-owned configs” section).
Per-job migration#
Step 1: write the YAML config#
Create configs/runpod/<job-name>.yaml in your v3 repo. Use
quickstart.md as the template; reference
config-reference.md for field semantics.
The v3-era environment variables and command-line flags map to YAML sections as follows:
v3 hand-rolled |
runpod-deploy YAML |
|---|---|
|
|
|
|
|
|
|
|
|
|
Hand-rolled |
|
Hand-rolled |
Auto-captured in manifest |
Inline bash run script |
|
Step 2: validate#
runpod-deploy validate --config configs/runpod/<job-name>.yaml --all
The --all flag runs every opt-in check: schema validation, local
path existence, GPU availability against the configured datacenters,
consumer pyproject scan. Fix anything it flags before paying for a
pod.
Step 3: dry-run#
runpod-deploy run --config configs/runpod/<job-name>.yaml --offline-dry-run
--offline-dry-run walks the command shape without hitting the
network — no runpodctl calls, no SSH, no rsync. Confirms the
orchestrator state machine accepts your config end-to-end.
Step 4: real run#
runpod-deploy run --config configs/runpod/<job-name>.yaml
On success, your artifacts land under artifacts/runpod/<timestamp>/
along with runpod_deploy_pull_manifest.json. The pod is stopped
automatically per stop.on_success: true.
Step 5: keep the v3 command name (optional)#
If you want uv run reviewer-runpod to keep working as a thin shim,
add a one-line wrapper to pyproject.toml:
[project.scripts]
reviewer-runpod = "your_v3_module.cli:reviewer_runpod_main"
Where reviewer_runpod_main is a 3-line Python function that calls
subprocess.run(["runpod-deploy", "run", "--config", "runpod/reviewer.yaml", *sys.argv[1:]]).
This lets you keep your existing tooling (uv run reviewer-runpod --dry-run) while the underlying execution is delegated.
Regression testing#
Before retiring the old v3 hand-rolled deploy scripts, run both in parallel for one billing cycle:
Run the v3 hand-rolled script:
uv run reviewer-runpod. Note pod-id, wall time, cost.Run the runpod-deploy equivalent:
runpod-deploy run --config runpod/reviewer.yaml. Note pod-id, wall time, cost.Compare the pulled artifacts byte-for-byte (
diff -r artifacts/v3-script-output/ artifacts/runpod/<ts>/).
If the artifacts diverge, do NOT retire the v3 script until you’ve
diagnosed the cause. Common causes: missing files in staging[]
(check excludes_default semantics), different environment variables
(check remote_env.exports + secrets), or different gpu_order
producing different GPU classes per shard.
Backwards-compat timeline#
v3.x with hand-rolled scripts: keep working as-is. No runpod-deploy dependency.
v3.x with runpod-deploy >= 0.8.1: add the wrapper per Step 5; both invocation styles work side-by-side.
v4.x (planned): hand-rolled scripts removed; runpod-deploy is the only path. Migration deadline TBD; will be announced in v3’s CHANGELOG when set.
See also#
quickstart.md— the canonical onboarding path for fresh consumers.config-reference.md— full YAML schema.extending.md— the three-tier extension story (consumers / library users / contributors).recipes/local-preflight-then-run.md— the canonical “audit then deploy” pattern; replaces ad-hoc pre-run hooks in v3-era scripts.