Migration notes (v0.8.2 and prompt-injection-v3)#

Lifecycle policy: `stop:` → `lifecycle:` (v0.8.2)#

The YAML stop: block is renamed to lifecycle: with three-valued actions instead of booleans, and the defaults change so that successful runs release their volume disk by default.

Motivation#

On 2026-05-17 the repo’s RunPod account held 76 stale EXITED pods totaling 3,930 GB of preserved volume disk — **$1.10/hr (~$26/day, ~$393/month)** of idle storage burn. The leak existed because `runpodctl pod stop` (which `runpod-deploy` issued under the old `on_success: true` default) *pauses* a pod but **keeps the volume disk allocated indefinitely** at RunPod's $0.10/GB·month rate. Operators reasonably assumed “stop” meant “terminated” — the documentation at lifecycle.md:214-222 literally said so. The schema change makes the action space explicit and the cost trade-off visible at config-edit time.

New schema#

lifecycle:
  on_success: delete       # NEW default — releases volume disk on success
  on_failure: stop         # NEW default — preserves paused pod for SSH forensics

Each field accepts one of three strings (plus a fourth on on_success only):

value	runpodctl call	volume disk after
`preserve`	(none)	continues at full rate (compute + disk)
`stop`	`pod stop <id>`	continues at ~$0.10/GB·month indefinitely
`delete`	`pod delete <id>`	released
`recycle`	`pod stop <id>`	continues at ~$0.10/GB·month; next run resumes this pod (`on_success` only)

See lifecycle.md §7 for the full table and lifecycle.md §7b for the cleanup-after-forensics workflow.

Legacy `stop:` block — bool shim#

Existing configs using the old stop: {on_success: bool, on_failure: bool} block continue to parse; a single [deprecated] WARNING is emitted per parse. The shim maps:

old form	new equivalent
`stop.on_success: true`	`lifecycle.on_success: delete`
`stop.on_success: false`	`lifecycle.on_success: preserve`
`stop.on_failure: true`	`lifecycle.on_failure: stop`
`stop.on_failure: false`	`lifecycle.on_failure: preserve`

v0.8.3 removed the bool shim. A YAML config containing stop: now raises ValueError with a message naming the v0.8.3 removal and pointing at this doc. Consumers pinned to v0.8.2 or earlier continue to parse the legacy form with a [deprecated] WARNING; pinning to runpod-deploy>=0.8.3 requires migrating to the lifecycle: block first.

CLI changes#

old command	new command
`runpod-deploy stop --state-file <path>`	`runpod-deploy cleanup --state-file <path> --mode stop`
(no equivalent — was a manual `xargs` invocation)	`runpod-deploy cleanup --all-stopped [--yes]`
(no equivalent)	`runpod-deploy ls-stale [--json]`

The stop subcommand remains as a deprecated alias.

Python API changes (breaking for direct importers)#

# Before
from runpod_deploy import StopPolicySpec
from runpod_deploy.provider import stop_pod

# After
from runpod_deploy import LifecyclePolicySpec, LIFECYCLE_ACTIONS, StalePod
from runpod_deploy.provider import cleanup_pod, list_stale_pods, bulk_delete_pods

RunpodJobSpec.stop is renamed to RunpodJobSpec.lifecycle.

What you need to do#

Now: nothing required — your existing configs and any in-flight runs continue to work via the bool shim. Watch the [deprecated] warnings to gauge your migration backlog.
Next sweep / next config edit: rename the stop: block to lifecycle: and replace booleans with string values. The migration is mechanical; the table above is the full mapping.
Audit: run runpod-deploy ls-stale to find any historical pods that the old code left behind; bulk-release with runpod-deploy cleanup --all-stopped --yes.
Hygiene: wire runpod-deploy ls-stale into a weekly cron or CI job to detect drift. See recipes/stale-pod-audit.md.

prompt-injection-v3 Migration#

This document walks prompt-injection-v3 consumers through replacing v3’s hand-rolled deploy commands (uv run reviewer-runpod, uv run v3-1-runpod, uv run v3-1-runpod-ephemeral) with thin wrappers around runpod-deploy run.

If you’re migrating a different consumer (e.g., a fresh project), skip this doc and go straight to quickstart.md.

Why migrate#

prompt-injection-v3 (the project) pre-dates runpod-deploy (the tool). The v3-era deploy scripts were hand-rolled bash that duplicated GPU/DC failover, staging, and artifact-pull logic. Every sweep maintenance change required editing six different scripts in parallel.

runpod-deploy absorbs those primitives:

GPU/DC failover — pod.gpu_order + pod.datacenters iterate the matrix automatically; v3 had to encode this in bash per script.
Staging excludes — staging[].excludes_default + standard rsync excludes replace the --exclude flag stacks v3 maintained inline.
Cost capping — budget.cost_cap_usd enforces both per-invocation budget and derives the implicit runtime ceiling; v3 had cost caps only via timeout on runpodctl pod create.
Deploy metadata capture — git SHA + lockfile hash land in runpod_deploy_pull_manifest.json automatically; v3 hand-rolled GIT_SHA=$(git rev-parse HEAD) injection.
Artifact pull manifest — runpod_deploy_pull_manifest.json records what was pulled, when, with what cost; v3 had ad-hoc pulled_log.txt.

The migration is mechanical: each v3 deploy command becomes a YAML config + a Makefile target that invokes runpod-deploy run.

One-time setup#

In the prompt-injection-v3 repo:

# Add runpod-deploy as an optional dependency
# (in pyproject.toml's [project.optional-dependencies.cloud]):
#   cloud = ["runpod-deploy>=0.8.1"]
uv sync --extra cloud

# runpod-deploy is now at .venv/bin/runpod-deploy
.venv/bin/runpod-deploy --help

This is the recommended “consumer-owned configs” pattern (see the runpod-deploy README’s “Consumer-owned configs” section).

Per-job migration#

Step 1: write the YAML config#

Create configs/runpod/<job-name>.yaml in your v3 repo. Use quickstart.md as the template; reference config-reference.md for field semantics.

The v3-era environment variables and command-line flags map to YAML sections as follows:

v3 hand-rolled	runpod-deploy YAML
`--gpu-type`, `--gpu-type-fallback`	`pod.gpu_order` (ordered list)
`--datacenter`, `--datacenter-fallback`	`pod.datacenters` (ordered list)
`--cost-cap-usd` (per-script)	`budget.cost_cap_usd`
`--timeout-minutes`	`budget.max_runtime_minutes`
`--cloud-type`	`pod.cloud_type` (`SECURE` or `COMMUNITY`)
Hand-rolled `rsync --exclude=foo`	`staging[].excludes_extra: [foo]`
Hand-rolled `git rev-parse HEAD`	Auto-captured in manifest
Inline bash run script	`run.body` (multi-line YAML string)

Step 2: validate#

runpod-deploy validate --config configs/runpod/<job-name>.yaml --all

The --all flag runs every opt-in check: schema validation, local path existence, GPU availability against the configured datacenters, consumer pyproject scan. Fix anything it flags before paying for a pod.

Step 3: dry-run#

runpod-deploy run --config configs/runpod/<job-name>.yaml --offline-dry-run

--offline-dry-run walks the command shape without hitting the network — no runpodctl calls, no SSH, no rsync. Confirms the orchestrator state machine accepts your config end-to-end.

Step 4: real run#

runpod-deploy run --config configs/runpod/<job-name>.yaml

On success, your artifacts land under artifacts/runpod/<timestamp>/ along with runpod_deploy_pull_manifest.json. The pod is stopped automatically per stop.on_success: true.

Step 5: keep the v3 command name (optional)#

If you want uv run reviewer-runpod to keep working as a thin shim, add a one-line wrapper to pyproject.toml:

[project.scripts]
reviewer-runpod = "your_v3_module.cli:reviewer_runpod_main"

Where reviewer_runpod_main is a 3-line Python function that calls subprocess.run(["runpod-deploy", "run", "--config", "runpod/reviewer.yaml", *sys.argv[1:]]). This lets you keep your existing tooling (uv run reviewer-runpod --dry-run) while the underlying execution is delegated.

Regression testing#

Before retiring the old v3 hand-rolled deploy scripts, run both in parallel for one billing cycle:

Run the v3 hand-rolled script: uv run reviewer-runpod. Note pod-id, wall time, cost.
Run the runpod-deploy equivalent: runpod-deploy run --config runpod/reviewer.yaml. Note pod-id, wall time, cost.
Compare the pulled artifacts byte-for-byte (diff -r artifacts/v3-script-output/ artifacts/runpod/<ts>/).

If the artifacts diverge, do NOT retire the v3 script until you’ve diagnosed the cause. Common causes: missing files in staging[] (check excludes_default semantics), different environment variables (check remote_env.exports + secrets), or different gpu_order producing different GPU classes per shard.

Backwards-compat timeline#

v3.x with hand-rolled scripts: keep working as-is. No runpod-deploy dependency.
v3.x with runpod-deploy >= 0.8.1: add the wrapper per Step 5; both invocation styles work side-by-side.
v4.x (planned): hand-rolled scripts removed; runpod-deploy is the only path. Migration deadline TBD; will be announced in v3’s CHANGELOG when set.