Public Signup Ops
Source: docs/runbooks/public-signup-ops.md
---
title: Public Signup Operations Runbook
description: Operational controls for signup abuse prevention, queue handling, failure escalation, and cleanup behavior.
owner: ops
last_reviewed: 2026-03-03
audience: internal
status: active
---
# Public Signup Operations Runbook
Purpose: operational SOP for monitoring, support, and cleanup of self-serve signup sessions.
## Daily checks
1. Review sessions in `REVIEW_REQUIRED` and `FAILED`.
2. Confirm OTP delivery metrics and Resend health.
3. Confirm Clerk invitation delivery and acceptance rates.
4. Confirm purge scheduler is running and deleting stale sessions.
## Rate-limit scope note
- Current signup limiter (`apps/api/src/lib/signup/risk-controls.ts`) is process-local in-memory.
- In multi-replica deployments, limits apply per instance, not globally.
- Treat this as an interim control until a shared limiter backend (Redis/KV) is deployed.
## Session status policy
- `PENDING_EMAIL_VERIFICATION`: waiting on OTP verification.
- `EMAIL_VERIFIED`: ready for provisioning.
- `PROVISIONING`: active step-machine execution.
- `PROVISIONED`: completed.
- `REVIEW_REQUIRED`: blocked by dependency/config; manual follow-up required.
- `FAILED`: unrecoverable error; investigate and retry via support.
- `EXPIRED`: timed out/aged out.
## Retention and purge
- Expired in-flight sessions are marked `EXPIRED`.
- Stale `REVIEW_REQUIRED`/`FAILED`/`EXPIRED` sessions are purged after retention threshold.
- Purge scheduler emits counts for:
- expired sessions marked,
- stale sessions deleted.
## Support triage checklist
Capture:
- `sessionId`
- current status
- `failureReason` / `reviewReason`
- admin email domain
- timestamp and environment
Then:
1. Validate feature flag and provider envs.
2. Validate slug conflict state.
3. Retry provision with fresh idempotency key.
4. If still blocked, escalate with session payload snapshot.
## Escalation
- App/runtime issues: platform-oncall.
- Provider outages (Clerk/Resend): incident channel + degraded launch notice.
- Abuse/rate-limit incidents: security/ops review with offending IP/domain/email patterns.