Skip to Content
InternalDocsRunbooksPublic Signup Ops

Public Signup Ops

Source: docs/runbooks/public-signup-ops.md

--- title: Public Signup Operations Runbook description: Operational controls for signup abuse prevention, queue handling, failure escalation, and cleanup behavior. owner: ops last_reviewed: 2026-03-03 audience: internal status: active --- # Public Signup Operations Runbook Purpose: operational SOP for monitoring, support, and cleanup of self-serve signup sessions. ## Daily checks 1. Review sessions in `REVIEW_REQUIRED` and `FAILED`. 2. Confirm OTP delivery metrics and Resend health. 3. Confirm Clerk invitation delivery and acceptance rates. 4. Confirm purge scheduler is running and deleting stale sessions. ## Rate-limit scope note - Current signup limiter (`apps/api/src/lib/signup/risk-controls.ts`) is process-local in-memory. - In multi-replica deployments, limits apply per instance, not globally. - Treat this as an interim control until a shared limiter backend (Redis/KV) is deployed. ## Session status policy - `PENDING_EMAIL_VERIFICATION`: waiting on OTP verification. - `EMAIL_VERIFIED`: ready for provisioning. - `PROVISIONING`: active step-machine execution. - `PROVISIONED`: completed. - `REVIEW_REQUIRED`: blocked by dependency/config; manual follow-up required. - `FAILED`: unrecoverable error; investigate and retry via support. - `EXPIRED`: timed out/aged out. ## Retention and purge - Expired in-flight sessions are marked `EXPIRED`. - Stale `REVIEW_REQUIRED`/`FAILED`/`EXPIRED` sessions are purged after retention threshold. - Purge scheduler emits counts for: - expired sessions marked, - stale sessions deleted. ## Support triage checklist Capture: - `sessionId` - current status - `failureReason` / `reviewReason` - admin email domain - timestamp and environment Then: 1. Validate feature flag and provider envs. 2. Validate slug conflict state. 3. Retry provision with fresh idempotency key. 4. If still blocked, escalate with session payload snapshot. ## Escalation - App/runtime issues: platform-oncall. - Provider outages (Clerk/Resend): incident channel + degraded launch notice. - Abuse/rate-limit incidents: security/ops review with offending IP/domain/email patterns.