Files
sciagent/docs/otp-registration-state-machine.md
T
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

186 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OTP-verified registration: workflow & state machine (engineering guide)
This document describes **email OTP verification during signup** end-to-end: **frontend**, **backend**, **database**, and **when object storage (e.g. MinIO) is relevant**. It is written so another team can implement the same pattern in a **new** application; it aligns with the reference implementation in this repo (`be0/src/auth_api.py`, `be0/src/auth_mail.py`, `fe0/src/auth/registration/`, migrations `013_*` / `014_*`).
---
## 1. Goals & threat model (what we optimize for)
| Goal | How |
|------|-----|
| Prove control of the inbox before full account use | `users.email_verified = false` until OTP succeeds; **login denied** until verified. |
| Avoid storing OTP plaintext | Persist **only a hash** of the OTP (same idea as password storage). |
| Limit brute-force on OTP | Cap **failed attempts** per pending code; use **constant-time** compare on hash. |
| Limit abuse of “resend” | Rate-limit **per email** and **per IP** (see §7). |
| Avoid email enumeration on resend | Return **same JSON envelope** whether the address exists or needs OTP (reference pattern). |
**MinIO / S3:** OTP delivery does **not** use object storage. Include MinIO in your design doc **only** if registration itself uploads files (avatars, documents); keep OTP and mail independent of buckets.
---
## 2. Account-level state machine (`users`)
After successful `POST /auth/register`, the server creates an **active** user with **`email_verified = false`**. Verification flips the flag; login is allowed only when **`email_verified = true`**.
```mermaid
stateDiagram-v2
[*] --> NoAccount: visitor
NoAccount --> RegisteredUnverified: POST /register OK\n(password hashed, user row created)
RegisteredUnverified --> Verified: POST /verify-otp OK\n(email_verified := true)
RegisteredUnverified --> RegisteredUnverified: POST /resend-otp\n(new OTP row, old pending superseded)
RegisteredUnverified --> NoAccount: optional admin deactivate/\ndelete user (app-specific)
Verified --> [*]: normal user lifecycle\n(login, JWT, etc.)
note right of RegisteredUnverified
Login with correct password
returns 403 until verified
(do not leak OTP validity)
end note
```
**Engineering rules:**
- **Register** must be **atomic**: create user + staff/profile as needed + insert OTP row + commit before sending email (or you risk orphan users without OTP).
- **Login** when password OK but `email_verified` is false: return **403** with a message that tells the user to complete OTP / resend (reference copy mentions OTP resend on registration flow).
---
## 3. OTP row lifecycle (`registration_otp_codes`)
Each issued OTP is one row. Only **one logical “pending”** code per user is typical: **issue** deletes previous unused rows for that user, then inserts a new row.
```mermaid
stateDiagram-v2
[*] --> Pending: issue OTP\n(plaintext only in memory/email)
Pending --> Consumed: verify OK\nused_at := now,\nuser.email_verified := true
Pending --> InvalidWrongCode: wrong OTP\nfailed_attempts += 1
InvalidWrongCode --> Pending: failed_attempts < max
InvalidWrongCap --> [*]: failed_attempts >= max\n(row ignored for verify)
Pending --> Superseded: resend or re-register issue\n(DB deletes unused rows first)
Pending --> Expired: now > expires_at\n(row ignored for verify)
Consumed --> [*]
state InvalidWrongCap <<choice>>
```
**Reference schema** (`014_registration_otp.sql`):
- `user_id``users(id)`
- `otp_hash` — never store plaintext OTP
- `expires_at` — server-side TTL (e.g. env-configurable minutes, clamped to a sane max)
- `failed_attempts` — increment on mismatch; **reject verify** when ≥ max (e.g. 5)
- `used_at` — non-null means consumed
**Verify query logic (conceptual):**
1. Resolve user by **normalized email**, active, **`email_verified` still false** — otherwise reject with **generic** error (same message as bad OTP if you want strict anti-enumeration on verify; reference uses a uniform rejection detail).
2. Select **latest** pending row where `used_at IS NULL`, `expires_at > now()`, `failed_attempts < max`.
3. Compare `hash(submitted_otp)` to `otp_hash` with **constant-time** comparison (`hmac.compare_digest` in Python).
4. On match: set `user.email_verified = true`, `row.used_at = now()`, audit log, commit.
5. On mismatch: increment `failed_attempts`, commit, reject.
---
## 4. Frontend state machine (wizard UX)
The SPA typically uses **local UI states**, not the servers DB state diagram:
```mermaid
stateDiagram-v2
[*] --> Form: land on registration
Form --> Form: client validation\n(email domain, password policy, etc.)
Form --> Otp: POST /register 2xx\nemailVerificationRequired === true
Form --> Form: 4xx/5xx show rejection
Otp --> Success: POST /verify-otp OK
Otp --> Otp: verify 4xx clear inputs/\nshow generic error
Otp --> Otp: POST /resend-otp\n(+ cooldown timer UX)
Success --> Login: navigate / user signs in
```
**Contract hints for engineers:**
- Read **`otpTtlSeconds`** from register response and drive countdown (stay in sync with server TTL; avoid hardcoding if backend can change TTL via env).
- **`otpDeliveryChannel`** (or equivalent): `smtp` | `log_only` | `none` | `smtp_failed` — drives toasts/help text (SMTP missing, dev log-only, SMTP error).
- **Resend UX cooldown** on the client is **supplementary**; server **must** enforce rate limits independently.
- OTP input: **exactly 6 digits** if matching this API; paste-friendly fields improve UX.
---
## 5. Backend API surface (minimal)
| Method | Path | Purpose |
|--------|------|---------|
| `POST` | `/auth/register` | Create user with `email_verified=false`; issue OTP; attempt email delivery; return `{ otpTtlSeconds, otpDeliveryChannel, emailVerificationRequired, ... }`. |
| `POST` | `/auth/verify-otp` | Body: `{ email, otp }` — normalized email + `\d{6}` OTP; on success set verified. |
| `POST` | `/auth/resend-otp` | Body: `{ email }` — enumeration-safe constant response; **429** if rate-limited. |
| `POST` | `/auth/login` | Reject **403** when password OK but email not verified (reference behavior). |
**Optional legacy path in this repo:** `POST /auth/verify-email` with **magic-link token** (`email_verification_tokens`). New apps should prefer **either** OTP **or** magic link consistently to reduce confusion.
---
## 6. Email delivery (no MinIO)
Outbound mail is **SMTP** or **dev log**:
- **Production:** `SMTP_HOST`, `SMTP_USER`, `SMTP_PASSWORD`, `SMTP_PORT`, `SMTP_USE_TLS`, `AUTH_MAIL_FROM`, optional `AUTH_PUBLIC_WEB_ORIGIN` for links in other mails.
- **Dev:** `AUTH_MAIL_LOG_ONLY=1` — log plaintext OTP (**never** enable in production).
If SMTP fails **after** user creation, return a distinct delivery channel (e.g. `smtp_failed`) so the client can explain “account created but email failed”; ops fix SMTP and user uses **resend**.
---
## 7. Rate limiting & scaling caveats
Reference implementation uses **in-process** buckets (`auth_rate_limit.py`): e.g. **5 resends per email per hour** and **30 per IP per hour** (rolling window).
**Greenfield checklist:**
- Single replica: in-memory limits are simple.
- Multiple API workers / pods: use **Redis** (or API gateway limits) for shared counters; tune limits per product policy.
- Always rate-limit **resend** harder than **verify** if verify is already capped by `failed_attempts`.
---
## 8. Database prerequisites
1. **`users.email_verified`** — `BOOLEAN NOT NULL DEFAULT FALSE` (migration `013_email_verification.sql` pattern).
2. **`registration_otp_codes`** — table as in §3 (`014_registration_otp.sql` pattern).
3. **Indexes** — partial index on pending OTP by `user_id` speeds lookup.
**MinIO:** no migration needed for OTP. Add bucket policies only if registration uploads binaries.
---
## 9. Observability & security checklist
- **Audit:** log verification success and resend with actor metadata (no OTP plaintext in audit).
- **Logs:** mail failures at `register` / `resend` with exception detail for ops; avoid logging hashed OTP.
- **HTTPS:** registration and OTP over TLS; HttpOnly cookies if using cookie-based sessions.
- **CORS:** allow credentials only for trusted front-end origins.
- **JWT:** issue tokens **after** login; do not put OTP or `email_verified=false` users into a “logged-in” state unless your product explicitly wants partial tokens (reference does not).
---
## 10. Reference file map (this repository)
| Layer | Location |
|-------|----------|
| Register / verify / resend / login gate | `be0/src/auth_api.py` |
| SMTP + `AUTH_MAIL_LOG_ONLY` | `be0/src/auth_mail.py` |
| Resend rate limits | `be0/src/auth_rate_limit.py` |
| OTP table migration | `be0/migrations/014_registration_otp.sql` |
| `email_verified` + magic-link tokens | `be0/migrations/013_email_verification.sql` |
| Registration UI + delivery UX | `fe0/src/auth/registration/RegistrationWithOtp.tsx` |
| TTL/cooldown constants (keep aligned) | `fe0/src/auth/registration/constants.ts` |
| API client | `fe0/src/lib/auth-service.ts` |
---
## 11. Summary one-liner for architects
**OTP registration is a small state machine on top of `users.email_verified` plus short-lived hashed OTP rows, sent over SMTP (not object storage), with gated login and abuse limits on resend—implement the same invariants even if frameworks differ.**