18 KiB
Database Engineering Review — User Profile Manager State Machine
Reviewer role: Senior database engineer Document under review: User profile manager — integration analysis and implementation state machine Audience: Backend / platform engineering team Status: Constructive feedback before implementation kickoff
1. Overall assessment
The document is well-organized and shows good systems thinking. In particular, three things are already right and should not be relitigated during implementation:
- The MinIO boundary is correctly drawn. Profile scalars belong in PostgreSQL; binary artifacts (proof documents, exports) are the only thing that should ever touch object storage. Keep that line bright.
- The naming separation between RBAC
roles(admin,viewer, …) and the institutionaljob_title/position_titleis essential and easy to get wrong. Holding to it prevents a class of authorization bugs. - Treating
is_active(account disabled) andprofile_verification_status(HR data trusted) as orthogonal axes is correct. They answer different questions and must not collapse into one column.
That said, the document is largely an interface-level plan. Several decisions that look like product-policy knobs are actually database design decisions with long-term cost — schema shape, constraint placement, transition enforcement, index strategy, and migration mechanics. The rest of this review focuses there.
2. Schema shape: pick the 1:1 child table, not extra columns on users
The document offers two options and leans neutral. The team should pick user_staff_profiles as a 1:1 child table keyed by user_id, and commit to it. Reasons:
- Hot-row contention. The
usersrow is read on every authenticated request (token validation,/me, role checks). Every column added to that table widens the row, increases TOAST pressure for nullable text columns, and pulls more bytes through the cache for traffic that doesn't care about HR data. Profile fields are read on profile pages and admin queues — different access pattern, different table. - Clear ownership. Auth code owns
users+user_roles. HR/profile code ownsuser_staff_profiles. The boundary lines up with the API boundary you're already drawing (/auth/*vs/admin/users/*). - Migration safety. Adding columns to
usersin a live system means coordinating with every read path that doesSELECT *(hopefully none, but in practice some). A new table is additive and isolated. - Permissions. Postgres-level GRANTs and row-level security policies, if you ever want them, are far easier to write per-table than per-column.
The one tradeoff is an extra LEFT JOIN on profile reads. That is cheap and indexable; it is not a reason to denormalize.
Recommendation: create user_staff_profiles with user_id PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE. The 1:1 is enforced by the PK.
3. Field-by-field schema critique
3.1 employee_id
The document says "uniqueness policy TBD". This must be decided before the migration is written, because it determines the constraint shape:
- If globally unique when present, use a partial unique index:
CREATE UNIQUE INDEX ... ON user_staff_profiles (employee_id) WHERE employee_id IS NOT NULL;A plainUNIQUEwould forbid more than one NULL on some configurations and is the wrong tool here. - If required at registration, make it
NOT NULLand use a regularUNIQUE. - Decide format validation (length, charset, leading zeros) and enforce with a
CHECKconstraint, not just in Pydantic. The DB is the last line of defense when a future script writes directly.
3.2 academic_title + academic_title_other
The enum + free-text "other" pattern works but has two failure modes worth pre-empting:
- Drift between code and DB. If
academic_titleis a PostgresENUM, adding a value requiresALTER TYPE, which is fine but easy to forget in code review. Prefer a lookup table (academic_titles(code, label_vi, label_en, sort_order, active)) referenced by FK. Adding a value is an INSERT, not a DDL change. Translations and ordering live next to the value. otherinvariant. Enforce at the DB level:CHECK ((academic_title = 'other') = (academic_title_other IS NOT NULL AND length(trim(academic_title_other)) > 0)). Without this, you will eventually see rows whereacademic_title='professor'andacademic_title_other='Something'from a buggy client.
3.3 department vs the existing users.unit_id
This is the most important inconsistency in the proposal. User already has an optional unit_id. The new design proposes department as free text. Pick one:
- If
unitsis a real catalog, dropdepartmentfrom the new design and reuseunit_id. Free text on top of a catalog guarantees data quality decay. - If the catalog is incomplete and free text is a transitional necessity, model it as
unit_id NULLABLE+unit_name_freetext NULLABLEwith aCHECKthat exactly one is set. Plan a backfill story.
Don't ship two parallel notions of "where this person works."
3.4 job_title
Free text is fine here, but:
- Length cap (e.g.
VARCHAR(120)orTEXTwith aCHECK length()). Without one, this is a future abuse vector in admin UI. - Keep it nullable. Some accounts (system, integration) won't have one.
3.5 Verification fields
profile_verification_status, verification_submitted_at, verified_at, verified_by_user_id, rejection_reason — fine, but enforce the invariants between them in the DB, not just in Python. See §4.
3.6 Timestamps
Every _at column should be TIMESTAMPTZ, never TIMESTAMP. You will eventually have admins in different time zones, and "naive timestamps interpreted as UTC by some code paths and as local time by others" is a recurring bug class. Make the DB authoritative.
4. Enforce the state machine in the database, not only in the application
The Mermaid diagram is clear, but right now nothing prevents a buggy code path or a one-off SQL fix from putting a row in an impossible state — e.g. status='verified' with verified_at IS NULL, or status='rejected' with no rejection_reason. Three layers of defense, in order of cost:
4.1 Status as a constrained type
Use either a Postgres ENUM or a CHECK (profile_verification_status IN ('draft','pending','verified','rejected')). Lookup table is also acceptable. Whichever you pick, don't store free-text statuses — they always rot.
4.2 Cross-column invariants
Express the state machine's value invariants as CHECK constraints. Concretely:
CONSTRAINT verified_requires_metadata CHECK (
profile_verification_status <> 'verified'
OR (verified_at IS NOT NULL AND verified_by_user_id IS NOT NULL)
),
CONSTRAINT rejected_requires_reason CHECK (
profile_verification_status <> 'rejected'
OR (rejection_reason IS NOT NULL AND length(trim(rejection_reason)) > 0)
),
CONSTRAINT pending_clears_verification CHECK (
profile_verification_status NOT IN ('draft','pending')
OR (verified_at IS NULL AND verified_by_user_id IS NULL)
)
These are cheap, write-time-only, and they turn an entire category of "how did we get here" production incidents into impossible-to-commit transactions.
4.3 Transition legality
The richer rule — "verified cannot transition to draft directly" — is harder to express as a column constraint. Two options:
- Application-level only, but always wrap transitions in
UPDATE … WHERE id = $1 AND profile_verification_status = $expected_prior_stateand check the affected-row count. This gives you optimistic concurrency for free and rejects illegal transitions implicitly because the WHERE won't match. - Trigger-based transition table. Heavier; only worth it if multiple services write to the table. For a single backend, the conditional UPDATE pattern is enough.
Whichever you pick, document and test the legal transitions explicitly. The diagram is the spec; the test suite should encode it directly (one test per transition, including illegal ones expecting failure).
5. Concurrency — name the races now
Two scenarios will happen in production. Decide the answer before writing code:
- Two admins click "Verify" on the same pending profile within the same second. With the conditional-UPDATE pattern above, the second one's UPDATE affects 0 rows. The handler must return a clean
409 Conflict(or treat as idempotent no-op — pick one and document it). Don't return 200 with a stale read; that's how audit logs end up with two "verified" entries for one transition. - Applicant PATCHes their profile while an admin is reviewing it. If the policy is "edits after verification revert to pending", this resolves itself. If the policy is "freeze after verified", you need a check at PATCH time. In both cases, the admin's verification UPDATE should be conditional on the content the admin reviewed, not just the prior state — consider an
etag/versioninteger column bumped on every applicant edit, and require admins to send back the version they reviewed.
The document mentions idempotency as a "pick one behavior" — agreed, but the team should write down which one and link to it from the admin endpoint code.
6. Audit logging — beyond "extend audit_log"
§3.5 of the source doc says "extend audit_log with entity_type='user_profile', before/after snapshot." Some sharper guidance:
- Store snapshots as
JSONB, not stringified JSON. JSONB lets you query later (e.g. "show me all profiles where the verifier was admin X" or "find profiles whosejob_titlechanged in Q3") without parsing. - Whitelist fields included in snapshots, don't blacklist. The whitelist won't accidentally include
password_hashor future PII columns the auditor didn't think about. - Add a monotonic event ordering column (
created_at TIMESTAMPTZ DEFAULT now()plus anevent_id BIGSERIAL) and index by(entity_type, entity_id, event_id). Querying "the history of this profile in order" is the most common audit access pattern; make it fast. - Audit rows must be written in the same transaction as the state change. If the audit insert fails, the state change must roll back. Otherwise the log is not trustworthy.
- Plan for growth. Audit logs are append-only and grow without bound. Decide retention (e.g. 7 years for HR-relevant entries, 2 years for cosmetic edits) and put a partitioning strategy on the table from day one —
PARTITION BY RANGE (created_at)monthly is a low-effort default and means you can drop old partitions cheaply later.
7. Indexing strategy for the admin queue
The admin "Users" page will have predictable hot queries. Plan indexes for them up front, not after the page is slow:
- Pending queue:
CREATE INDEX ix_usp_pending ON user_staff_profiles (verification_submitted_at) WHERE profile_verification_status = 'pending';Partial indexes are tiny and exactly match the query. - Filter by department/unit: composite index covering the typical filter + sort, e.g.
(unit_id, profile_verification_status, verification_submitted_at). - Lookup by employee_id: covered by the unique index above.
- Admin activity:
(verified_by_user_id, verified_at DESC)for "what did admin X approve recently".
Skip speculative indexes. Add them when the access pattern is real. But the four above will be real on day one.
8. Migration mechanics
The document says "Alembic/SQL migration consistent with repo conventions" — fine, but a few points the team should agree on before writing the migration:
- Backfill plan. New columns on
users(or newuser_staff_profilesrows for existing accounts) need a defined initial state. Recommended: for every existing user, INSERT auser_staff_profilesrow withprofile_verification_status = 'draft'and all institutional fields NULL. Do this in the same migration that creates the table, in a single transaction if the user count is small (<100k); otherwise batch it. - NOT NULL columns must be added in two steps for live tables: add nullable → backfill → set NOT NULL. The Alembic migration should do all three, and the second step should be batched.
- Reversibility. Every migration needs a working
downgrade(). Verification status data will be lost on downgrade — that's fine, but make it explicit (DROP TABLE user_staff_profiles) rather than half-reversible. - Lock duration.
ALTER TABLE users ADD COLUMNwith a default takes anACCESS EXCLUSIVElock; on a busy auth table this is user-visible downtime. Adding a separate table sidesteps this entirely — another reason to preferuser_staff_profiles.
9. Suggested DDL skeleton
Provided as a starting point for the migration author, not as the final word. Field list mirrors §5 of the source doc.
CREATE TYPE profile_verification_status AS ENUM
('draft', 'pending', 'verified', 'rejected');
CREATE TABLE academic_titles (
code TEXT PRIMARY KEY,
label_vi TEXT NOT NULL,
label_en TEXT NOT NULL,
sort_order INTEGER NOT NULL DEFAULT 0,
active BOOLEAN NOT NULL DEFAULT TRUE
);
CREATE TABLE user_staff_profiles (
user_id UUID PRIMARY KEY
REFERENCES users(id) ON DELETE CASCADE,
employee_id TEXT,
academic_title_code TEXT REFERENCES academic_titles(code),
academic_title_other TEXT,
unit_name_freetext TEXT, -- only if catalog is incomplete; see §3.3
job_title TEXT,
profile_verification_status profile_verification_status
NOT NULL DEFAULT 'draft',
verification_submitted_at TIMESTAMPTZ,
verified_at TIMESTAMPTZ,
verified_by_user_id UUID REFERENCES users(id),
rejection_reason TEXT,
version INTEGER NOT NULL DEFAULT 1,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT employee_id_shape
CHECK (employee_id IS NULL OR employee_id ~ '^[A-Z0-9-]{3,32}$'),
CONSTRAINT academic_title_other_invariant CHECK (
(academic_title_code = 'other')
= (academic_title_other IS NOT NULL
AND length(trim(academic_title_other)) > 0)
),
CONSTRAINT verified_requires_metadata CHECK (
profile_verification_status <> 'verified'
OR (verified_at IS NOT NULL AND verified_by_user_id IS NOT NULL)
),
CONSTRAINT rejected_requires_reason CHECK (
profile_verification_status <> 'rejected'
OR (rejection_reason IS NOT NULL
AND length(trim(rejection_reason)) > 0)
),
CONSTRAINT non_terminal_clears_verification CHECK (
profile_verification_status NOT IN ('draft','pending')
OR (verified_at IS NULL AND verified_by_user_id IS NULL)
),
CONSTRAINT job_title_length CHECK (
job_title IS NULL OR length(job_title) <= 120
)
);
CREATE UNIQUE INDEX ix_usp_employee_id_unique
ON user_staff_profiles (employee_id)
WHERE employee_id IS NOT NULL;
CREATE INDEX ix_usp_pending_queue
ON user_staff_profiles (verification_submitted_at)
WHERE profile_verification_status = 'pending';
CREATE INDEX ix_usp_verifier_activity
ON user_staff_profiles (verified_by_user_id, verified_at DESC)
WHERE verified_by_user_id IS NOT NULL;
The version column supports the optimistic concurrency pattern in §5. Bump it in every UPDATE; require it in admin verify/reject calls.
10. Testing the database layer specifically
API tests (mentioned in source §5) are necessary but not sufficient. Add:
- Constraint tests that attempt to INSERT impossible rows and assert the
CHECKrejects them. These guard the DDL itself. - Transition tests that walk every legal edge in the state machine and assert success, plus every illegal edge and assert rejection. The Mermaid diagram is the source of truth — generate the test matrix from it.
- Concurrency tests using two real connections (not mocks) attempting the same verify simultaneously. Pytest can do this with threading + a proper test DB.
- Migration round-trip tests:
upgrade → downgrade → upgradeon a populated test DB to catch reversibility bugs early.
11. Open questions the team must answer before merge
These are not blockers for design, but they are blockers for "is the migration done":
- Is
employee_idrequired at registration, optional, or required-before-pending? (Determines NULL policy andCHECKshape.) - Strict or pragmatic re-verification on edit? (§4.1.B in source doc.) Pick one; encode it in tests.
- Verifying an already-verified profile: idempotent 200, or 409? (§3.4 in source doc.)
- Catalog vs free-text department: is
unitspopulated and trustworthy? If yes, dropunit_name_freetext. If no, what is the backfill plan? - Audit log retention period and partitioning cadence?
- Does verification ever expire (e.g. annual re-verification for HR compliance)? If yes, schema needs
verified_untiland a scheduled job; better to design for it now than retrofit.
12. Summary checklist
| Topic | Recommendation |
|---|---|
| Schema shape | Separate user_staff_profiles 1:1 table; do not widen users. |
employee_id uniqueness |
Partial unique index, NULL policy decided before migration. |
| Academic title | Lookup table over Postgres ENUM; CHECK enforces the other invariant. |
| Department | Reconcile with existing unit_id; do not ship parallel notions. |
| State enforcement | Postgres ENUM/CHECK for status; cross-column CHECKs for invariants. |
| Transitions | Conditional UPDATE on prior state; reject illegal transitions implicitly. |
| Concurrency | version column for optimistic locking; admin verify carries expected version. |
| Audit | JSONB snapshots, whitelist fields, same-transaction writes, partitioned by month. |
| Indexes | Partial index on pending queue; composite on admin activity; unique on employee_id. |
| Timestamps | TIMESTAMPTZ everywhere. |
| Migration | Two-phase NOT NULL; backfill in same migration; reversible downgrade. |
| Tests | Constraint tests, transition matrix, concurrency tests, migration round-trip. |
The plan in the source document is sound. The notes above are the difference between a design that survives one release and one that survives five.
Prepared as a database engineering review, complementary to the original integration analysis. Any path references mirror the source document and should be re-verified against the current tree before code changes land.