tlam89/sciagent

Fork 0

Files

T

Thinh Lam 688fac73e9

CI/CD / backend (push) Failing after 2m8s

Details

CI/CD / frontend (push) Failing after 1m40s

Details

CI/CD / deploy (push) Has been skipped

Details

sciagent code + Gitea Actions CI/CD

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-30 09:38:30 +07:00

12 KiB

Raw Blame History

Architecture improvement proposals

This document compares this repository (Initiative / “Profyt” style stack: fe0 + be0 + PostgreSQL + MinIO) against the layered patterns and RBAC/DB practices described in the reference write-up (Cookiecutter-style Full Stack template: Route → Service → Repository → Database).

It proposes concrete improvements for frontend, backend, and database—not to copy the template verbatim, but to reduce risk and operational cost as the product grows.

1. How your app is structured today (summary)

Area	What you have	Reference template pattern
Backend	A large `be0/main.py` FastAPI app with many endpoints, workflow glue, and file/YAML fallbacks in one module	Small route modules, thin handlers, `services/` + `repositories/`, `schemas/`, `deps.py`
Domain DB	`be0/src/initiative_db/` (async engine, `get_session` with commit/rollback, SQLAlchemy models)	Same async SQLAlchemy idea; stronger separation of data access from HTTP
Auth	`be0/src/auth_api.py` — Argon2, JWT (HS256), roles embedded in `user_roles` + JWT payload, `decode_access_token_user_id` per route	Reusable `get_current_user`, `RoleChecker`, optional claim checks (e.g. access vs refresh)
Frontend	`fe0` React + Vite, centralized `apiClient`, `AuthContext`, permission map in `fe0/src/lib/permissions.ts`	UI gates only; real enforcement on server (you partially do this)
Storage	MinIO via `be0/src/minio/storage.py`, evidence endpoints in `main.py`, `application_artifacts` in PostgreSQL	Similar object-store + DB metadata split
Migrations	SQL files under `be0/migrations/` applied via Docker `initdb.d`	Alembic revisions + env imports for all models

Bottom line: You already have strong pieces (async sessions, MinIO abstraction, initiative schema, evidence pipeline). The main gap is structural—keeping business rules, HTTP, and storage concerns separated and testable, and making role and ownership rules easy to review in one place.

2. Database improvements

2.1 Single source of truth for schema evolution

Today: be0/migrations/*.sql are mounted into Postgres at container init, while SQLAlchemy models in be0/src/initiative_db/models.py are the app’s view of the world. That can drift (manual SQL + model mismatch).

Proposals:

Introduce Alembic (or another migration tool) for be0, targeting the same DATABASE_URL / INITIATIVE_DATABASE_URL, with:
- alembic/env.py importing all model modules so autogenerate sees every table.
- A one-time baseline migration that matches the current SQL, then all future changes via revision files only.
Keep Docker initdb.d for local bootstrap or replace it with alembic upgrade head in an entrypoint—avoid maintaining two parallel migration paths in production.

2.2 Naming convention and metadata hygiene

Reference lesson: A Base.metadata naming convention makes autogenerated diffs stable across environments.

Proposal: Add a SQLAlchemy MetaData(naming_convention=…) on your Base in be0/src/initiative_db/models.py before the next non-trivial migration, so new constraints don’t get database-default names that differ by environment.

2.3 Indexes and hot paths (review pass)

Already good: initiatives.case_code is unique; ownership is by owner_id. Ensure all foreign keys used in list/detail filters have indexes (Postgres does not always index FK columns automatically in every setup—verify in migrations).

Proposal: One focused pass over queries in drafts, submissions, and application_artifacts (evidence) — add composite or partial indexes only where proven by slow query logs (avoid over-indexing).

2.4 Align “roles” in DB vs product language

Observation: 001_initiative_schema.sql defines user_role enum including applicant, council_member, etc. The frontend fe0/src/lib/permissions.ts uses admin | editor | viewer with business descriptions (“Người nộp đơn” tied to viewer). This semantic mapping is easy to get wrong in new endpoints.

Proposals:

Add a short internal doc (table): DB role(s) → JWT/claims → UI permission set.
In API design, avoid duplicating role names in ad hoc strings; prefer one enumeration shared between auth issuance and checks (or a small const module on the server used by all routers).

2.5 Transaction boundaries (you’re on the right track)

get_session commits on success and rolls back on exception—aligned with the reference pattern. Keep service-level logic from calling commit() except where a deliberate long-lived transaction is required.

Proposal: As you split code out of main.py, ensure one unit of work per request unless you document exceptions (e.g. multi-step background handoff).

3. Backend improvements

3.1 Move from monolithic `main.py` to layered packages

Gap: A large main.py mixing workflow state, file exports, application drafts, MinIO, and compliance tools makes it hard to:

test ownership and RBAC in isolation;
reason about what runs on every deploy;
onboard new contributors.

Target shape (incremental, not big-bang):

be0/src/
  api/                 # or routers/
    deps.py            # get_current_user, require_role, get_initiative_scoped, etc.
    routes/
      auth.py
      application_drafts.py
      applications.py
      evidence.py
      admin.py
  services/            # business rules: "can this user add evidence to this case?"
  repositories/        # SQL only: initiative by case_code, artifact by role, …
  schemas/             # Pydantic request/response DTOs
  initiative_db/       # keep models + engine; repositories call into models

Rule (from the reference): Routes never execute raw SQL; they call services that call repositories. This matches your evidence flow conceptually (you’re halfway there with application_storage).

3.2 RBAC and IDOR: centralize “who can touch this row?”

Reference patterns worth adopting:

RoleChecker-style dependencies (even if not a class) so route signatures read: “requires admin” vs “any authenticated user” without re-reading 30 lines of checks.
Ownership (IDOR): For every case_id / initiative_id / application_id, a single function e.g. assert_initiative_access(user_id, case_code, need_write=True) that:
- returns 404 for cross-tenant access when appropriate (no existence leak);
- is reused by evidence upload, draft save, and submit.

Today you check owner_id in several places; factor that into one service method and reuse.

3.3 JWT and token hygiene

Reference: Distinguish access vs refresh token claims (e.g. type) so a refresh token cannot be used as a Bearer on API routes.

Proposal: If you only issue access tokens today, document that. If you add refresh, enforce type == "access" in the dependency that guards APIs.

Already good: JWT_SECRET length check in dev/prod, Argon2 for passwords.

3.4 Reduce dual persistence in production

Today: Some flows can fall back to YAML on disk or in-memory stores when Postgres is off. That’s great for demos; in production it doubles test matrices.

Proposals:

Feature flag or environment switch: INITIATIVES_PERSISTENCE=postgres only in prod; fail fast if misconfigured.
Deprecate or strictly scope file-based application draft storage to “local dev without Docker”.

3.5 MinIO and evidence

Already good: S3Storage, size limits, presigned download URLs, application_artifacts for metadata.

Proposals:

Centralize all MinIO key conventions and role names (evidence_research, evidence_textbook, full_pdf) in one server module to avoid string drift.
Add periodic job (cron/worker) for orphaned objects if uploads fail mid-way—your cleanup module in minio is a start; ensure it’s scheduled in deployed environments.

3.6 Errors and API contract

Reference: Map domain errors to HTTP status in one exception_handlers module.

Proposal: Use consistent JSON error bodies ({ "detail": "...", "code": "EVIDENCE_TOO_LARGE" }) so the frontend can branch without parsing English/Vietnamese strings.

4. Frontend improvements

4.1 Treat server as the only authority

Today: fe0/src/lib/permissions.ts encodes a rich matrix (ROLE_PERMISSIONS). That’s good UX, but must mirror server checks. Any mismatch is a security issue (UI hides a button, API still open—or the reverse).

Proposals:

For sensitive actions, prefer “call API → 403/404” over purely client hasPermission (e.g. hide buttons but always handle 403 on mutation).
Optionally expose a GET /api/v1/auth/me/permissions or embed resolved permissions in JWT claims—if you can keep them small and non-stale. Otherwise, document that fe is best-effort and be is truth.

4.2 Align “applicant” / “viewer” vocabulary

The UI comments describe applicant behavior while the Role type is viewer. New contributors will be confused.

Proposals:

Rename in UI copy only (getRoleDisplayName) or add alias type ApplicantRole = 'viewer' with a comment, and keep one canonical role in code.
Reconcile with SQL enum names in documentation (see §2.4).

4.3 Generated types from OpenAPI

Gap: Hand-maintained DTOs and ad hoc as unknown as casts in draft save/load are brittle.

Proposal: Export OpenAPI from FastAPI (/openapi.json) and generate TypeScript types (or a thin client) in CI. At minimum, generate enums and response shapes for application-drafts, applications, and evidence.

4.4 Draft and file state machine

You combine local IndexedDB, server JSON tabs, and MinIO evidence—users need clear feedback (you’ve started with toasts and size limits).

Proposals:

A visible per-field status for evidence: local only / synced / error (small badge next to the filename).
Idempotent retry: if upload fails, one-click “retry sync” without re-selecting the file (blob still in IDB).

4.5 Performance and DX

Code-split very large forms (InitiativeApplicationForm, InitiativeReportForm) by section for faster first paint.
React Query (you use it in places) for server-backed lists with stable keys; align cache invalidation on submit/delete application.

5. Suggested prioritization (practical order)

DB: Alembic + stop dual SQL/model drift; baseline migration.
Backend: Extract deps.py + initiative ownership helper; use it in evidence + draft + submit routes.
Backend: Split main.py into routes/ by domain (even if services are thin at first).
API: Standard error JSON + OpenAPI export + generated TS types.
Frontend: Permission doc + 403 handling + evidence sync UX (status badge / retry).
Ops: One persistence mode in prod; optional cleanup job for MinIO/DB consistency.

6. Mapping to the reference “lessons” list

Reference lesson	Your project
Four-layer separation	Partial — move HTTP out of `main.py` and add services.
`has_role` on user model / single RBAC place	Partial — centralize in `deps` + one service.
IDOR: always scope by `user_id` in queries	Good direction — consolidate checks.
404 on cross-owner access	Adopt where you still return 403 for existence leaks.
DB timestamps / naming / Alembic imports	Adopt naming + Alembic hygiene.
One transaction per request	Yes in `get_session`.
NullPool for workers	N/A until you have forked workers; then apply.
Don’t create admin over HTTP	OK if admin bootstrap is CLI or SQL—keep that.

Generated from a comparison of this codebase with the structure and ideas in the supplied architecture-analysis.md (vstorm / full-stack template style). Adjust priorities to your release timeline and team size.

12 KiB Raw Blame History Unescape Escape

Architecture improvement proposals

1. How your app is structured today (summary)

2. Database improvements

2.1 Single source of truth for schema evolution

2.2 Naming convention and metadata hygiene

2.3 Indexes and hot paths (review pass)

2.4 Align “roles” in DB vs product language

2.5 Transaction boundaries (you’re on the right track)

3. Backend improvements

3.1 Move from monolithic main.py to layered packages

3.2 RBAC and IDOR: centralize “who can touch this row?”

3.3 JWT and token hygiene

3.4 Reduce dual persistence in production

3.5 MinIO and evidence

3.6 Errors and API contract

4. Frontend improvements

4.1 Treat server as the only authority

4.2 Align “applicant” / “viewer” vocabulary

4.3 Generated types from OpenAPI

4.4 Draft and file state machine

4.5 Performance and DX

5. Suggested prioritization (practical order)

6. Mapping to the reference “lessons” list

12 KiB

Raw Blame History

3.1 Move from monolithic `main.py` to layered packages