commit 688fac73e9ed402e3dad4bde6d740dd33b9a9e4f Author: Thinh Lam Date: Tue Jun 30 09:38:30 2026 +0700 sciagent code + Gitea Actions CI/CD Co-Authored-By: Claude Opus 4.8 diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 0000000..204678c --- /dev/null +++ b/.dockerignore @@ -0,0 +1,23 @@ +# Build context for the frontend_user / frontend_admin images is the repo ROOT (the +# npm workspace). Keep only root manifests + shared/ + the two app dirs; exclude the +# rest so the context stays small. (be0 builds from ./be0 and is unaffected by this file.) +**/node_modules +**/dist +**/dist-ssr +.git +.gitignore +.dockerignore +.claude +docs +be0 +fe0 +assets +database +Posgresdb +scripts +deploy +*.md +.env +.env.* +.DS_Store +**/__pycache__ diff --git a/.env.example b/.env.example new file mode 100644 index 0000000..59c04d2 --- /dev/null +++ b/.env.example @@ -0,0 +1,105 @@ +# ============================================================ +# Production / docker-compose.prod.yml +# ----------------------------------------------------------- +# 1. Copy: cp .env.example .env +# 2. Fill every value below (never commit .env — it is gitignored). +# 3. Prefer strong random secrets: +# openssl rand -base64 32 +# +# Before deploy: ./scripts/verify-prod-env.sh +# Full deploy: ./scripts/deploy-prod.sh +# Stack map (FE→BE→DB→MinIO): docs/deploy-stack-overview.md +# Postgres / volume quirks: docs/deploy-production-docker.md +# +# If .env was ever committed to git, rotate ALL secrets below. +# ============================================================ + +# Public hostname or IP that browsers use to reach this machine. +PUBLIC_HOST=your-public-hostname-or-ip.example.com + +FE_PORT=8081 + +# Optional: admin/council SPA port. Bound to 127.0.0.1 only in docker-compose.prod.yml +# (reach it via SSH tunnel or an authenticated reverse-proxy vhost). Defaults to 8082. +# FE_ADMIN_PORT=8082 + +# Optional: principal-investigator SPA port (research proposals + project cockpit). Defaults to 8083. +# FE_INV_PORT=8083 + +# Optional: publisher SPA port (research-result publication). Defaults to 8084. +# FE_PUB_PORT=8084 + +# Optional: extra CORS Allowed-Origins for be0 (comma-separated, no spaces). Production compose sets +# CORS_ORIGINS to http://${PUBLIC_HOST}:${FE_PORT} plus these extras automatically. +# CORS_ORIGINS_EXTRA=https://app.example.com,http://internal:8081 + +MINIO_API_PORT=19000 +MINIO_CONSOLE_PORT=19001 + +MINIO_ROOT_USER=minio_root_change_me +MINIO_ROOT_PASSWORD=replace_with_long_random_secret + +# --- HTTPS for MinIO presigned URLs (required if the SPA is https://…) ------------ +# Mixed content blocks http://PUBLIC_HOST:19000 embedded from an HTTPS UI. Options: +# A) Proxied viewer only (already in-app) — no change needed for preview. +# B) HTTPS for direct MinIO links (iframe / “open presigned URL”) — put TLS in front +# of the S3 API port and align these with that public URL. See docs/minio-behind-https.md . +# Example subdomain (recommended): +# S3_PUBLIC_ENDPOINT_URL=https://minio-api.your-domain.com +# MINIO_SERVER_URL=https://minio-api.your-domain.com +# Optionally point the console at HTTPS too: +# MINIO_BROWSER_REDIRECT_URL=https://minio-console.your-domain.com +# If omitted, Compose keeps using http://${PUBLIC_HOST}:${MINIO_API_PORT} for both. + +# Username + password are fixed the first time the Postgres volume is created (see comment below). + +# Identifier only (letters, digits, underscore) — avoids URL / healthcheck pitfalls. +POSTGRES_USER=postgres_app_user +POSTGRES_PASSWORD=replace_with_long_random_secret + +# Optional: only for scripts/sync-postgres-app-password.sh when the app role is not superuser +# or you must connect as a different DB superuser (e.g. postgres) to run ALTER ROLE. +# POSTGRES_SUPERUSER=postgres + +# Database name created on first init (normally keep "initiatives"). +POSTGRES_DB=initiatives + +# --- Auth (required for production) ------------------------------------------------ +# Generate: openssl rand -base64 48 +JWT_SECRET=replace_with_openssl_rand_base64_48 + +# MinIO browser CORS — your public SPA origin (scheme + host, no trailing slash). +MINIO_API_CORS_ALLOW_ORIGIN=https://www.example.com + +# Postgres + password caveat: +# Changing POSTGRES_USER/POSTGRES_PASSWORD here later does NOT change an existing Docker volume — +# Postgres only reads them when /var/lib/postgresql/data is empty. If login fails after editing .env: +# • Use the same password as first boot (e.g. dev stack used initiative / initiative_secret), or +# • With docker-compose.prod.yml stopped: docker volume rm …_initiative_pg_data then up again (drops DB), or +# • Run ./scripts/sync-postgres-app-password.sh to set the DB role password from this file (no wipe), or + +# --------------------------------------------------------------------------- +# SMTP — outbound mail from be0 (registration OTP, password reset) +# --------------------------------------------------------------------------- +# docker-compose / docker-compose.prod passes these into the be0 container. +# Compose substitutes ${SMTP_*} from THIS file (repo-root `.env`), not from be0/.env alone. +# Omit AUTH_MAIL_LOG_ONLY (or set 0/false) when using real SMTP. +# +# SMTP_HOST=smtp.your-mail-provider.com +# SMTP_PORT=587 +# SMTP_USER=your_smtp_username +# SMTP_PASSWORD=your_smtp_password +# AUTH_MAIL_FROM=noreply@your-institution.edu.vn +# SMTP_USE_TLS=1 +# +# Public URL of the web app (password-reset / verify links in email). Production example: +# AUTH_PUBLIC_WEB_ORIGIN=https://your-app.example.com +# +# Dev-only: print OTP in be0 logs instead of sending mail +# AUTH_MAIL_LOG_ONLY=1 +# +# Microsoft 365 / Outlook (smtp.office365.com), log shows 535 Authentication unsuccessful: +# • SMTP_USER = full mailbox address; SMTP_PASSWORD = correct app password if MFA is enabled +# (not your normal web-login password unless basic auth is allowed — many tenants require app passwords). +# • Exchange admin: enable "Authenticated SMTP" for the mailbox; security defaults may block SMTP AUTH. +# • After editing .env: docker compose up -d be0 (so the container reloads env). diff --git a/.gitea/workflows/ci-cd.yml b/.gitea/workflows/ci-cd.yml new file mode 100644 index 0000000..106641a --- /dev/null +++ b/.gitea/workflows/ci-cd.yml @@ -0,0 +1,99 @@ +name: CI/CD + +# Gitea Actions pipeline for the UMP / ImageHub monorepo. +# backend — be0 (FastAPI, Python 3.11) pytest against a throwaway Postgres +# frontend — npm workspaces (shared + 4 Vite/React SPAs): typecheck, build, unit tests +# deploy — on push to main only: build + `docker compose up -d` on the host runner +# +# Runner labels expected (act_runner registered on 103.149.170.102): +# ci -> docker mode (clean, ephemeral) used by backend + frontend +# deploy -> host mode (drives host docker) used by deploy +on: + push: + branches: [main] + pull_request: + branches: [main] + +jobs: + backend: + runs-on: ci + services: + postgres: + image: postgres:16-alpine + env: + POSTGRES_USER: initiative + POSTGRES_PASSWORD: initiative_secret + POSTGRES_DB: initiatives + ports: + - 5432:5432 + options: >- + --health-cmd "pg_isready -U initiative -d initiatives" + --health-interval 5s --health-timeout 5s --health-retries 10 + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: "3.11" + - name: Install backend deps (+ test deps) + working-directory: be0 + run: | + python -m pip install --upgrade pip + pip install -r requirements-dev.txt + - name: Unit tests — pytest PER FILE (isolates asyncpg event loop per module) + working-directory: be0 + env: + INITIATIVE_DATABASE_URL: postgresql+asyncpg://initiative:initiative_secret@postgres:5432/initiatives + run: | + set -e + fail=0 + for f in tests/test_*.py; do + echo "::group::$f" + python -m pytest "$f" -q || fail=1 + echo "::endgroup::" + done + exit $fail + + frontend: + runs-on: ci + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-node@v4 + with: + node-version: "20" + - name: Install (workspaces) + run: npm ci + - name: Typecheck (all workspaces) + run: npm run typecheck + - name: Build (all workspaces) + run: npm run build + - name: Unit tests (workspaces w/ vitest — shared, investigator, publisher) + run: npm test --workspaces --if-present + + # Deploy runs in HOST mode from a PERSISTENT dir (NOT the ephemeral runner + # workspace): docker-compose.prod.yml bind-mounts ./assets/minio-data and + # ./be0, so MinIO data + submitted files must live on a stable host path or + # they would be wiped on every deploy. + deploy: + needs: [backend, frontend] + if: github.event_name == 'push' && github.ref == 'refs/heads/main' + runs-on: deploy + steps: + - name: Sync code to persistent deploy dir + run: | + set -euo pipefail + DEPLOY_DIR=/srv/sciagent + if [ ! -d "$DEPLOY_DIR/.git" ]; then + git clone http://localhost:3000/tlam89/sciagent.git "$DEPLOY_DIR" + fi + cd "$DEPLOY_DIR" + git fetch origin main + git reset --hard origin/main + - name: Materialize prod .env from secret + run: | + set -euo pipefail + printf '%s' "${{ secrets.PROD_ENV }}" > /srv/sciagent/.env + chmod 600 /srv/sciagent/.env + - name: Deploy stack (build locally, no registry pull) + run: cd /srv/sciagent && bash scripts/deploy-prod.sh --no-pull + - name: Stack health check + run: cd /srv/sciagent && bash scripts/check-prod-stack.sh diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..dee745a --- /dev/null +++ b/.gitignore @@ -0,0 +1,41 @@ +# Logs +logs +*.log +npm-debug.log* +yarn-debug.log* +yarn-error.log* +pnpm-debug.log* +lerna-debug.log* + +node_modules +dist +dist-ssr +*.local + +# Editor directories and files +.vscode/* +!.vscode/extensions.json +.idea +.DS_Store +*.suo +*.ntvs* +*.njsproj +*.sln +*.sw? + +# Secrets — commit only `.env.example`, never `.env`. +.env + +.env.local +.env.*.local + +# Keep the example/template +!.env.example + +assets/minio-data/* + +be0/.venv/ + +# HMW-mode marker — session-local toggle (/ultra-on … /ultra-off). Never commit; +# committing it would leave a fresh `git clone` stuck in token-burn mode. +.claude/hmw-mode.on \ No newline at end of file diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..b09cd78 --- /dev/null +++ b/LICENSE @@ -0,0 +1,201 @@ +Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/Posgresdb/crud_examples.sql b/Posgresdb/crud_examples.sql new file mode 100644 index 0000000..ed75882 --- /dev/null +++ b/Posgresdb/crud_examples.sql @@ -0,0 +1,137 @@ +-- ============================================================================= +-- CRUD PATTERNS — Sáng kiến application system +-- ============================================================================= + +-- ============================================================================= +-- CREATE: Submit a new application with multiple authors (atomic) +-- ============================================================================= +BEGIN; + -- Set audit context + SELECT set_config('my.user_id', '42', true); + + -- 1. Main record + INSERT INTO applications(code, title, registration_year, status, purpose, + is_technical_solution, primary_unit_id, created_by) + VALUES ('SK-2025-007', + 'Hệ thống tự động điền hồ sơ sáng kiến', + 2025, 'DRAFT', + 'Tự động hoá việc điền các mẫu số 01–04', + TRUE, 2, 42) + RETURNING application_id \gset + + -- 2. Authors (defer contribution-sum check until COMMIT) + SET CONSTRAINTS trg_contribution_total DEFERRED; + INSERT INTO application_authors(application_id, user_id, contribution_pct, role, display_order) VALUES + (:application_id, 42, 60.00, 'PRIMARY', 1), + (:application_id, 13, 25.00, 'CO_AUTHOR', 2), + (:application_id, 27, 15.00, 'CO_AUTHOR', 3); + + -- 3. Orgs that tested it + INSERT INTO application_adopters(application_id, org_name, address, field) VALUES + (:application_id, 'Phòng KHCN', '217 Hồng Bàng, Q.5', 'Cải cách hành chính'); +COMMIT; + + +-- ============================================================================= +-- READ: Dashboard — paginated list with filters +-- ============================================================================= +SELECT * FROM v_application_summary + WHERE registration_year = 2025 + AND status = ANY(ARRAY['UNDER_REVIEW','EVALUATED']::text[]) + AND title ILIKE '%động vật%' -- uses trigram index + ORDER BY avg_score DESC NULLS LAST, submitted_at DESC + LIMIT 20 OFFSET 0; + +-- Read: full application with nested data (app layer usually does this as N queries +-- or one JSON aggregate — here's the aggregate version) +SELECT jsonb_build_object( + 'application', to_jsonb(a.*), + 'authors', (SELECT jsonb_agg(jsonb_build_object( + 'user_id', u.user_id, + 'name', u.full_name, + 'pct', aa.contribution_pct, + 'role', aa.role + ) ORDER BY aa.display_order) + FROM application_authors aa + JOIN users u USING (user_id) + WHERE aa.application_id = a.application_id), + 'evaluations',(SELECT jsonb_agg(to_jsonb(e.*)) + FROM evaluations e WHERE e.application_id = a.application_id), + 'attachments',(SELECT jsonb_agg(to_jsonb(att.*)) + FROM attachments att WHERE att.application_id = a.application_id) +) AS document +FROM applications a +WHERE a.application_id = 1 AND a.deleted_at IS NULL; + +-- Full-text search (Vietnamese-friendly; combine with unaccent for better recall) +SELECT application_id, code, title + FROM applications + WHERE to_tsvector('simple', title || ' ' || coalesce(introduction,'')) + @@ plainto_tsquery('simple', 'đạo đức động vật') + ORDER BY registration_year DESC + LIMIT 10; + + +-- ============================================================================= +-- UPDATE: Progress an application through the workflow +-- ============================================================================= +-- Submit (DRAFT → SUBMITTED). Triggers populate submitted_at automatically. +UPDATE applications SET status = 'SUBMITTED' WHERE application_id = 7; + +-- Assign to review panel +UPDATE applications SET status = 'UNDER_REVIEW' WHERE application_id = 7; + +-- Upsert an evaluation (same evaluator re-scores) +INSERT INTO evaluations (application_id, evaluator_id, novelty_score, effectiveness_score, conclusion) +VALUES (7, 99, 32, 48, 'Đề nghị công nhận') +ON CONFLICT (application_id, evaluator_id) +DO UPDATE SET + novelty_score = EXCLUDED.novelty_score, + effectiveness_score = EXCLUDED.effectiveness_score, + conclusion = EXCLUDED.conclusion, + evaluated_at = NOW(); + +-- Update JSONB field: patch a single effectiveness sub-field +UPDATE applications + SET effectiveness = effectiveness || jsonb_build_object( + 'economic', + 'Tiết kiệm ~30% thời gian xét duyệt' + ) + WHERE application_id = 7; + +-- Partial update (PATCH-style) — only update provided fields. The app layer +-- generates SET clauses from the non-null fields in the request body. +UPDATE applications + SET title = COALESCE($1, title), + purpose = COALESCE($2, purpose), + updated_at = NOW() + WHERE application_id = $3 AND deleted_at IS NULL +RETURNING *; + + +-- ============================================================================= +-- DELETE: Soft delete + restore +-- ============================================================================= +-- Soft delete +UPDATE applications SET deleted_at = NOW() WHERE application_id = 7; + +-- Restore +UPDATE applications SET deleted_at = NULL WHERE application_id = 7; + +-- Hard delete (only for drafts, cascades to authors/evaluations/etc.) +DELETE FROM applications + WHERE application_id = 7 + AND status = 'DRAFT'; + + +-- ============================================================================= +-- ANALYTICS: Materialized-view refresh (run nightly via cron/pgAgent) +-- ============================================================================= +REFRESH MATERIALIZED VIEW CONCURRENTLY mv_annual_stats; + +-- Leaderboard: top-scoring approved innovations +SELECT code, title, avg_score + FROM v_application_summary + WHERE status = 'APPROVED' + ORDER BY avg_score DESC + LIMIT 10; diff --git a/Posgresdb/schema.sql b/Posgresdb/schema.sql new file mode 100644 index 0000000..f015125 --- /dev/null +++ b/Posgresdb/schema.sql @@ -0,0 +1,422 @@ +-- ============================================================================= +-- SÁNG KIẾN (INNOVATION APPLICATION) DATABASE SCHEMA +-- PostgreSQL 14+ +-- +-- Domain: Manage innovation applications at ĐHYD TP.HCM (Vietnamese medical +-- university). Supports the full lifecycle: draft → submit → evaluate → approve. +-- +-- Design principles: +-- - 3NF for entities, JSONB for semi-structured/optional narrative +-- - Soft delete (deleted_at) — legal/audit requires historical retention +-- - State machine on applications.status enforced by trigger +-- - Full audit_log via trigger on all CUD operations +-- - Contribution % sums to 100 enforced by DEFERRABLE trigger +-- ============================================================================= + +CREATE EXTENSION IF NOT EXISTS pg_trgm; -- fuzzy matching +CREATE EXTENSION IF NOT EXISTS unaccent; -- Vietnamese diacritics in search + +-- Convenience: updated_at auto-maintenance +CREATE OR REPLACE FUNCTION touch_updated_at() RETURNS TRIGGER AS $$ +BEGIN NEW.updated_at := NOW(); RETURN NEW; END; +$$ LANGUAGE plpgsql; + + +-- ============================================================================= +-- REFERENCE: UNITS (departments, faculties, centers) +-- ============================================================================= +CREATE TABLE units ( + unit_id SERIAL PRIMARY KEY, + code VARCHAR(32) UNIQUE NOT NULL, + name VARCHAR(255) NOT NULL, -- full Vietnamese name + parent_unit_id INT REFERENCES units(unit_id) ON DELETE SET NULL, + type VARCHAR(32) NOT NULL + CHECK (type IN ('TRUONG','KHOA','PHONG','BO_MON','TRUNG_TAM','KHAC')), + is_active BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +CREATE TRIGGER trg_units_touch BEFORE UPDATE ON units + FOR EACH ROW EXECUTE FUNCTION touch_updated_at(); + + +-- ============================================================================= +-- USERS (unified: authors, evaluators, admins — a user can wear many hats) +-- ============================================================================= +CREATE TABLE users ( + user_id SERIAL PRIMARY KEY, + full_name VARCHAR(255) NOT NULL, + title VARCHAR(64), -- PGS.TS, TS., GS., CN., ThS. + date_of_birth DATE, + email VARCHAR(255) UNIQUE, + phone VARCHAR(32), + id_number VARCHAR(32) UNIQUE, -- CCCD / hộ chiếu + unit_id INT REFERENCES units(unit_id) ON DELETE SET NULL, + position VARCHAR(255), -- chức danh: Trưởng phòng, GV cao cấp + qualification VARCHAR(64), -- trình độ: Tiến sĩ, Thạc sĩ, Cử nhân + user_type VARCHAR(32) NOT NULL DEFAULT 'AUTHOR' + CHECK (user_type IN ('AUTHOR','COUNCIL','ADMIN','STUDENT','EXTERNAL')), + is_active BOOLEAN NOT NULL DEFAULT TRUE, + deleted_at TIMESTAMPTZ, -- soft delete + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +CREATE INDEX idx_users_unit ON users(unit_id); +CREATE INDEX idx_users_active ON users(is_active) WHERE deleted_at IS NULL; +CREATE INDEX idx_users_name_trgm ON users USING GIN (full_name gin_trgm_ops); +CREATE TRIGGER trg_users_touch BEFORE UPDATE ON users + FOR EACH ROW EXECUTE FUNCTION touch_updated_at(); + + +-- ============================================================================= +-- APPLICATIONS (sáng kiến) — the core entity +-- ============================================================================= +CREATE TABLE applications ( + application_id SERIAL PRIMARY KEY, + code VARCHAR(32) UNIQUE NOT NULL, -- e.g., 'SK-2025-001' + title TEXT NOT NULL, + title_en TEXT, + registration_year INT NOT NULL CHECK (registration_year BETWEEN 2000 AND 2100), + field_of_application TEXT, -- lĩnh vực áp dụng + + -- Workflow state (enforced by trigger below) + status VARCHAR(32) NOT NULL DEFAULT 'DRAFT' + CHECK (status IN ( + 'DRAFT','SUBMITTED','UNDER_REVIEW', + 'EVALUATED','APPROVED','REJECTED','WITHDRAWN' + )), + + -- Mẫu 01 narrative (long text) + introduction TEXT, -- 1. Mở đầu + current_state TEXT, -- 4.1 Tình trạng đã biết + purpose TEXT, -- Mục đích + implementation_steps TEXT, -- Các bước thực hiện + required_conditions TEXT, -- Điều kiện cần thiết + results_achieved TEXT, -- Kết quả thu được + novelty_description TEXT, -- Tính mới + confidential_info TEXT, -- Thông tin cần bảo mật + + -- 10 effectiveness sub-fields (all optional narrative) → JSONB + effectiveness JSONB NOT NULL DEFAULT '{}'::jsonb, + -- Shape: { "economic":"...", "teaching":"...", "productivity":"...", + -- "work_efficiency":"...", "quality":"...", "cost_reduction":"...", + -- "environment":"...", "health":"...", "safety":"...", "awareness":"..." } + + -- Mẫu 02 fields + owner_org VARCHAR(255), -- chủ đầu tư + first_applied_date DATE, -- ngày áp dụng lần đầu + content_summary TEXT, -- nội dung sáng kiến (short) + author_assessment TEXT, -- đánh giá theo tác giả + org_assessment TEXT, -- đánh giá theo tổ chức + + -- Mẫu 02 classification (mutually exclusive in form, but stored as flags) + is_technical_solution BOOLEAN NOT NULL DEFAULT FALSE, + is_from_research_article BOOLEAN NOT NULL DEFAULT FALSE, + is_from_book_material BOOLEAN NOT NULL DEFAULT FALSE, + CONSTRAINT chk_exactly_one_classification CHECK ( + status = 'DRAFT' OR + (is_technical_solution::int + is_from_research_article::int + is_from_book_material::int) = 1 + ), + + -- Workflow timestamps + submitted_at TIMESTAMPTZ, + decided_at TIMESTAMPTZ, + + primary_unit_id INT REFERENCES units(unit_id), + created_by INT REFERENCES users(user_id), + deleted_at TIMESTAMPTZ, -- soft delete + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_apps_status ON applications(status) WHERE deleted_at IS NULL; +CREATE INDEX idx_apps_year ON applications(registration_year); +CREATE INDEX idx_apps_unit ON applications(primary_unit_id); +CREATE INDEX idx_apps_title_trgm ON applications USING GIN (title gin_trgm_ops); +CREATE INDEX idx_apps_fts ON applications USING GIN ( + to_tsvector('simple', + coalesce(title,'') || ' ' || + coalesce(introduction,'') || ' ' || + coalesce(novelty_description,'') + ) +); +CREATE INDEX idx_apps_effectiveness ON applications USING GIN (effectiveness); +CREATE TRIGGER trg_apps_touch BEFORE UPDATE ON applications + FOR EACH ROW EXECUTE FUNCTION touch_updated_at(); + + +-- ============================================================================= +-- APPLICATION_AUTHORS (M:N with contribution %) +-- ============================================================================= +CREATE TABLE application_authors ( + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + user_id INT NOT NULL REFERENCES users(user_id), + contribution_pct NUMERIC(5,2) NOT NULL CHECK (contribution_pct > 0 AND contribution_pct <= 100), + role VARCHAR(32) NOT NULL DEFAULT 'CO_AUTHOR' + CHECK (role IN ('PRIMARY','CO_AUTHOR')), + display_order INT NOT NULL DEFAULT 0, + PRIMARY KEY (application_id, user_id) +); +CREATE INDEX idx_app_authors_user ON application_authors(user_id); + +-- At most one PRIMARY author per application +CREATE UNIQUE INDEX uq_primary_per_app + ON application_authors(application_id) WHERE role = 'PRIMARY'; + +-- Deferrable check: contribution % must total 100 per application +CREATE OR REPLACE FUNCTION check_contribution_total() RETURNS TRIGGER AS $$ +DECLARE v_total NUMERIC; v_app INT; +BEGIN + v_app := COALESCE(NEW.application_id, OLD.application_id); + SELECT COALESCE(SUM(contribution_pct),0) INTO v_total + FROM application_authors WHERE application_id = v_app; + -- Only enforce when application has left DRAFT + IF (SELECT status FROM applications WHERE application_id = v_app) <> 'DRAFT' + AND v_total <> 100 THEN + RAISE EXCEPTION 'Contribution % for application % must sum to 100 (got %)', + '%', v_app, v_total; + END IF; + RETURN NULL; +END; +$$ LANGUAGE plpgsql; + +CREATE CONSTRAINT TRIGGER trg_contribution_total + AFTER INSERT OR UPDATE OR DELETE ON application_authors + DEFERRABLE INITIALLY DEFERRED + FOR EACH ROW EXECUTE FUNCTION check_contribution_total(); + + +-- ============================================================================= +-- ORGS that tested / adopted the innovation (Mẫu 01 inner table) +-- ============================================================================= +CREATE TABLE application_adopters ( + adopter_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + display_order INT NOT NULL DEFAULT 0, + org_name VARCHAR(255) NOT NULL, + address TEXT, + field TEXT +); +CREATE INDEX idx_adopters_app ON application_adopters(application_id); + + +-- ============================================================================= +-- PARTICIPANTS in first application (Mẫu 02 inner table) +-- ============================================================================= +CREATE TABLE application_participants ( + participant_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + user_id INT REFERENCES users(user_id), -- optional link + display_order INT NOT NULL DEFAULT 0, + full_name VARCHAR(255) NOT NULL, + date_of_birth DATE, + work_unit VARCHAR(255), + position VARCHAR(255), + qualification VARCHAR(64), + support_content TEXT +); +CREATE INDEX idx_participants_app ON application_participants(application_id); + + +-- ============================================================================= +-- EVALUATIONS (Mẫu 04) — council members score applications +-- ============================================================================= +CREATE TABLE evaluations ( + evaluation_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + evaluator_id INT NOT NULL REFERENCES users(user_id), + + novelty_comments TEXT, + novelty_score INT NOT NULL DEFAULT 0 + CHECK (novelty_score BETWEEN 0 AND 40), + + effectiveness_comments TEXT, + effectiveness_score INT NOT NULL DEFAULT 0 + CHECK (effectiveness_score BETWEEN 0 AND 60), + + total_score INT GENERATED ALWAYS AS (novelty_score + effectiveness_score) STORED, + conclusion TEXT, + evaluated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + UNIQUE (application_id, evaluator_id) +); +CREATE INDEX idx_eval_app ON evaluations(application_id); +CREATE INDEX idx_eval_evaluator ON evaluations(evaluator_id); + + +-- ============================================================================= +-- COMMITMENTS (Bản cam kết) — for paper-based innovations +-- ============================================================================= +CREATE TABLE commitments ( + commitment_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + user_id INT NOT NULL REFERENCES users(user_id), + + paper_title TEXT, + role_type VARCHAR(32) NOT NULL + CHECK (role_type IN ('PRIMARY_AUTHOR','CO_AUTHOR')), + + -- 5 commitment checkboxes + is_legal_owner BOOLEAN NOT NULL DEFAULT FALSE, + is_authorized_by_owner BOOLEAN NOT NULL DEFAULT FALSE, + has_coauthor_consent BOOLEAN NOT NULL DEFAULT FALSE, + not_predatory_journal BOOLEAN NOT NULL DEFAULT FALSE, + complies_with_ip_law BOOLEAN NOT NULL DEFAULT FALSE, + + signed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + UNIQUE (application_id, user_id) +); +CREATE INDEX idx_commit_app ON commitments(application_id); + + +-- ============================================================================= +-- ATTACHMENTS (uploaded files — figures, flowcharts, annexes) +-- ============================================================================= +CREATE TABLE attachments ( + attachment_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + file_name VARCHAR(255) NOT NULL, + file_path TEXT NOT NULL, -- S3/MinIO key + file_size BIGINT, + mime_type VARCHAR(128), + kind VARCHAR(32) -- 'LUU_DO', 'PHU_LUC', 'KY_SO', 'KHAC' + CHECK (kind IS NULL OR kind IN ('LUU_DO','PHU_LUC','KY_SO','KHAC')), + uploaded_by INT REFERENCES users(user_id), + uploaded_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +CREATE INDEX idx_attach_app ON attachments(application_id); + + +-- ============================================================================= +-- AUDIT LOG — single table, populated by triggers on all CUD operations +-- ============================================================================= +CREATE TABLE audit_log ( + log_id BIGSERIAL PRIMARY KEY, + table_name VARCHAR(64) NOT NULL, + record_id TEXT NOT NULL, + action VARCHAR(16) NOT NULL CHECK (action IN ('INSERT','UPDATE','DELETE')), + changed_by INT, -- set from app via SET LOCAL my.user_id + changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + old_data JSONB, + new_data JSONB +); +CREATE INDEX idx_audit_table_record ON audit_log(table_name, record_id); +CREATE INDEX idx_audit_user_time ON audit_log(changed_by, changed_at DESC); + +-- Generic audit trigger function +CREATE OR REPLACE FUNCTION audit_trigger() RETURNS TRIGGER AS $$ +DECLARE + v_user INT; + v_pk TEXT; +BEGIN + -- Get user_id from session var if app sets it; else NULL + BEGIN v_user := current_setting('my.user_id')::INT; + EXCEPTION WHEN OTHERS THEN v_user := NULL; END; + + v_pk := COALESCE( + (row_to_json(NEW)::jsonb->>TG_ARGV[0]), + (row_to_json(OLD)::jsonb->>TG_ARGV[0]) + ); + + INSERT INTO audit_log(table_name, record_id, action, changed_by, old_data, new_data) + VALUES ( + TG_TABLE_NAME, + v_pk, + TG_OP, + v_user, + CASE WHEN TG_OP IN ('UPDATE','DELETE') THEN to_jsonb(OLD) END, + CASE WHEN TG_OP IN ('INSERT','UPDATE') THEN to_jsonb(NEW) END + ); + RETURN COALESCE(NEW, OLD); +END; +$$ LANGUAGE plpgsql; + +-- Attach audit trigger to the important tables (pass PK column name as arg) +CREATE TRIGGER trg_audit_applications AFTER INSERT OR UPDATE OR DELETE ON applications + FOR EACH ROW EXECUTE FUNCTION audit_trigger('application_id'); +CREATE TRIGGER trg_audit_authors AFTER INSERT OR UPDATE OR DELETE ON application_authors + FOR EACH ROW EXECUTE FUNCTION audit_trigger('application_id'); +CREATE TRIGGER trg_audit_evaluations AFTER INSERT OR UPDATE OR DELETE ON evaluations + FOR EACH ROW EXECUTE FUNCTION audit_trigger('evaluation_id'); +CREATE TRIGGER trg_audit_commitments AFTER INSERT OR UPDATE OR DELETE ON commitments + FOR EACH ROW EXECUTE FUNCTION audit_trigger('commitment_id'); + + +-- ============================================================================= +-- WORKFLOW STATE MACHINE ENFORCEMENT +-- ============================================================================= +CREATE OR REPLACE FUNCTION enforce_application_transitions() RETURNS TRIGGER AS $$ +DECLARE + allowed BOOLEAN := FALSE; +BEGIN + IF OLD.status = NEW.status THEN RETURN NEW; END IF; + + -- Allowed transitions + allowed := CASE + WHEN OLD.status = 'DRAFT' AND NEW.status IN ('SUBMITTED','WITHDRAWN') THEN TRUE + WHEN OLD.status = 'SUBMITTED' AND NEW.status IN ('UNDER_REVIEW','WITHDRAWN','DRAFT') THEN TRUE + WHEN OLD.status = 'UNDER_REVIEW' AND NEW.status IN ('EVALUATED','WITHDRAWN') THEN TRUE + WHEN OLD.status = 'EVALUATED' AND NEW.status IN ('APPROVED','REJECTED') THEN TRUE + ELSE FALSE + END; + + IF NOT allowed THEN + RAISE EXCEPTION 'Invalid status transition: % → %', OLD.status, NEW.status; + END IF; + + -- Auto-set timestamps + IF NEW.status = 'SUBMITTED' AND OLD.status = 'DRAFT' THEN + NEW.submitted_at := NOW(); + END IF; + IF NEW.status IN ('APPROVED','REJECTED') THEN + NEW.decided_at := NOW(); + END IF; + + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER trg_app_state_machine + BEFORE UPDATE OF status ON applications + FOR EACH ROW EXECUTE FUNCTION enforce_application_transitions(); + + +-- ============================================================================= +-- CONVENIENCE VIEWS +-- ============================================================================= + +-- Dashboard: applications with author names and current evaluation average +CREATE VIEW v_application_summary AS +SELECT + a.application_id, + a.code, + a.title, + a.status, + a.registration_year, + u.name AS primary_unit_name, + (SELECT string_agg(usr.full_name, ', ' ORDER BY aa.display_order) + FROM application_authors aa + JOIN users usr ON usr.user_id = aa.user_id + WHERE aa.application_id = a.application_id) AS author_names, + (SELECT ROUND(AVG(total_score),2) + FROM evaluations WHERE application_id = a.application_id) AS avg_score, + (SELECT COUNT(*) FROM evaluations WHERE application_id = a.application_id) AS num_evaluations, + a.submitted_at, + a.decided_at +FROM applications a +LEFT JOIN units u ON u.unit_id = a.primary_unit_id +WHERE a.deleted_at IS NULL; + +-- Materialized view: annual approval statistics (refresh nightly) +CREATE MATERIALIZED VIEW mv_annual_stats AS +SELECT + registration_year, + COUNT(*) FILTER (WHERE status = 'APPROVED') AS approved, + COUNT(*) FILTER (WHERE status = 'REJECTED') AS rejected, + COUNT(*) FILTER (WHERE status NOT IN ('APPROVED','REJECTED')) AS pending, + COUNT(*) AS total +FROM applications +WHERE deleted_at IS NULL +GROUP BY registration_year; +CREATE UNIQUE INDEX ON mv_annual_stats(registration_year); diff --git a/Posgresdb/test_schema.sql b/Posgresdb/test_schema.sql new file mode 100644 index 0000000..73d7e1f --- /dev/null +++ b/Posgresdb/test_schema.sql @@ -0,0 +1,83 @@ +-- Validation tests: run in a single transaction per block +-- =========================================================== + +-- 1. SEED: units + users +INSERT INTO units(code, name, type) VALUES + ('DHYD', 'Đại học Y Dược TP.HCM', 'TRUONG'), + ('KHCN', 'Phòng Khoa học Công nghệ', 'PHONG'); + +INSERT INTO users(full_name, title, email, id_number, unit_id, qualification, user_type) VALUES + ('Trần Hùng', 'PGS.TS', 'tranhung@ump.edu.vn', '001001', 1, 'Tiến sĩ', 'AUTHOR'), + ('Đỗ Quốc Vũ', 'CN.', 'doquocvu@ump.edu.vn', '001002', 2, 'Cử nhân', 'AUTHOR'), + ('Nguyễn Hội đồng A', 'PGS.TS', 'hdA@ump.edu.vn', '002001', 1, 'Tiến sĩ', 'COUNCIL'); + +-- 2. CREATE an application in DRAFT state +INSERT INTO applications(code, title, registration_year, status, purpose, primary_unit_id, created_by) +VALUES ('SK-2025-001', + 'Quy trình xét duyệt Đạo đức trong nghiên cứu trên động vật', + 2025, 'DRAFT', + 'Chuẩn hoá quy trình xét duyệt hồ sơ', + 2, 2); + +-- 3. ADD authors with DEFERRED constraint (sums to 100 at COMMIT) +BEGIN; +INSERT INTO application_authors(application_id, user_id, contribution_pct, role) VALUES + (1, 1, 50, 'CO_AUTHOR'), + (1, 2, 50, 'PRIMARY'); +-- At this point sum=100, but app is DRAFT so constraint doesn't even care yet +COMMIT; + +-- Verify +SELECT 'Authors inserted:' AS step, count(*) FROM application_authors; + +-- 4. TRY to submit the application (DRAFT → SUBMITTED): needs classification +-- This should FAIL the check constraint because no classification flag is set +\echo 'Test 4: should FAIL (missing classification)' +UPDATE applications SET status='SUBMITTED' WHERE application_id=1; +\echo '' + +-- Fix and retry +UPDATE applications + SET is_technical_solution = TRUE, + status = 'SUBMITTED' + WHERE application_id = 1; +SELECT 'After submit:' AS step, status, submitted_at FROM applications WHERE application_id=1; + +-- 5. TRY invalid transition SUBMITTED → APPROVED (should FAIL) +\echo 'Test 5: should FAIL (illegal transition)' +UPDATE applications SET status='APPROVED' WHERE application_id=1; +\echo '' + +-- Valid transitions +UPDATE applications SET status='UNDER_REVIEW' WHERE application_id=1; + +-- 6. EVALUATOR scores the application +INSERT INTO evaluations(application_id, evaluator_id, novelty_score, effectiveness_score, conclusion) +VALUES (1, 3, 35, 50, 'Đề xuất công nhận'); + +SELECT 'Evaluation:' AS step, novelty_score, effectiveness_score, total_score FROM evaluations; + +-- 7. Move to EVALUATED → APPROVED +UPDATE applications SET status='EVALUATED' WHERE application_id=1; +UPDATE applications SET status='APPROVED' WHERE application_id=1; + +SELECT 'Final status:' AS step, status, decided_at IS NOT NULL AS has_decision_time + FROM applications WHERE application_id=1; + +-- 8. READ: summary view +SELECT code, title, status, author_names, avg_score, num_evaluations + FROM v_application_summary; + +-- 9. AUDIT trail: who changed what? +SELECT table_name, action, changed_at, + (new_data->>'status') AS new_status + FROM audit_log + WHERE table_name = 'applications' + ORDER BY log_id; + +-- 10. Bad contribution sum should fail at COMMIT +\echo 'Test 10: should FAIL (sum != 100 on submitted app)' +BEGIN; + UPDATE application_authors SET contribution_pct = 30 WHERE application_id=1 AND user_id=1; + -- sum is now 30+50=80, but app is APPROVED so trigger will reject at commit +COMMIT; diff --git a/README.md b/README.md new file mode 100644 index 0000000..d2b992a --- /dev/null +++ b/README.md @@ -0,0 +1,254 @@ +Initiative Management System + +The platform consists of two main services: + +- **Frontend**: React-based web application with TypeScript and Vite +- **Backend**: FastAPI-based REST API with Python 3.11 +- **AI Integration**: Ollama-powered document analysis and compliance checking + +## Project Structure + +``` +poc/ +├── fe0/ # Frontend service +│ ├── src/ # React application source +│ ├── public/ # Static assets +│ ├── package.json # Node.js dependencies +│ └── Dockerfile # Frontend container +├── be0/ # Backend service +│ ├── src/ # Python application source +│ ├── main.py # FastAPI application entry point +│ ├── requirements.txt # Python dependencies +│ └── Dockerfile # Backend container +├── assets/ # Shared resources and data +└── docker-compose.yml # Service orchestration +``` + +## Prerequisites + +- Docker 20.10+ +- Docker Compose 2.0+ +- Git + +## Quick Start + +1. **Clone and setup** + ```bash + git clone + cd poc + ``` + +2. **Start all services** + ```bash + docker-compose up --build + ``` + +3. **Access the application** + - **Frontend**: http://localhost:8081 + - **Backend API**: http://localhost:4402 + - **API Documentation**: http://localhost:4402/docs + +## Development Setup + +### Frontend Development + +```bash +cd fe0 +npm install +npm run dev +``` + +**Available Scripts:** +- `npm run dev` - Start development server +- `npm run build` - Build for production +- `npm run preview` - Preview production build +- `npm run lint` - Run ESLint + +**Technology Stack:** +- React 18 with TypeScript +- Vite for build tooling +- Tailwind CSS for styling +- shadcn/ui component library +- React Router for navigation +- TanStack Query for state management + +### Backend Development + +```bash +cd be0 +pip install -r requirements.txt +uvicorn main:app --host 0.0.0.0 --port 4402 --reload +``` + +**Technology Stack:** +- FastAPI framework +- Python 3.11 +- Pydantic for data validation +- LangChain for AI workflows +- Ollama for local AI models +- PDF processing with PyPDF and Docling + +## API Documentation + +### Core Endpoints + +#### Workflow Management +- `POST /workflows` - Initialize new compliance workflow +- `GET /workflows/{workflow_id}` - Retrieve workflow status +- `PUT /workflows/{workflow_id}/items` - Update workflow items +- `POST /workflows/{workflow_id}/approvals` - Submit approvals +- `GET /workflows/{workflow_id}/report` - Generate status reports +- `POST /workflows/{workflow_id}/advance` - Progress to next phase + +#### Document Processing +- `POST /upload_document` - Upload and parse documents +- `POST /get_page` - Retrieve specific document pages +- `POST /test_ollama` - Test AI model connectivity + +#### System Health +- `GET /health` - Service health check +- `GET /` - API information and available endpoints + +### Request/Response Examples + +**Create Workflow:** +```json +POST /workflows +{ + "project_name": "ISO 27001 Implementation", + "project_description": "Implement ISO 27001 controls", + "records_officer_email": "officer@company.com" +} +``` + +**Update Workflow Item:** +```json +PUT /workflows/{workflow_id}/items +{ + "item_id": 1, + "status": "completed", + "comment": "Implementation completed", + "updated_by": "john.doe@company.com" +} +``` + +## Configuration + +### Environment Variables + +| Variable | Description | Default | +|----------|-------------|---------| +| `GENERIC_TIMEZONE`` | Application timezone | `UTC` | +| `NVIDIA_VISIBLE_DEVICES` | GPU access for AI models | `all` | +| `NVIDIA_DRIVER_CAPABILITIES` | GPU capabilities | `compute,utility` | + +### Docker Network Configuration + +Services communicate via a custom Docker network (`profyt-net`) with static IP addressing: +- Frontend: `192.168.42.20` +- Backend: `192.168.42.22` + +## Features + +### Compliance Management +- **ISO 27001** compliance tracking and reporting +- **Records Management** integration workflows +- **Risk Assessment** tools and dashboards +- **Document Processing** with AI-powered analysis + +### Workflow Engine +- Multi-phase compliance workflows +- Approval management system +- Progress tracking and reporting +- Integration with external systems + +### AI-Powered Analysis +- Document parsing and content extraction +- Compliance gap analysis +- Automated report generation +- Natural language processing for policy analysis + +## Deployment + +### Production Deployment + +On the **application host** (SSH), from the repository root: + +1. **Secrets & config** + ```bash + cp .env.example .env + # Edit .env: PUBLIC_HOST, ports, MinIO and Postgres credentials (openssl rand -base64 32). + # Never commit `.env`. Postgres user/password apply only on FIRST empty DB volume — see `.env.example`. + ./scripts/verify-prod-env.sh + ``` + +2. **Deploy (pull, build, recreate containers)** + ```bash + ./scripts/deploy-prod.sh + # Air-gapped / no registry pull: + # ./scripts/deploy-prod.sh --no-pull + ``` + + Or manually (must pass `/.env` explicitly if it is not named `.env` next to the compose file): + ```bash + docker compose --env-file .env -f docker-compose.prod.yml pull + docker compose --env-file .env -f docker-compose.prod.yml up -d --build --remove-orphans + ``` + +3. **Smoke checks** (`FE_PORT` and API port come from `.env` / compose; API is `127.0.0.1:4402` in prod compose) + ```bash + # Replace 8081 with the FE_PORT value in .env when different. + curl -sf http://127.0.0.1:8081/ + curl -sf http://127.0.0.1:4402/health + ``` + +### Scaling Considerations + +- **Frontend**: Stateless, horizontally scalable +- **Backend**: Consider database persistence for production +- **AI Models**: GPU requirements for optimal performance +- **Storage**: Implement proper file storage for documents + +## Monitoring and Logging + +### Application Logs +- Frontend logs: Available via Docker logs +- Backend logs: Stored in `be0/logs/` directory +- System logs: `docker-compose logs [service-name]` + +### Health Monitoring +- Health check endpoints available +- Docker health checks configured +- Log aggregation recommended for production + +## Security Considerations + +### Current Implementation +- CORS enabled for cross-origin requests +- Input validation via Pydantic models +- File upload restrictions + +### Production Recommendations +- Implement authentication/authorization +- Add rate limiting +- Enable HTTPS/TLS +- Implement proper secret management +- Add audit logging + +## Contributing + +1. Fork the repository +2. Create a feature branch (`git checkout -b feature/amazing-feature`) +3. Commit your changes (`git commit -m 'Add amazing feature'`) +4. Push to the branch (`git push origin feature/amazing-feature`) +5. Open a Pull Request + +### Development Guidelines +- Follow TypeScript best practices +- Write comprehensive tests +- Update documentation for new features +- Follow conventional commit messages + +## License + +This project is licensed under the terms specified in the LICENSE file. diff --git a/be0/.env.example b/be0/.env.example new file mode 100644 index 0000000..d7902a3 --- /dev/null +++ b/be0/.env.example @@ -0,0 +1,29 @@ +# Copy to .env and adjust. docker-compose sets these for the be0 service when using the repo stack. +INITIATIVE_DATABASE_URL=postgresql+asyncpg://initiative:initiative_secret@localhost:15432/initiatives + +# S3 / MinIO — server-to-server (API → object store) +S3_ENDPOINT_URL=http://localhost:19000 +S3_ACCESS_KEY=minio_user +S3_SECRET_KEY=minio_password +S3_BUCKET_ATTACHMENTS=initiative-attachments +S3_BUCKET_EXPORTS=initiative-exports +S3_BUCKET_QUARANTINE=initiative-quarantine + +# Optional: HTTPS base for presigned URLs (must match public MinIO TLS host; see docs/minio-behind-https.md) +# S3_PUBLIC_ENDPOINT_URL=https://minio-api.example.com + +# Optional: comma-separated extra browser origins for CORS (merged with localhost defaults in main.py). +# In Docker dev stack, docker-compose.yml can set this; production compose adds your public UI URL automatically. +# CORS_ORIGINS=http://YOUR_LAN_IP:8081 + +# Local Python runs may load this file; Docker Compose uses the repo-root `.env` for ${SMTP_*} → be0. +# Password reset email (same SMTP block as `.env.example` beside docker-compose for dev stack.) +# OTP + reset use src/auth_mail.py: set SMTP_* for Option A or AUTH_MAIL_LOG_ONLY=1 locally. +# AUTH_MAIL_LOG_ONLY=1 +# AUTH_PUBLIC_WEB_ORIGIN=http://localhost:8081 +# SMTP_HOST=smtp.example.com +# SMTP_PORT=587 +# SMTP_USER= +# SMTP_PASSWORD= +# AUTH_MAIL_FROM=noreply@example.com +# SMTP_USE_TLS=1 diff --git a/be0/CHAT_ASSISTANT_README.md b/be0/CHAT_ASSISTANT_README.md new file mode 100644 index 0000000..36280c4 --- /dev/null +++ b/be0/CHAT_ASSISTANT_README.md @@ -0,0 +1,223 @@ +# Chat Assistant Module + +## Overview + +The Chat Assistant module provides a conversational AI interface for answering policy and compliance questions using Ollama. + +## Architecture + +### Backend (`be0/src/chat_assistant.py`) + +The `ChatAssistant` class provides: +- **Chat functionality**: Conversational AI for policy questions +- **Content verification**: Verify content against compliance requirements +- **Policy Q&A**: Answer questions about policies and compliance + +### Frontend (`fe0/src/features/chat/`) + +The frontend chat feature includes: +- **Service layer**: API communication with backend +- **React hooks**: Easy-to-use hooks for chat functionality +- **Type definitions**: TypeScript types for type safety + +## API Endpoints + +### 1. Chat Endpoint +``` +POST /api/v1/chat +``` + +**Request Body:** +```json +{ + "message": "What are ISO 27001 requirements?", + "conversation_history": [ + { + "role": "user", + "content": "Previous message" + }, + { + "role": "assistant", + "content": "Previous response" + } + ], + "context": "Optional context about policies" +} +``` + +**Response:** +```json +{ + "message": "ISO 27001 is an information security management system...", + "model": "gemma3:27b", + "tokens_used": 150 +} +``` + +### 2. Verify Content Endpoint +``` +POST /api/v1/chat/verify +``` + +**Form Data:** +- `field_name`: Name of the field being verified +- `content`: Content to verify +- `verification_criteria`: (Optional) Specific criteria to check + +**Response:** +```json +{ + "message": "The content meets compliance requirements...", + "model": "gemma3:27b", + "tokens_used": 200 +} +``` + +### 3. Policy Question Endpoint +``` +POST /api/v1/chat/question +``` + +**Form Data:** +- `question`: The user's question +- `policy_context`: (Optional) Context about specific policies + +**Response:** +```json +{ + "message": "Answer to the policy question...", + "model": "gemma3:27b", + "tokens_used": 180 +} +``` + +## Features + +### 1. Conversational Context +- Maintains conversation history for context-aware responses +- Keeps last 10 messages for context +- System prompt guides the assistant's behavior + +### 2. Policy Expertise +- Specialized in IT governance and compliance +- Knowledgeable about ISO 27001, NIST, GDPR, etc. +- Provides accurate, actionable advice + +### 3. Content Verification +- Analyzes content against compliance requirements +- Provides detailed feedback +- Suggests improvements + +## Usage + +### Backend + +```python +from src.chat_assistant import get_chat_assistant + +# Get chat assistant instance +assistant = get_chat_assistant() + +# Chat +request = ChatRequest( + message="What is ISO 27001?", + context="IT governance" +) +response = await assistant.chat(request) + +# Verify content +response = await assistant.verify_content( + field_name="Project Description", + content="Our project implements security controls..." +) +``` + +### Frontend + +```typescript +import { useChat } from '@/features/chat/hooks/useChat'; + +const { sendMessage, verifyContent, isLoading } = useChat(); + +// Send a message +const response = await sendMessage( + "What are compliance requirements?", + conversationHistory, // Optional + "ISO 27001 context" // Optional +); + +// Verify content +const verification = await verifyContent( + "Project Name", + "Project content to verify" +); +``` + +## Configuration + +### Model Selection + +The default model is `gemma3:27b`. To change it: + +```python +# In chat_assistant.py +assistant = ChatAssistant(model_name="your-model-name") +``` + +### System Prompt + +The system prompt can be customized in the `ChatAssistant.__init__` method to change the assistant's behavior and expertise. + +## Logging + +All chat interactions are logged to: +- `be0/logs/ChatAssistant.log` + +This helps with debugging and monitoring. + +## Error Handling + +The module includes comprehensive error handling: +- Catches and logs all exceptions +- Returns user-friendly error messages +- Raises HTTPException for API errors + +## Testing + +To test the chat assistant: + +1. **Start the backend:** + ```bash + cd be0 + docker-compose up be0 + ``` + +2. **Test via API:** + ```bash + curl -X POST http://localhost:4402/api/v1/chat \ + -H "Content-Type: application/json" \ + -d '{"message": "What is ISO 27001?"}' + ``` + +3. **Test via Frontend:** + - Open the Dashboard + - Use the ChatAssistant component + - Ask questions or verify content + +## Integration + +The ChatAssistant is integrated with: +- **ChatAssistant.tsx**: React component in the Dashboard +- **useChat hook**: React hook for chat functionality +- **chatService**: API service layer + +## Future Enhancements + +Potential improvements: +1. Streaming responses for real-time text generation +2. Multi-turn conversation management +3. Document context injection +4. Voice input/output +5. Response rating and feedback +6. Conversation export +7. Custom model fine-tuning diff --git a/be0/Dockerfile b/be0/Dockerfile new file mode 100644 index 0000000..752cf2e --- /dev/null +++ b/be0/Dockerfile @@ -0,0 +1,34 @@ +FROM python:3.11 + +# Set the working directory +WORKDIR /app + +# Copy the requirements file +COPY ./requirements.txt /app/ + +# Install dependencies and set up Python environment +RUN apt-get update && apt-get install -y --no-install-recommends \ + zstd \ + curl \ + git \ + build-essential \ + python3-pip \ + libreoffice-writer-nogui \ + && rm -rf /var/lib/apt/lists/* + +# RUN curl -fsSL https://ollama.com/install.sh | sh + + +RUN pip install --upgrade pip + +WORKDIR /app + +RUN pip install --no-cache-dir -r requirements.txt +RUN pip install nltk +# Avoid runtime GitHub downloads (slow/hanging in some networks) before Uvicorn starts. +RUN python3 -m nltk.downloader punkt punkt_tab stopwords averaged_perceptron_tagger_eng wordnet + +COPY . /app/ + +EXPOSE 4402 +ENTRYPOINT ["/app/entrypoint.sh"] \ No newline at end of file diff --git a/be0/GOVERNANCE_LAYER_STATUS.md b/be0/GOVERNANCE_LAYER_STATUS.md new file mode 100644 index 0000000..51f0dd8 --- /dev/null +++ b/be0/GOVERNANCE_LAYER_STATUS.md @@ -0,0 +1,172 @@ +# Governance Layer Status in be0 + +## Current State + +### ✅ What EXISTS (Current Implementation) + +The current `be0` codebase has: + +1. **Basic Workflow System** (`src/domain/entities/workflow.py`, `src/application/services/workflow_service.py`) + - SDLC/RM Integration workflow + - Phase-based progression + - Task/checklist management + - **Location**: `be0/src/domain/entities/workflow.py` + +2. **Compliance Verification** (`src/compliance_verifier.py`) + - Ollama-based compliance checking + - Text generation and similarity analysis + - **Location**: `be0/src/compliance_verifier.py` + +3. **Chat Assistant** (`src/chat_assistant.py`) + - Policy Q&A functionality + - Content verification + - **Location**: `be0/src/chat_assistant.py` + +4. **Architecture Foundation** + - Domain/Application/Infrastructure layers + - Repository pattern + - API routes structure + - **Location**: `be0/src/domain/`, `be0/src/application/`, `be0/src/api/` + +--- + +## ❌ What's MISSING (Governance Layer for Initiatives) + +The **Grassroots Initiative Recognition System** governance layer has **NOT been implemented yet**. + +### Missing Components: + +#### 1. **Initiative Management** +- ❌ Initiative entity (initiative_id, group_type, status, etc.) +- ❌ Author management (contribution percentages, lead author logic) +- ❌ Unit/Appraisal Team entities +- **Should be in**: `be0/src/domain/entities/initiative.py` + +#### 2. **Business Rules Engine** +- ❌ Novelty checker (duplicate detection) +- ❌ Scoring algorithm (Group 01 dual/triple reviewer) +- ❌ Auto-classification (Group 02) +- ❌ Author contribution validator +- **Should be in**: `be0/src/domain/rules/` or `be0/src/application/rules/` + +#### 3. **Workflow State Machine** +- ❌ Initiative state transitions (DRAFT → SUBMITTED → UNIT_REVIEW → etc.) +- ❌ Deadline enforcement +- ❌ SLA tracking +- **Should be in**: `be0/src/application/state_machine.py` or `be0/src/domain/workflows/initiative_workflow.py` + +#### 4. **Review Management** +- ❌ Review assignment logic +- ❌ Blind review enforcement +- ❌ Score conflict detection +- ❌ Reviewer assignment service +- **Should be in**: `be0/src/application/services/review_service.py` + +#### 5. **Document Management** +- ❌ Form templates (Form 01, 03, 05, 06) +- ❌ Document versioning +- ❌ File storage integration +- **Should be in**: `be0/src/infrastructure/storage/` + +#### 6. **API Endpoints** +- ❌ `/api/v1/initiatives` (CRUD) +- ❌ `/api/v1/initiatives/{id}/submit` +- ❌ `/api/v1/initiatives/{id}/reviews` +- ❌ `/api/v1/reviews/{review_id}/score` +- ❌ `/api/v1/initiatives/{id}/appeal` +- **Should be in**: `be0/src/api/routes/initiatives.py` + +--- + +## Recommended Structure for Governance Layer + +``` +be0/src/ +├── domain/ +│ ├── entities/ +│ │ ├── initiative.py # ❌ MISSING +│ │ ├── author.py # ❌ MISSING +│ │ ├── review.py # ❌ MISSING +│ │ ├── unit.py # ❌ MISSING +│ │ └── appraisal_team.py # ❌ MISSING +│ ├── rules/ +│ │ ├── novelty_checker.py # ❌ MISSING +│ │ ├── scoring_engine.py # ❌ MISSING +│ │ ├── duplicate_detector.py # ❌ MISSING +│ │ └── classification_engine.py # ❌ MISSING +│ └── workflows/ +│ └── initiative_workflow.py # ❌ MISSING +├── application/ +│ ├── services/ +│ │ ├── initiative_service.py # ❌ MISSING +│ │ ├── review_service.py # ❌ MISSING +│ │ ├── notification_service.py # ❌ MISSING +│ │ └── deadline_service.py # ❌ MISSING +│ └── state_machine.py # ❌ MISSING +├── infrastructure/ +│ ├── storage/ +│ │ └── file_storage.py # ❌ MISSING +│ └── database/ +│ └── models.py # ❌ MISSING (SQLAlchemy models) +└── api/ + └── routes/ + ├── initiatives.py # ❌ MISSING + ├── reviews.py # ❌ MISSING + └── reports.py # ❌ MISSING +``` + +--- + +## What to Build Next + +Based on the simplified tech stack we discussed, here's the implementation order: + +### Phase 1: Core Entities & Database +1. Create database models (PostgreSQL) +2. Create domain entities (Initiative, Author, Review, etc.) +3. Create repository interfaces + +### Phase 2: Business Rules +1. Novelty checker (using PostgreSQL pg_trgm) +2. Scoring engine +3. Auto-classification logic + +### Phase 3: Workflow +1. State machine implementation +2. Transition rules +3. Deadline tracking + +### Phase 4: API & Services +1. Initiative service +2. Review service +3. API endpoints +4. Document upload + +--- + +## Current vs. Required + +| Component | Current | Required | Status | +|-----------|---------|----------|--------| +| Workflow (SDLC) | ✅ | ✅ | Implemented | +| Initiative Management | ❌ | ✅ | **Missing** | +| Business Rules | ❌ | ✅ | **Missing** | +| Review System | ❌ | ✅ | **Missing** | +| State Machine | ❌ | ✅ | **Missing** | +| Document Storage | ❌ | ✅ | **Missing** | +| Scoring Engine | ❌ | ✅ | **Missing** | + +--- + +## Next Steps + +To implement the governance layer: + +1. **Start with database schema** - Create PostgreSQL tables for initiatives, authors, reviews +2. **Create domain entities** - Python classes for Initiative, Author, Review +3. **Implement business rules** - Novelty checker, scoring engine +4. **Build state machine** - Workflow transitions +5. **Create API endpoints** - RESTful APIs for frontend +6. **Add document storage** - Local filesystem integration + +The foundation (layered architecture, FastAPI, PostgreSQL) is already in place - you just need to build the governance-specific components on top of it. diff --git a/be0/TROUBLESHOOTING_CHAT.md b/be0/TROUBLESHOOTING_CHAT.md new file mode 100644 index 0000000..b510dfc --- /dev/null +++ b/be0/TROUBLESHOOTING_CHAT.md @@ -0,0 +1,150 @@ +# Chat Assistant Troubleshooting Guide + +## Common Errors and Solutions + +### Error: 500 Internal Server Error + +This usually indicates one of the following issues: + +#### 1. Ollama Not Running + +**Symptoms:** +- 500 error on `/api/v1/chat` +- Error message mentions "connection" or "refused" + +**Solution:** +```bash +# Check if Ollama is running in the container +docker exec be0 ps aux | grep ollama + +# If not running, restart the container +docker-compose restart be0 + +# Or start Ollama manually +docker exec be0 ollama serve & +``` + +#### 2. Model Not Available + +**Symptoms:** +- Error mentions "model not found" +- Model name mismatch + +**Solution:** +```bash +# Check available models +docker exec be0 ollama list + +# Pull the required model +docker exec be0 ollama pull gemma3:270M + +# Verify model is available +docker exec be0 ollama list | grep gemma3 +``` + +#### 3. Model Name Mismatch + +**Issue:** Code uses `gemma3:27b` but entrypoint pulls `gemma3:270M` + +**Solution:** +The code has been updated to use `gemma3:270M` to match the entrypoint script. + +#### 4. Network Connectivity + +**Symptoms:** +- Connection refused errors +- Timeout errors + +**Solution:** +```bash +# Check if Ollama is accessible from within the container +docker exec be0 curl http://localhost:11434/api/tags + +# Check Ollama service status +docker exec be0 ollama list +``` + +## Diagnostic Endpoints + +### Health Check +```bash +curl http://localhost:4402/health +``` + +This will show: +- Overall service status +- Ollama connection status +- Available models + +### Test Ollama Directly +```bash +# From inside the container +docker exec be0 ollama run gemma3:270M "Hello" +``` + +## Debugging Steps + +1. **Check Backend Logs:** + ```bash + docker-compose logs be0 | tail -50 + ``` + +2. **Check Chat Assistant Logs:** + ```bash + tail -f be0/logs/ChatAssistant.log + ``` + +3. **Test API Endpoint:** + ```bash + curl -X POST http://localhost:4402/api/v1/chat \ + -H "Content-Type: application/json" \ + -d '{"message": "Hello"}' + ``` + +4. **Verify Ollama Service:** + ```bash + docker exec be0 ollama list + docker exec be0 curl http://localhost:11434/api/tags + ``` + +## Common Fixes + +### Fix 1: Restart Ollama Service +```bash +docker exec be0 pkill ollama +docker exec be0 ollama serve & +sleep 2 +docker exec be0 ollama list +``` + +### Fix 2: Pull Missing Model +```bash +docker exec be0 ollama pull gemma3:270M +``` + +### Fix 3: Restart Container +```bash +docker-compose restart be0 +``` + +### Fix 4: Rebuild Container +```bash +docker-compose down +docker-compose build be0 +docker-compose up be0 +``` + +## Expected Behavior + +When working correctly: +1. Health endpoint shows Ollama as "connected" +2. Available models list includes `gemma3:270M` +3. Chat endpoint returns 200 with a response +4. Logs show successful message processing + +## Still Having Issues? + +1. Check the full error in logs: `docker-compose logs be0` +2. Verify Ollama is running: `docker exec be0 ps aux | grep ollama` +3. Test Ollama directly: `docker exec be0 ollama run gemma3:270M "test"` +4. Check model availability: `docker exec be0 ollama list` diff --git a/be0/__init__.py b/be0/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/__pycache__/__init__.cpython-313.pyc b/be0/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..e03dbea Binary files /dev/null and b/be0/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/__pycache__/main.cpython-311.pyc b/be0/__pycache__/main.cpython-311.pyc new file mode 100644 index 0000000..ad674b6 Binary files /dev/null and b/be0/__pycache__/main.cpython-311.pyc differ diff --git a/be0/__pycache__/main.cpython-313.pyc b/be0/__pycache__/main.cpython-313.pyc new file mode 100644 index 0000000..55aeb1b Binary files /dev/null and b/be0/__pycache__/main.cpython-313.pyc differ diff --git a/be0/assets/data/pdf/UNObank_IT Governance and Risk Management Policy v1.5.pdf b/be0/assets/data/pdf/UNObank_IT Governance and Risk Management Policy v1.5.pdf new file mode 100644 index 0000000..cbc3fa1 Binary files /dev/null and b/be0/assets/data/pdf/UNObank_IT Governance and Risk Management Policy v1.5.pdf differ diff --git a/be0/entrypoint.sh b/be0/entrypoint.sh new file mode 100755 index 0000000..1a2ebcb --- /dev/null +++ b/be0/entrypoint.sh @@ -0,0 +1,46 @@ +#!/bin/bash + +if command -v ollama >/dev/null 2>&1; then + echo "Starting Ollama server..." + ollama serve & + sleep 1 +else + echo "Ollama not installed in this image; skipping." +fi + +# if ! ollama list | grep -q "qwen2.5:3b"; then +# echo "Model qwen2.5:3b not found. Pulling..." +# ollama pull qwen2.5:3b + +# else +# echo "Model qwen2.5:3b already exists. Skipping pull." +# fi + +# #download embedding model +# if ! ollama list | grep -q "embeddinggemma:300m"; then +# echo "Model embeddinggemma:300m not found. Pulling..." +# ollama pull embeddinggemma:300m + +# else +# echo "Model embeddinggemma:300m already exists. Skipping pull." +# fi + +# NLTK corpora are installed when the image is built (see Dockerfile). +# Bind mount overwrites /app; image site-packages may be stale vs mounted requirements.txt. +if [ -f /app/requirements.txt ]; then + echo "Installing/updating Python deps from mounted /app/requirements.txt..." + pip install --no-cache-dir -r /app/requirements.txt || { + echo "ERROR: pip install -r /app/requirements.txt failed; fix deps and restart be0." + exit 1 + } +fi + +echo "Applying idempotent initiative DB migrations (008–014 incl. registration_otp_codes) if needed..." +python /app/scripts/apply_initiative_migrations.py || echo "WARNING: apply_initiative_migrations exited non-zero — check be0 logs (API may return 503 for evidence/artifacts until DB is fixed)." + +echo "Starting FastAPI..." +if [ "${UVICORN_RELOAD:-0}" = "1" ]; then + exec uvicorn main:app --host 0.0.0.0 --port 4402 --reload +else + exec uvicorn main:app --host 0.0.0.0 --port 4402 +fi diff --git a/be0/main.py b/be0/main.py new file mode 100644 index 0000000..22af1b9 --- /dev/null +++ b/be0/main.py @@ -0,0 +1,3726 @@ +from fastapi import FastAPI, HTTPException, BackgroundTasks, Request, Query +from fastapi.responses import JSONResponse, Response, StreamingResponse +from starlette.staticfiles import StaticFiles +from starlette.requests import Request as StarletteRequest +from pydantic import BaseModel, Field +import unicodedata +from typing import Dict, List, Optional, Any, Literal +from datetime import datetime, timezone +from fastapi import File, UploadFile, Form, Header, Body # type: ignore + +from src.auth_jwt import decode_access_token_user_id, decode_bearer_token + +from src.admin_audit_routes import router as admin_audit_router +from src.admin_user_profile_routes import router as admin_user_profile_router +from src.template_routes import router as template_router +from src.research_routes import router as research_router +from src.imagehub_routes import router as imagehub_router + +from fastapi.middleware.cors import CORSMiddleware # type: ignore + +from src.utils import initialize_a_logger +import ollama +import numpy as np +from src.internal_control.it_governance.document_io import DocumentIO +np.random.seed(42) +from pathlib import Path +import uuid +import json +import hashlib +import asyncio +from enum import Enum +from pydantic import BaseModel, Field, validator +import os +import subprocess +import sys +import yaml + +# Import the workflow (assuming it's in rm_workflow.py) +from langgraph.graph import StateGraph, START, END +from typing import TypedDict, Literal +import numpy as np +from src.compliance_verifier import Compliance_Verifier, ComplianceRequest, PromptRequest +from src.chat_assistant import ChatAssistant, ChatRequest, ChatResponse, get_chat_assistant +from src.auth_api import router as auth_api_router + +# Re-define the state and workflow components for FastAPI +class RMIntegrationState(TypedDict): + current_phase: str + phase_number: int + checklist_items: List[dict] + completed_items: List[int] + pending_approvals: List[str] + records_officer_involved: bool + project_status: str + comments: dict + validation_results: dict + next_phase_ready: bool + +# Pydantic models for API requests/responses +class TaskStatus(str, Enum): + PENDING = "pending" + IN_PROGRESS = "in_progress" + COMPLETED = "completed" + BLOCKED = "blocked" + +class WorkflowInitRequest(BaseModel): + project_name: str = Field(..., description="Name of the project") + project_description: Optional[str] = Field(None, description="Project description") + records_officer_email: Optional[str] = Field(None, description="Records officer contact") + +class UpdateItemRequest(BaseModel): + item_id: int = Field(..., description="ID of the checklist item to update") + status: TaskStatus = Field(..., description="New status of the item") + comment: Optional[str] = Field(None, description="Comment about the update") + updated_by: Optional[str] = Field(None, description="Who updated the item") + +class ApprovalRequest(BaseModel): + approval_type: str = Field(..., description="Type of approval") + approved: bool = Field(..., description="Whether approved or rejected") + approver: str = Field(..., description="Who provided the approval") + comment: Optional[str] = Field(None, description="Approval comment") + +class WorkflowResponse(BaseModel): + workflow_id: str + current_phase: str + phase_number: int + project_status: str + completion_percentage: float + pending_approvals: List[str] + next_phase_ready: bool + timestamp: str + +class StatusReport(BaseModel): + workflow_id: str + current_phase: str + phase_number: int + completion_percentage: float + completed_items: int + total_items: int + pending_approvals: List[str] + validation_results: Dict[str, str] + project_status: str + checklist_items: List[dict] + timestamp: str + +# Chat Assistant Request Models +class VerifyContentRequest(BaseModel): + """Request model for content verification.""" + field_name: str + content: str + verification_criteria: Optional[str] = None + +class PolicyQuestionRequest(BaseModel): + """Request model for policy questions.""" + question: str + policy_context: Optional[str] = None + +# In-memory storage for workflows (in production, use a database) +workflows_storage: Dict[str, Dict[str, Any]] = {} + + +# try: +# logger = initialize_a_logger() +# logger.info("Logger initialized successfully") # Test it immediately +# except Exception as e: +# import logging +# logging.basicConfig(level=logging.INFO) +# logger = logging.getLogger() +# logger.error(f"Logger initialization failed: {e}", exc_info=True) + +logger = initialize_a_logger('./logs/main.log') + +# FastAPI app initialization +app = FastAPI( + title="RM Integration SDLC Workflow API", + description="API for managing Records Management integration into System Development Life Cycle", + version="1.0.0" +) + +app.include_router(auth_api_router, prefix="/api/v1") +app.include_router(admin_audit_router, prefix="/api/v1") +app.include_router(admin_user_profile_router, prefix="/api/v1") +app.include_router(template_router, prefix="/api/v1") +app.include_router(research_router, prefix="/api/v1") +app.include_router(imagehub_router, prefix="/api/v1") + +APP_ROOT_DIR = Path(__file__).resolve().parent + + +def _resolved_frontend_public_dir() -> Path: + """`assets` at repo root locally, or `/app/assets` when mounted in Docker.""" + env = os.getenv("APPLICATION_REPORT_EXPORT_DIR") + if env: + return Path(env) + app_assets = APP_ROOT_DIR / "assets" + if app_assets.is_dir(): + return app_assets.resolve() + return (APP_ROOT_DIR.parent / "assets").resolve() + + +def _resolved_application_draft_dir() -> Path: + """ + fe0/public/application-drafts when running from repo; `/app/assets/application-drafts` in Docker + (see docker-compose `./assets:/app/assets`). Using APP_ROOT_DIR.parent/fe0 breaks inside Docker + (parent is `/`, producing `/fe0/...`). + """ + env = os.getenv("APPLICATION_DRAFT_DIR") + if env: + return Path(env) + fe0 = APP_ROOT_DIR.parent / "fe0" + if fe0.is_dir(): + return (fe0 / "public" / "application-drafts").resolve() + return (APP_ROOT_DIR / "assets" / "application-drafts").resolve() + + +FRONTEND_PUBLIC_DIR = _resolved_frontend_public_dir() +APPLICATION_DRAFT_DIR = _resolved_application_draft_dir() + + +def _load_application_draft_yaml(case_id: str) -> Optional[Dict[str, Any]]: + """Load draft from disk; try current path and legacy Docker mis-resolved path.""" + name = f"{case_id}.yml" + candidates = [ + APPLICATION_DRAFT_DIR / name, + APP_ROOT_DIR.parent / "fe0" / "public" / "application-drafts" / name, + Path("/fe0/public/application-drafts") / name, + ] + seen: set[str] = set() + for path in candidates: + try: + key = str(path.resolve(strict=False)) + except (OSError, RuntimeError): + key = str(path) + if key in seen: + continue + seen.add(key) + if path.is_file(): + with open(path, "r", encoding="utf-8") as handle: + data = yaml.safe_load(handle) or {} + if isinstance(data, dict): + return data + return None + + +def _empty_applicant_draft_bundle(case_id: str) -> Dict[str, Any]: + """ + Client-generated case IDs can exist in sessionStorage before any POST save. + DB reset / no YAML yet must not 404 — return the same shape as save/load. + """ + return { + "caseId": case_id, + "updatedAt": datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z"), + "tabs": {}, + } + + +class ApplicationDraftSaveRequest(BaseModel): + caseId: Optional[str] = None + tab: str + data: Dict[str, Any] + + +class ReviewDocumentSaveRequest(BaseModel): + caseId: str = Field(..., min_length=1, max_length=128) + officialBieuMau: Dict[str, Any] = Field(default_factory=dict) + templateData: Optional[Dict[str, Any]] = None + fullBundle: Optional[Dict[str, Any]] = None + + +class ReviewDocumentUpdateRequest(BaseModel): + officialBieuMau: Dict[str, Any] = Field(default_factory=dict) + templateData: Optional[Dict[str, Any]] = None + fullBundle: Optional[Dict[str, Any]] = None + + +class PdfLayoutEditPayload(BaseModel): + id: str = Field(default="", max_length=128) + text: str = Field(default="", max_length=20_000) + page: int = Field(default=1, ge=1, le=300) + x: float = 0 + y: float = 0 + fontSize: float = Field(default=12, ge=1, le=256) + lineHeight: float = Field(default=14, ge=1, le=512) + boxWidth: float = Field(default=240, ge=1, le=5_000) + letterSpacing: float = Field(default=0, ge=-50, le=200) + textAlign: Literal["left", "center", "right"] = "left" + fontName: Literal["TimesRoman", "TimesRomanBold", "Helvetica", "HelveticaBold"] = "TimesRoman" + colorHex: str = Field(default="#111827", max_length=16) + + +class UpdateSubmittedApplicationBody(BaseModel): + """Applicant history panel: edit name + submission date (mirrors fe0 ApplicantHistoryCrudDialog).""" + + name: str = Field(..., min_length=1, max_length=500) + submittedDate: str = Field(..., min_length=4, max_length=40) + + +class CreateSubmittedApplicationBody(BaseModel): + """Create a new shell record for applicant and immediately allocate `applicationId`.""" + + name: Optional[str] = Field(default=None, max_length=500) + + +def _cors_allow_origins() -> List[str]: + """Localhost defaults plus optional comma-separated `CORS_ORIGINS` (e.g. http://VM_IP:8081).""" + base = [ + "http://localhost:8081", + "http://localhost:8080", + "http://localhost:3000", + "http://127.0.0.1:8081", + "http://127.0.0.1:8080", + "http://127.0.0.1:3000", + ] + extra = os.getenv("CORS_ORIGINS", "").strip() + if not extra: + return base + merged = list(base) + for part in extra.split(","): + o = part.strip() + if o and o not in merged: + merged.append(o) + return merged + + +CORS_ALLOW_ORIGINS = _cors_allow_origins() +if "*" in CORS_ALLOW_ORIGINS: + raise RuntimeError("CORS_ORIGINS must not include '*' when allow_credentials=True") + +# document_parser = DocumentIO() + +app.add_middleware( + CORSMiddleware, + allow_origins=CORS_ALLOW_ORIGINS, + allow_credentials=True, + allow_methods=["GET", "POST", "PUT", "DELETE", "PATCH", "OPTIONS"], + allow_headers=["*"], + expose_headers=["*"], +) + + +@app.middleware("http") +async def _credential_version_middleware(request: StarletteRequest, call_next): + from src.auth_credential_middleware import auth_credential_version_middleware + + return await auth_credential_version_middleware(request, call_next) + + +@app.middleware("http") +async def _security_headers_middleware(request: StarletteRequest, call_next): + response = await call_next(request) + response.headers.setdefault("X-Content-Type-Options", "nosniff") + response.headers.setdefault("X-Frame-Options", "DENY") + response.headers.setdefault("Referrer-Policy", "strict-origin-when-cross-origin") + if os.getenv("ENVIRONMENT", "").lower() == "production": + response.headers.setdefault( + "Strict-Transport-Security", "max-age=31536000; includeSubDomains" + ) + return response + + +@app.on_event("startup") +async def _initiative_db_startup(): + from src.initiative_db.engine import init_engine, is_postgres_enabled + + if is_postgres_enabled(): + await init_engine() + logger.info("Initiative PostgreSQL persistence enabled") + mig_script = APP_ROOT_DIR / "scripts" / "apply_initiative_migrations.py" + if mig_script.is_file(): + try: + proc = await asyncio.create_subprocess_exec( + sys.executable, + str(mig_script), + cwd=str(APP_ROOT_DIR), + env=os.environ.copy(), + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + out, err = await asyncio.wait_for(proc.communicate(), timeout=120) + if out.strip(): + logger.info( + "apply_initiative_migrations: %s", + out.decode("utf-8", errors="replace").strip(), + ) + if err.strip(): + logger.warning( + "apply_initiative_migrations stderr: %s", + err.decode("utf-8", errors="replace").strip(), + ) + if proc.returncode != 0: + logger.warning( + "apply_initiative_migrations exited code %s (e.g. missing registration_otp_codes)", + proc.returncode, + ) + except Exception as exc: + logger.warning("apply_initiative_migrations could not run: %s", exc) + + try: + from src.minio.storage import S3Storage, settings as _s3s + + await S3Storage(_s3s).ensure_buckets_exist() + logger.info("MinIO/S3 buckets ensured (attachments/exports/quarantine).") + except Exception as exc: + logger.warning("MinIO/S3 bucket init skipped (configure S3_* env to enable evidence uploads): %s", exc) + + +@app.on_event("shutdown") +async def _initiative_db_shutdown(): + from src.initiative_db.engine import dispose_engine, is_postgres_enabled + + if is_postgres_enabled(): + await dispose_engine() + + +logger.info(f"parser start") +Compliance_Verifier = Compliance_Verifier() +Chat_Assistant = get_chat_assistant() + +# Import or redefine the workflow functions from the previous artifact +def phase1_concept_development(state: RMIntegrationState) -> RMIntegrationState: + """Phase 1: Concept Development - Initial records planning""" + + phase1_checklist = [ + { + "id": 1, + "task": "Include Records Officer in system design process", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + }, + { + "id": 2, + "task": "Identify records that support the business process", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 3, + "task": "Evaluate current record schedules applicability", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 4, + "task": "Determine if new record schedule is required", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 5, + "task": "Obtain Records Officer signature on Investment Summary Proposal", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + } + ] + + state["current_phase"] = "Concept Development" + state["phase_number"] = 1 + state["checklist_items"] = phase1_checklist + state["pending_approvals"] = ["Records Officer - System Design", "Records Officer - Investment Summary"] + + return state + +def create_rm_integration_workflow(): + """Simplified workflow creation for FastAPI""" + # This would include all the phase functions from the previous artifact + # For brevity, I'm showing the structure + workflow = StateGraph(RMIntegrationState) + # Add all nodes and edges as in the previous implementation + return workflow + +@app.get("/") +async def root(): + """Root endpoint with API information""" + return { + "message": "RM Integration SDLC Workflow API", + "version": "1.0.0", + "endpoints": { + "POST /workflows": "Create new workflow", + "GET /workflows/{workflow_id}": "Get workflow status", + "PUT /workflows/{workflow_id}/items": "Update checklist item", + "POST /workflows/{workflow_id}/approvals": "Submit approval", + "GET /workflows/{workflow_id}/report": "Get detailed status report", + "POST /workflows/{workflow_id}/advance": "Advance to next phase", + "GET /workflows": "List all workflows" + } + } + +@app.post("/workflows", response_model=WorkflowResponse) +async def create_workflow(request: WorkflowInitRequest): + """Create a new RM integration workflow""" + + workflow_id = str(uuid.uuid4()) + + # Initialize workflow state + initial_state = { + "current_phase": "", + "phase_number": 0, + "checklist_items": [], + "completed_items": [], + "pending_approvals": [], + "records_officer_involved": False, + "project_status": "Starting RM Integration Process", + "comments": {}, + "validation_results": {}, + "next_phase_ready": False + } + + # Start with Phase 1 + state = phase1_concept_development(initial_state) + + # Store workflow + workflows_storage[workflow_id] = { + "state": state, + "metadata": { + "project_name": request.project_name, + "project_description": request.project_description, + "records_officer_email": request.records_officer_email, + "created_at": datetime.now().isoformat(), + "last_updated": datetime.now().isoformat() + } + } + + completed_count = len([item for item in state["checklist_items"] if item["status"] == "completed"]) + total_count = len(state["checklist_items"]) + completion_percentage = (completed_count / total_count * 100) if total_count > 0 else 0 + + return WorkflowResponse( + workflow_id=workflow_id, + current_phase=state["current_phase"], + phase_number=state["phase_number"], + project_status=state["project_status"], + completion_percentage=completion_percentage, + pending_approvals=state["pending_approvals"], + next_phase_ready=state["next_phase_ready"], + timestamp=datetime.now().isoformat() + ) + +@app.get("/workflows/{workflow_id}", response_model=WorkflowResponse) +async def get_workflow_status(workflow_id: str): + """Get current workflow status""" + + if workflow_id not in workflows_storage: + raise HTTPException(status_code=404, detail="Workflow not found") + + state = workflows_storage[workflow_id]["state"] + + completed_count = len([item for item in state["checklist_items"] if item["status"] == "completed"]) + total_count = len(state["checklist_items"]) + completion_percentage = (completed_count / total_count * 100) if total_count > 0 else 0 + + return WorkflowResponse( + workflow_id=workflow_id, + current_phase=state["current_phase"], + phase_number=state["phase_number"], + project_status=state["project_status"], + completion_percentage=completion_percentage, + pending_approvals=state["pending_approvals"], + next_phase_ready=state["next_phase_ready"], + timestamp=datetime.now().isoformat() + ) + +@app.put("/workflows/{workflow_id}/items") +async def update_checklist_item(workflow_id: str, request: UpdateItemRequest): + """Update a checklist item status""" + + if workflow_id not in workflows_storage: + raise HTTPException(status_code=404, detail="Workflow not found") + + state = workflows_storage[workflow_id]["state"] + + # Find and update the item + item_found = False + for item in state["checklist_items"]: + if item["id"] == request.item_id: + item["status"] = request.status.value + if request.status == TaskStatus.COMPLETED and request.item_id not in state["completed_items"]: + state["completed_items"].append(request.item_id) + item_found = True + break + + if not item_found: + raise HTTPException(status_code=404, detail="Checklist item not found") + + # Add comment if provided + if request.comment: + state["comments"][request.item_id] = { + "comment": request.comment, + "updated_by": request.updated_by, + "timestamp": datetime.now().isoformat() + } + + # Update metadata + workflows_storage[workflow_id]["metadata"]["last_updated"] = datetime.now().isoformat() + + return {"message": f"Item {request.item_id} updated successfully", "status": request.status.value} + +@app.post("/workflows/{workflow_id}/approvals") +async def submit_approval(workflow_id: str, request: ApprovalRequest): + """Submit an approval for the workflow""" + + if workflow_id not in workflows_storage: + raise HTTPException(status_code=404, detail="Workflow not found") + + state = workflows_storage[workflow_id]["state"] + + # Remove approval from pending if approved + if request.approved and request.approval_type in state["pending_approvals"]: + state["pending_approvals"].remove(request.approval_type) + + # Log the approval + approval_key = f"approval_{len(state.get('approval_log', []))}" + if "approval_log" not in state: + state["approval_log"] = [] + + state["approval_log"].append({ + "approval_type": request.approval_type, + "approved": request.approved, + "approver": request.approver, + "comment": request.comment, + "timestamp": datetime.now().isoformat() + }) + + # Update metadata + workflows_storage[workflow_id]["metadata"]["last_updated"] = datetime.now().isoformat() + + return { + "message": f"Approval {'granted' if request.approved else 'rejected'} for {request.approval_type}", + "pending_approvals": state["pending_approvals"] + } + +@app.get("/workflows/{workflow_id}/report", response_model=StatusReport) +async def get_detailed_report(workflow_id: str): + """Get detailed status report for a workflow""" + + if workflow_id not in workflows_storage: + raise HTTPException(status_code=404, detail="Workflow not found") + + state = workflows_storage[workflow_id]["state"] + metadata = workflows_storage[workflow_id]["metadata"] + + completed_count = len([item for item in state["checklist_items"] if item["status"] == "completed"]) + total_count = len(state["checklist_items"]) + completion_percentage = (completed_count / total_count * 100) if total_count > 0 else 0 + + return StatusReport( + workflow_id=workflow_id, + current_phase=state["current_phase"], + phase_number=state["phase_number"], + completion_percentage=completion_percentage, + completed_items=completed_count, + total_items=total_count, + pending_approvals=state["pending_approvals"], + validation_results=state["validation_results"], + project_status=state["project_status"], + checklist_items=state["checklist_items"], + timestamp=datetime.now().isoformat() + ) + +@app.post("/workflows/{workflow_id}/advance") +async def advance_workflow(workflow_id: str): + """Attempt to advance workflow to next phase""" + + if workflow_id not in workflows_storage: + raise HTTPException(status_code=404, detail="Workflow not found") + + state = workflows_storage[workflow_id]["state"] + + # Validate current phase completion + all_completed = True + validation_results = {} + + for item in state["checklist_items"]: + if item["status"] != "completed": + all_completed = False + validation_results[str(item["id"])] = f"Item {item['id']} not completed: {item['task']}" + + if state["pending_approvals"]: + all_completed = False + validation_results["approvals"] = f"Pending approvals: {', '.join(state['pending_approvals'])}" + + if not all_completed: + state["validation_results"] = validation_results + state["next_phase_ready"] = False + return { + "success": False, + "message": "Cannot advance: Phase requirements not met", + "validation_results": validation_results + } + + # Advance to next phase + current_phase = state["phase_number"] + if current_phase >= 8: + return { + "success": False, + "message": "Workflow already at final phase", + "current_phase": state["current_phase"] + } + + # Here you would call the appropriate next phase function + # For now, just incrementing phase number as example + state["phase_number"] += 1 + state["current_phase"] = f"Phase {state['phase_number']}" + state["project_status"] = f"Advanced to {state['current_phase']}" + state["validation_results"] = {} + state["next_phase_ready"] = True + + # Update metadata + workflows_storage[workflow_id]["metadata"]["last_updated"] = datetime.now().isoformat() + + return { + "success": True, + "message": f"Advanced to {state['current_phase']}", + "current_phase": state["current_phase"], + "phase_number": state["phase_number"] + } + +@app.get("/workflows") +async def list_workflows(): + """List all workflows with basic information""" + + workflows = [] + for workflow_id, data in workflows_storage.items(): + state = data["state"] + metadata = data["metadata"] + + completed_count = len([item for item in state["checklist_items"] if item["status"] == "completed"]) + total_count = len(state["checklist_items"]) + completion_percentage = (completed_count / total_count * 100) if total_count > 0 else 0 + + workflows.append({ + "workflow_id": workflow_id, + "project_name": metadata["project_name"], + "current_phase": state["current_phase"], + "phase_number": state["phase_number"], + "completion_percentage": completion_percentage, + "created_at": metadata["created_at"], + "last_updated": metadata["last_updated"] + }) + + return {"workflows": workflows, "total_count": len(workflows)} + +@app.delete("/workflows/{workflow_id}") +async def delete_workflow(workflow_id: str): + """Delete a workflow""" + + if workflow_id not in workflows_storage: + raise HTTPException(status_code=404, detail="Workflow not found") + + del workflows_storage[workflow_id] + + return {"message": f"Workflow {workflow_id} deleted successfully"} + +# Health check endpoint +@app.get("/health") +async def health_check(): + """Health check endpoint""" + # Check Ollama connectivity + ollama_status = "unknown" + try: + import ollama + models = ollama.list() + ollama_status = "connected" + available_models = [m.get("name", "") for m in models.get("models", [])] + except Exception as e: + ollama_status = f"error: {str(e)}" + available_models = [] + + return { + "status": "healthy", + "timestamp": datetime.now().isoformat(), + "active_workflows": len(workflows_storage), + "ollama": { + "status": ollama_status, + "available_models": available_models + } + } + +# Test endpoint for connectivity +@app.get("/api/v1/test") +async def test_endpoint(): + """Simple test endpoint to verify connectivity""" + return { + "message": "Backend is reachable", + "timestamp": datetime.now().isoformat(), + "status": "ok" + } + +# Error handlers +@app.exception_handler(ValueError) +async def value_error_handler(request, exc): + return JSONResponse( + status_code=400, + content={"detail": str(exc)} + ) + + +@app.post("/test_ollama") +async def test_ollama(req: PromptRequest, authorization: Optional[str] = Header(None)): + _require_admin_user(authorization) + try: + response = ollama.chat( + model="qwen2.5:3b", + messages=[{'role': 'user', 'content': req.prompt}], + options={"temperature": 0.0}, + ) + return {"oss_json": response["message"]["content"]} + except HTTPException as e: + return {"oss_json": str(e)} + + except HTTPException as e: + return {"oss_json": str(e)} + +@app.post("/test_ollama_1") +async def test_ollama_1(req: PromptRequest, authorization: Optional[str] = Header(None)): + _require_admin_user(authorization) + result = await Compliance_Verifier.generate_text(req) + return result + +@app.post("/test_ollama_similarity") +async def vectorize_requirement(req: PromptRequest, authorization: Optional[str] = Header(None)): + _require_admin_user(authorization) + result = await Compliance_Verifier.vectorize_requirement(req) + return result + +@app.post("/analyze_compliance") +async def semantic_similarity( + data: ComplianceRequest, authorization: Optional[str] = Header(None) +) -> Dict[str, Any]: + _require_authenticated_user(authorization) + result = await Compliance_Verifier.semantic_similarity(data) + return result + +@app.post("/analyze_structure") +async def structure_similarity( + data: ComplianceRequest, authorization: Optional[str] = Header(None) +) -> Dict[str, Any]: + _require_authenticated_user(authorization) + result = await Compliance_Verifier.structural_similarity(data) + return result + +# Chat Assistant Endpoints +@app.options("/api/v1/chat") +async def chat_options(request: StarletteRequest): + """Handle CORS preflight for chat endpoint""" + origin = request.headers.get("origin", "http://localhost:8080") + allow_origin = origin if origin in CORS_ALLOW_ORIGINS else "http://localhost:8080" + + return Response( + status_code=200, + headers={ + "Access-Control-Allow-Origin": allow_origin, + "Access-Control-Allow-Methods": "POST, OPTIONS", + "Access-Control-Allow-Headers": "Content-Type, Authorization", + "Access-Control-Allow-Credentials": "true", + } + ) + +@app.post("/api/v1/chat", response_model=ChatResponse) +async def chat_endpoint(request: ChatRequest, authorization: Optional[str] = Header(None)): + """ + Chat endpoint for conversational AI assistant. + Handles policy questions and general compliance queries. + """ + _require_authenticated_user(authorization) + try: + logger.info(f"Chat endpoint called with message: {request.message[:50] if request.message else 'Empty'}...") + logger.debug(f"Full request: message={request.message}, has_history={bool(request.conversation_history)}, context={request.context}") + + # Validate request + if not request.message or not request.message.strip(): + raise HTTPException(status_code=400, detail="Message cannot be empty") + + response = await Chat_Assistant.chat(request) + logger.info(f"Chat response generated successfully: {response.message[:50] if response.message else 'Empty'}...") + return response + except HTTPException as he: + logger.error(f"HTTPException in chat endpoint: {he.status_code} - {he.detail}") + raise + except Exception as e: + logger.error(f"Unexpected error in chat endpoint: {e}", exc_info=True) + import traceback + logger.error(f"Full traceback: {traceback.format_exc()}") + raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}") + +@app.post("/api/v1/chat/verify", response_model=ChatResponse) +async def verify_content_endpoint( + request: VerifyContentRequest, authorization: Optional[str] = Header(None) +): + """ + Verify content against compliance requirements. + """ + _require_authenticated_user(authorization) + try: + response = await Chat_Assistant.verify_content( + field_name=request.field_name, + content=request.content, + verification_criteria=request.verification_criteria + ) + return response + except HTTPException: + raise + except Exception as e: + logger.error(f"Error in verify endpoint: {e}", exc_info=True) + raise HTTPException(status_code=500, detail=str(e)) + +@app.post("/api/v1/chat/question", response_model=ChatResponse) +async def answer_policy_question( + request: PolicyQuestionRequest, authorization: Optional[str] = Header(None) +): + """ + Answer a policy or compliance question. + """ + _require_authenticated_user(authorization) + try: + response = await Chat_Assistant.answer_policy_question( + question=request.question, + policy_context=request.policy_context + ) + return response + except HTTPException: + raise + except Exception as e: + logger.error(f"Error in question endpoint: {e}", exc_info=True) + raise HTTPException(status_code=500, detail=str(e)) + +# Idea Management Endpoints +class IdeaRequest(BaseModel): + title: str = Field(..., description="Title of the idea") + description: str = Field(..., description="Description of the idea") + category: Optional[str] = Field(None, description="Category of the idea") + +class IdeaSearchRequest(BaseModel): + query: str = Field(..., description="Search query text") + limit: Optional[int] = Field(5, description="Maximum number of results") + score_threshold: Optional[float] = Field(0.5, description="Minimum similarity score") + +# Initialize Qdrant collection on first API call (lazy initialization) + +@app.post("/api/v1/ideas") +async def add_idea(request: IdeaRequest, authorization: Optional[str] = Header(None)): + """Add a new idea to the vector database""" + _require_admin_user(authorization) + try: + from src.infrastructure.vector_db.qdrant_service import get_qdrant_service + qdrant_service = get_qdrant_service() + # Ensure collection is initialized + await qdrant_service.initialize_collection() + result = await qdrant_service.add_idea( + title=request.title, + description=request.description, + category=request.category + ) + return result + except Exception as e: + logger.error(f"Error adding idea: {e}", exc_info=True) + raise HTTPException(status_code=500, detail=str(e)) + +@app.post("/api/v1/ideas/search") +async def search_ideas(request: IdeaSearchRequest, authorization: Optional[str] = Header(None)): + """Search for similar ideas using vector similarity""" + _require_authenticated_user(authorization) + try: + from src.infrastructure.vector_db.qdrant_service import get_qdrant_service + qdrant_service = get_qdrant_service() + # Ensure collection is initialized + await qdrant_service.initialize_collection() + results = await qdrant_service.search_similar_ideas( + query_text=request.query, + limit=request.limit or 5, + score_threshold=request.score_threshold or 0.5 + ) + return {"results": results, "count": len(results)} + except Exception as e: + logger.error(f"Error searching ideas: {e}", exc_info=True) + raise HTTPException(status_code=500, detail=str(e)) + +@app.get("/api/v1/ideas") +async def get_all_ideas(limit: int = 100, authorization: Optional[str] = Header(None)): + """Get all ideas from the database""" + _require_admin_user(authorization) + try: + from src.infrastructure.vector_db.qdrant_service import get_qdrant_service + qdrant_service = get_qdrant_service() + ideas = await qdrant_service.get_all_ideas(limit=limit) + return {"ideas": ideas, "count": len(ideas)} + except Exception as e: + logger.error(f"Error getting ideas: {e}", exc_info=True) + raise HTTPException(status_code=500, detail=str(e)) + +@app.delete("/api/v1/ideas/{idea_id}") +async def delete_idea(idea_id: str, authorization: Optional[str] = Header(None)): + """Delete an idea from the database""" + _require_admin_user(authorization) + try: + from src.infrastructure.vector_db.qdrant_service import get_qdrant_service + qdrant_service = get_qdrant_service() + success = await qdrant_service.delete_idea(idea_id) + if success: + return {"message": "Idea deleted successfully", "id": idea_id} + else: + raise HTTPException(status_code=404, detail="Idea not found") + except HTTPException: + raise + except Exception as e: + logger.error(f"Error deleting idea: {e}", exc_info=True) + raise HTTPException(status_code=500, detail=str(e)) + +@app.post("/api/v1/ideas/bulk-add") +async def bulk_add_ideas(ideas: List[IdeaRequest], authorization: Optional[str] = Header(None)): + """Add multiple ideas at once""" + _require_admin_user(authorization) + if len(ideas) > 50: + raise HTTPException(status_code=422, detail="Tối đa 50 ý tưởng mỗi lần.") + try: + from src.infrastructure.vector_db.qdrant_service import get_qdrant_service + qdrant_service = get_qdrant_service() + # Ensure collection is initialized + await qdrant_service.initialize_collection() + results = [] + for idea in ideas: + result = await qdrant_service.add_idea( + title=idea.title, + description=idea.description, + category=idea.category + ) + results.append(result) + return {"results": results, "count": len(results)} + except Exception as e: + logger.error(f"Error bulk adding ideas: {e}", exc_info=True) + raise HTTPException(status_code=500, detail=str(e)) + +@app.post("/api/v1/ideas/initialize-ump") +async def initialize_ump_ideas(authorization: Optional[str] = Header(None)): + """Initialize database with the 10 UMP innovation ideas""" + _require_admin_user(authorization) + ump_ideas = [ + IdeaRequest( + title="Nền tảng Trợ lý AI học tập lâm sàng (Clinical AI Tutor)", + description="Ứng dụng AI đóng vai trò trợ giảng cho sinh viên y, hỗ trợ phân tích ca bệnh giả lập, giải thích cận lâm sàng, và gợi ý chẩn đoán theo phác đồ Việt Nam.", + category="Giáo dục - AI" + ), + IdeaRequest( + title="Hệ thống bệnh án điện tử học thuật (Academic EMR Sandbox)", + description="Môi trường EMR mô phỏng cho đào tạo và nghiên cứu, cho phép sinh viên và giảng viên thực hành nhập – phân tích – khai thác dữ liệu y khoa mà không ảnh hưởng dữ liệu bệnh nhân thật.", + category="Giáo dục - Chuyển đổi số" + ), + IdeaRequest( + title="Trung tâm mô phỏng y khoa bằng AR/VR & Digital Twin", + description="Xây dựng phòng lab mô phỏng phẫu thuật, cấp cứu, và quy trình điều trị bằng AR/VR, kết hợp mô hình \"digital twin\" của cơ thể người phục vụ đào tạo nâng cao.", + category="Giáo dục - AR/VR" + ), + IdeaRequest( + title="Chương trình Y tế cộng đồng số cho vùng sâu vùng xa", + description="Kết hợp telehealth, trợ lý ảo y tế (agentic care) và AI sàng lọc sớm bệnh không lây (NCD) cho người dân vùng nông thôn, miền núi và hải đảo.", + category="Tác động xã hội - Telehealth" + ), + IdeaRequest( + title="Nền tảng nghiên cứu AI y sinh dùng chung (UMP AI Research Hub)", + description="Cung cấp hạ tầng GPU, kho dữ liệu y khoa ẩn danh, và công cụ phân tích AI cho giảng viên – nghiên cứu sinh – startup hợp tác nghiên cứu.", + category="Nghiên cứu - AI" + ), + IdeaRequest( + title="Hệ thống theo dõi và dự báo sức khỏe sinh viên & nhân viên y tế", + description="Ứng dụng phân tích dữ liệu và AI để phát hiện sớm stress, burnout, và vấn đề sức khỏe tâm thần trong cộng đồng sinh viên và nhân viên y tế.", + category="Tác động xã hội - Sức khỏe" + ), + IdeaRequest( + title="Vườn ươm khởi nghiệp công nghệ y sinh (MedTech Incubator)", + description="Hỗ trợ sinh viên, bác sĩ và giảng viên phát triển startup MedTech, HealthTech, AI y tế thông qua mentoring, quỹ seed và kết nối bệnh viện – doanh nghiệp.", + category="Khởi nghiệp - MedTech" + ), + IdeaRequest( + title="Hệ thống quản lý chất lượng đào tạo và kiểm định số", + description="Số hóa toàn bộ quy trình đảm bảo chất lượng nội bộ (IQA), đánh giá chương trình đào tạo, và chuẩn hóa theo tiêu chuẩn quốc tế (WFME, AUN-QA).", + category="Giáo dục - Quản lý chất lượng" + ), + IdeaRequest( + title="Nền tảng dữ liệu lớn phòng chống dịch và bệnh không lây", + description="Phân tích dữ liệu dịch tễ, môi trường, và hành vi để dự báo dịch bệnh, hỗ trợ Sở Y tế và Bộ Y tế trong ra quyết định chính sách.", + category="Nghiên cứu - Dịch tễ học" + ), + IdeaRequest( + title="Học viện Y học chính xác & Y học cá thể hóa", + description="Kết hợp dữ liệu gen, hình ảnh y khoa, lối sống và AI để nghiên cứu và ứng dụng điều trị cá thể hóa cho bệnh ung thư, tim mạch và bệnh mạn tính.", + category="Nghiên cứu - Y học chính xác" + ), + ] + + try: + from src.infrastructure.vector_db.qdrant_service import get_qdrant_service + qdrant_service = get_qdrant_service() + # Ensure collection is initialized + await qdrant_service.initialize_collection() + results = [] + for idea in ump_ideas: + result = await qdrant_service.add_idea( + title=idea.title, + description=idea.description, + category=idea.category + ) + results.append(result) + return {"results": results, "count": len(results), "message": f"Successfully added {len(results)} UMP ideas"} + except Exception as e: + logger.error(f"Error initializing UMP ideas: {e}", exc_info=True) + raise HTTPException(status_code=500, detail=str(e)) + + +@app.post("/api/v1/application-reports/excel") +async def save_application_report_excel(caseId: str = Form(...), file: UploadFile = File(...)): + """ + Save uploaded application Excel file to shared assets folder so admin can review it. + """ + if not caseId.strip(): + raise HTTPException(status_code=400, detail="caseId is required") + + FRONTEND_PUBLIC_DIR.mkdir(parents=True, exist_ok=True) + safe_case_id = "".join(ch for ch in caseId if ch.isalnum() or ch in ("-", "_")) + if not safe_case_id: + raise HTTPException(status_code=400, detail="Invalid caseId") + + target_name = f"{safe_case_id}.xlsx" + target_path = FRONTEND_PUBLIC_DIR / target_name + content = await file.read() + with open(target_path, "wb") as output: + output.write(content) + + return { + "caseId": safe_case_id, + "fileName": target_name, + "savedPath": str(target_path), + "publicUrl": f"/assets/{target_name}", + } + + +@app.get("/api/v1/application-reports") +async def list_application_reports(): + """ + List saved report files from shared assets folder. + """ + FRONTEND_PUBLIC_DIR.mkdir(parents=True, exist_ok=True) + files = [] + for path in sorted(FRONTEND_PUBLIC_DIR.glob("*.xlsx"), key=lambda p: p.stat().st_mtime, reverse=True): + stat = path.stat() + files.append( + { + "fileName": path.name, + "publicUrl": f"/assets/{path.name}", + "sizeBytes": stat.st_size, + "updatedAt": datetime.fromtimestamp(stat.st_mtime).isoformat(), + } + ) + return {"files": files} + + +def _normalize_case_id(case_id: Optional[str]) -> str: + raw = case_id or f"CASE-{int(datetime.now().timestamp() * 1000)}" + safe = "".join(ch for ch in raw if ch.isalnum() or ch in ("-", "_")) + if not safe: + raise HTTPException(status_code=400, detail="Invalid caseId") + return safe + + +def _draft_path(case_id: str) -> Path: + APPLICATION_DRAFT_DIR.mkdir(parents=True, exist_ok=True) + return APPLICATION_DRAFT_DIR / f"{case_id}.yml" + + +@app.post("/api/v1/application-drafts") +async def save_application_draft( + request: ApplicationDraftSaveRequest, + authorization: Optional[str] = Header(None), +): + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.drafts import save_application_draft_tab + + case_id = _normalize_case_id(request.caseId) + owner_uid = decode_access_token_user_id(authorization) + + if is_postgres_enabled(): + try: + async with get_session() as session: + return await save_application_draft_tab( + session, case_id, request.tab, request.data, owner_id=owner_uid + ) + except HTTPException: + raise + except Exception as e: + logger.exception("application draft save (PostgreSQL) failed") + raise HTTPException(status_code=500, detail="Failed to persist draft") from e + + target = _draft_path(case_id) + + current: Dict[str, Any] = { + "caseId": case_id, + "updatedAt": datetime.now().isoformat(), + "tabs": {}, + } + if target.exists(): + with open(target, "r", encoding="utf-8") as handle: + loaded = yaml.safe_load(handle) or {} + if isinstance(loaded, dict): + current.update(loaded) + current["tabs"] = dict(loaded.get("tabs") or {}) + + current["caseId"] = case_id + current["updatedAt"] = datetime.now().isoformat() + current["tabs"][request.tab] = request.data + + with open(target, "w", encoding="utf-8") as handle: + yaml.safe_dump(current, handle, allow_unicode=True, sort_keys=False) + + return { + "caseId": case_id, + "updatedAt": current["updatedAt"], + "tabs": current["tabs"], + "publicUrl": f"/application-drafts/{case_id}.yml", + } + + +@app.get("/api/v1/application-drafts/{case_id}") +async def get_application_draft(case_id: str): + from sqlalchemy.exc import IntegrityError + + from src.initiative_db.drafts import ( + get_application_draft_document, + insert_initiative_draft_if_missing, + ) + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import merge_application_draft_document_with_snapshot_if_needed + + safe_case_id = _normalize_case_id(case_id) + if is_postgres_enabled(): + try: + async with get_session() as session: + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + ini_res = await resolve_initiative_for_draft_case_key(session, case_id) + if ini_res is not None: + safe_case_id = ini_res.case_code + except Exception: + logger.exception("resolve initiative for application draft GET failed; using path case id") + yaml_fallback = _load_application_draft_yaml(safe_case_id) + + if is_postgres_enabled(): + try: + async with get_session() as session: + try: + doc = await get_application_draft_document(session, safe_case_id) + return await merge_application_draft_document_with_snapshot_if_needed(session, safe_case_id, doc) + except KeyError: + pass + if yaml_fallback is None: + return _empty_applicant_draft_bundle(safe_case_id) + try: + await insert_initiative_draft_if_missing(session, safe_case_id, yaml_fallback) + except IntegrityError: + await session.rollback() + async with get_session() as session: + try: + doc = await get_application_draft_document(session, safe_case_id) + return await merge_application_draft_document_with_snapshot_if_needed(session, safe_case_id, doc) + except KeyError: + return yaml_fallback + except HTTPException: + raise + except Exception as e: + logger.exception("application draft load (PostgreSQL) failed") + raise HTTPException(status_code=500, detail="Failed to load draft") from e + + if yaml_fallback is not None: + return yaml_fallback + return _empty_applicant_draft_bundle(safe_case_id) + + +def _evidence_kind_to_role(kind: object) -> Optional[str]: + """ + Map API kind query/form value → storage role. + + Accepts str or list (duplicate ``kind=`` keys); strips BOM / ZWSP; NFKC-normalizes Unicode + so lookalike Latin cannot bypass the allow-list. + """ + if kind is None: + return None + if isinstance(kind, (list, tuple)): + candidates = [str(x) for x in kind if x is not None and str(x).strip() != ""] + else: + candidates = [str(kind)] + + for c in candidates: + k = unicodedata.normalize("NFKC", (c or "").strip()) + k = k.replace("\ufeff", "").replace("\u200b", "").strip().lower() + if k == "research": + return "research_evidence" + if k == "textbook": + return "textbook_evidence" + if k == "technical": + return "technical_evidence" + return None + + +def _evidence_role_to_api_kind(role: str) -> str: + if role == "research_evidence": + return "research" + if role == "textbook_evidence": + return "textbook" + if role == "technical_evidence": + return "technical" + return "research" + + +def _evidence_row_looks_like_pdf(row, default_name: str) -> bool: + """True when the stored artifact should be shown with an inline PDF presign.""" + mt = (row.mime_type or "").lower() + if "pdf" in mt: + return True + dn = (default_name or "").lower() + return dn.endswith(".pdf") + + +def _jwt_role_strings(authorization: Optional[str]) -> list[str]: + p = decode_bearer_token(authorization) + if not p: + return [] + r = p.get("roles") + if isinstance(r, list): + return [str(x) for x in r] + return [] + + +def _is_staff_reviewer(authorization: Optional[str]) -> bool: + roles = _jwt_role_strings(authorization) + return "admin" in roles or "editor" in roles + + +def _require_admin_user(authorization: Optional[str]) -> uuid.UUID: + """JWT must be valid and include role ``admin``.""" + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để thực hiện thao tác.") + roles = _jwt_role_strings(authorization) + if "admin" not in roles: + raise HTTPException(status_code=403, detail="Chỉ tài khoản quản trị mới thực hiện được.") + return uid + + +def _require_authenticated_user(authorization: Optional[str]) -> uuid.UUID: + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để thực hiện thao tác.") + return uid + + +def _require_staff_reviewer(authorization: Optional[str]) -> uuid.UUID: + uid = _require_authenticated_user(authorization) + if not _is_staff_reviewer(authorization): + raise HTTPException(status_code=403, detail="Không có quyền truy cập.") + return uid + + +async def _assert_initiative_case_access( + session: Any, + case_id: str, + uid: uuid.UUID, + authorization: Optional[str], +) -> None: + """Allow staff or initiative owner for the normalized case code.""" + from sqlalchemy import select + + from src.initiative_db.models import Initiative + + normalized = _normalize_case_id(case_id) + ini = (await session.execute(select(Initiative).where(Initiative.case_code == normalized))).scalar_one_or_none() + if ini is None: + return + if ini.owner_id == uid or _is_staff_reviewer(authorization): + return + raise HTTPException(status_code=403, detail="Không có quyền xem hồ sơ này.") + + +async def _assert_review_document_access( + session: Any, + review_document_id: str, + uid: uuid.UUID, + authorization: Optional[str], +) -> None: + from src.initiative_db.models import ApplicationReviewDocument, Initiative + + try: + rid = uuid.UUID(str(review_document_id)) + except ValueError as exc: + raise HTTPException(status_code=404, detail="Không tìm thấy review document") from exc + doc = await session.get(ApplicationReviewDocument, rid) + if doc is None: + raise HTTPException(status_code=404, detail="Không tìm thấy review document") + ini = await session.get(Initiative, doc.initiative_id) + if ini is not None and ini.owner_id != uid and not _is_staff_reviewer(authorization): + raise HTTPException(status_code=403, detail="Không có quyền xem hồ sơ này.") + + +class AdminApplicationResultBody(BaseModel): + decision: Literal["approved", "rejected"] + feedback: str = Field(default="", max_length=50_000) + rationale: Optional[str] = Field(default=None, max_length=50_000) + + +def _initiative_allows_owner_evidence_edit(status: str) -> bool: + s = (status or "").strip().lower() + return s not in ("approved", "rejected") + + +def _normalize_pdf_layout_edits(raw: Any) -> list[Dict[str, Any]]: + if not isinstance(raw, list): + raise HTTPException(status_code=422, detail="layoutEdits phải là mảng.") + if len(raw) > 200: + raise HTTPException(status_code=422, detail="Tối đa 200 mục chỉnh bố cục.") + out: list[Dict[str, Any]] = [] + for idx, item in enumerate(raw): + if not isinstance(item, dict): + raise HTTPException(status_code=422, detail=f"layoutEdits[{idx}] không hợp lệ.") + try: + parsed = PdfLayoutEditPayload(**item) + except Exception as exc: + raise HTTPException(status_code=422, detail=f"layoutEdits[{idx}] lỗi định dạng: {exc}") from exc + row = parsed.model_dump() + if not str(row.get("text") or "").strip(): + continue + out.append(row) + return out + + +def _load_minio_for_evidence(): + """Returns (storage, bucket_name, err_msg). On failure, storage is None.""" + try: + from src.minio.storage import S3Storage, settings as s3settings + + return S3Storage(), s3settings.s3_bucket_attachments, None + except Exception as exc: + logger.warning("MinIO / S3 not available for evidence upload: %s", exc) + return None, None, str(exc) + + +def _load_minio_for_exports(): + """Returns (storage, bucket_name, err_msg). On failure, storage is None.""" + try: + from src.minio.storage import S3Storage, settings as s3settings + + return S3Storage(), s3settings.s3_bucket_exports, None + except Exception as exc: + logger.warning("MinIO / S3 not available for export upload: %s", exc) + return None, None, str(exc) + + +@app.post("/api/v1/application-drafts/{case_id}/evidence") +async def upload_application_draft_evidence( + case_id: str, + kind: str = Form(..., description="research | textbook | technical (Minh chứng 2.1 / 2.2 / kỹ thuật)"), + file: UploadFile = File(...), + authorization: Optional[str] = Header(None), +): + """ + Tải minh chứng (PDF, hình, Word, Excel, …) lên MinIO; chủ sở hữu, trừ hồ sơ đã approved/rejected. + """ + from src.initiative_db.application_storage import ( + get_evidence_artifact_row, + upsert_evidence_artifact, + ) + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.minio.storage import ALLOWED_MIME_TYPES, StorageError + + role = _evidence_kind_to_role(kind) + if role is None: + raise HTTPException(status_code=400, detail="kind phải là research, textbook hoặc technical") + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để tải minh chứng.") + + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để lưu minh chứng trên máy chủ.") + + s3, bucket, _cfg_err = _load_minio_for_evidence() + if s3 is None or bucket is None: + raise HTTPException( + status_code=503, + detail="Lưu tệp MinIO chưa cấu hình. Đặt biến môi trường S3/MinIO hoặc xem tài liệu triển khai.", + ) + + import mimetypes + + filename_l = (file.filename or "").lower() + guessed, _ = mimetypes.guess_type(filename_l) + content_type = (file.content_type or "").split(";")[0].strip() or "application/octet-stream" + if content_type not in ALLOWED_MIME_TYPES and guessed in ALLOWED_MIME_TYPES: + content_type = guessed + if content_type not in ALLOWED_MIME_TYPES: + raise HTTPException(status_code=422, detail=f"Loại tệp không được phép: {content_type}") + + _max_evidence_bytes = 50 * 1024 * 1024 + if file.size is not None and int(file.size) > _max_evidence_bytes: + raise HTTPException( + status_code=413, + detail="Tệp vượt quá 50 MB. Hãy nén hoặc chia nhỏ tệp trước khi tải lên.", + ) + + async with get_session() as session: + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + ini = await resolve_initiative_for_draft_case_key(session, case_id) + if ini is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ (hãy lưu bản nháp trước).") + canonical_case = ini.case_code + if ini.owner_id != uid: + raise HTTPException(status_code=403, detail="Chỉ chủ sở hữu mới tải được minh chứng.") + st = str(ini.status or "") + if not _initiative_allows_owner_evidence_edit(st): + raise HTTPException( + status_code=422, + detail="Hồ sơ đã kết thúc duyệt — không cập nhật minh chứng.", + ) + + old_row = await get_evidence_artifact_row(session, initiative_id=ini.id, role=role) + prior_storage_key = ( + (old_row.storage_uri or "").strip() or None if old_row is not None else None + ) + + object_key = s3.build_key_for_initiative(ini.id, file.filename or "evidence.pdf") + try: + result = await s3.upload( + bucket=bucket, + key=object_key, + fileobj=file.file, + mime_type=content_type, + metadata={"uploaded_by": str(uid), "case_code": canonical_case, "role": role}, + ) + except ValueError as exc: + raise HTTPException(status_code=422, detail=str(exc)) from exc + except StorageError as exc: + raise HTTPException(status_code=502, detail=f"Không tải được lên kho: {exc}") from exc + + await upsert_evidence_artifact( + session, + initiative_id=ini.id, + role=role, + storage_uri=object_key, + original_name=file.filename, + byte_size=result.get("size"), + sha256_hex=result.get("sha256"), + uploaded_by=uid, + mime_type=content_type, + ) + + from src.auth_jwt import decode_bearer_token + from src.audit import AuditAction, jwt_payload_actor_email, record_audit + + ae, ar = jwt_payload_actor_email(decode_bearer_token(authorization)) + ev_action = AuditAction.update if old_row is not None else AuditAction.create + await record_audit( + session, + actor_user_id=uid, + actor_email=ae, + actor_role=ar, + action=ev_action, + entity_type="application_evidence", + entity_id=f"{canonical_case}:{role}", + before=( + {"storageKey": prior_storage_key} + if prior_storage_key + else None + ), + after={ + "storageKey": object_key, + "originalName": file.filename, + "mimeType": content_type, + }, + metadata={"minioBucket": bucket, "caseId": canonical_case, "evidenceRole": role}, + ) + await session.commit() + + if prior_storage_key and prior_storage_key != object_key: + try: + await s3.delete(bucket, prior_storage_key) + except StorageError as exc: + logger.warning( + "MinIO delete of replaced evidence object failed (DB already points to new key): %s", + exc, + ) + + k_label = _evidence_role_to_api_kind(role) + return { + "ok": True, + "caseId": canonical_case, + "kind": k_label, + "storageKey": object_key, + "originalName": file.filename, + "byteSize": result.get("size"), + "uploadedAt": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"), + } + + +@app.get("/api/v1/application-drafts/{case_id}/evidence") +async def get_application_draft_evidence( + case_id: str, + authorization: Optional[str] = Header(None), +): + """Metadata (và link tải có thời hạn) cho minh chứng 2.1 / 2.2. Chủ hồ sơ hoặc admin/hội đồng.""" + from src.initiative_db.application_storage import ( + get_evidence_artifact_row, + ) + from src.initiative_db.engine import get_session, is_postgres_enabled + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem minh chứng.") + + if not is_postgres_enabled(): + return {"research": None, "textbook": None, "technical": None} + + s3, bucket, _ = _load_minio_for_exports() + + async with get_session() as session: + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + ini = await resolve_initiative_for_draft_case_key(session, case_id) + if ini is None: + return {"research": None, "textbook": None, "technical": None} + is_staff = _is_staff_reviewer(authorization) + if ini.owner_id != uid and not is_staff: + raise HTTPException(status_code=403, detail="Không có quyền xem minh chứng hồ sơ này.") + + r_row = await get_evidence_artifact_row(session, initiative_id=ini.id, role="research_evidence") + t_row = await get_evidence_artifact_row(session, initiative_id=ini.id, role="textbook_evidence") + tech_row = await get_evidence_artifact_row(session, initiative_id=ini.id, role="technical_evidence") + + async def pack(row, kind_key: str): + if row is None: + return None + default_name = row.original_name or "evidence" + if not default_name.lower().endswith((".pdf", ".png", ".jpg", ".jpeg", ".docx", ".xlsx")): + mt = (row.mime_type or "").lower() + if "pdf" in mt: + default_name = f"{default_name}.pdf" + elif "word" in mt or "document" in mt: + default_name = f"{default_name}.docx" + out = { + "kind": kind_key, + "originalName": row.original_name, + "byteSize": row.byte_size, + "mimeType": row.mime_type, + "uploadedAt": row.uploaded_at.isoformat() if row.uploaded_at else None, + "storageKey": row.storage_uri, + "reviewStatus": row.review_status, + "reviewedAt": row.reviewed_at.isoformat() if row.reviewed_at else None, + } + if s3 and bucket and row.storage_uri: + try: + from src.minio.storage import settings as s3s + + out["downloadUrl"] = await s3.get_download_url( + bucket, + row.storage_uri, + ttl=s3s.s3_signed_url_ttl, + filename=default_name, + inline=False, + ) + if _evidence_row_looks_like_pdf(row, default_name): + out["viewUrl"] = await s3.get_download_url( + bucket, + row.storage_uri, + ttl=s3s.s3_signed_url_ttl, + filename=default_name, + inline=True, + response_content_type="application/pdf", + ) + else: + out["viewUrl"] = None + except Exception: + out["downloadUrl"] = None + out["viewUrl"] = None + return out + + return { + "research": await pack(r_row, "research"), + "textbook": await pack(t_row, "textbook"), + "technical": await pack(tech_row, "technical"), + } + + +@app.get("/api/v1/application-drafts/{case_id}/evidence/content") +async def stream_application_draft_evidence_content( + case_id: str, + request: Request, + kind: str = Query(..., description="research | textbook | technical"), + attachment: bool = Query( + False, + description="If true, Content-Disposition: attachment (download). Otherwise inline for embedding.", + ), + authorization: Optional[str] = Header(None), +): + """ + Stream evidence bytes through the API so browsers on HTTPS avoid mixed-content iframes + (presigned MinIO URLs are often http://HOST:MINIO_PORT). + Same ACL as GET …/evidence. + """ + from src.initiative_db.application_storage import ( + get_evidence_artifact_row, + ) + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.minio.storage import StorageError, _sanitize_filename as minio_safe_fn + + kinds = request.query_params.getlist("kind") + if len(kinds) > 1: + effective_kind: object = kinds + elif len(kinds) == 1: + effective_kind = kinds[0] + else: + effective_kind = kind + + role = _evidence_kind_to_role(effective_kind) + if role is None: + raise HTTPException(status_code=400, detail="kind phải là research, textbook hoặc technical") + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem minh chứng.") + + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để lấy minh chứng.") + + s3, bucket, _ = _load_minio_for_evidence() + if s3 is None or bucket is None: + raise HTTPException( + status_code=503, + detail="Lưu tệp MinIO chưa cấu hình. Đặt biến môi trường S3/MinIO hoặc xem tài liệu triển khai.", + ) + + async with get_session() as session: + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + ini = await resolve_initiative_for_draft_case_key(session, case_id) + if ini is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ.") + if ini.owner_id != uid and not _is_staff_reviewer(authorization): + raise HTTPException(status_code=403, detail="Không có quyền xem minh chứng hồ sơ này.") + row = await get_evidence_artifact_row(session, initiative_id=ini.id, role=role) + + if row is None or not (row.storage_uri or "").strip(): + raise HTTPException(status_code=404, detail="Không có tệp minh chứng cho loại này.") + + default_name = row.original_name or "evidence" + if not default_name.lower().endswith((".pdf", ".png", ".jpg", ".jpeg", ".docx", ".xlsx")): + mtguess = (row.mime_type or "").lower() + if "pdf" in mtguess: + default_name = f"{default_name}.pdf" + elif "word" in mtguess or "document" in mtguess: + default_name = f"{default_name}.docx" + + safe_fn = minio_safe_fn(default_name) or "file" + disp = "attachment" if attachment else "inline" + media_type = (row.mime_type or "").strip() or "application/octet-stream" + + try: + + async def body(): + async for chunk in s3.download_stream(bucket, row.storage_uri): + yield chunk + + return StreamingResponse( + body(), + media_type=media_type, + headers={ + "Content-Disposition": f'{disp}; filename="{safe_fn}"', + "Cache-Control": "private, no-store", + }, + ) + except FileNotFoundError: + raise HTTPException(status_code=404, detail="Tệp không còn trên kho lưu trữ.") from None + except StorageError as exc: + raise HTTPException(status_code=502, detail=f"Không đọc được tệp từ kho: {exc}") from exc + + +@app.delete("/api/v1/application-drafts/{case_id}/evidence") +async def delete_application_draft_evidence( + case_id: str, + request: Request, + kind: str = Query(..., description="research | textbook | technical"), + authorization: Optional[str] = Header(None), +): + from src.initiative_db.application_storage import ( + delete_evidence_artifact_row, + ) + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.minio.storage import StorageError + + kinds = request.query_params.getlist("kind") + if len(kinds) > 1: + effective_kind: object = kinds + elif len(kinds) == 1: + effective_kind = kinds[0] + else: + effective_kind = kind + role = _evidence_kind_to_role(effective_kind) + if role is None: + raise HTTPException(status_code=400, detail="kind phải là research, textbook hoặc technical") + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xóa minh chứng.") + + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL.") + + s3, bucket, _ = _load_minio_for_evidence() + async with get_session() as session: + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + ini = await resolve_initiative_for_draft_case_key(session, case_id) + if ini is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ.") + if ini.owner_id != uid: + raise HTTPException(status_code=403, detail="Chỉ chủ sở hữu mới xóa được minh chứng.") + st = str(ini.status or "") + if not _initiative_allows_owner_evidence_edit(st): + raise HTTPException( + status_code=422, + detail="Hồ sơ đã kết thúc duyệt — không xóa minh chứng.", + ) + + old = await delete_evidence_artifact_row(session, initiative_id=ini.id, role=role) + from src.auth_jwt import decode_bearer_token + from src.audit import AuditAction, jwt_payload_actor_email, record_audit + + if old is not None: + ae, ar = jwt_payload_actor_email(decode_bearer_token(authorization)) + await record_audit( + session, + actor_user_id=uid, + actor_email=ae, + actor_role=ar, + action=AuditAction.delete, + entity_type="application_evidence", + entity_id=f"{ini.case_code}:{role}", + before={ + "storageKey": old.storage_uri, + "originalName": old.original_name, + }, + metadata={"caseId": ini.case_code, "minioBucket": bucket}, + ) + if old and old.storage_uri and s3 and bucket: + try: + await s3.delete(bucket, old.storage_uri) + except StorageError as exc: + logger.warning("MinIO delete failed (continuing with DB row removed): %s", exc) + await session.commit() + + return {"ok": True, "kind": _evidence_role_to_api_kind(role)} + + +@app.get("/api/v1/application-drafts/{case_id}/official-form-layout") +async def get_application_draft_official_form_layout( + case_id: str, + authorization: Optional[str] = Header(None), +): + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.drafts import get_official_form_layout_payload + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem bố cục PDF.") + if not is_postgres_enabled(): + return {"caseId": _normalize_case_id(case_id), "layoutEdits": [], "pdf": None} + + s3, bucket, _ = _load_minio_for_evidence() + safe_case_id = _normalize_case_id(case_id) + async with get_session() as session: + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + ini = await resolve_initiative_for_draft_case_key(session, safe_case_id) + if ini is None: + return {"caseId": safe_case_id, "layoutEdits": [], "pdf": None} + if ini.owner_id != uid and not _is_staff_reviewer(authorization): + raise HTTPException(status_code=403, detail="Không có quyền xem bố cục hồ sơ này.") + payload = await get_official_form_layout_payload(session, ini.case_code) + + if not payload: + return {"caseId": safe_case_id, "layoutEdits": [], "pdf": None} + + edits = payload.get("layoutEdits") + if not isinstance(edits, list): + edits = [] + + pdf_meta: Optional[Dict[str, Any]] = None + storage_key = str(payload.get("storageKey") or "").strip() + if storage_key: + pdf_meta = { + "storageKey": storage_key, + "originalName": payload.get("originalName"), + "byteSize": payload.get("byteSize"), + "uploadedAt": payload.get("uploadedAt"), + "downloadUrl": None, + "viewUrl": None, + } + if s3 and bucket: + try: + from src.minio.storage import settings as s3s + + default_name = str(payload.get("originalName") or "official-form-layout.pdf") + pdf_meta["downloadUrl"] = await s3.get_download_url( + bucket, + storage_key, + ttl=s3s.s3_signed_url_ttl, + filename=default_name, + inline=False, + ) + pdf_meta["viewUrl"] = await s3.get_download_url( + bucket, + storage_key, + ttl=s3s.s3_signed_url_ttl, + filename=default_name, + inline=True, + response_content_type="application/pdf", + ) + except Exception: + pdf_meta["downloadUrl"] = None + pdf_meta["viewUrl"] = None + + return { + "caseId": safe_case_id, + "layoutEdits": edits, + "updatedAt": payload.get("updatedAt"), + "pdf": pdf_meta, + } + + +@app.post("/api/v1/application-drafts/{case_id}/official-form-layout") +async def save_application_draft_official_form_layout( + case_id: str, + layout_edits_json: str = Form(..., description="JSON array of PdfTextLayoutEdit"), + file: UploadFile = File(..., description="Edited official-form PDF"), + authorization: Optional[str] = Header(None), +): + from src.initiative_db.drafts import get_official_form_layout_payload, save_official_form_layout_payload + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.minio.storage import StorageError + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để lưu bố cục PDF.") + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để lưu bố cục PDF.") + s3, bucket, _cfg_err = _load_minio_for_exports() + if s3 is None or bucket is None: + raise HTTPException( + status_code=503, + detail="Lưu tệp MinIO chưa cấu hình. Đặt biến môi trường S3/MinIO hoặc xem tài liệu triển khai.", + ) + + safe_case_id = _normalize_case_id(case_id) + try: + raw_edits = json.loads(layout_edits_json) + except Exception as exc: + raise HTTPException(status_code=422, detail="layoutEdits JSON không hợp lệ.") from exc + normalized_edits = _normalize_pdf_layout_edits(raw_edits) + + content_type = (file.content_type or "").split(";")[0].strip().lower() or "application/octet-stream" + if content_type != "application/pdf": + raise HTTPException(status_code=422, detail=f"Chỉ nhận PDF, nhận được: {content_type}") + + async with get_session() as session: + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + ini = await resolve_initiative_for_draft_case_key(session, safe_case_id) + if ini is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ (hãy lưu bản nháp trước).") + if ini.owner_id != uid: + raise HTTPException(status_code=403, detail="Chỉ chủ sở hữu mới lưu bố cục PDF.") + st = str(ini.status or "") + if not _initiative_allows_owner_evidence_edit(st): + raise HTTPException(status_code=422, detail="Hồ sơ đã kết thúc duyệt — không cập nhật bố cục PDF.") + + existing_payload = await get_official_form_layout_payload(session, ini.case_code) + object_key = s3.build_key_for_initiative(ini.id, file.filename or "official-form-layout.pdf") + try: + upload = await s3.upload( + bucket=bucket, + key=object_key, + fileobj=file.file, + mime_type="application/pdf", + metadata={"uploaded_by": str(uid), "case_code": ini.case_code, "role": "official_form_layout_pdf"}, + ) + except ValueError as exc: + raise HTTPException(status_code=422, detail=str(exc)) from exc + except StorageError as exc: + raise HTTPException(status_code=502, detail=f"Không tải được PDF bố cục lên kho: {exc}") from exc + + now_iso = datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z") + payload = { + "storageKey": object_key, + "originalName": file.filename or "official-form-layout.pdf", + "mimeType": "application/pdf", + "byteSize": upload.get("size"), + "sha256": upload.get("sha256"), + "uploadedBy": str(uid), + "uploadedAt": now_iso, + "updatedAt": now_iso, + "layoutEdits": normalized_edits, + "layoutEditCount": len(normalized_edits), + } + await save_official_form_layout_payload( + session, + case_id=ini.case_code, + payload=payload, + owner_id=uid, + ) + await session.commit() + + prior_storage_key = str((existing_payload or {}).get("storageKey") or "").strip() + if prior_storage_key and prior_storage_key != object_key: + try: + await s3.delete(bucket, prior_storage_key) + except StorageError as exc: + logger.warning("MinIO delete of replaced official-form layout PDF failed: %s", exc) + + return { + "ok": True, + "caseId": safe_case_id, + "layoutEdits": normalized_edits, + "storageKey": object_key, + "byteSize": upload.get("size"), + "uploadedAt": payload["uploadedAt"], + } + + +class EvidenceReviewBody(BaseModel): + decision: Literal["approved", "rejected"] + + +@app.patch("/api/v1/application-drafts/{case_id}/evidence/review") +async def patch_evidence_review( + case_id: str, + request: Request, + kind: str = Query(..., description="research | textbook | technical"), + body: EvidenceReviewBody = Body(...), + authorization: Optional[str] = Header(None), +): + """ + Duyệt / từ chối minh chứng (chỉ admin hoặc hội đồng / editor). + """ + from src.initiative_db.application_storage import get_evidence_artifact_row, set_evidence_artifact_review + from src.initiative_db.engine import get_session, is_postgres_enabled + + if not _is_staff_reviewer(authorization): + raise HTTPException(status_code=403, detail="Chỉ quản trị hoặc hội đồng mới thẩm định minh chứng.") + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập.") + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL.") + + kinds = request.query_params.getlist("kind") + if len(kinds) > 1: + effective_kind: object = kinds + elif len(kinds) == 1: + effective_kind = kinds[0] + else: + effective_kind = kind + role = _evidence_kind_to_role(effective_kind) + if role is None: + raise HTTPException(status_code=400, detail="kind phải là research, textbook hoặc technical") + + async with get_session() as session: + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + ini = await resolve_initiative_for_draft_case_key(session, case_id) + if ini is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ.") + row = await get_evidence_artifact_row(session, initiative_id=ini.id, role=role) + if row is None: + raise HTTPException(status_code=404, detail="Chưa có tệp minh chứng cho loại này.") + before_review = { + "reviewStatus": row.review_status, + "reviewedBy": str(row.reviewed_by) if row.reviewed_by else None, + } + await set_evidence_artifact_review( + session, + initiative_id=ini.id, + role=role, + review_status=body.decision, + reviewer_id=uid, + ) + from src.auth_jwt import decode_bearer_token + from src.audit import AuditAction, jwt_payload_actor_email, record_audit + + ae, ar = jwt_payload_actor_email(decode_bearer_token(authorization)) + await record_audit( + session, + actor_user_id=uid, + actor_email=ae, + actor_role=ar, + action=AuditAction.update, + entity_type="application_evidence_review", + entity_id=f"{ini.case_code}:{role}", + before=before_review, + after={"reviewStatus": body.decision, "reviewedBy": str(uid)}, + metadata={"caseId": ini.case_code}, + ) + await session.commit() + + return {"ok": True, "kind": _evidence_role_to_api_kind(role), "decision": body.decision} + + +@app.get("/api/v1/initiatives/by-case/{case_id}/tab-snapshots") +async def list_initiative_tab_snapshots( + case_id: str, + tab: Optional[str] = Query( + None, description="Optional filter: report | application | contribution" + ), + limit: int = Query(20, ge=1, le=200), + authorization: Optional[str] = Header(None), +): + """List versioned tab payloads for an initiative (Postgres + migration 002). Owner-only.""" + from sqlalchemy import select + + from src.initiative_db.application_storage import list_tab_snapshots_for_case + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import Initiative + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem lịch sử tab.") + + if not is_postgres_enabled(): + return {"data": []} + + safe_case_id = _normalize_case_id(case_id) + try: + async with get_session() as session: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == safe_case_id)) + ).scalar_one_or_none() + if ini is None: + return {"data": []} + if ini.owner_id != uid: + raise HTTPException(status_code=403, detail="Không có quyền xem hồ sơ này.") + data = await list_tab_snapshots_for_case( + session, case_code=safe_case_id, tab=tab, limit=limit + ) + return {"data": data} + except HTTPException: + raise + except Exception: + logger.exception("GET tab-snapshots failed case=%s", safe_case_id) + raise HTTPException(status_code=500, detail="Không tải được lịch sử tab") from None + + +@app.get("/api/v1/initiatives/by-case/{case_id}/submit-snapshots") +async def list_initiative_submit_snapshots( + case_id: str, + limit: int = Query(10, ge=1, le=50), + authorization: Optional[str] = Header(None), +): + """Immutable submit snapshots for an initiative (Postgres + migration 002). Owner-only.""" + from sqlalchemy import select + + from src.initiative_db.application_storage import list_submit_snapshots_for_case + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import Initiative + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem lịch sử nộp.") + + if not is_postgres_enabled(): + return {"data": []} + + safe_case_id = _normalize_case_id(case_id) + try: + async with get_session() as session: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == safe_case_id)) + ).scalar_one_or_none() + if ini is None: + return {"data": []} + if ini.owner_id != uid: + raise HTTPException(status_code=403, detail="Không có quyền xem hồ sơ này.") + data = await list_submit_snapshots_for_case(session, case_code=safe_case_id, limit=limit) + return {"data": data} + except HTTPException: + raise + except Exception: + logger.exception("GET submit-snapshots failed case=%s", safe_case_id) + raise HTTPException(status_code=500, detail="Không tải được lịch sử nộp") from None + + +@app.post("/api/v1/review-documents") +async def create_review_document( + body: ReviewDocumentSaveRequest, + authorization: Optional[str] = Header(None), +): + """ + Persist ReviewPanel JSON bundle. + Primary payload is `officialBieuMau`; `templateData` / `fullBundle` are optional mirrors. + """ + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import save_review_document_bundle + + safe_case_id = _normalize_case_id(body.caseId) + req_case_id = _normalize_case_id(body.caseId) + if req_case_id != safe_case_id: + raise HTTPException(status_code=400, detail="caseId in path and body must match.") + owner_uid = _require_authenticated_user(authorization) + + if is_postgres_enabled(): + try: + async with get_session() as session: + await _assert_initiative_case_access(session, safe_case_id, owner_uid, authorization) + saved = await save_review_document_bundle( + session, + case_id=safe_case_id, + official_bieu_mau=body.officialBieuMau, + template_data=body.templateData, + full_bundle=body.fullBundle, + owner_user_id=owner_uid, + ) + return saved + except HTTPException: + raise + except Exception: + logger.exception("review-document save (PostgreSQL) failed case=%s", safe_case_id) + raise HTTPException(status_code=500, detail="Không lưu được JSON ReviewPanel") from None + + # file fallback + target_dir = APP_ROOT_DIR / "assets" / "review-documents" + target_dir.mkdir(parents=True, exist_ok=True) + payload = { + "caseId": safe_case_id, + "officialBieuMau": body.officialBieuMau, + "templateData": body.templateData, + "fullBundle": body.fullBundle, + "savedAt": datetime.utcnow().replace(microsecond=0).isoformat() + "Z", + } + with open(target_dir / f"{safe_case_id}.json", "w", encoding="utf-8") as handle: + json.dump(payload, handle, ensure_ascii=False, indent=2) + return payload + + +@app.get("/api/v1/review-documents") +async def list_review_documents( + caseId: str, + limit: int = Query(20, ge=1, le=200), + authorization: Optional[str] = Header(None), +): + """List review documents by case id (latest first).""" + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import list_review_document_bundles + + uid = _require_authenticated_user(authorization) + safe_case_id = _normalize_case_id(caseId) + + if is_postgres_enabled(): + try: + async with get_session() as session: + await _assert_initiative_case_access(session, safe_case_id, uid, authorization) + rows = await list_review_document_bundles(session, case_id=safe_case_id, limit=limit) + return {"data": rows} + except Exception: + logger.exception("review-document load (PostgreSQL) failed case=%s", safe_case_id) + raise HTTPException(status_code=500, detail="Không tải được JSON ReviewPanel") from None + + target_file = APP_ROOT_DIR / "assets" / "review-documents" / f"{safe_case_id}.json" + if not _is_staff_reviewer(authorization): + raise HTTPException(status_code=403, detail="Không có quyền xem hồ sơ này.") + if target_file.exists(): + try: + with open(target_file, "r", encoding="utf-8") as handle: + return {"data": [json.load(handle)]} + except Exception: + logger.exception("review-document file fallback load failed case=%s", safe_case_id) + raise HTTPException(status_code=500, detail="Không tải được JSON ReviewPanel") from None + return {"data": []} + + +@app.get("/api/v1/review-documents/{review_document_id}") +async def get_review_document_by_id( + review_document_id: str, authorization: Optional[str] = Header(None) +): + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import get_review_document_bundle_by_id + + uid = _require_authenticated_user(authorization) + + if is_postgres_enabled(): + try: + async with get_session() as session: + await _assert_review_document_access(session, review_document_id, uid, authorization) + row = await get_review_document_bundle_by_id( + session, review_document_id=review_document_id + ) + if row is None: + raise HTTPException(status_code=404, detail="Không tìm thấy review document") + return row + except HTTPException: + raise + except Exception: + logger.exception("review-document get by id failed id=%s", review_document_id) + raise HTTPException(status_code=500, detail="Không tải được review document") from None + raise HTTPException(status_code=501, detail="ID-based lookup requires PostgreSQL mode") + + +@app.put("/api/v1/review-documents/{review_document_id}") +async def update_review_document_by_id( + review_document_id: str, + body: ReviewDocumentUpdateRequest, + authorization: Optional[str] = Header(None), +): + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import update_review_document_bundle + + uid = _require_authenticated_user(authorization) + + if is_postgres_enabled(): + try: + async with get_session() as session: + await _assert_review_document_access(session, review_document_id, uid, authorization) + row = await update_review_document_bundle( + session, + review_document_id=review_document_id, + official_bieu_mau=body.officialBieuMau, + template_data=body.templateData, + full_bundle=body.fullBundle, + ) + if row is None: + raise HTTPException(status_code=404, detail="Không tìm thấy review document") + return row + except HTTPException: + raise + except Exception: + logger.exception("review-document update failed id=%s", review_document_id) + raise HTTPException(status_code=500, detail="Không cập nhật được review document") from None + raise HTTPException(status_code=501, detail="ID-based update requires PostgreSQL mode") + + +@app.delete("/api/v1/review-documents/{review_document_id}") +async def delete_review_document_by_id( + review_document_id: str, authorization: Optional[str] = Header(None) +): + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import delete_review_document_bundle + + uid = _require_authenticated_user(authorization) + + if is_postgres_enabled(): + try: + async with get_session() as session: + await _assert_review_document_access(session, review_document_id, uid, authorization) + ok = await delete_review_document_bundle( + session, review_document_id=review_document_id + ) + if not ok: + raise HTTPException(status_code=404, detail="Không tìm thấy review document") + return {"deleted": True, "id": review_document_id} + except HTTPException: + raise + except Exception: + logger.exception("review-document delete failed id=%s", review_document_id) + raise HTTPException(status_code=500, detail="Không xóa được review document") from None + raise HTTPException(status_code=501, detail="ID-based delete requires PostgreSQL mode") + + +# Backward-compatible endpoints (deprecated) +@app.post("/api/v1/applications/{case_id}/review-document") +async def save_review_document( + case_id: str, + body: ReviewDocumentSaveRequest, + authorization: Optional[str] = Header(None), +): + if _normalize_case_id(case_id) != _normalize_case_id(body.caseId): + raise HTTPException(status_code=400, detail="caseId in path and body must match.") + return await create_review_document(body, authorization) + + +@app.get("/api/v1/applications/{case_id}/review-document") +async def get_review_document(case_id: str): + rows = await list_review_documents(case_id, limit=1) + data = rows.get("data") if isinstance(rows, dict) else None + if isinstance(data, list) and data: + return data[0] + raise HTTPException(status_code=404, detail="Không tìm thấy JSON ReviewPanel") + + +@app.get("/api/v1/applications/{case_id}/review-document/be01-context") +async def get_review_document_be01_context(case_id: str): + """Convert latest official ReviewPanel JSON to be01 `data_blank.json` shape.""" + from src.be01.official_to_data_blank import official_to_data_blank + + row = await get_review_document(case_id) + official = row.get("officialBieuMau") if isinstance(row, dict) else {} + if not isinstance(official, dict) or not official: + raise HTTPException(status_code=404, detail="Không có officialBieuMau để chuyển đổi.") + return { + "caseId": str(row.get("caseId") or case_id), + "be01Context": official_to_data_blank(official), + } + + +class PreviewApplicationFormDocxRequest(BaseModel): + """Paired with `data_blank.json` after `official_to_data_blank(officialBieuMau)`.""" + + officialBieuMau: dict[str, Any] + + +@app.post("/api/v1/docx/preview-application-form") +async def preview_application_form_docx(body: PreviewApplicationFormDocxRequest): + """ + Render `template_application_form.docx` (Jinja2/docxtpl) and return a filled .docx. + Accepts the same `officialBieuMau` object produced on the « Xem lại » tab. + """ + from src.be01.official_to_data_blank import official_to_data_blank + from src.be01.fill_application_form import fill_application_form_docx + + try: + ctx = official_to_data_blank(body.officialBieuMau or {}) + except Exception as exc: # noqa: BLE001 + raise HTTPException( + status_code=400, detail="Không chuyển officialBieuMau sang dữ liệu biểu mẫu: " + str(exc) + ) from exc + try: + raw = fill_application_form_docx(ctx) + except ImportError as exc: + raise HTTPException( + status_code=501, detail="Thiếu thư viện docxtpl. Cài: pip install docxtpl" + ) from exc + except FileNotFoundError as exc: + raise HTTPException( + status_code=500, detail="Không tìm thấy file mẫu Word trên server: " + str(exc) + ) from exc + except Exception as exc: # noqa: BLE001 + raise HTTPException( + status_code=500, detail="Lỗi khi render mẫu Word: " + str(exc) + ) from exc + return Response( + content=raw, + media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document", + headers={"Content-Disposition": 'inline; filename="mau-don-va-bao-cao-xem-truoc.docx"'}, + ) + + +@app.post("/api/v1/docx/preview-application-form-pdf") +async def preview_application_form_pdf(body: PreviewApplicationFormDocxRequest): + """ + Same merge as `preview-application-form` (docxtpl), then LibreOffice → PDF so layout matches DOCX. + """ + from src.be01.docx_to_pdf import convert_docx_bytes_to_pdf + from src.be01.fill_application_form import fill_application_form_docx + from src.be01.official_to_data_blank import official_to_data_blank + + try: + ctx = official_to_data_blank(body.officialBieuMau or {}) + except Exception as exc: # noqa: BLE001 + raise HTTPException( + status_code=400, detail="Không chuyển officialBieuMau sang dữ liệu biểu mẫu: " + str(exc) + ) from exc + try: + docx_bytes = fill_application_form_docx(ctx) + except ImportError as exc: + raise HTTPException( + status_code=501, detail="Thiếu thư viện docxtpl. Cài: pip install docxtpl" + ) from exc + except FileNotFoundError as exc: + raise HTTPException( + status_code=500, detail="Không tìm thấy file mẫu Word trên server: " + str(exc) + ) from exc + except Exception as exc: # noqa: BLE001 + raise HTTPException( + status_code=500, detail="Lỗi khi render mẫu Word: " + str(exc) + ) from exc + try: + pdf_bytes = convert_docx_bytes_to_pdf( + docx_bytes, + relax_justified_softbreaks=True, + strip_table_row_heights=False, + ) + except FileNotFoundError as exc: + raise HTTPException( + status_code=501, + detail="Chưa cài LibreOffice để xuất PDF. Docker: thêm libreoffice-writer-nogui; " + "hoặc đặt LIBREOFFICE_PATH. Chi tiết: " + str(exc), + ) from exc + except (RuntimeError, ValueError, subprocess.TimeoutExpired) as exc: + raise HTTPException( + status_code=500, detail="Không chuyển DOCX sang PDF: " + str(exc) + ) from exc + return Response( + content=pdf_bytes, + media_type="application/pdf", + headers={"Content-Disposition": 'inline; filename="mau-ho-so-sang-kien.pdf"'}, + ) + + +@app.post("/api/v1/docx/convert-pdf") +async def convert_uploaded_docx_to_pdf( + file: UploadFile = File(...), + relax_justified_softbreaks: bool = Form(True), + strip_table_row_heights: bool = Form(False), +): + """ + Convert an uploaded `.docx` file to PDF using LibreOffice for near-Word layout fidelity. + """ + from src.be01.docx_to_pdf import convert_docx_bytes_to_pdf + + filename = (file.filename or "").strip() + if not filename: + raise HTTPException(status_code=400, detail="Thiếu tên file .docx.") + if not filename.lower().endswith(".docx"): + raise HTTPException(status_code=400, detail="Chỉ hỗ trợ file .docx.") + + try: + docx_bytes = await file.read() + except Exception as exc: # noqa: BLE001 + raise HTTPException(status_code=400, detail="Không đọc được nội dung file upload.") from exc + if not docx_bytes: + raise HTTPException(status_code=400, detail="File .docx rỗng.") + + try: + pdf_bytes = await asyncio.to_thread( + convert_docx_bytes_to_pdf, + docx_bytes, + relax_justified_softbreaks=relax_justified_softbreaks, + strip_table_row_heights=strip_table_row_heights, + ) + except FileNotFoundError as exc: + raise HTTPException( + status_code=501, + detail="Chưa cài LibreOffice để xuất PDF. Docker: thêm libreoffice-writer-nogui; " + "hoặc đặt LIBREOFFICE_PATH. Chi tiết: " + str(exc), + ) from exc + except (RuntimeError, ValueError, subprocess.TimeoutExpired) as exc: + raise HTTPException( + status_code=500, detail="Không chuyển DOCX sang PDF: " + str(exc) + ) from exc + + safe_stem = "".join(ch if ch.isalnum() or ch in ("-", "_") else "_" for ch in Path(filename).stem) + out_name = (safe_stem or "document") + ".pdf" + return Response( + content=pdf_bytes, + media_type="application/pdf", + headers={"Content-Disposition": f'inline; filename="{out_name}"'}, + ) + + +# --- Đơn sáng kiến: nộp PDF (người nộp) + danh sách cho admin « Danh Sách Sáng kiến » --- + +SUBMITTED_INITIATIVES_DIR = Path( + os.getenv( + "SUBMITTED_INITIATIVES_DIR", + str((APP_ROOT_DIR.parent / "fe0" / "public" / "submitted-initiatives").resolve()), + ) +) +SUBMITTED_INDEX_PATH = SUBMITTED_INITIATIVES_DIR / "index.json" +SUBMITTED_INITIATIVES_DIR.mkdir(parents=True, exist_ok=True) +app.mount( + "/submitted-initiatives", + StaticFiles(directory=str(SUBMITTED_INITIATIVES_DIR.resolve())), + name="submitted_initiatives", +) + + +def _load_submitted_items() -> List[Dict[str, Any]]: + SUBMITTED_INITIATIVES_DIR.mkdir(parents=True, exist_ok=True) + if not SUBMITTED_INDEX_PATH.exists(): + return [] + try: + with open(SUBMITTED_INDEX_PATH, "r", encoding="utf-8") as handle: + data = json.load(handle) + items = data.get("items") if isinstance(data, dict) else None + return list(items) if isinstance(items, list) else [] + except Exception: + logger.exception("Failed to load submitted initiatives index") + return [] + + +def _save_submitted_items(items: List[Dict[str, Any]]) -> None: + SUBMITTED_INITIATIVES_DIR.mkdir(parents=True, exist_ok=True) + with open(SUBMITTED_INDEX_PATH, "w", encoding="utf-8") as handle: + json.dump({"items": items}, handle, ensure_ascii=False, indent=2) + + +@app.post("/api/applications/submit") +async def submit_initiative_application( + file: UploadFile = File(...), + metadata: str = Form(""), + authorization: Optional[str] = Header(None), +): + """ + Nhận file PDF hồ sơ đầy đủ từ tab « Xem lại », lưu vào public/submitted-initiatives + và ghi vào index để GET /api/applications trả về cho admin. + """ + if not file.filename or not file.filename.lower().endswith(".pdf"): + raise HTTPException(status_code=400, detail="Yêu cầu file PDF (.pdf)") + + content = await file.read() + if not content or len(content) < 100: + raise HTTPException(status_code=400, detail="File PDF không hợp lệ hoặc quá nhỏ") + + try: + meta = json.loads(metadata) if metadata.strip() else {} + except json.JSONDecodeError: + meta = {} + + new_id = f"sub-{uuid.uuid4().hex[:16]}" + now = datetime.utcnow().replace(microsecond=0).isoformat() + "Z" + safe_name = f"{new_id}.pdf" + SUBMITTED_INITIATIVES_DIR.mkdir(parents=True, exist_ok=True) + pdf_path = SUBMITTED_INITIATIVES_DIR / safe_name + with open(pdf_path, "wb") as handle: + handle.write(content) + + initiative_name = (meta.get("initiativeName") or meta.get("name") or "").strip() or "Hồ sơ sáng kiến" + author_name = (meta.get("authorName") or "").strip() or "—" + author_email = (meta.get("authorEmail") or "").strip() or None + author_phone = (meta.get("authorPhone") or "").strip() or None + case_id = (meta.get("caseId") or "").strip() or None + + item: Dict[str, Any] = { + "id": new_id, + "submittedDate": now, + "name": initiative_name, + "author": { + "id": case_id or new_id, + "name": author_name, + "email": author_email, + "phone": author_phone, + }, + "subjectId": meta.get("subjectId") or "", + "groupId": meta.get("groupId") or "", + "status": "pending", + "reviewStatus": "not_reviewed", + "supervisor": None, + "reviewer": None, + "reviewDeadline": None, + "conference": None, + "topicType": str(meta.get("topicType") or "Hồ sơ PDF (đơn + báo cáo)"), + "files": { + "fullText": {"url": f"/submitted-initiatives/{safe_name}", "type": "pdf"}, + "abstract": None, + "poster": None, + }, + } + + public_url = f"/submitted-initiatives/{safe_name}" + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import ApplicationSubmitPersistError, save_submitted_application + from src.initiative_db.submission_readiness import ApplicationSubmissionNotReadyError + + owner_uid = decode_access_token_user_id(authorization) + + if is_postgres_enabled(): + try: + pdf_sha256 = hashlib.sha256(content).hexdigest() + pdf_len = len(content) + async with get_session() as session: + saved = await save_submitted_application( + session=session, + metadata=meta if isinstance(meta, dict) else {}, + file_url=public_url, + submission_id=new_id, + owner_user_id=owner_uid, + pdf_byte_size=pdf_len, + pdf_sha256=pdf_sha256, + pdf_original_name=safe_name, + pdf_body=content, + ) + logger.info("Submitted initiative PDF persisted in PostgreSQL path=%s", pdf_path) + return saved + except ApplicationSubmissionNotReadyError as exc: + try: + pdf_path.unlink(missing_ok=True) + except OSError: + pass + raise HTTPException( + status_code=400, + detail={ + "message": "Hồ sơ chưa đủ điều kiện nộp.", + "missing": exc.missing, + }, + ) from exc + except ApplicationSubmitPersistError as exc: + raise HTTPException( + status_code=503, + detail=str(exc), + ) from exc + except Exception: + logger.exception("application submission persist (PostgreSQL) failed; fallback to file index") + + items = _load_submitted_items() + items.insert(0, item) + _save_submitted_items(items) + + logger.info("Submitted initiative PDF id=%s path=%s", new_id, pdf_path) + return { + "id": new_id, + "submittedDate": now, + "publicUrl": public_url, + "name": initiative_name, + } + + +@app.post("/api/applications/new") +async def create_submitted_application( + body: CreateSubmittedApplicationBody, + authorization: Optional[str] = Header(None), +): + """Create a new submitted-application shell row and return generated application id.""" + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import User + from src.initiative_db.submissions import create_submitted_application_shell + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để tạo hồ sơ mới.") + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Tính năng này yêu cầu PostgreSQL.") + + try: + async with get_session() as session: + user = await session.get(User, uid) + row = await create_submitted_application_shell( + session=session, + owner_user_id=uid, + name=(body.name or "").strip() or None, + author_name=(str(user.full_name).strip() if user and user.full_name else None), + author_email=(str(user.email).strip() if user and user.email else None), + author_phone=(str(user.phone).strip() if user and user.phone else None), + ) + return {"id": str(row.get("id") or ""), "application": row} + except HTTPException: + raise + except Exception: + logger.exception("POST /api/applications/new failed") + raise HTTPException(status_code=500, detail="Không thể tạo hồ sơ mới.") from None + + +def _get_application_from_file_index(application_id: str) -> Optional[Dict[str, Any]]: + for row in _load_submitted_items(): + if str(row.get("id")) == application_id: + return row + return None + + +@app.get("/api/applications/mine") +async def list_my_applications(authorization: Optional[str] = Header(None)): + """Submitted applications for the logged-in applicant (Postgres + optional file fallback).""" + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import User + from src.initiative_db.submissions import list_my_submitted_applications + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem hồ sơ của bạn.") + + if is_postgres_enabled(): + try: + async with get_session() as session: + user = await session.get(User, uid) + email = str(user.email) if user is not None else "" + data = await list_my_submitted_applications(session, uid, email) + return {"data": data} + except HTTPException: + raise + except Exception: + logger.exception("GET /api/applications/mine (PostgreSQL) failed") + raise HTTPException(status_code=500, detail="Không tải được danh sách hồ sơ") from None + + payload = decode_bearer_token(authorization) + token_email = str((payload or {}).get("email") or "").strip().lower() + items = _load_submitted_items() + filtered: List[Dict[str, Any]] = [] + for row in items: + auth_em = str((row.get("author") or {}).get("email") or "").strip().lower() + if token_email and auth_em == token_email: + filtered.append(row) + filtered.sort(key=lambda x: str(x.get("submittedDate") or ""), reverse=True) + for row in filtered: + sd = str(row.get("submittedDate") or "") + if len(sd) >= 4 and sd[:4].isdigit(): + row["calendarYear"] = int(sd[:4]) + return {"data": filtered} + + +async def _enrich_application_detail_full_pdf_presign(session, row: Dict[str, Any]) -> None: + """If full PDF artifact is stored as a MinIO exports key, add files.fullText.viewUrl for admins.""" + from sqlalchemy import select + + from src.initiative_db.models import ApplicationArtifact, Initiative + + case = str(row.get("draft_case_id") or "").strip() + if not case: + return + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case)) + ).scalar_one_or_none() + if ini is None: + return + art = ( + await session.execute( + select(ApplicationArtifact).where( + ApplicationArtifact.initiative_id == ini.id, + ApplicationArtifact.role == "full_pdf", + ) + ) + ).scalar_one_or_none() + if art is None or not (art.storage_uri or "").strip(): + return + uri = (art.storage_uri or "").strip() + if uri.startswith("/submitted-initiatives") or uri.startswith(("http://", "https://")): + return + try: + from src.minio.storage import S3Storage, settings as s3s + + s3 = S3Storage() + bucket = s3s.s3_bucket_exports + view_url = await s3.get_download_url( + bucket, + uri, + ttl=3600, + filename=(art.original_name or "ho-so.pdf"), + inline=True, + response_content_type="application/pdf", + ) + except Exception: + logger.warning("Presigned URL for submitted full PDF failed (case=%s)", case, exc_info=True) + return + files = row.setdefault("files", {}) + ft = files.get("fullText") + merged = dict(ft) if isinstance(ft, dict) else {} + merged["viewUrl"] = view_url + merged["storageKey"] = uri + files["fullText"] = merged + + +@app.get("/api/applications/export") +async def export_applications_excel( + authorization: Optional[str] = Header(None), + page: int = 1, + pageSize: int = 20, + name: str = "", + authorName: str = "", + reviewerName: str = "", + status: str = "", + reviewStatus: str = "", + dateFrom: str = "", + dateTo: str = "", + sortBy: str = "submittedDate", + sortOrder: str = "desc", + lifecycle: str = "", +): + """ + Xuất Excel danh sách sáng kiến (cùng bộ lọc / sắp xếp với GET /api/applications). + Cột TT & MSSK: «YYYY-n» (n tăng theo từng năm trong bản xuất). Chỉ tài khoản admin. + """ + _require_admin_user(authorization) + from src.be01.export_applications_list_xlsx import build_applications_list_xlsx + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import submitted_applications_pairs_for_export + + _ = page, pageSize # client gửi cùng query với danh sách; không phân trang khi xuất + + if is_postgres_enabled(): + try: + async with get_session() as session: + pairs = await submitted_applications_pairs_for_export( + session, + name=name, + author_name=authorName, + reviewer_name=reviewerName, + status=status, + review_status=reviewStatus, + date_from=dateFrom, + date_to=dateTo, + sort_by=sortBy, + sort_order=sortOrder, + lifecycle=lifecycle, + ) + except Exception: + logger.exception("GET /api/applications/export (PostgreSQL) failed") + raise HTTPException( + status_code=503, + detail="Không thể xuất Excel từ cơ sở dữ liệu. Vui lòng thử lại sau.", + ) from None + else: + items = _load_submitted_items() + lc = (lifecycle or "").strip().lower() + + def match(row: Dict[str, Any], *, skip_status: bool = False) -> bool: + row_status = str(row.get("status") or "") + if lc == "inbox": + if row_status in ("approved", "rejected"): + return False + elif lc == "decided": + if row_status not in ("approved", "rejected"): + return False + n = name.strip().lower() + if n and n not in str(row.get("name") or "").lower(): + return False + an = authorName.strip().lower() + auth = row.get("author") or {} + if an and an not in str(auth.get("name") or "").lower(): + return False + rn = reviewerName.strip().lower() + if rn: + rev = row.get("reviewer") or {} + if rn not in str(rev.get("name") or "").lower(): + return False + if not skip_status and status and row_status != status: + return False + if reviewStatus and str(row.get("reviewStatus") or "") != reviewStatus: + return False + sd = row.get("submittedDate") + if dateFrom and sd: + sd_day = str(sd)[:10] + if len(sd_day) == 10 and sd_day < dateFrom: + return False + if dateTo and sd: + sd_day = str(sd)[:10] + if len(sd_day) == 10 and sd_day > dateTo: + return False + return True + + filtered = [x for x in items if match(x, skip_status=False)] + reverse = sortOrder != "asc" + if sortBy == "name": + filtered.sort(key=lambda x: str(x.get("name") or ""), reverse=reverse) + elif sortBy == "author": + filtered.sort( + key=lambda x: str((x.get("author") or {}).get("name") or ""), + reverse=reverse, + ) + else: + filtered.sort(key=lambda x: str(x.get("submittedDate") or ""), reverse=reverse) + pairs = [(r, {}) for r in filtered] + + body = build_applications_list_xlsx(pairs) + safe_fn = f"sang-kien-export-{datetime.now(timezone.utc).strftime('%Y%m%d-%H%M')}.xlsx" + return Response( + content=body, + media_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", + headers={"Content-Disposition": f'attachment; filename="{safe_fn}"'}, + ) + + +BULK_APPLICATION_BACKUPS_MAX = 250 + + +@app.get("/api/applications/export-backups") +async def export_applications_backups_bundle( + authorization: Optional[str] = Header(None), + page: int = 1, + pageSize: int = 20, + name: str = "", + authorName: str = "", + reviewerName: str = "", + status: str = "", + reviewStatus: str = "", + dateFrom: str = "", + dateTo: str = "", + sortBy: str = "submittedDate", + sortOrder: str = "desc", + lifecycle: str = "", +): + """ + Admin-only: một file ZIP chứa từng file ZIP sao lưu hồ sơ (cùng bộ lọc / sắp xếp với Xuất Excel). + """ + from sqlalchemy import desc, select + + from src.audit import AuditAction, record_audit, resolve_actor_fields + from src.initiative_db.application_backup import build_backup_zipstream + from src.initiative_db.backup_naming import backup_zip_attachment_filename + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import ApplicationArtifact, ApplicationReviewDocument, User + from src.initiative_db.submissions import ( + _as_review_document_row, + resolve_submitted_initiative_for_backup, + submitted_applications_pairs_for_export, + ) + from src.minio.storage import _sanitize_filename, settings as s3_settings + from zipstream import ZipStream + + admin_uid = _require_admin_user(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Sao lưu yêu cầu PostgreSQL.") + _ = page, pageSize + + outer = ZipStream() + used_inner_names: set[str] = set() + + def _unique_member_name(safe_fn: str) -> str: + base = safe_fn or "backup.zip" + if base not in used_inner_names: + used_inner_names.add(base) + return base + stem = base[:-4] if base.lower().endswith(".zip") else base + n = 2 + while True: + cand = f"{stem}_{n}.zip" + if cand not in used_inner_names: + used_inner_names.add(cand) + return cand + n += 1 + + packed_ids: List[str] = [] + async with get_session() as session: + try: + pairs = await submitted_applications_pairs_for_export( + session, + name=name, + author_name=authorName, + reviewer_name=reviewerName, + status=status, + review_status=reviewStatus, + date_from=dateFrom, + date_to=dateTo, + sort_by=sortBy, + sort_order=sortOrder, + lifecycle=lifecycle, + ) + except Exception: + logger.exception("GET /api/applications/export-backups (list) failed") + raise HTTPException( + status_code=503, + detail="Không thể tải danh sách hồ sơ để đóng gói sao lưu.", + ) from None + + if len(pairs) > BULK_APPLICATION_BACKUPS_MAX: + raise HTTPException( + status_code=413, + detail=f"Quá nhiều hồ sơ ({len(pairs)}). Tối đa {BULK_APPLICATION_BACKUPS_MAX} — vui lòng thu hẹp bộ lọc.", + ) + + for row, _payload in pairs: + application_id = str(row.get("id") or "").strip() + if not application_id: + continue + resolved = await resolve_submitted_initiative_for_backup(session, application_id) + if resolved is None: + continue + initiative, public_id = resolved + + arts = ( + await session.execute( + select(ApplicationArtifact).where(ApplicationArtifact.initiative_id == initiative.id) + ) + ).scalars().all() + + rd = ( + await session.execute( + select(ApplicationReviewDocument) + .where(ApplicationReviewDocument.initiative_id == initiative.id) + .order_by(desc(ApplicationReviewDocument.document_version)) + .limit(1) + ) + ).scalar_one_or_none() + review_json = _as_review_document_row(rd) if rd is not None else None + + owner = await session.get(User, initiative.owner_id) + inner_fn = backup_zip_attachment_filename( + owner_email=owner.email if owner is not None else None, + owner_full_name=owner.full_name if owner is not None else None, + public_application_id=public_id, + ) + member_name = _unique_member_name(inner_fn) + + inner_z = build_backup_zipstream( + settings=s3_settings, + initiative=initiative, + application_id=public_id, + case_code=initiative.case_code, + artifacts=list(arts), + review_doc_json=review_json, + owner_id=str(initiative.owner_id), + submitted_at=initiative.submitted_at.isoformat() + if initiative.submitted_at is not None + else None, + ) + outer.add(iter(inner_z), member_name) + packed_ids.append(public_id) + + if not packed_ids: + raise HTTPException( + status_code=404, + detail="Không có hồ sơ nào khớp bộ lọc để đóng gói sao lưu.", + ) + + actor_email, actor_role = await resolve_actor_fields(session, admin_uid) + await record_audit( + session, + actor_user_id=admin_uid, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.read, + entity_type="application_backup_bulk", + entity_id="bulk", + metadata={ + "outcome": "nested_zip_stream", + "packed_count": len(packed_ids), + "application_ids": packed_ids[:100], + "truncated_ids": len(packed_ids) > 100, + }, + ) + await session.commit() + + outer_fn = _sanitize_filename( + f"sang-kien-sao-luu-tong-hop-{datetime.now(timezone.utc).strftime('%Y%m%d-%H%M')}.zip" + ) or "sang-kien-sao-luu-tong-hop.zip" + return StreamingResponse( + outer, + media_type="application/zip", + headers={"Content-Disposition": f'attachment; filename="{outer_fn}"'}, + ) + + +@app.get("/api/applications/{application_id}") +async def get_application( + application_id: str, authorization: Optional[str] = Header(None) +): + """Single submitted application for review page (admin / applicant deep link).""" + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import User + from src.initiative_db.submissions import ( + _applicant_may_mutate_row, + _as_submission_item, + _resolve_initiative_and_latest_draft_for_application_id, + get_application_by_id, + ) + + uid = _require_authenticated_user(authorization) + + if is_postgres_enabled(): + try: + async with get_session() as session: + if not _is_staff_reviewer(authorization): + try: + initiative, draft = await _resolve_initiative_and_latest_draft_for_application_id( + session, application_id + ) + except LookupError as exc: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ") from exc + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + row = _as_submission_item(initiative, payload) + user = await session.get(User, uid) + email = str(user.email) if user is not None else "" + if not _applicant_may_mutate_row(initiative, row, uid, email): + raise HTTPException(status_code=403, detail="Không có quyền xem hồ sơ này.") + row = await get_application_by_id(session, application_id) + if row is not None: + await _enrich_application_detail_full_pdf_presign(session, row) + return row + except HTTPException: + raise + except Exception: + logger.exception("application detail query (PostgreSQL) failed; refusing file index fallback while DB is configured") + raise HTTPException( + status_code=503, + detail="Không thể tải hồ sơ từ cơ sở dữ liệu. Vui lòng thử lại sau hoặc liên hệ quản trị.", + ) from None + + if not _is_staff_reviewer(authorization): + payload = decode_bearer_token(authorization) + token_email = str((payload or {}).get("email") or "").strip().lower() + row_check = _get_application_from_file_index(application_id) + if row_check is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ") + auth_em = str((row_check.get("author") or {}).get("email") or "").strip().lower() + if not token_email or auth_em != token_email: + raise HTTPException(status_code=403, detail="Không có quyền xem hồ sơ này.") + + row = _get_application_from_file_index(application_id) + if row is not None: + return row + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ") + + +@app.get("/api/applications/{application_id}/backup") +async def download_application_backup( + application_id: str, + authorization: Optional[str] = Header(None), +): + """ + Admin-only: stream a ZIP (manifest + submitted PDF + official DOCX/PDF + evidence + optional review JSON). + """ + from sqlalchemy import desc, select + + from src.audit import AuditAction, record_audit, resolve_actor_fields + from src.initiative_db.application_backup import build_backup_zipstream + from src.initiative_db.backup_naming import backup_zip_attachment_filename + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import ApplicationArtifact, ApplicationReviewDocument, User + from src.initiative_db.submissions import _as_review_document_row, resolve_submitted_initiative_for_backup + from src.minio.storage import settings as s3_settings + + admin_uid = _require_admin_user(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Sao lưu yêu cầu PostgreSQL.") + + async with get_session() as session: + resolved = await resolve_submitted_initiative_for_backup(session, application_id) + if resolved is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ.") + initiative, public_id = resolved + + arts = ( + await session.execute( + select(ApplicationArtifact).where(ApplicationArtifact.initiative_id == initiative.id) + ) + ).scalars().all() + + rd = ( + await session.execute( + select(ApplicationReviewDocument) + .where(ApplicationReviewDocument.initiative_id == initiative.id) + .order_by(desc(ApplicationReviewDocument.document_version)) + .limit(1) + ) + ).scalar_one_or_none() + review_json = _as_review_document_row(rd) if rd is not None else None + + actor_email, actor_role = await resolve_actor_fields(session, admin_uid) + await record_audit( + session, + actor_user_id=admin_uid, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.read, + entity_type="application_backup", + entity_id=public_id, + metadata={ + "outcome": "streaming_zip", + "initiative_id": str(initiative.id), + "case_code": initiative.case_code, + "artifact_count": len(arts), + }, + ) + await session.commit() + + owner = await session.get(User, initiative.owner_id) + safe_fn = backup_zip_attachment_filename( + owner_email=owner.email if owner is not None else None, + owner_full_name=owner.full_name if owner is not None else None, + public_application_id=public_id, + ) + + z = build_backup_zipstream( + settings=s3_settings, + initiative=initiative, + application_id=public_id, + case_code=initiative.case_code, + artifacts=list(arts), + review_doc_json=review_json, + owner_id=str(initiative.owner_id), + submitted_at=initiative.submitted_at.isoformat() + if initiative.submitted_at is not None + else None, + ) + return StreamingResponse( + z, + media_type="application/zip", + headers={"Content-Disposition": f'attachment; filename="{safe_fn}"'}, + ) + + +@app.put("/api/applications/{application_id}") +async def update_submitted_application( + application_id: str, + body: UpdateSubmittedApplicationBody, + authorization: Optional[str] = Header(None), +): + """Update name and submitted date for the applicant's own submission (Postgres or file index).""" + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import User + from src.initiative_db.submissions import _parse_submitted_date_input, update_my_submitted_application + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để cập nhật hồ sơ.") + + if is_postgres_enabled(): + try: + async with get_session() as session: + user = await session.get(User, uid) + email = str(user.email) if user is not None else "" + return await update_my_submitted_application( + session, uid, email, application_id, body.name, body.submittedDate + ) + except LookupError: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ") + except PermissionError: + raise HTTPException(status_code=403, detail="Không có quyền cập nhật hồ sơ này.") + except HTTPException: + raise + except Exception: + logger.exception("PUT /api/applications (PostgreSQL) failed") + raise HTTPException(status_code=500, detail="Không thể cập nhật hồ sơ") from None + + payload = decode_bearer_token(authorization) + token_email = str((payload or {}).get("email") or "").strip().lower() + items = _load_submitted_items() + idx = next((i for i, r in enumerate(items) if str(r.get("id")) == application_id), None) + if idx is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ") + row = items[idx] + auth_em = str((row.get("author") or {}).get("email") or "").strip().lower() + if not token_email or auth_em != token_email: + raise HTTPException(status_code=403, detail="Không có quyền cập nhật hồ sơ này.") + try: + dt = _parse_submitted_date_input(body.submittedDate) + iso = dt.astimezone(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z") + except Exception: + raise HTTPException(status_code=400, detail="Ngày nộp không hợp lệ.") + updated = {**row, "name": body.name.strip(), "submittedDate": iso} + if len(iso) >= 4 and iso[:4].isdigit(): + updated["calendarYear"] = int(iso[:4]) + items[idx] = updated + _save_submitted_items(items) + return items[idx] + + +@app.delete("/api/applications/{application_id}") +async def delete_submitted_application( + application_id: str, + authorization: Optional[str] = Header(None), +): + """Delete the applicant's own submission (Postgres cascade; file index removes row + PDF if present).""" + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import User + from src.initiative_db.submissions import delete_my_submitted_application + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xóa hồ sơ.") + + if is_postgres_enabled(): + try: + async with get_session() as session: + user = await session.get(User, uid) + email = str(user.email) if user is not None else "" + await delete_my_submitted_application(session, uid, email, application_id) + return {"deleted": True, "id": application_id} + except LookupError: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ") + except PermissionError: + raise HTTPException(status_code=403, detail="Không có quyền xóa hồ sơ này.") + except HTTPException: + raise + except Exception: + logger.exception("DELETE /api/applications (PostgreSQL) failed") + raise HTTPException(status_code=500, detail="Không thể xóa hồ sơ") from None + + payload = decode_bearer_token(authorization) + token_email = str((payload or {}).get("email") or "").strip().lower() + items = _load_submitted_items() + idx = next((i for i, r in enumerate(items) if str(r.get("id")) == application_id), None) + if idx is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ") + row = items[idx] + auth_em = str((row.get("author") or {}).get("email") or "").strip().lower() + if not token_email or auth_em != token_email: + raise HTTPException(status_code=403, detail="Không có quyền xóa hồ sơ này.") + + files = row.get("files") or {} + ft = files.get("fullText") if isinstance(files.get("fullText"), dict) else None + url = str((ft or {}).get("url") or "") if ft else "" + if url.startswith("/submitted-initiatives/"): + fname = url.replace("/submitted-initiatives/", "").lstrip("/") + safe = "".join(c for c in fname if c.isalnum() or c in ("-", "_", ".")) + if safe: + pdf_path = SUBMITTED_INITIATIVES_DIR / safe + try: + if pdf_path.is_file(): + pdf_path.unlink() + except OSError: + logger.warning("Could not delete PDF file %s", pdf_path) + + items.pop(idx) + _save_submitted_items(items) + return {"deleted": True, "id": application_id} + + +@app.get("/api/conferences") +async def list_conference_filter_options(): + """ + Distinct hội nghị / đợt (from ``application_workflow.conference``) for council/admin list filters. + Returns a list of ``{id, name}`` objects for ``useLookupQuery`` / ``toOptionList``. + """ + from sqlalchemy import select + + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import ApplicationWorkflow + + if not is_postgres_enabled(): + return [] + try: + async with get_session() as session: + rows = (await session.execute(select(ApplicationWorkflow))).scalars().all() + seen: dict[str, dict] = {} + for wf in rows: + c = wf.conference + if not isinstance(c, dict): + continue + cid = str(c.get("id") or "").strip() + cname = str(c.get("name") or "").strip() + if not cname: + continue + opt_id = cid if cid else f"__name__:{cname}" + if opt_id not in seen: + seen[opt_id] = {"id": opt_id, "name": cname} + return sorted(seen.values(), key=lambda x: (str(x.get("name") or "").lower(), str(x.get("id") or ""))) + except Exception: + logger.exception("GET /api/conferences failed") + return [] + + +@app.get("/api/supervisors") +async def list_supervisor_filter_options(): + """Distinct supervisors from ``application_workflow.supervisor`` for dashboard filters.""" + from sqlalchemy import select + + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.models import ApplicationWorkflow + + if not is_postgres_enabled(): + return [] + try: + async with get_session() as session: + rows = (await session.execute(select(ApplicationWorkflow))).scalars().all() + seen: dict[str, dict] = {} + for wf in rows: + s = wf.supervisor + if not isinstance(s, dict): + continue + sid = str(s.get("id") or "").strip() + sname = str(s.get("name") or s.get("fullName") or "").strip() + if not sname: + continue + opt_id = sid if sid else f"__name__:{sname}" + if opt_id not in seen: + seen[opt_id] = {"id": opt_id, "name": sname} + return sorted(seen.values(), key=lambda x: (str(x.get("name") or "").lower(), str(x.get("id") or ""))) + except Exception: + logger.exception("GET /api/supervisors failed") + return [] + + +@app.get("/api/applications") +async def list_applications( + page: int = 1, + pageSize: int = 20, + name: str = "", + authorName: str = "", + reviewerName: str = "", + status: str = "", + reviewStatus: str = "", + dateFrom: str = "", + dateTo: str = "", + sortBy: str = "submittedDate", + sortOrder: str = "desc", + lifecycle: str = "", + authorization: Optional[str] = Header(None), +): + """ + Danh sách hồ sơ đã nộp (PDF) cho dashboard admin / hội đồng. + Dữ liệu lưu cục bộ qua POST /api/applications/submit. + """ + _require_staff_reviewer(authorization) + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.submissions import list_submitted_applications + + page = max(1, page) + page_size = max(1, min(100, pageSize)) + + if is_postgres_enabled(): + try: + async with get_session() as session: + return await list_submitted_applications( + session=session, + page=page, + page_size=page_size, + name=name, + author_name=authorName, + reviewer_name=reviewerName, + status=status, + review_status=reviewStatus, + date_from=dateFrom, + date_to=dateTo, + sort_by=sortBy, + sort_order=sortOrder, + lifecycle=lifecycle, + ) + except Exception: + logger.exception("application list query (PostgreSQL) failed; refusing file index fallback while DB is configured") + raise HTTPException( + status_code=503, + detail="Không thể tải danh sách hồ sơ từ cơ sở dữ liệu. Vui lòng thử lại sau hoặc liên hệ quản trị.", + ) from None + + items = _load_submitted_items() + + lc = (lifecycle or "").strip().lower() + + def match(row: Dict[str, Any], *, skip_status: bool = False) -> bool: + row_status = str(row.get("status") or "") + if lc == "inbox": + if row_status in ("approved", "rejected"): + return False + elif lc == "decided": + if row_status not in ("approved", "rejected"): + return False + n = name.strip().lower() + if n and n not in str(row.get("name") or "").lower(): + return False + an = authorName.strip().lower() + auth = row.get("author") or {} + if an and an not in str(auth.get("name") or "").lower(): + return False + rn = reviewerName.strip().lower() + if rn: + rev = row.get("reviewer") or {} + if rn not in str(rev.get("name") or "").lower(): + return False + if not skip_status and status and row_status != status: + return False + if reviewStatus and str(row.get("reviewStatus") or "") != reviewStatus: + return False + sd = row.get("submittedDate") + if dateFrom and sd: + sd_day = str(sd)[:10] + if len(sd_day) == 10 and sd_day < dateFrom: + return False + if dateTo and sd: + sd_day = str(sd)[:10] + if len(sd_day) == 10 and sd_day > dateTo: + return False + return True + + filtered_for_counts = [x for x in items if match(x, skip_status=True)] + status_counts = { + "approved": sum(1 for x in filtered_for_counts if str(x.get("status") or "") == "approved"), + "rejected": sum(1 for x in filtered_for_counts if str(x.get("status") or "") == "rejected"), + } + filtered = [x for x in items if match(x, skip_status=False)] + + reverse = sortOrder != "asc" + if sortBy == "name": + filtered.sort(key=lambda x: str(x.get("name") or ""), reverse=reverse) + elif sortBy == "author": + filtered.sort( + key=lambda x: str((x.get("author") or {}).get("name") or ""), + reverse=reverse, + ) + else: + filtered.sort(key=lambda x: str(x.get("submittedDate") or ""), reverse=reverse) + + total = len(filtered) + start = (page - 1) * page_size + page_data = filtered[start : start + page_size] + total_pages = max(1, (total + page_size - 1) // page_size) if total else 1 + + return { + "data": page_data, + "pagination": { + "page": page, + "pageSize": page_size, + "totalItems": total, + "totalPages": total_pages, + }, + "statusCounts": status_counts, + } + + +@app.get("/api/applications/{application_id}/admin-result") +async def get_application_admin_result( + application_id: str, + authorization: Optional[str] = Header(None), +): + """READ kết quả Duyệt/Từ chối do quản trị ghi nhận (theo ``applicationId``).""" + _require_admin_user(authorization) + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.application_admin_results import get_admin_result_for_application + + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để đọc kết quả.") + async with get_session() as session: + row = await get_admin_result_for_application(session, application_id) + if row is None: + raise HTTPException(status_code=404, detail="Chưa có kết quả cho mã hồ sơ này.") + return row + + +@app.get("/api/notifications") +async def api_list_notifications( + page: int = 1, + pageSize: int = 20, + authorization: Optional[str] = Header(None), +): + """In-app inbox for the current user (applicant receives rows after admin adjudication).""" + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem thông báo.") + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.user_notifications import list_notifications_for_user + + if not is_postgres_enabled(): + return { + "data": [], + "pagination": {"page": 1, "pageSize": max(1, min(100, pageSize)), "totalItems": 0, "totalPages": 1}, + } + async with get_session() as session: + return await list_notifications_for_user(session, uid, page=page, page_size=pageSize) + + +@app.get("/api/notifications/unread-count") +async def api_notifications_unread_count(authorization: Optional[str] = Header(None)): + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem thông báo.") + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.user_notifications import count_unread_notifications + + if not is_postgres_enabled(): + return {"count": 0} + async with get_session() as session: + n = await count_unread_notifications(session, uid) + return {"count": n} + + +@app.patch("/api/notifications/{notification_id}/read") +async def api_mark_notification_read( + notification_id: str, + authorization: Optional[str] = Header(None), +): + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để xem thông báo.") + try: + nid = uuid.UUID(notification_id.strip()) + except ValueError: + raise HTTPException(status_code=404, detail="Không tìm thấy thông báo.") from None + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.user_notifications import mark_notification_read + + if not is_postgres_enabled(): + raise HTTPException(status_code=404, detail="Không tìm thấy thông báo.") + async with get_session() as session: + ok = await mark_notification_read(session, uid, nid) + if not ok: + raise HTTPException(status_code=404, detail="Không tìm thấy thông báo.") + return {"ok": True} + + +@app.post("/api/applications/{application_id}/admin-result") +async def create_application_admin_result( + application_id: str, + body: AdminApplicationResultBody, + authorization: Optional[str] = Header(None), +): + """CREATE kết quả — đồng bộ ``initiatives.status`` (approved / rejected).""" + admin_uid = _require_admin_user(authorization) + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.application_admin_results import create_admin_result + + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để lưu kết quả.") + from src.initiative_db.user_notifications import best_effort_notify_applicant_after_admin_decision + + result: Optional[Dict[str, Any]] = None + try: + async with get_session() as session: + result = await create_admin_result( + session, + application_id, + admin_uid, + decision=body.decision, + feedback=body.feedback, + rationale=body.rationale, + ) + except LookupError: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ đã nộp với mã đã chọn.") from None + except ValueError as exc: + msg = str(exc) + if msg == "result_already_exists": + raise HTTPException( + status_code=409, + detail="Đã có kết quả cho hồ sơ này — dùng cập nhật hoặc xóa trước.", + ) from None + if msg == "invalid_decision": + raise HTTPException(status_code=422, detail="Quyết định không hợp lệ.") from None + raise HTTPException(status_code=400, detail=msg) from None + + await best_effort_notify_applicant_after_admin_decision(result) + return result + + +@app.put("/api/applications/{application_id}/admin-result") +async def update_application_admin_result( + application_id: str, + body: AdminApplicationResultBody, + authorization: Optional[str] = Header(None), +): + """Idempotent upsert: tạo hoặc cập nhật kết quả trong một yêu cầu (đồng bộ ``initiatives.status``).""" + admin_uid = _require_admin_user(authorization) + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.application_admin_results import upsert_admin_result + + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để lưu kết quả.") + from src.initiative_db.user_notifications import best_effort_notify_applicant_after_admin_decision + + result: Optional[Dict[str, Any]] = None + try: + async with get_session() as session: + result = await upsert_admin_result( + session, + application_id, + admin_uid, + decision=body.decision, + feedback=body.feedback, + rationale=body.rationale, + ) + except LookupError: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ đã nộp với mã đã chọn.") from None + except ValueError as exc: + if str(exc) == "invalid_decision": + raise HTTPException(status_code=422, detail="Quyết định không hợp lệ.") from None + raise HTTPException(status_code=400, detail=str(exc)) from None + + await best_effort_notify_applicant_after_admin_decision(result) + return result + + +@app.delete("/api/applications/{application_id}/admin-result") +async def delete_application_admin_result( + application_id: str, + authorization: Optional[str] = Header(None), +): + """DELETE kết quả — trả ``initiatives.status`` về ``submitted``.""" + from src.initiative_db.engine import get_session, is_postgres_enabled + from src.initiative_db.application_admin_results import delete_admin_result + + admin_uid = _require_admin_user(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để xóa kết quả.") + try: + async with get_session() as session: + await delete_admin_result(session, application_id, actor_user_id=admin_uid) + except LookupError as exc: + if exc.args and exc.args[0] == "result_not_found": + raise HTTPException(status_code=404, detail="Chưa có kết quả để xóa.") from None + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ đã nộp với mã đã chọn.") from None + return {"deleted": True, "applicationId": application_id} + + +if __name__ == "__main__": + import uvicorn # type: ignore + uvicorn.run(app, host='0.0.0.0', port='4402', debug=True) \ No newline at end of file diff --git a/be0/migrations/001_initiative_schema.sql b/be0/migrations/001_initiative_schema.sql new file mode 100644 index 0000000..fd7bf95 --- /dev/null +++ b/be0/migrations/001_initiative_schema.sql @@ -0,0 +1,251 @@ +-- Initiative Recognition System — PostgreSQL schema (architecture_plan.md §4) +-- Table order respects FKs (units before users). + +CREATE EXTENSION IF NOT EXISTS citext; + +-- ========= ENUMS ========= +DO $$ BEGIN + CREATE TYPE user_role AS ENUM ('applicant','council_member','editor','admin','viewer'); +EXCEPTION WHEN duplicate_object THEN NULL; +END $$; + +DO $$ BEGIN + CREATE TYPE initiative_class AS ENUM ('technical','research','textbook'); +EXCEPTION WHEN duplicate_object THEN NULL; +END $$; + +DO $$ BEGIN + CREATE TYPE research_evidence AS ENUM ('international','domestic','poster'); +EXCEPTION WHEN duplicate_object THEN NULL; +END $$; + +DO $$ BEGIN + CREATE TYPE eval_level AS ENUM ('high','medium','low'); +EXCEPTION WHEN duplicate_object THEN NULL; +END $$; + +DO $$ BEGIN + CREATE TYPE submission_status AS ENUM ('draft','submitted','under_review','approved','rejected'); +EXCEPTION WHEN duplicate_object THEN NULL; +END $$; + +DO $$ BEGIN + CREATE TYPE recognition_tier AS ENUM ('excellent','good'); +EXCEPTION WHEN duplicate_object THEN NULL; +END $$; + +-- ========= IDENTITY ========= +CREATE TABLE IF NOT EXISTS units ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + name TEXT NOT NULL, + parent_id UUID REFERENCES units(id), + address TEXT +); + +CREATE TABLE IF NOT EXISTS users ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + email CITEXT UNIQUE NOT NULL, + password_hash TEXT NOT NULL, + full_name TEXT NOT NULL, + phone TEXT, + unit_id UUID REFERENCES units(id), + is_active BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE TABLE IF NOT EXISTS user_roles ( + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + role user_role NOT NULL, + PRIMARY KEY (user_id, role) +); + +-- System user for anonymous draft saves (no login yet) +INSERT INTO users (id, email, password_hash, full_name) +VALUES ( + '00000000-0000-4000-8000-000000000001', + 'system@draft.local', + '-', + 'System (draft owner)' +) +ON CONFLICT (email) DO NOTHING; + +-- ========= CASE / INITIATIVE ROOT ========= +CREATE TABLE IF NOT EXISTS initiatives ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + case_code TEXT UNIQUE NOT NULL, + owner_id UUID NOT NULL REFERENCES users(id), + status submission_status NOT NULL DEFAULT 'draft', + recognition_tier recognition_tier, + submitted_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_initiatives_owner_status ON initiatives(owner_id, status); + +-- ========= DRAFT SNAPSHOTS ========= +CREATE TABLE IF NOT EXISTS drafts ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + draft_code TEXT UNIQUE NOT NULL, + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + payload JSONB NOT NULL, + version INTEGER NOT NULL DEFAULT 1, + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_drafts_initiative ON drafts(initiative_id); + +-- ========= ĐƠN (APPLICATION) ========= +CREATE TABLE IF NOT EXISTS applications ( + initiative_id UUID PRIMARY KEY REFERENCES initiatives(id) ON DELETE CASCADE, + initiative_name TEXT NOT NULL, + investor_name TEXT, + application_field TEXT, + first_apply_date DATE, + initiative_classification initiative_class, + research_evidence_kind research_evidence, + international_journal_decl TEXT, + content_summary TEXT, + confidential_info TEXT, + conditions TEXT, + author_evaluation TEXT, + trial_evaluation TEXT, + submission_day SMALLINT, + submission_month SMALLINT, + submission_year SMALLINT, + honesty_confirmed BOOLEAN NOT NULL DEFAULT FALSE, + CONSTRAINT chk_first_apply_window + CHECK (first_apply_date IS NULL + OR first_apply_date BETWEEN DATE '2025-04-15' AND DATE '2026-04-15') +); + +CREATE TABLE IF NOT EXISTS authors ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + user_id UUID REFERENCES users(id), + ordinal SMALLINT NOT NULL, + full_name TEXT NOT NULL, + dob DATE, + workplace TEXT, + title TEXT, + qualification TEXT, + contribution_percent NUMERIC(5,2) NOT NULL, + is_representative BOOLEAN NOT NULL DEFAULT FALSE, + CHECK (contribution_percent >= 0 AND contribution_percent <= 100) +); +CREATE UNIQUE INDEX IF NOT EXISTS uq_authors_repr ON authors(initiative_id) WHERE is_representative; + +CREATE TABLE IF NOT EXISTS support_staff ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + full_name TEXT, + dob DATE, + workplace TEXT, + title TEXT, + qualification TEXT, + support_content TEXT +); + +CREATE TABLE IF NOT EXISTS evidence_files ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + kind TEXT NOT NULL CHECK (kind IN ('textbook','research','technical')), + storage_uri TEXT NOT NULL, + original_name TEXT NOT NULL, + mime_type TEXT NOT NULL DEFAULT 'application/pdf', + byte_size BIGINT NOT NULL, + sha256 CHAR(64) NOT NULL, + uploaded_by UUID NOT NULL REFERENCES users(id), + uploaded_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE UNIQUE INDEX IF NOT EXISTS uq_evidence_kind ON evidence_files(initiative_id, kind); + +-- ========= BÁO CÁO (REPORT) ========= +CREATE TABLE IF NOT EXISTS reports ( + initiative_id UUID PRIMARY KEY REFERENCES initiatives(id) ON DELETE CASCADE, + introduction TEXT, + representative_phone TEXT, + representative_email TEXT, + current_status TEXT, + purpose TEXT, + implementation_steps TEXT, + first_applied_unit TEXT, + achieved_result TEXT, + novelty TEXT, + effectiveness JSONB NOT NULL DEFAULT '{}'::jsonb, + submission_date DATE +); + +CREATE TABLE IF NOT EXISTS trial_units ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + name TEXT NOT NULL, + address TEXT, + field TEXT, + ordinal SMALLINT +); + +-- ========= CONTRIBUTION CONFIRMATION ========= +CREATE TABLE IF NOT EXISTS contributions ( + initiative_id UUID PRIMARY KEY REFERENCES initiatives(id) ON DELETE CASCADE, + main_author TEXT NOT NULL, + position TEXT, + representative_percent NUMERIC(5,2), + submission_date TIMESTAMPTZ, + digital_signature_confirmed BOOLEAN NOT NULL DEFAULT FALSE +); + +CREATE TABLE IF NOT EXISTS contribution_participants ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + full_name TEXT, + work_unit TEXT, + contribution_percent NUMERIC(5,2) +); + +-- ========= PHIẾU ĐÁNH GIÁ ========= +CREATE TABLE IF NOT EXISTS evaluations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + council_member_id UUID NOT NULL REFERENCES users(id), + position TEXT, + evaluation_date DATE NOT NULL, + novelty_level eval_level, + novelty_score SMALLINT, + novelty_comment TEXT, + effectiveness_level eval_level, + effectiveness_score SMALLINT, + effectiveness_comment TEXT, + total_score SMALLINT GENERATED ALWAYS AS + (COALESCE(novelty_score,0) + COALESCE(effectiveness_score,0)) STORED, + conclusion TEXT, + status submission_status NOT NULL DEFAULT 'draft', + submitted_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + CHECK (novelty_score IS NULL OR (novelty_score BETWEEN 0 AND 40)), + CHECK (effectiveness_score IS NULL OR (effectiveness_score BETWEEN 0 AND 60)), + UNIQUE (initiative_id, council_member_id) +); +CREATE INDEX IF NOT EXISTS idx_eval_initiative ON evaluations(initiative_id); + +-- ========= ADMIN VERIFY ========= +CREATE TABLE IF NOT EXISTS verifications ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + field_name TEXT NOT NULL, + content_hash CHAR(64) NOT NULL, + verified_by UUID NOT NULL REFERENCES users(id), + verified_at TIMESTAMPTZ NOT NULL DEFAULT now(), + result TEXT +); + +-- ========= AUDIT TRAIL ========= +CREATE TABLE IF NOT EXISTS audit_log ( + id BIGSERIAL PRIMARY KEY, + actor_id UUID REFERENCES users(id), + action TEXT NOT NULL, + entity TEXT NOT NULL, + entity_id UUID NOT NULL, + diff JSONB, + occurred_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_audit_entity ON audit_log(entity, entity_id); diff --git a/be0/migrations/002_application_storage_extensions.sql b/be0/migrations/002_application_storage_extensions.sql new file mode 100644 index 0000000..402a40b --- /dev/null +++ b/be0/migrations/002_application_storage_extensions.sql @@ -0,0 +1,71 @@ +-- Versioned tab payloads + immutable submit snapshots + workflow/taxonomy + artifact registry. +-- Apply on existing DBs: psql "$INITIATIVE_DATABASE_URL" -f migrations/002_application_storage_extensions.sql +-- (use sync driver URL, not asyncpg, for psql) + +-- ========= DRAFT TAB SNAPSHOTS (fe0: report | application | contribution) ========= +CREATE TABLE IF NOT EXISTS draft_tab_snapshots ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + draft_id UUID REFERENCES drafts(id) ON DELETE SET NULL, + tab TEXT NOT NULL CHECK (tab IN ('report', 'application', 'contribution')), + tab_version INTEGER NOT NULL DEFAULT 1, + payload JSONB NOT NULL DEFAULT '{}'::jsonb, + source TEXT NOT NULL DEFAULT 'autosave', + captured_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_draft_tab_snapshots_init_tab_ver + ON draft_tab_snapshots (initiative_id, tab, tab_version DESC); +CREATE INDEX IF NOT EXISTS idx_draft_tab_snapshots_captured + ON draft_tab_snapshots (captured_at DESC); + +-- ========= SUBMIT SNAPSHOTS (immutable row per successful submit) ========= +CREATE TABLE IF NOT EXISTS application_submit_snapshots ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + submission_record_id TEXT NOT NULL, + merged_tabs JSONB NOT NULL DEFAULT '{}'::jsonb, + submit_metadata JSONB NOT NULL DEFAULT '{}'::jsonb, + full_pdf_uri TEXT NOT NULL, + captured_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_submit_snapshots_init_time + ON application_submit_snapshots (initiative_id, captured_at DESC); + +-- ========= WORKFLOW / LIST PROJECTION (council fields) ========= +CREATE TABLE IF NOT EXISTS application_workflow ( + initiative_id UUID PRIMARY KEY REFERENCES initiatives(id) ON DELETE CASCADE, + review_status TEXT NOT NULL DEFAULT 'not_reviewed', + review_deadline DATE, + reviewer JSONB, + supervisor JSONB, + conference JSONB, + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- ========= TAXONOMY (subjectId, groupId, topicType from fe0 ApplicationItem) ========= +CREATE TABLE IF NOT EXISTS application_taxonomy ( + initiative_id UUID PRIMARY KEY REFERENCES initiatives(id) ON DELETE CASCADE, + subject_id TEXT NOT NULL DEFAULT '', + group_id TEXT NOT NULL DEFAULT '', + topic_type TEXT NOT NULL DEFAULT '', + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- ========= ARTIFACTS (PDF + future abstract/poster URIs; complements evidence_files) ========= +CREATE TABLE IF NOT EXISTS application_artifacts ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + role TEXT NOT NULL CHECK (role IN ( + 'full_pdf', 'abstract', 'poster', + 'textbook_evidence', 'research_evidence', 'technical_evidence', 'other' + )), + storage_uri TEXT NOT NULL, + original_name TEXT, + mime_type TEXT NOT NULL DEFAULT 'application/pdf', + byte_size BIGINT, + sha256 CHAR(64), + uploaded_by UUID REFERENCES users(id), + uploaded_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (initiative_id, role) +); +CREATE INDEX IF NOT EXISTS idx_application_artifacts_init ON application_artifacts (initiative_id); diff --git a/be0/migrations/003_review_documents.sql b/be0/migrations/003_review_documents.sql new file mode 100644 index 0000000..6126364 --- /dev/null +++ b/be0/migrations/003_review_documents.sql @@ -0,0 +1,22 @@ +-- Persist ReviewPanel JSON bundles (templateData + officialBieuMau + full trees) +-- Apply on existing DBs: +-- psql "$INITIATIVE_DATABASE_URL" -f migrations/003_review_documents.sql + +CREATE TABLE IF NOT EXISTS application_review_documents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + case_id TEXT NOT NULL, + document_version INTEGER NOT NULL DEFAULT 1, + official_bieu_mau JSONB NOT NULL DEFAULT '{}'::jsonb, + template_data JSONB, + full_bundle JSONB, + created_by UUID REFERENCES users(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (initiative_id, document_version) +); + +CREATE INDEX IF NOT EXISTS idx_review_docs_initiative_time + ON application_review_documents (initiative_id, created_at DESC); +CREATE INDEX IF NOT EXISTS idx_review_docs_case_time + ON application_review_documents (case_id, created_at DESC); + diff --git a/be0/migrations/004_application_admin_results.sql b/be0/migrations/004_application_admin_results.sql new file mode 100644 index 0000000..9eeecab --- /dev/null +++ b/be0/migrations/004_application_admin_results.sql @@ -0,0 +1,18 @@ +-- Admin-recorded adjudication outcome per initiative (linked to applicant application id API). +-- One row per initiative; CRUD via /api/applications/{applicationId}/admin-result + +CREATE TABLE IF NOT EXISTS application_admin_results ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + initiative_id UUID NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE, + decision TEXT NOT NULL CHECK (decision IN ('approved','rejected')), + feedback TEXT NOT NULL DEFAULT '', + rationale TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID REFERENCES users(id), + updated_by UUID REFERENCES users(id), + CONSTRAINT uq_application_admin_results_initiative UNIQUE (initiative_id) +); + +CREATE INDEX IF NOT EXISTS idx_application_admin_results_initiative + ON application_admin_results(initiative_id); diff --git a/be0/migrations/004_evidence_artifact_review.sql b/be0/migrations/004_evidence_artifact_review.sql new file mode 100644 index 0000000..4c02fd3 --- /dev/null +++ b/be0/migrations/004_evidence_artifact_review.sql @@ -0,0 +1,13 @@ +-- Evidence staff review (approve / reject) on application_artifacts — must match be0/src/initiative_db/models.py ApplicationArtifact +-- New DBs: loaded by docker-compose postgres init (04_...). +-- Existing DBs: run once, e.g. +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/004_evidence_artifact_review.sql +-- # or: psql "$INITIATIVE_DATABASE_URL" -f be0/migrations/004_evidence_artifact_review.sql + +ALTER TABLE application_artifacts + ADD COLUMN IF NOT EXISTS review_status TEXT, + ADD COLUMN IF NOT EXISTS reviewed_by UUID REFERENCES users (id) ON DELETE SET NULL, + ADD COLUMN IF NOT EXISTS reviewed_at TIMESTAMPTZ; + +CREATE INDEX IF NOT EXISTS idx_application_artifacts_review + ON application_artifacts (initiative_id, review_status); diff --git a/be0/migrations/006_user_notifications.sql b/be0/migrations/006_user_notifications.sql new file mode 100644 index 0000000..eda4665 --- /dev/null +++ b/be0/migrations/006_user_notifications.sql @@ -0,0 +1,26 @@ +-- In-app notifications for applicants (admin adjudication → inbox). +-- Best-effort insert after PUT/POST admin-result; full text duplicated for read UX. + +CREATE TABLE IF NOT EXISTS user_notifications ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + recipient_user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + type TEXT NOT NULL CHECK (type IN ('admin_application_decision')), + title TEXT NOT NULL, + body TEXT NOT NULL, + application_id TEXT NOT NULL, + related_initiative_id UUID REFERENCES initiatives(id) ON DELETE SET NULL, + source_admin_result_id UUID REFERENCES application_admin_results(id) ON DELETE SET NULL, + decision TEXT NOT NULL CHECK (decision IN ('approved','rejected')), + merit_category_label TEXT, + feedback_text TEXT NOT NULL DEFAULT '', + rationale_text TEXT, + read_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX IF NOT EXISTS user_notifications_inbox_idx + ON user_notifications (recipient_user_id, created_at DESC); + +CREATE INDEX IF NOT EXISTS user_notifications_unread_idx + ON user_notifications (recipient_user_id) + WHERE read_at IS NULL; diff --git a/be0/migrations/007_user_roles_email_policy_admin.sql b/be0/migrations/007_user_roles_email_policy_admin.sql new file mode 100644 index 0000000..21e3248 --- /dev/null +++ b/be0/migrations/007_user_roles_email_policy_admin.sql @@ -0,0 +1,33 @@ +-- Policy-sourced admin rows: safe to drop when email leaves AUTH_ADMIN_EMAILS (app reconciliation). +-- Apply on existing DBs: docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/007_user_roles_email_policy_admin.sql +-- Fresh docker-compose init: add this file as docker-entrypoint-initdb.d/07_*.sql + +ALTER TABLE user_roles ADD COLUMN IF NOT EXISTS admin_from_email_policy BOOLEAN NOT NULL DEFAULT FALSE; + +COMMENT ON COLUMN user_roles.admin_from_email_policy IS + 'TRUE when admin was granted by email allow-list (AUTH_ADMIN_EMAILS). Reconciliation may DELETE this row if the user email is no longer in the list. FALSE preserves manually granted admin (future / exceptional).'; + +-- One-time cleanup: remove admin for addresses not in the default institutional allow-list +-- (must match default in auth_api._DEFAULT_POLICY_ADMIN_EMAILS when AUTH_ADMIN_EMAILS is unset). +DELETE FROM user_roles ur +USING users u +WHERE ur.user_id = u.id + AND ur.role::text = 'admin' + AND lower(u.email::text) NOT IN ( + 'thaontt@ump.edu.vn', + 'nltanh@ump.edu.vn', + 'ldbaochau@ump.edu.vn', + 'htchuong@ump.edu.vn' + ); + +UPDATE user_roles ur +SET admin_from_email_policy = TRUE +FROM users u +WHERE ur.user_id = u.id + AND ur.role::text = 'admin' + AND lower(u.email::text) IN ( + 'thaontt@ump.edu.vn', + 'nltanh@ump.edu.vn', + 'ldbaochau@ump.edu.vn', + 'htchuong@ump.edu.vn' + ); diff --git a/be0/migrations/008_audit_events.sql b/be0/migrations/008_audit_events.sql new file mode 100644 index 0000000..3ad294f --- /dev/null +++ b/be0/migrations/008_audit_events.sql @@ -0,0 +1,38 @@ +-- Unified append-only audit trail (see assets/docs/audit-log-implementation.md). +-- Application role should be granted INSERT, SELECT only (configure per deployment). + +DO $$ +BEGIN + CREATE TYPE audit_action AS ENUM ( + 'create', + 'read', + 'update', + 'delete', + 'login', + 'logout', + 'login_failed' + ); +EXCEPTION + WHEN duplicate_object THEN NULL; +END +$$; + +CREATE TABLE IF NOT EXISTS audit_events ( + id BIGSERIAL PRIMARY KEY, + occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(), + actor_user_id UUID REFERENCES users(id) ON DELETE SET NULL, + actor_email TEXT NOT NULL, + actor_role TEXT NOT NULL, + action audit_action NOT NULL, + entity_type TEXT NOT NULL, + entity_id TEXT, + before JSONB, + after JSONB, + metadata JSONB NOT NULL DEFAULT '{}'::jsonb, + request_id UUID +); + +CREATE INDEX IF NOT EXISTS idx_audit_actor_time ON audit_events (actor_user_id, occurred_at DESC); +CREATE INDEX IF NOT EXISTS idx_audit_entity ON audit_events (entity_type, entity_id, occurred_at DESC); +CREATE INDEX IF NOT EXISTS idx_audit_action_time ON audit_events (action, occurred_at DESC); +CREATE INDEX IF NOT EXISTS idx_audit_metadata_gin ON audit_events USING gin (metadata); diff --git a/be0/migrations/009_backup_artifact_roles_storage_kind.sql b/be0/migrations/009_backup_artifact_roles_storage_kind.sql new file mode 100644 index 0000000..875f62c --- /dev/null +++ b/be0/migrations/009_backup_artifact_roles_storage_kind.sql @@ -0,0 +1,35 @@ +-- Backup / canonical storage: official printable DOCX+PDF roles + explicit storage_kind. +-- Apply: psql "$INITIATIVE_DATABASE_URL" -f migrations/009_backup_artifact_roles_storage_kind.sql + +ALTER TABLE application_artifacts DROP CONSTRAINT IF EXISTS application_artifacts_role_check; +ALTER TABLE application_artifacts ADD CONSTRAINT application_artifacts_role_check CHECK (role IN ( + 'full_pdf', + 'abstract', + 'poster', + 'textbook_evidence', + 'research_evidence', + 'technical_evidence', + 'other', + 'official_form_docx', + 'official_form_pdf' +)); + +ALTER TABLE application_artifacts + ADD COLUMN IF NOT EXISTS storage_kind TEXT; + +UPDATE application_artifacts SET storage_kind = CASE + WHEN storage_uri LIKE 'http://%' OR storage_uri LIKE 'https://%' THEN 'external_url' + WHEN storage_uri LIKE '/submitted-initiatives/%' THEN 'filesystem' + WHEN role IN ('research_evidence', 'textbook_evidence', 'technical_evidence') THEN 'minio_attachments' + ELSE 'minio_exports' +END +WHERE storage_kind IS NULL; + +ALTER TABLE application_artifacts DROP CONSTRAINT IF EXISTS application_artifacts_storage_kind_check; +ALTER TABLE application_artifacts ADD CONSTRAINT application_artifacts_storage_kind_check + CHECK (storage_kind IS NULL OR storage_kind IN ( + 'minio_exports', + 'minio_attachments', + 'filesystem', + 'external_url' + )); diff --git a/be0/migrations/010_user_staff_profiles.sql b/be0/migrations/010_user_staff_profiles.sql new file mode 100644 index 0000000..32a5710 --- /dev/null +++ b/be0/migrations/010_user_staff_profiles.sql @@ -0,0 +1,114 @@ +-- User staff profiles (1:1 with users) — HR / verification workflow +-- Apply: docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/010_user_staff_profiles.sql + +DO $$ BEGIN + CREATE TYPE profile_verification_status AS ENUM ('draft', 'pending', 'verified', 'rejected'); +EXCEPTION WHEN duplicate_object THEN NULL; +END $$; + +CREATE TABLE IF NOT EXISTS academic_titles ( + code TEXT PRIMARY KEY, + label_vi TEXT NOT NULL, + label_en TEXT NOT NULL, + sort_order INTEGER NOT NULL DEFAULT 0, + active BOOLEAN NOT NULL DEFAULT TRUE +); + +INSERT INTO academic_titles (code, label_vi, label_en, sort_order) VALUES + ('professor', 'Giáo sư', 'Professor', 10), + ('associate_professor', 'Phó Giáo sư', 'Associate Professor', 20), + ('doctor_sc', 'Tiến sĩ', 'Doctor of Science', 30), + ('bsckii', 'BSCKII', 'Specialist level II', 35), + ('bscki', 'BSCKI', 'Specialist level I', 36), + ('master', 'Thạc sĩ', 'Master', 40), + ('doctor_md', 'Bác sĩ', 'Physician', 45), + ('pharmacist', 'Dược sĩ', 'Pharmacist', 46), + ('bachelor', 'Cử nhân', 'Bachelor', 50), + ('other', 'Khác (ghi rõ)', 'Other (specify)', 100) +ON CONFLICT (code) DO NOTHING; + +CREATE TABLE IF NOT EXISTS user_staff_profiles ( + user_id UUID PRIMARY KEY + REFERENCES users(id) ON DELETE CASCADE, + + employee_id TEXT, + academic_title_code TEXT REFERENCES academic_titles(code), + academic_title_other TEXT, + unit_name_freetext TEXT, + job_title TEXT, + + profile_verification_status profile_verification_status + NOT NULL DEFAULT 'draft', + verification_submitted_at TIMESTAMPTZ, + verified_at TIMESTAMPTZ, + verified_by_user_id UUID REFERENCES users(id), + rejection_reason TEXT, + + version INTEGER NOT NULL DEFAULT 1, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + CONSTRAINT employee_id_shape + CHECK (employee_id IS NULL OR employee_id ~ '^[A-Z0-9-]{3,32}$'), + + CONSTRAINT academic_title_other_invariant CHECK ( + CASE + WHEN academic_title_code IS NULL THEN academic_title_other IS NULL + WHEN academic_title_code = 'other' THEN + academic_title_other IS NOT NULL AND length(trim(academic_title_other)) > 0 + ELSE academic_title_other IS NULL + END + ), + + CONSTRAINT verified_requires_metadata CHECK ( + profile_verification_status <> 'verified' + OR (verified_at IS NOT NULL AND verified_by_user_id IS NOT NULL) + ), + + CONSTRAINT rejected_requires_reason CHECK ( + profile_verification_status <> 'rejected' + OR (rejection_reason IS NOT NULL AND length(trim(rejection_reason)) > 0) + ), + + CONSTRAINT non_terminal_clears_verification CHECK ( + profile_verification_status NOT IN ('draft', 'pending') + OR (verified_at IS NULL AND verified_by_user_id IS NULL) + ), + + CONSTRAINT rejected_clears_verification_metadata CHECK ( + profile_verification_status <> 'rejected' + OR (verified_at IS NULL AND verified_by_user_id IS NULL) + ), + + CONSTRAINT verified_clears_rejection CHECK ( + profile_verification_status <> 'verified' + OR rejection_reason IS NULL + ), + + CONSTRAINT job_title_length CHECK ( + job_title IS NULL OR length(job_title) <= 120 + ) +); + +CREATE UNIQUE INDEX IF NOT EXISTS ix_usp_employee_id_unique + ON user_staff_profiles (employee_id) + WHERE employee_id IS NOT NULL; + +CREATE INDEX IF NOT EXISTS ix_usp_pending_queue + ON user_staff_profiles (verification_submitted_at) + WHERE profile_verification_status = 'pending'; + +CREATE INDEX IF NOT EXISTS ix_usp_verifier_activity + ON user_staff_profiles (verified_by_user_id, verified_at DESC) + WHERE verified_by_user_id IS NOT NULL; + +-- Backfill one row per existing user (draft, NULL fields) +INSERT INTO user_staff_profiles (user_id, profile_verification_status) +SELECT u.id, 'draft'::profile_verification_status +FROM users u +WHERE NOT EXISTS ( + SELECT 1 FROM user_staff_profiles p WHERE p.user_id = u.id +); + +COMMENT ON TABLE user_staff_profiles IS + 'Institutional staff profile and verification state; scalars only — no MinIO.'; diff --git a/be0/migrations/011_academic_titles_vn.sql b/be0/migrations/011_academic_titles_vn.sql new file mode 100644 index 0000000..eb53874 --- /dev/null +++ b/be0/migrations/011_academic_titles_vn.sql @@ -0,0 +1,19 @@ +-- Extend / refresh academic_titles for UMP staff profile dropdown (VN labels + BSCK codes). +-- Apply after 010: psql … -f be0/migrations/011_academic_titles_vn.sql + +INSERT INTO academic_titles (code, label_vi, label_en, sort_order, active) VALUES + ('professor', 'Giáo sư', 'Professor', 10, TRUE), + ('associate_professor', 'Phó Giáo sư', 'Associate Professor', 20, TRUE), + ('doctor_sc', 'Tiến sĩ', 'Doctor of Science', 30, TRUE), + ('bsckii', 'BSCKII', 'Specialist level II', 35, TRUE), + ('bscki', 'BSCKI', 'Specialist level I', 36, TRUE), + ('master', 'Thạc sĩ', 'Master', 40, TRUE), + ('doctor_md', 'Bác sĩ', 'Physician', 45, TRUE), + ('pharmacist', 'Dược sĩ', 'Pharmacist', 46, TRUE), + ('bachelor', 'Cử nhân', 'Bachelor', 50, TRUE), + ('other', 'Khác (ghi rõ)', 'Other (specify)', 100, TRUE) +ON CONFLICT (code) DO UPDATE SET + label_vi = EXCLUDED.label_vi, + label_en = EXCLUDED.label_en, + sort_order = EXCLUDED.sort_order, + active = EXCLUDED.active; diff --git a/be0/migrations/012_password_reset.sql b/be0/migrations/012_password_reset.sql new file mode 100644 index 0000000..cbe7f6c --- /dev/null +++ b/be0/migrations/012_password_reset.sql @@ -0,0 +1,19 @@ +-- Password reset tokens + JWT credential invalidation (see auth_api, auth_credential_middleware). +-- Apply: docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/012_password_reset.sql + +ALTER TABLE users ADD COLUMN IF NOT EXISTS credential_version INTEGER NOT NULL DEFAULT 0; + +COMMENT ON COLUMN users.credential_version IS + 'Incremented on password change/reset. JWT ''cv'' claim must match or token is rejected.'; + +CREATE TABLE IF NOT EXISTS password_reset_tokens ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + token_hash TEXT NOT NULL UNIQUE, + expires_at TIMESTAMPTZ NOT NULL, + used_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX IF NOT EXISTS idx_password_reset_tokens_user_id ON password_reset_tokens(user_id); +CREATE INDEX IF NOT EXISTS idx_password_reset_tokens_expires_at ON password_reset_tokens(expires_at); diff --git a/be0/migrations/013_email_verification.sql b/be0/migrations/013_email_verification.sql new file mode 100644 index 0000000..89cedd5 --- /dev/null +++ b/be0/migrations/013_email_verification.sql @@ -0,0 +1,21 @@ +-- Email verification before login (see auth_api deliver_email_verification_email). +-- Apply: docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/013_email_verification.sql + +ALTER TABLE users ADD COLUMN IF NOT EXISTS email_verified BOOLEAN NOT NULL DEFAULT FALSE; + +UPDATE users SET email_verified = TRUE WHERE email_verified = FALSE; + +COMMENT ON COLUMN users.email_verified IS + 'FALSE until user confirms institutional inbox via email link; login and API tokens require TRUE.'; + +CREATE TABLE IF NOT EXISTS email_verification_tokens ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + token_hash TEXT NOT NULL UNIQUE, + expires_at TIMESTAMPTZ NOT NULL, + used_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX IF NOT EXISTS idx_email_verification_tokens_user_id ON email_verification_tokens(user_id); +CREATE INDEX IF NOT EXISTS idx_email_verification_tokens_expires_at ON email_verification_tokens(expires_at); diff --git a/be0/migrations/014_registration_otp.sql b/be0/migrations/014_registration_otp.sql new file mode 100644 index 0000000..6c2f45d --- /dev/null +++ b/be0/migrations/014_registration_otp.sql @@ -0,0 +1,20 @@ +-- Registration email verification via 6-digit OTP (replaces magic-link issuance on register). +-- Apply after 013_email_verification.sql: +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/014_registration_otp.sql + +CREATE TABLE IF NOT EXISTS registration_otp_codes ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + otp_hash TEXT NOT NULL, + expires_at TIMESTAMPTZ NOT NULL, + failed_attempts INT NOT NULL DEFAULT 0, + used_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX IF NOT EXISTS idx_registration_otp_codes_user_pending + ON registration_otp_codes (user_id) + WHERE used_at IS NULL; + +COMMENT ON TABLE registration_otp_codes IS + 'Hashed 6-digit OTP for register verification; pending rows deleted when superseded by resend.'; diff --git a/be0/migrations/015_document_templates.sql b/be0/migrations/015_document_templates.sql new file mode 100644 index 0000000..e162383 --- /dev/null +++ b/be0/migrations/015_document_templates.sql @@ -0,0 +1,24 @@ +-- Admin-managed document templates: a .docx (stored in MinIO bucket initiative-templates) +-- plus its extracted Jinja placeholder fields. Applicants render a filled PDF by template id. +-- Apply after 014_registration_otp.sql: +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/015_document_templates.sql + +CREATE TABLE IF NOT EXISTS document_templates ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + name TEXT NOT NULL, + description TEXT, + storage_key TEXT NOT NULL, + original_filename TEXT, + content_sha256 TEXT, + fields JSONB NOT NULL DEFAULT '[]'::jsonb, + is_active BOOLEAN NOT NULL DEFAULT TRUE, + created_by UUID REFERENCES users(id) ON DELETE SET NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX IF NOT EXISTS idx_document_templates_active + ON document_templates (is_active, created_at DESC); + +COMMENT ON TABLE document_templates IS + 'Admin-managed DOCX templates (file in MinIO initiative-templates) with extracted Jinja placeholder fields. Applicants render filled PDFs by template id.'; diff --git a/be0/migrations/016_research_projects.sql b/be0/migrations/016_research_projects.sql new file mode 100644 index 0000000..1031001 --- /dev/null +++ b/be0/migrations/016_research_projects.sql @@ -0,0 +1,133 @@ +-- Research-project proposals (Thuyết minh đề tài, Mẫu III.06-TM.ĐTUD) + the PI "cockpit" entities. +-- A proposal row IS the project across its lifecycle: draft -> submitted -> approved | rejected. +-- On approval the cockpit unlocks; child tables (members/datasets/models/assets/milestones) hang off it. +-- Owner+admin authz (v1): a project is owned by owner_user_id; admins may review/approve/reject. +-- Apply after 015_document_templates.sql: +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/016_research_projects.sql + +CREATE TABLE IF NOT EXISTS research_projects ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + owner_user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + status TEXT NOT NULL DEFAULT 'draft' CHECK (status IN ('draft','submitted','approved','rejected')), + code TEXT, + title TEXT NOT NULL DEFAULT '', + level TEXT NOT NULL DEFAULT '', + pi_name TEXT NOT NULL DEFAULT '', + period_months INTEGER, + budget_total NUMERIC(14,2), + content JSONB NOT NULL DEFAULT '{}'::jsonb, + submitted_at TIMESTAMPTZ, + reviewed_by UUID REFERENCES users(id) ON DELETE SET NULL, + reviewed_at TIMESTAMPTZ, + review_note TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX IF NOT EXISTS idx_research_projects_owner ON research_projects (owner_user_id, created_at DESC); +CREATE INDEX IF NOT EXISTS idx_research_projects_status ON research_projects (status, created_at DESC); + +CREATE TABLE IF NOT EXISTS research_project_members ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + project_id UUID NOT NULL REFERENCES research_projects(id) ON DELETE CASCADE, + sort_order INTEGER NOT NULL DEFAULT 0, + name TEXT NOT NULL DEFAULT '', + role TEXT NOT NULL DEFAULT '', + access TEXT NOT NULL DEFAULT '', + org TEXT NOT NULL DEFAULT '', + email TEXT NOT NULL DEFAULT '', + months INTEGER, + tasks TEXT NOT NULL DEFAULT '', + status TEXT NOT NULL DEFAULT '', + user_id UUID REFERENCES users(id) ON DELETE SET NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_research_project_members_project ON research_project_members (project_id, sort_order); + +CREATE TABLE IF NOT EXISTS research_project_datasets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + project_id UUID NOT NULL REFERENCES research_projects(id) ON DELETE CASCADE, + sort_order INTEGER NOT NULL DEFAULT 0, + name TEXT NOT NULL DEFAULT '', + type TEXT NOT NULL DEFAULT '', + records INTEGER, + source TEXT NOT NULL DEFAULT '', + sensitivity TEXT NOT NULL DEFAULT '', + ethics TEXT NOT NULL DEFAULT '', + owner TEXT NOT NULL DEFAULT '', + status TEXT NOT NULL DEFAULT '', + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_research_project_datasets_project ON research_project_datasets (project_id, sort_order); + +CREATE TABLE IF NOT EXISTS research_project_models ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + project_id UUID NOT NULL REFERENCES research_projects(id) ON DELETE CASCADE, + sort_order INTEGER NOT NULL DEFAULT 0, + name TEXT NOT NULL DEFAULT '', + task TEXT NOT NULL DEFAULT '', + framework TEXT NOT NULL DEFAULT '', + version TEXT NOT NULL DEFAULT '', + dataset TEXT NOT NULL DEFAULT '', + auc NUMERIC(6,4), + sensitivity NUMERIC(6,4), + specificity NUMERIC(6,4), + accuracy NUMERIC(6,4), + owner TEXT NOT NULL DEFAULT '', + notes TEXT NOT NULL DEFAULT '', + status TEXT NOT NULL DEFAULT '', + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_research_project_models_project ON research_project_models (project_id, sort_order); + +CREATE TABLE IF NOT EXISTS research_project_assets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + project_id UUID NOT NULL REFERENCES research_projects(id) ON DELETE CASCADE, + sort_order INTEGER NOT NULL DEFAULT 0, + name TEXT NOT NULL DEFAULT '', + category TEXT NOT NULL DEFAULT '', + acquisition TEXT NOT NULL DEFAULT '', + value NUMERIC(14,2), + owner TEXT NOT NULL DEFAULT '', + notes TEXT NOT NULL DEFAULT '', + status TEXT NOT NULL DEFAULT '', + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_research_project_assets_project ON research_project_assets (project_id, sort_order); + +CREATE TABLE IF NOT EXISTS research_project_milestones ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + project_id UUID NOT NULL REFERENCES research_projects(id) ON DELETE CASCADE, + sort_order INTEGER NOT NULL DEFAULT 0, + title TEXT NOT NULL DEFAULT '', + deliverable TEXT NOT NULL DEFAULT '', + start_period TEXT NOT NULL DEFAULT '', + end_period TEXT NOT NULL DEFAULT '', + owner TEXT NOT NULL DEFAULT '', + budget NUMERIC(14,2), + progress INTEGER NOT NULL DEFAULT 0, + status TEXT NOT NULL DEFAULT '', + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_research_project_milestones_project ON research_project_milestones (project_id, sort_order); + +CREATE TABLE IF NOT EXISTS research_project_audit ( + id BIGSERIAL PRIMARY KEY, + project_id UUID NOT NULL REFERENCES research_projects(id) ON DELETE CASCADE, + occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(), + actor_user_id UUID REFERENCES users(id) ON DELETE SET NULL, + actor_name TEXT NOT NULL DEFAULT '', + role_label TEXT NOT NULL DEFAULT '', + action TEXT NOT NULL, + subject TEXT NOT NULL DEFAULT '', + detail TEXT NOT NULL DEFAULT '' +); +CREATE INDEX IF NOT EXISTS idx_research_project_audit_project ON research_project_audit (project_id, occurred_at DESC); + +COMMENT ON TABLE research_projects IS + 'Research-project proposals (Thuyet minh de tai) that become managed projects on approval. Owner and admin authz. Content JSONB holds the full proposal form. Child research_project_* tables hold cockpit entities.'; diff --git a/be0/migrations/017_imagehub_datasets.sql b/be0/migrations/017_imagehub_datasets.sql new file mode 100644 index 0000000..139bd07 --- /dev/null +++ b/be0/migrations/017_imagehub_datasets.sql @@ -0,0 +1,76 @@ +-- ImageHub: content-addressed imaging dataset versioning (milestone 1 walking skeleton). +-- A dataset is owned by a user (investigator/PI). Files are stored as content-addressed, +-- globally deduped blobs in MinIO (one imagehub_blobs row per distinct sha256). The current +-- working file set lives in imagehub_dataset_files; a version freezes a manifest snapshot. +-- Admin sees all datasets (clinical data repository); owners see their own (research data). +-- Apply after 016_research_projects.sql: +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/017_imagehub_datasets.sql + +CREATE TABLE IF NOT EXISTS imagehub_datasets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + owner_user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + name TEXT NOT NULL DEFAULT '', + slug TEXT NOT NULL DEFAULT '', + description TEXT NOT NULL DEFAULT '', + visibility TEXT NOT NULL DEFAULT 'private' CHECK (visibility IN ('private','internal','public')), + modality_tags JSONB NOT NULL DEFAULT '[]'::jsonb, + default_branch TEXT NOT NULL DEFAULT 'main', + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_imagehub_datasets_owner ON imagehub_datasets (owner_user_id, created_at DESC); + +-- Globally content-addressed blob registry: identical bytes across datasets dedupe to one row. +CREATE TABLE IF NOT EXISTS imagehub_blobs ( + sha256 TEXT PRIMARY KEY, + size_bytes BIGINT NOT NULL DEFAULT 0, + media_type TEXT NOT NULL DEFAULT 'application/octet-stream', + storage_bucket TEXT NOT NULL DEFAULT '', + storage_key TEXT NOT NULL DEFAULT '', + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- Current working file set on a dataset default branch (one row per logical path). +CREATE TABLE IF NOT EXISTS imagehub_dataset_files ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + dataset_id UUID NOT NULL REFERENCES imagehub_datasets(id) ON DELETE CASCADE, + logical_path TEXT NOT NULL DEFAULT '', + blob_sha256 TEXT NOT NULL REFERENCES imagehub_blobs(sha256) ON DELETE RESTRICT, + size_bytes BIGINT NOT NULL DEFAULT 0, + media_type TEXT NOT NULL DEFAULT 'application/octet-stream', + imaging_meta JSONB NOT NULL DEFAULT '{}'::jsonb, + uploaded_by UUID REFERENCES users(id) ON DELETE SET NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE UNIQUE INDEX IF NOT EXISTS uq_imagehub_dataset_files_path ON imagehub_dataset_files (dataset_id, logical_path); + +-- Frozen version snapshots (the versioning spine; DAG-ready via parent_version_id). +CREATE TABLE IF NOT EXISTS imagehub_versions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + dataset_id UUID NOT NULL REFERENCES imagehub_datasets(id) ON DELETE CASCADE, + seq INTEGER NOT NULL DEFAULT 1, + message TEXT NOT NULL DEFAULT '', + manifest JSONB NOT NULL DEFAULT '[]'::jsonb, + parent_version_id UUID REFERENCES imagehub_versions(id) ON DELETE SET NULL, + author_user_id UUID REFERENCES users(id) ON DELETE SET NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE UNIQUE INDEX IF NOT EXISTS uq_imagehub_versions_seq ON imagehub_versions (dataset_id, seq); + +-- Append-only audit trail per dataset. +CREATE TABLE IF NOT EXISTS imagehub_dataset_audit ( + id BIGSERIAL PRIMARY KEY, + dataset_id UUID NOT NULL REFERENCES imagehub_datasets(id) ON DELETE CASCADE, + occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(), + actor_user_id UUID REFERENCES users(id) ON DELETE SET NULL, + actor_name TEXT NOT NULL DEFAULT '', + role_label TEXT NOT NULL DEFAULT '', + action TEXT NOT NULL, + subject TEXT NOT NULL DEFAULT '', + detail TEXT NOT NULL DEFAULT '' +); +CREATE INDEX IF NOT EXISTS idx_imagehub_dataset_audit_dataset ON imagehub_dataset_audit (dataset_id, occurred_at DESC); + +COMMENT ON TABLE imagehub_datasets IS + 'ImageHub content-addressed imaging datasets. Owner and admin authz. Files dedupe into imagehub_blobs by sha256 — imagehub_versions freezes a manifest snapshot.'; diff --git a/be0/migrations/018_imagehub_segmentation_links.sql b/be0/migrations/018_imagehub_segmentation_links.sql new file mode 100644 index 0000000..38f1b37 --- /dev/null +++ b/be0/migrations/018_imagehub_segmentation_links.sql @@ -0,0 +1,21 @@ +-- ImageHub: link organ-segmentation masks to their parent image file (Phase D). +-- A mask file (file_kind='segmentation') points at the image it segments via a +-- self-referential parent_file_id (e.g. an organ mask of ct.nii.gz); organ_label +-- names the organ. Regular files stay file_kind='image'. Idempotent (ADD COLUMN IF +-- NOT EXISTS) so the startup runner can apply it to volumes that predate it. +-- Apply after 017_imagehub_datasets.sql (no semicolons inside comments — the runner +-- splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/018_imagehub_segmentation_links.sql + +ALTER TABLE imagehub_dataset_files + ADD COLUMN IF NOT EXISTS file_kind TEXT NOT NULL DEFAULT 'image' CHECK (file_kind IN ('image','segmentation')); + +ALTER TABLE imagehub_dataset_files + ADD COLUMN IF NOT EXISTS parent_file_id UUID REFERENCES imagehub_dataset_files(id) ON DELETE CASCADE; + +ALTER TABLE imagehub_dataset_files + ADD COLUMN IF NOT EXISTS organ_label TEXT NOT NULL DEFAULT ''; + +-- List all masks of an image efficiently. +CREATE INDEX IF NOT EXISTS idx_imagehub_dataset_files_parent + ON imagehub_dataset_files (parent_file_id); diff --git a/be0/migrations/019_imagehub_cloud_import.sql b/be0/migrations/019_imagehub_cloud_import.sql new file mode 100644 index 0000000..20b6454 --- /dev/null +++ b/be0/migrations/019_imagehub_cloud_import.sql @@ -0,0 +1,53 @@ +-- ImageHub: Cloud Import — storage methods + external (referenced, not copied) dataset files. +-- A storage method holds verified credentials (config_encrypted, never returned to the client) +-- for an external bucket (S3/GCS/Azure). A dataset file is then EITHER a local content-addressed +-- blob (blob_sha256 set) OR an external reference (storage_method_id + external_path set) that +-- streams from the bucket and is never copied to our servers (privacy rule C4). Idempotent +-- (CREATE/ADD ... IF NOT EXISTS) so the startup runner can apply it to volumes that predate it. +-- Apply after 018 (no semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/019_imagehub_cloud_import.sql + +CREATE TABLE IF NOT EXISTS imagehub_storage_methods ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + owner_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + name TEXT NOT NULL, + provider TEXT NOT NULL CHECK (provider IN ('s3','gcs','azure')), + access_mode TEXT NOT NULL DEFAULT 'read' CHECK (access_mode IN ('read','readwrite')), + bucket TEXT NOT NULL, + region TEXT, + config_encrypted TEXT NOT NULL, + verification_status TEXT NOT NULL DEFAULT 'pending' CHECK (verification_status IN ('pending','verified','failed')), + verification_reason TEXT, + verification_checked_at TIMESTAMPTZ, + created_by UUID REFERENCES users(id) ON DELETE SET NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX IF NOT EXISTS idx_imagehub_storage_methods_owner + ON imagehub_storage_methods (owner_id); + +-- Allow a dataset file to be an external reference instead of a local blob. Existing rows keep +-- blob_sha256 set and the new columns NULL, so they satisfy the local-blob branch of the CHECK. +ALTER TABLE imagehub_dataset_files + ALTER COLUMN blob_sha256 DROP NOT NULL; + +ALTER TABLE imagehub_dataset_files + ADD COLUMN IF NOT EXISTS storage_method_id UUID REFERENCES imagehub_storage_methods(id) ON DELETE RESTRICT; + +ALTER TABLE imagehub_dataset_files + ADD COLUMN IF NOT EXISTS external_path TEXT; + +-- A file is EITHER a local content-addressed blob OR an external reference, never both or neither. +ALTER TABLE imagehub_dataset_files + DROP CONSTRAINT IF EXISTS ck_imagehub_file_storage_mode; + +ALTER TABLE imagehub_dataset_files + ADD CONSTRAINT ck_imagehub_file_storage_mode CHECK ( + (blob_sha256 IS NOT NULL AND storage_method_id IS NULL AND external_path IS NULL) + OR + (blob_sha256 IS NULL AND storage_method_id IS NOT NULL AND external_path IS NOT NULL) + ); + +CREATE INDEX IF NOT EXISTS idx_imagehub_dataset_files_storage_method + ON imagehub_dataset_files (storage_method_id); diff --git a/be0/migrations/020_imagehub_dataset_stages.sql b/be0/migrations/020_imagehub_dataset_stages.sql new file mode 100644 index 0000000..d295673 --- /dev/null +++ b/be0/migrations/020_imagehub_dataset_stages.sql @@ -0,0 +1,26 @@ +-- ImageHub: labeling-pipeline stages on a dataset (Label -> Review_1 -> Review_2 ...). Each stage +-- has a kind (label/review), an order (seq), an optional review_percent (review stages only), and +-- an auto_assign flag (the "Automatic Task Assignment" toggle). Idempotent (CREATE ... IF NOT +-- EXISTS) so the startup runner can apply it to volumes that predate it. Apply after 019 (no +-- semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/020_imagehub_dataset_stages.sql + +CREATE TABLE IF NOT EXISTS imagehub_dataset_stages ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + dataset_id UUID NOT NULL REFERENCES imagehub_datasets(id) ON DELETE CASCADE, + name TEXT NOT NULL, + kind TEXT NOT NULL DEFAULT 'label' CHECK (kind IN ('label','review')), + seq INTEGER NOT NULL DEFAULT 0, + review_percent INTEGER CHECK (review_percent IS NULL OR (review_percent >= 0 AND review_percent <= 100)), + auto_assign BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- Stages of a dataset, in pipeline order. +CREATE INDEX IF NOT EXISTS idx_imagehub_dataset_stages_dataset + ON imagehub_dataset_stages (dataset_id, seq); + +-- A stage name is unique within its dataset. +CREATE UNIQUE INDEX IF NOT EXISTS uq_imagehub_dataset_stages_name + ON imagehub_dataset_stages (dataset_id, name); diff --git a/be0/migrations/021_imagehub_task_pipeline.sql b/be0/migrations/021_imagehub_task_pipeline.sql new file mode 100644 index 0000000..7cba7ad --- /dev/null +++ b/be0/migrations/021_imagehub_task_pipeline.sql @@ -0,0 +1,37 @@ +-- ImageHub: per-file work TASKS that flow through a dataset's pipeline stages (single-user MVP). +-- A task is a NEW join row (one per dataset file) carrying its pipeline position (current_stage_id +-- + pipeline_state), per-user queue status, assignee, priority, and the Ground-Truth reference flag. +-- The file row itself (imagehub_dataset_files) stays a pure storage record. Membership / multi-labeler +-- assignment is a later phase, so for now task access reuses the dataset owner-or-admin gate. +-- Idempotent (CREATE ... IF NOT EXISTS) so the startup runner can apply it to volumes that predate it. +-- Apply after 020 (no semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/021_imagehub_task_pipeline.sql + +CREATE TABLE IF NOT EXISTS imagehub_tasks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + dataset_id UUID NOT NULL REFERENCES imagehub_datasets(id) ON DELETE CASCADE, + dataset_file_id UUID NOT NULL REFERENCES imagehub_dataset_files(id) ON DELETE CASCADE, + name TEXT NOT NULL DEFAULT '', + current_stage_id UUID REFERENCES imagehub_dataset_stages(id) ON DELETE SET NULL, + pipeline_state TEXT NOT NULL DEFAULT 'inLabel' CHECK (pipeline_state IN ('inLabel','inReview','groundTruth','issue')), + queue_status TEXT NOT NULL DEFAULT 'assigned' CHECK (queue_status IN ('assigned','saved','pendingFinalization','skipped')), + assignee_user_id UUID REFERENCES users(id) ON DELETE SET NULL, + assignment_mode TEXT NOT NULL DEFAULT 'auto' CHECK (assignment_mode IN ('auto','manual')), + priority DOUBLE PRECISION NOT NULL DEFAULT 0 CHECK (priority >= 0 AND priority <= 1), + is_reference_standard BOOLEAN NOT NULL DEFAULT FALSE, + skipped_seq BIGINT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- One task per file (MVP simplification — droppable later for multi-task-per-file). +CREATE UNIQUE INDEX IF NOT EXISTS uq_imagehub_tasks_file + ON imagehub_tasks (dataset_file_id); + +-- Queue scan: a dataset's tasks at a given stage and status, highest priority first. +CREATE INDEX IF NOT EXISTS idx_imagehub_tasks_queue + ON imagehub_tasks (dataset_id, current_stage_id, queue_status, priority DESC); + +-- A user's personal labeling queue across datasets. +CREATE INDEX IF NOT EXISTS idx_imagehub_tasks_assignee + ON imagehub_tasks (assignee_user_id, queue_status); diff --git a/be0/migrations/022_imagehub_task_annotations.sql b/be0/migrations/022_imagehub_task_annotations.sql new file mode 100644 index 0000000..e7bd10e --- /dev/null +++ b/be0/migrations/022_imagehub_task_annotations.sql @@ -0,0 +1,8 @@ +-- ImageHub: a task's labeler annotations (bbox / points / pen / brush / polygon) stored as JSON. +-- The shared viewer's annotation overlay emits normalized [0..1] vector geometry per slice — small +-- JSON, persisted on the task so the AnnotationTool can load + save a labeler's work. Idempotent +-- (ADD COLUMN IF NOT EXISTS) so the startup runner can apply it to volumes that predate it. Apply +-- after 021 (no semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/022_imagehub_task_annotations.sql + +ALTER TABLE imagehub_tasks ADD COLUMN IF NOT EXISTS annotations JSONB NOT NULL DEFAULT '[]'::jsonb; diff --git a/be0/migrations/023_imagehub_dataset_members.sql b/be0/migrations/023_imagehub_dataset_members.sql new file mode 100644 index 0000000..5badfa5 --- /dev/null +++ b/be0/migrations/023_imagehub_dataset_members.sql @@ -0,0 +1,23 @@ +-- ImageHub: dataset membership — lets users other than the owner work a dataset's tasks +-- (multi-labeler). MVP treats all members as labelers: they view the dataset and work tasks +-- assigned to them, while dataset / stage / settings management stays with the owner + platform +-- admins. The role column is reserved for a future project-admin tier. Idempotent. Apply after 022 +-- (no semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/023_imagehub_dataset_members.sql + +CREATE TABLE IF NOT EXISTS imagehub_dataset_members ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + dataset_id UUID NOT NULL REFERENCES imagehub_datasets(id) ON DELETE CASCADE, + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + role TEXT NOT NULL DEFAULT 'member' CHECK (role IN ('project_admin','member')), + added_by UUID REFERENCES users(id) ON DELETE SET NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- One membership per user per dataset. +CREATE UNIQUE INDEX IF NOT EXISTS uq_imagehub_dataset_members_user + ON imagehub_dataset_members (dataset_id, user_id); + +-- "Datasets I am a member of" lookup (the member's dataset list). +CREATE INDEX IF NOT EXISTS idx_imagehub_dataset_members_user + ON imagehub_dataset_members (user_id); diff --git a/be0/migrations/024_imagehub_dataset_project_link.sql b/be0/migrations/024_imagehub_dataset_project_link.sql new file mode 100644 index 0000000..c6c61b4 --- /dev/null +++ b/be0/migrations/024_imagehub_dataset_project_link.sql @@ -0,0 +1,13 @@ +-- ImageHub: link a dataset to a research project (the "workspace" superstructure). Nullable, +-- so existing datasets stay unlinked and a dataset can still exist standalone. A dataset created +-- from a project cockpit attaches to that project. ON DELETE SET NULL so deleting a project +-- orphans its datasets rather than dropping the imaging data. Idempotent. Apply after 023 +-- (no semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/024_imagehub_dataset_project_link.sql + +ALTER TABLE imagehub_datasets + ADD COLUMN IF NOT EXISTS research_project_id UUID REFERENCES research_projects(id) ON DELETE SET NULL; + +-- "Datasets in this project" lookup (the project-scoped dataset list). +CREATE INDEX IF NOT EXISTS idx_imagehub_datasets_research_project + ON imagehub_datasets (research_project_id); diff --git a/be0/migrations/025_imagehub_task_review_events.sql b/be0/migrations/025_imagehub_task_review_events.sql new file mode 100644 index 0000000..3bb91c9 --- /dev/null +++ b/be0/migrations/025_imagehub_task_review_events.sql @@ -0,0 +1,25 @@ +-- ImageHub: structured review decisions. The task pipeline applies accept/acceptWithCorrections/ +-- reject moves, but until now the verdict survived only as a free-text Vietnamese audit string — +-- not queryable, no reviewer/stage FK, no reject reason. This append-only table records every +-- review decision so review history + per-reviewer accept/reject counters become real. Idempotent. +-- Apply after 024 (no semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/025_imagehub_task_review_events.sql + +CREATE TABLE IF NOT EXISTS imagehub_task_review_events ( + id BIGSERIAL PRIMARY KEY, + dataset_id UUID NOT NULL REFERENCES imagehub_datasets(id) ON DELETE CASCADE, + task_id UUID NOT NULL REFERENCES imagehub_tasks(id) ON DELETE CASCADE, + stage_id UUID REFERENCES imagehub_dataset_stages(id) ON DELETE SET NULL, + reviewer_user_id UUID REFERENCES users(id) ON DELETE SET NULL, + decision TEXT NOT NULL CHECK (decision IN ('accept','acceptWithCorrections','reject')), + note TEXT NOT NULL DEFAULT '', + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- Per-reviewer counters over a date window (the productivity panel query). +CREATE INDEX IF NOT EXISTS idx_imagehub_review_events_reviewer + ON imagehub_task_review_events (dataset_id, reviewer_user_id, created_at); + +-- A task's review history (chronological). +CREATE INDEX IF NOT EXISTS idx_imagehub_review_events_task + ON imagehub_task_review_events (task_id, created_at); diff --git a/be0/migrations/026_imagehub_file_folder_path.sql b/be0/migrations/026_imagehub_file_folder_path.sql new file mode 100644 index 0000000..ab5cf80 --- /dev/null +++ b/be0/migrations/026_imagehub_file_folder_path.sql @@ -0,0 +1,21 @@ +-- ImageHub: persist the relative folder path of each uploaded file (Option B — real folders inside +-- a dataset). Until now logical_path was basename-flattened, so an uploaded directory structure +-- (e.g. the nnU-Net imagesTr/labelsTr layout) was lost once files reached MinIO. folder_path keeps +-- the relative directory so the dataset browser can render a real folder tree and the structure +-- round-trips. The working-file natural key moves from (dataset_id, logical_path) to +-- (dataset_id, folder_path, logical_path) so two files sharing a basename in different folders no +-- longer collide and silently merge. Existing rows default folder_path to the empty string, so the +-- new key stays unique wherever the old one was. Idempotent. +-- Apply after 025 (no semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/026_imagehub_file_folder_path.sql + +ALTER TABLE imagehub_dataset_files + ADD COLUMN IF NOT EXISTS folder_path TEXT NOT NULL DEFAULT ''; + +DROP INDEX IF EXISTS uq_imagehub_dataset_files_path; + +CREATE UNIQUE INDEX IF NOT EXISTS uq_imagehub_dataset_files_folder_path + ON imagehub_dataset_files (dataset_id, folder_path, logical_path); + +CREATE INDEX IF NOT EXISTS idx_imagehub_dataset_files_folder + ON imagehub_dataset_files (dataset_id, folder_path); diff --git a/be0/migrations/027_imagehub_dataset_label_map.sql b/be0/migrations/027_imagehub_dataset_label_map.sql new file mode 100644 index 0000000..b671b8b --- /dev/null +++ b/be0/migrations/027_imagehub_dataset_label_map.sql @@ -0,0 +1,12 @@ +-- ImageHub: per-dataset value to name label map for multi-label segmentation masks. A multi-label +-- labelsTr/.nii.gz encodes each organ or structure as an integer voxel value (1, 2, 3 …). Until +-- now the viewer named those values from a fixed TotalSegmentator-v2 117-class map, so a non +-- TotalSegmentator dataset (KiTS = 1 kidney / 2 tumor / 3 cyst, or any custom nnU-Net labels) showed +-- confidently-wrong organ names. label_map stores the dataset own value to name mapping (a JSON object +-- with string keys), so the organ panel labels each overlay correctly and a user can edit them. The +-- empty default keeps the TotalSegmentator fallback for datasets without a map. Idempotent. +-- Apply after 026 (no semicolons inside comments or string literals — the runner splitter is naive): +-- docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/027_imagehub_dataset_label_map.sql + +ALTER TABLE imagehub_datasets + ADD COLUMN IF NOT EXISTS label_map JSONB NOT NULL DEFAULT '{}'::jsonb; diff --git a/be0/requirements-dev.txt b/be0/requirements-dev.txt new file mode 100644 index 0000000..60d3c00 --- /dev/null +++ b/be0/requirements-dev.txt @@ -0,0 +1,6 @@ +# Test-only dependencies for CI (not installed in the runtime image). +# be0 tests are a mix of unittest.TestCase (incl. IsolatedAsyncioTestCase) and +# pytest-style; pytest runs both. pytest-asyncio covers the pytest async tests. +-r requirements.txt +pytest>=8,<9 +pytest-asyncio>=0.23,<0.24 diff --git a/be0/requirements.txt b/be0/requirements.txt new file mode 100644 index 0000000..63a66a3 --- /dev/null +++ b/be0/requirements.txt @@ -0,0 +1,43 @@ +uvicorn[standard] +httpx +sqlalchemy[asyncio]>=2.0 +asyncpg>=0.29 +greenlet>=3.0 +argon2-cffi>=23.1.0 +PyJWT>=2.8.0 +ollama +fastapi +asyncio +python-multipart + +langchain +langchain-core +langgraph + +langchain-community +sentence-transformers +huggingface +scikit-learn + +neo4j + +nltk +rake-nltk +pypdf +pydantic +pydantic-settings +aioboto3 +zipstream-ng +boto3 +numpy +pandas + +pyvi +docling +pymupdf +docxtpl>=0.16 +openpyxl>=3.1.0 + +# ImageHub: best-effort imaging metadata sniff (DICOM / NIfTI). See src/imagehub_routes.py. +pydicom +nibabel \ No newline at end of file diff --git a/be0/scripts/__pycache__/apply_initiative_migrations.cpython-313.pyc b/be0/scripts/__pycache__/apply_initiative_migrations.cpython-313.pyc new file mode 100644 index 0000000..3be66ad Binary files /dev/null and b/be0/scripts/__pycache__/apply_initiative_migrations.cpython-313.pyc differ diff --git a/be0/scripts/__pycache__/repair_split_submission.cpython-313.pyc b/be0/scripts/__pycache__/repair_split_submission.cpython-313.pyc new file mode 100644 index 0000000..64eb3f4 Binary files /dev/null and b/be0/scripts/__pycache__/repair_split_submission.cpython-313.pyc differ diff --git a/be0/scripts/add_ump_ideas.py b/be0/scripts/add_ump_ideas.py new file mode 100644 index 0000000..afd9964 --- /dev/null +++ b/be0/scripts/add_ump_ideas.py @@ -0,0 +1,93 @@ +""" +Script to add the 10 UMP innovation ideas to the vector database +""" +import asyncio +import sys +from pathlib import Path + +# Add parent directory to path +sys.path.insert(0, str(Path(__file__).parent.parent)) + +from src.infrastructure.vector_db.qdrant_service import get_qdrant_service + +UMP_IDEAS = [ + { + "title": "Nền tảng Trợ lý AI học tập lâm sàng (Clinical AI Tutor)", + "description": "Ứng dụng AI đóng vai trò trợ giảng cho sinh viên y, hỗ trợ phân tích ca bệnh giả lập, giải thích cận lâm sàng, và gợi ý chẩn đoán theo phác đồ Việt Nam.", + "category": "Giáo dục - AI" + }, + { + "title": "Hệ thống bệnh án điện tử học thuật (Academic EMR Sandbox)", + "description": "Môi trường EMR mô phỏng cho đào tạo và nghiên cứu, cho phép sinh viên và giảng viên thực hành nhập – phân tích – khai thác dữ liệu y khoa mà không ảnh hưởng dữ liệu bệnh nhân thật.", + "category": "Giáo dục - Chuyển đổi số" + }, + { + "title": "Trung tâm mô phỏng y khoa bằng AR/VR & Digital Twin", + "description": "Xây dựng phòng lab mô phỏng phẫu thuật, cấp cứu, và quy trình điều trị bằng AR/VR, kết hợp mô hình \"digital twin\" của cơ thể người phục vụ đào tạo nâng cao.", + "category": "Giáo dục - AR/VR" + }, + { + "title": "Chương trình Y tế cộng đồng số cho vùng sâu vùng xa", + "description": "Kết hợp telehealth, trợ lý ảo y tế (agentic care) và AI sàng lọc sớm bệnh không lây (NCD) cho người dân vùng nông thôn, miền núi và hải đảo.", + "category": "Tác động xã hội - Telehealth" + }, + { + "title": "Nền tảng nghiên cứu AI y sinh dùng chung (UMP AI Research Hub)", + "description": "Cung cấp hạ tầng GPU, kho dữ liệu y khoa ẩn danh, và công cụ phân tích AI cho giảng viên – nghiên cứu sinh – startup hợp tác nghiên cứu.", + "category": "Nghiên cứu - AI" + }, + { + "title": "Hệ thống theo dõi và dự báo sức khỏe sinh viên & nhân viên y tế", + "description": "Ứng dụng phân tích dữ liệu và AI để phát hiện sớm stress, burnout, và vấn đề sức khỏe tâm thần trong cộng đồng sinh viên và nhân viên y tế.", + "category": "Tác động xã hội - Sức khỏe" + }, + { + "title": "Vườn ươm khởi nghiệp công nghệ y sinh (MedTech Incubator)", + "description": "Hỗ trợ sinh viên, bác sĩ và giảng viên phát triển startup MedTech, HealthTech, AI y tế thông qua mentoring, quỹ seed và kết nối bệnh viện – doanh nghiệp.", + "category": "Khởi nghiệp - MedTech" + }, + { + "title": "Hệ thống quản lý chất lượng đào tạo và kiểm định số", + "description": "Số hóa toàn bộ quy trình đảm bảo chất lượng nội bộ (IQA), đánh giá chương trình đào tạo, và chuẩn hóa theo tiêu chuẩn quốc tế (WFME, AUN-QA).", + "category": "Giáo dục - Quản lý chất lượng" + }, + { + "title": "Nền tảng dữ liệu lớn phòng chống dịch và bệnh không lây", + "description": "Phân tích dữ liệu dịch tễ, môi trường, và hành vi để dự báo dịch bệnh, hỗ trợ Sở Y tế và Bộ Y tế trong ra quyết định chính sách.", + "category": "Nghiên cứu - Dịch tễ học" + }, + { + "title": "Học viện Y học chính xác & Y học cá thể hóa", + "description": "Kết hợp dữ liệu gen, hình ảnh y khoa, lối sống và AI để nghiên cứu và ứng dụng điều trị cá thể hóa cho bệnh ung thư, tim mạch và bệnh mạn tính.", + "category": "Nghiên cứu - Y học chính xác" + } +] + +async def main(): + """Add all UMP ideas to the database""" + print("Initializing Qdrant service...") + qdrant_service = get_qdrant_service() + + print("Initializing collection...") + await qdrant_service.initialize_collection() + + print(f"Adding {len(UMP_IDEAS)} ideas to the database...") + results = [] + for i, idea in enumerate(UMP_IDEAS, 1): + try: + print(f"Adding idea {i}/{len(UMP_IDEAS)}: {idea['title']}") + result = await qdrant_service.add_idea( + title=idea['title'], + description=idea['description'], + category=idea['category'] + ) + results.append(result) + print(f"✓ Added: {result['id']}") + except Exception as e: + print(f"✗ Error adding idea {i}: {e}") + + print(f"\n✓ Successfully added {len(results)}/{len(UMP_IDEAS)} ideas") + return results + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/be0/scripts/apply-migration-007.sh b/be0/scripts/apply-migration-007.sh new file mode 100755 index 0000000..8a8811d --- /dev/null +++ b/be0/scripts/apply-migration-007.sh @@ -0,0 +1,86 @@ +#!/usr/bin/env bash +# Apply migration 007 (user_roles.admin_from_email_policy) to an EXISTING Postgres. +# initdb scripts in docker-entrypoint-initdb.d run only on first volume creation. +# +# Default (full SQL file): adds column, runs one-time policy DELETE/UPDATE (see +# be0/migrations/007_user_roles_email_policy_admin.sql before running on prod). +# +# Usage (from anywhere): +# ./be0/scripts/apply-migration-007.sh +# ./be0/scripts/apply-migration-007.sh --schema-only # only ADD COLUMN (safest repeat) +# +# On a remote host (SSH to be0/docker host, repo or copy of migrations present): +# export POSTGRES_CONTAINER=initiative-postgres POSTGRES_USER=initiative POSTGRES_DB=initiatives +# ./be0/scripts/apply-migration-007.sh +# +# From repo root (wrapper): +# ./scripts/apply-migration-007-postgres.sh +set -euo pipefail + +SCHEMA_ONLY=0 +for arg in "$@"; do + case "$arg" in + --schema-only) SCHEMA_ONLY=1 ;; + -h|--help) + sed -n '2,20p' "$0" + exit 0 + ;; + esac +done + +BE0_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +SQL_FULL="$BE0_ROOT/migrations/007_user_roles_email_policy_admin.sql" +CONTAINER="${POSTGRES_CONTAINER:-initiative-postgres}" +PGUSER="${POSTGRES_USER:-initiative}" +PGDATABASE="${POSTGRES_DB:-initiatives}" + +if ! docker info >/dev/null 2>&1; then + echo "error: Docker is not reachable (is the daemon running?)" >&2 + exit 1 +fi +if ! docker inspect "$CONTAINER" >/dev/null 2>&1; then + echo "error: container not found: $CONTAINER (set POSTGRES_CONTAINER)" >&2 + exit 1 +fi +if [[ "$(docker inspect -f '{{.State.Running}}' "$CONTAINER" 2>/dev/null || echo false)" != "true" ]]; then + echo "error: container is not running: $CONTAINER" >&2 + exit 1 +fi + +apply_schema_only() { + docker exec -i "$CONTAINER" psql -U "$PGUSER" -d "$PGDATABASE" -v ON_ERROR_STOP=1 <<'SQL' +ALTER TABLE user_roles ADD COLUMN IF NOT EXISTS admin_from_email_policy BOOLEAN NOT NULL DEFAULT FALSE; + +COMMENT ON COLUMN user_roles.admin_from_email_policy IS + 'TRUE when admin was granted by email allow-list (AUTH_ADMIN_EMAILS). Reconciliation may DELETE this row if the user email is no longer in the list. FALSE preserves manually granted admin (future / exceptional).'; +SQL +} + +apply_full() { + if [[ ! -f "$SQL_FULL" ]]; then + echo "error: missing migration file: $SQL_FULL" >&2 + exit 1 + fi + docker exec -i "$CONTAINER" psql -U "$PGUSER" -d "$PGDATABASE" -v ON_ERROR_STOP=1 <"$SQL_FULL" +} + +verify_column() { + local out + out="$(docker exec "$CONTAINER" psql -U "$PGUSER" -d "$PGDATABASE" -tAc \ + "SELECT 1 FROM information_schema.columns WHERE table_schema = 'public' AND table_name = 'user_roles' AND column_name = 'admin_from_email_policy'")" + if [[ "${out//$'\r'/}" != "1" ]]; then + echo "error: verification failed: column admin_from_email_policy missing on public.user_roles" >&2 + exit 1 + fi +} + +if (( SCHEMA_ONLY )); then + echo "Applying schema only (ADD COLUMN + COMMENT) → $CONTAINER / $PGDATABASE" + apply_schema_only +else + echo "Applying full 007_user_roles_email_policy_admin.sql → $CONTAINER / $PGDATABASE" + apply_full +fi + +verify_column +echo "ok: user_roles.admin_from_email_policy is present; admin register/login should work with current be0." diff --git a/be0/scripts/apply_initiative_migrations.py b/be0/scripts/apply_initiative_migrations.py new file mode 100644 index 0000000..7529823 --- /dev/null +++ b/be0/scripts/apply_initiative_migrations.py @@ -0,0 +1,533 @@ +""" +Apply idempotent SQL fixes when the DB volume predates newer migrations. + +- ``008_audit_events.sql`` when ``audit_events`` is missing (older volumes never + ran ``docker-entrypoint-initdb.d`` for new files). +- ``009_backup_artifact_roles_storage_kind.sql`` when ``storage_kind`` is missing. +- ``010_user_staff_profiles.sql`` + ``011_academic_titles_vn.sql`` when + ``academic_titles`` is missing (staff profile / register flow). +- ``013_email_verification.sql`` when ``email_verification_tokens`` is missing. +- ``014_registration_otp.sql`` when ``registration_otp_codes`` is missing. + +Run automatically from entrypoint when ``INITIATIVE_DATABASE_URL`` is set. +Standalone: + + INITIATIVE_DATABASE_URL=postgresql+asyncpg://user:pass@host:5432/dbname \\ + python scripts/apply_initiative_migrations.py +""" + +from __future__ import annotations + +import asyncio +import os +import sys +from pathlib import Path + + +def _async_url_to_asyncpg_dsn(url: str) -> str: + u = url.strip() + if "+asyncpg" in u: + u = u.replace("postgresql+asyncpg://", "postgresql://", 1) + return u + + +def _strip_sql_comments(text: str) -> str: + lines: list[str] = [] + for line in text.splitlines(): + s = line.strip() + if s.startswith("--"): + continue + lines.append(line) + return "\n".join(lines) + + +def _split_sql_statements(text: str) -> list[str]: + """Split on semicolons outside ``$$`` dollar-quoted blocks (008 uses ``DO $$``).""" + statements: list[str] = [] + buf: list[str] = [] + i = 0 + n = len(text) + in_dollar = False + while i < n: + if text.startswith("$$", i): + in_dollar = not in_dollar + buf.append("$$") + i += 2 + continue + ch = text[i] + if ch == ";" and not in_dollar: + stmt = "".join(buf).strip() + if stmt: + statements.append(stmt) + buf = [] + i += 1 + continue + buf.append(ch) + i += 1 + tail = "".join(buf).strip() + if tail: + statements.append(tail) + return statements + + +async def _needs_audit_events_migration(conn) -> bool: + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'audit_events' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_backup_migration(conn) -> bool: + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.columns + WHERE table_schema = 'public' + AND table_name = 'application_artifacts' + AND column_name = 'storage_kind' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_staff_profiles_migration(conn) -> bool: + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'academic_titles' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_email_verification_migration(conn) -> bool: + """True when verification tokens table is missing (013 also adds users.email_verified).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'email_verification_tokens' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_registration_otp_migration(conn) -> bool: + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'registration_otp_codes' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_document_templates_migration(conn) -> bool: + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'document_templates' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_research_projects_migration(conn) -> bool: + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'research_projects' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_datasets_migration(conn) -> bool: + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'imagehub_datasets' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_segmentation_columns_migration(conn) -> bool: + """True when imagehub_dataset_files lacks the segmentation-link columns (018).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.columns + WHERE table_schema = 'public' + AND table_name = 'imagehub_dataset_files' + AND column_name = 'file_kind' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_cloud_import_migration(conn) -> bool: + """True when the cloud-import storage_methods table is absent (019).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'imagehub_storage_methods' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_stages_migration(conn) -> bool: + """True when the dataset-stages table is absent (020).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'imagehub_dataset_stages' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_tasks_migration(conn) -> bool: + """True when the per-file task-pipeline table is absent (021).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'imagehub_tasks' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_task_annotations_migration(conn) -> bool: + """True when imagehub_tasks lacks the annotations column (022).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.columns + WHERE table_schema = 'public' + AND table_name = 'imagehub_tasks' + AND column_name = 'annotations' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_members_migration(conn) -> bool: + """True when the dataset-membership table is absent (023).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'imagehub_dataset_members' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_dataset_project_link_migration(conn) -> bool: + """True when imagehub_datasets.research_project_id is absent (024).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.columns + WHERE table_schema = 'public' + AND table_name = 'imagehub_datasets' + AND column_name = 'research_project_id' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_review_events_migration(conn) -> bool: + """True when the task-review-events table is absent (025).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'imagehub_task_review_events' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_folder_path_migration(conn) -> bool: + """True when imagehub_dataset_files.folder_path is absent (026).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.columns + WHERE table_schema = 'public' + AND table_name = 'imagehub_dataset_files' + AND column_name = 'folder_path' + LIMIT 1 + """ + ) + return row is None + + +async def _needs_imagehub_label_map_migration(conn) -> bool: + """True when imagehub_datasets.label_map is absent (027).""" + row = await conn.fetchrow( + """ + SELECT 1 + FROM information_schema.columns + WHERE table_schema = 'public' + AND table_name = 'imagehub_datasets' + AND column_name = 'label_map' + LIMIT 1 + """ + ) + return row is None + + +async def _apply_sql_file(conn, path: Path, label: str) -> None: + body = _strip_sql_comments(path.read_text(encoding="utf-8")) + for stmt in _split_sql_statements(body): + await conn.execute(stmt) + print(f"apply_initiative_migrations: {label} applied.") + + +async def main() -> int: + raw_url = (os.environ.get("INITIATIVE_DATABASE_URL") or "").strip() + if not raw_url.lower().startswith("postgresql"): + print("apply_initiative_migrations: no PostgreSQL URL; skipping.", file=sys.stderr) + return 0 + + root = Path(__file__).resolve().parent.parent + m008 = root / "migrations" / "008_audit_events.sql" + m009 = root / "migrations" / "009_backup_artifact_roles_storage_kind.sql" + m010 = root / "migrations" / "010_user_staff_profiles.sql" + m011 = root / "migrations" / "011_academic_titles_vn.sql" + for p in (m008, m009, m010, m011): + if not p.is_file(): + print(f"apply_initiative_migrations: missing {p}", file=sys.stderr) + return 1 + + import asyncpg + + dsn = _async_url_to_asyncpg_dsn(raw_url) + conn = await asyncpg.connect(dsn, timeout=60) + try: + if await _needs_audit_events_migration(conn): + print("apply_initiative_migrations: applying 008_audit_events …") + await _apply_sql_file(conn, m008, "008_audit_events") + else: + print("apply_initiative_migrations: audit_events present; OK.") + + if await _needs_backup_migration(conn): + print("apply_initiative_migrations: applying 009_backup_artifact_roles_storage_kind …") + await _apply_sql_file(conn, m009, "009_backup_artifact_roles_storage_kind") + else: + print("apply_initiative_migrations: application_artifacts.storage_kind present; OK.") + + if await _needs_staff_profiles_migration(conn): + print("apply_initiative_migrations: applying 010_user_staff_profiles …") + await _apply_sql_file(conn, m010, "010_user_staff_profiles") + print("apply_initiative_migrations: applying 011_academic_titles_vn …") + await _apply_sql_file(conn, m011, "011_academic_titles_vn") + else: + print("apply_initiative_migrations: academic_titles present; OK.") + + m013 = root / "migrations" / "013_email_verification.sql" + if not m013.is_file(): + print(f"apply_initiative_migrations: missing {m013}", file=sys.stderr) + return 1 + if await _needs_email_verification_migration(conn): + print("apply_initiative_migrations: applying 013_email_verification …") + await _apply_sql_file(conn, m013, "013_email_verification") + else: + print("apply_initiative_migrations: email_verification_tokens present; OK.") + + m014 = root / "migrations" / "014_registration_otp.sql" + if not m014.is_file(): + print(f"apply_initiative_migrations: missing {m014}", file=sys.stderr) + return 1 + if await _needs_registration_otp_migration(conn): + print("apply_initiative_migrations: applying 014_registration_otp …") + await _apply_sql_file(conn, m014, "014_registration_otp") + else: + print("apply_initiative_migrations: registration_otp_codes present; OK.") + + m015 = root / "migrations" / "015_document_templates.sql" + if not m015.is_file(): + print(f"apply_initiative_migrations: missing {m015}", file=sys.stderr) + return 1 + if await _needs_document_templates_migration(conn): + print("apply_initiative_migrations: applying 015_document_templates …") + await _apply_sql_file(conn, m015, "015_document_templates") + else: + print("apply_initiative_migrations: document_templates present; OK.") + + m016 = root / "migrations" / "016_research_projects.sql" + if not m016.is_file(): + print(f"apply_initiative_migrations: missing {m016}", file=sys.stderr) + return 1 + if await _needs_research_projects_migration(conn): + print("apply_initiative_migrations: applying 016_research_projects …") + await _apply_sql_file(conn, m016, "016_research_projects") + else: + print("apply_initiative_migrations: research_projects present; OK.") + + m017 = root / "migrations" / "017_imagehub_datasets.sql" + if not m017.is_file(): + print(f"apply_initiative_migrations: missing {m017}", file=sys.stderr) + return 1 + if await _needs_imagehub_datasets_migration(conn): + print("apply_initiative_migrations: applying 017_imagehub_datasets …") + await _apply_sql_file(conn, m017, "017_imagehub_datasets") + else: + print("apply_initiative_migrations: imagehub_datasets present; OK.") + + m018 = root / "migrations" / "018_imagehub_segmentation_links.sql" + if not m018.is_file(): + print(f"apply_initiative_migrations: missing {m018}", file=sys.stderr) + return 1 + if await _needs_imagehub_segmentation_columns_migration(conn): + print("apply_initiative_migrations: applying 018_imagehub_segmentation_links …") + await _apply_sql_file(conn, m018, "018_imagehub_segmentation_links") + else: + print("apply_initiative_migrations: imagehub_dataset_files.file_kind present; OK.") + + m019 = root / "migrations" / "019_imagehub_cloud_import.sql" + if not m019.is_file(): + print(f"apply_initiative_migrations: missing {m019}", file=sys.stderr) + return 1 + if await _needs_cloud_import_migration(conn): + print("apply_initiative_migrations: applying 019_imagehub_cloud_import …") + await _apply_sql_file(conn, m019, "019_imagehub_cloud_import") + else: + print("apply_initiative_migrations: imagehub_storage_methods present; OK.") + + m020 = root / "migrations" / "020_imagehub_dataset_stages.sql" + if not m020.is_file(): + print(f"apply_initiative_migrations: missing {m020}", file=sys.stderr) + return 1 + if await _needs_imagehub_stages_migration(conn): + print("apply_initiative_migrations: applying 020_imagehub_dataset_stages …") + await _apply_sql_file(conn, m020, "020_imagehub_dataset_stages") + else: + print("apply_initiative_migrations: imagehub_dataset_stages present; OK.") + + m021 = root / "migrations" / "021_imagehub_task_pipeline.sql" + if not m021.is_file(): + print(f"apply_initiative_migrations: missing {m021}", file=sys.stderr) + return 1 + if await _needs_imagehub_tasks_migration(conn): + print("apply_initiative_migrations: applying 021_imagehub_task_pipeline …") + await _apply_sql_file(conn, m021, "021_imagehub_task_pipeline") + else: + print("apply_initiative_migrations: imagehub_tasks present; OK.") + + m022 = root / "migrations" / "022_imagehub_task_annotations.sql" + if not m022.is_file(): + print(f"apply_initiative_migrations: missing {m022}", file=sys.stderr) + return 1 + if await _needs_imagehub_task_annotations_migration(conn): + print("apply_initiative_migrations: applying 022_imagehub_task_annotations …") + await _apply_sql_file(conn, m022, "022_imagehub_task_annotations") + else: + print("apply_initiative_migrations: imagehub_tasks.annotations present; OK.") + + m023 = root / "migrations" / "023_imagehub_dataset_members.sql" + if not m023.is_file(): + print(f"apply_initiative_migrations: missing {m023}", file=sys.stderr) + return 1 + if await _needs_imagehub_members_migration(conn): + print("apply_initiative_migrations: applying 023_imagehub_dataset_members …") + await _apply_sql_file(conn, m023, "023_imagehub_dataset_members") + else: + print("apply_initiative_migrations: imagehub_dataset_members present; OK.") + + m024 = root / "migrations" / "024_imagehub_dataset_project_link.sql" + if not m024.is_file(): + print(f"apply_initiative_migrations: missing {m024}", file=sys.stderr) + return 1 + if await _needs_imagehub_dataset_project_link_migration(conn): + print("apply_initiative_migrations: applying 024_imagehub_dataset_project_link …") + await _apply_sql_file(conn, m024, "024_imagehub_dataset_project_link") + else: + print("apply_initiative_migrations: imagehub_datasets.research_project_id present; OK.") + + m025 = root / "migrations" / "025_imagehub_task_review_events.sql" + if not m025.is_file(): + print(f"apply_initiative_migrations: missing {m025}", file=sys.stderr) + return 1 + if await _needs_imagehub_review_events_migration(conn): + print("apply_initiative_migrations: applying 025_imagehub_task_review_events …") + await _apply_sql_file(conn, m025, "025_imagehub_task_review_events") + else: + print("apply_initiative_migrations: imagehub_task_review_events present; OK.") + + m026 = root / "migrations" / "026_imagehub_file_folder_path.sql" + if not m026.is_file(): + print(f"apply_initiative_migrations: missing {m026}", file=sys.stderr) + return 1 + if await _needs_imagehub_folder_path_migration(conn): + print("apply_initiative_migrations: applying 026_imagehub_file_folder_path …") + await _apply_sql_file(conn, m026, "026_imagehub_file_folder_path") + else: + print("apply_initiative_migrations: imagehub_dataset_files.folder_path present; OK.") + + m027 = root / "migrations" / "027_imagehub_dataset_label_map.sql" + if not m027.is_file(): + print(f"apply_initiative_migrations: missing {m027}", file=sys.stderr) + return 1 + if await _needs_imagehub_label_map_migration(conn): + print("apply_initiative_migrations: applying 027_imagehub_dataset_label_map …") + await _apply_sql_file(conn, m027, "027_imagehub_dataset_label_map") + else: + print("apply_initiative_migrations: imagehub_datasets.label_map present; OK.") + + return 0 + except Exception as exc: + print(f"apply_initiative_migrations: FAILED: {exc}", file=sys.stderr) + if os.environ.get("INITIATIVE_DB_STRICT_MIGRATE", "").strip().lower() in ("1", "true", "yes"): + return 1 + return 0 + finally: + await conn.close() + + +if __name__ == "__main__": + raise SystemExit(asyncio.run(main())) diff --git a/be0/scripts/repair_split_submission.py b/be0/scripts/repair_split_submission.py new file mode 100644 index 0000000..f0bf943 --- /dev/null +++ b/be0/scripts/repair_split_submission.py @@ -0,0 +1,90 @@ +#!/usr/bin/env python3 +""" +CLI: merge a mis-linked submission onto the real CASE-* initiative row and delete the orphan initiative. + +Usage (dry-run — default, no writes): + + cd be0 + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://user:pass@host:5432/initiatives" + python scripts/repair_split_submission.py --submission-id sub-d560fbb6f2944ec6 + +Apply (commits one transaction): + + python scripts/repair_split_submission.py --submission-id sub-... --good-case CASE-YOURCODE --execute + +Requires the same Postgres URL as the API (`INITIATIVE_DATABASE_URL` / `DATABASE_URL`). +""" +from __future__ import annotations + +import argparse +import asyncio +import os +import sys + +SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__)) +ROOT = os.path.abspath(os.path.join(SCRIPT_DIR, "..")) +if ROOT not in sys.path: + sys.path.insert(0, ROOT) + + +async def _main_async() -> int: + p = argparse.ArgumentParser(description="Repair split submission / wrong initiative linkage.") + p.add_argument( + "--submission-id", + required=True, + help="submissionRecord.id (e.g. sub-d560fbb6f2944ec6)", + ) + p.add_argument( + "--good-case", + dest="good_case", + default=None, + help="Explicit CASE-* code for the autosave row (recommended if owner has multiple drafts)", + ) + p.add_argument( + "--execute", + action="store_true", + help="Apply changes (otherwise dry-run only)", + ) + args = p.parse_args() + + os.environ.setdefault("INITIATIVE_DATABASE_URL", os.getenv("DATABASE_URL") or "") + from src.initiative_db.engine import get_session, init_engine, is_postgres_enabled + from src.initiative_db.repair_split_submission import repair_submission_cross_initiative_merge + + if not is_postgres_enabled(): + print("Error: set INITIATIVE_DATABASE_URL=postgresql+asyncpg://.../initiatives", file=sys.stderr) + return 2 + + await init_engine() + + async with get_session() as session: + report = await repair_submission_cross_initiative_merge( + session, + submission_record_id=args.submission_id.strip(), + good_case_code_explicit=(args.good_case or "").strip() or None, + dry_run=not args.execute, + ) + + lines = [ + f"dry_run={report.dry_run}", + f"submission_record_id={report.submission_record_id}", + f"owner_id={report.owner_id or '(n/a)'}", + f"bad_case={report.bad_case_code or '(n/a)'}", + f"good_case={report.good_case_code or '(n/a)'}", + ] + if report.skipped: + lines.append(f"SKIPPED: {report.skipped}") + lines.extend(report.actions) + print("\n".join(lines)) + + if args.execute and report.skipped: + return 3 + return 0 + + +def main() -> None: + raise SystemExit(asyncio.run(_main_async())) + + +if __name__ == "__main__": + main() diff --git a/be0/src/__init__.py b/be0/src/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/__pycache__/MCP.cpython-311.pyc b/be0/src/__pycache__/MCP.cpython-311.pyc new file mode 100644 index 0000000..5e92aba Binary files /dev/null and b/be0/src/__pycache__/MCP.cpython-311.pyc differ diff --git a/be0/src/__pycache__/Memory_Manager.cpython-311.pyc b/be0/src/__pycache__/Memory_Manager.cpython-311.pyc new file mode 100644 index 0000000..b1fe4d8 Binary files /dev/null and b/be0/src/__pycache__/Memory_Manager.cpython-311.pyc differ diff --git a/be0/src/__pycache__/Memory_Manager.cpython-38.pyc b/be0/src/__pycache__/Memory_Manager.cpython-38.pyc new file mode 100644 index 0000000..5ed2184 Binary files /dev/null and b/be0/src/__pycache__/Memory_Manager.cpython-38.pyc differ diff --git a/be0/src/__pycache__/Pdf_Manager.cpython-311.pyc b/be0/src/__pycache__/Pdf_Manager.cpython-311.pyc new file mode 100644 index 0000000..c43de93 Binary files /dev/null and b/be0/src/__pycache__/Pdf_Manager.cpython-311.pyc differ diff --git a/be0/src/__pycache__/Response_Manager.cpython-311.pyc b/be0/src/__pycache__/Response_Manager.cpython-311.pyc new file mode 100644 index 0000000..1a471a5 Binary files /dev/null and b/be0/src/__pycache__/Response_Manager.cpython-311.pyc differ diff --git a/be0/src/__pycache__/__init__.cpython-311.pyc b/be0/src/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000..d61ed95 Binary files /dev/null and b/be0/src/__pycache__/__init__.cpython-311.pyc differ diff --git a/be0/src/__pycache__/__init__.cpython-313.pyc b/be0/src/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..fef79e2 Binary files /dev/null and b/be0/src/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/__pycache__/__init__.cpython-38.pyc b/be0/src/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..53fadf0 Binary files /dev/null and b/be0/src/__pycache__/__init__.cpython-38.pyc differ diff --git a/be0/src/__pycache__/admin_audit_routes.cpython-311.pyc b/be0/src/__pycache__/admin_audit_routes.cpython-311.pyc new file mode 100644 index 0000000..5bc8eb4 Binary files /dev/null and b/be0/src/__pycache__/admin_audit_routes.cpython-311.pyc differ diff --git a/be0/src/__pycache__/admin_audit_routes.cpython-313.pyc b/be0/src/__pycache__/admin_audit_routes.cpython-313.pyc new file mode 100644 index 0000000..12c3548 Binary files /dev/null and b/be0/src/__pycache__/admin_audit_routes.cpython-313.pyc differ diff --git a/be0/src/__pycache__/admin_user_profile_routes.cpython-311.pyc b/be0/src/__pycache__/admin_user_profile_routes.cpython-311.pyc new file mode 100644 index 0000000..8e8d204 Binary files /dev/null and b/be0/src/__pycache__/admin_user_profile_routes.cpython-311.pyc differ diff --git a/be0/src/__pycache__/admin_user_profile_routes.cpython-313.pyc b/be0/src/__pycache__/admin_user_profile_routes.cpython-313.pyc new file mode 100644 index 0000000..9bb7f71 Binary files /dev/null and b/be0/src/__pycache__/admin_user_profile_routes.cpython-313.pyc differ diff --git a/be0/src/__pycache__/audit.cpython-311.pyc b/be0/src/__pycache__/audit.cpython-311.pyc new file mode 100644 index 0000000..704db57 Binary files /dev/null and b/be0/src/__pycache__/audit.cpython-311.pyc differ diff --git a/be0/src/__pycache__/audit.cpython-313.pyc b/be0/src/__pycache__/audit.cpython-313.pyc new file mode 100644 index 0000000..0da919a Binary files /dev/null and b/be0/src/__pycache__/audit.cpython-313.pyc differ diff --git a/be0/src/__pycache__/auth_api.cpython-311.pyc b/be0/src/__pycache__/auth_api.cpython-311.pyc new file mode 100644 index 0000000..2d8866d Binary files /dev/null and b/be0/src/__pycache__/auth_api.cpython-311.pyc differ diff --git a/be0/src/__pycache__/auth_api.cpython-313.pyc b/be0/src/__pycache__/auth_api.cpython-313.pyc new file mode 100644 index 0000000..27a8e94 Binary files /dev/null and b/be0/src/__pycache__/auth_api.cpython-313.pyc differ diff --git a/be0/src/__pycache__/auth_credential_middleware.cpython-311.pyc b/be0/src/__pycache__/auth_credential_middleware.cpython-311.pyc new file mode 100644 index 0000000..0a0ee6e Binary files /dev/null and b/be0/src/__pycache__/auth_credential_middleware.cpython-311.pyc differ diff --git a/be0/src/__pycache__/auth_credential_middleware.cpython-313.pyc b/be0/src/__pycache__/auth_credential_middleware.cpython-313.pyc new file mode 100644 index 0000000..b74153b Binary files /dev/null and b/be0/src/__pycache__/auth_credential_middleware.cpython-313.pyc differ diff --git a/be0/src/__pycache__/auth_jwt.cpython-311.pyc b/be0/src/__pycache__/auth_jwt.cpython-311.pyc new file mode 100644 index 0000000..7a622f4 Binary files /dev/null and b/be0/src/__pycache__/auth_jwt.cpython-311.pyc differ diff --git a/be0/src/__pycache__/auth_jwt.cpython-313.pyc b/be0/src/__pycache__/auth_jwt.cpython-313.pyc new file mode 100644 index 0000000..1d6bc31 Binary files /dev/null and b/be0/src/__pycache__/auth_jwt.cpython-313.pyc differ diff --git a/be0/src/__pycache__/auth_mail.cpython-311.pyc b/be0/src/__pycache__/auth_mail.cpython-311.pyc new file mode 100644 index 0000000..7487c48 Binary files /dev/null and b/be0/src/__pycache__/auth_mail.cpython-311.pyc differ diff --git a/be0/src/__pycache__/auth_mail.cpython-313.pyc b/be0/src/__pycache__/auth_mail.cpython-313.pyc new file mode 100644 index 0000000..01ca3aa Binary files /dev/null and b/be0/src/__pycache__/auth_mail.cpython-313.pyc differ diff --git a/be0/src/__pycache__/auth_rate_limit.cpython-311.pyc b/be0/src/__pycache__/auth_rate_limit.cpython-311.pyc new file mode 100644 index 0000000..c831059 Binary files /dev/null and b/be0/src/__pycache__/auth_rate_limit.cpython-311.pyc differ diff --git a/be0/src/__pycache__/auth_rate_limit.cpython-313.pyc b/be0/src/__pycache__/auth_rate_limit.cpython-313.pyc new file mode 100644 index 0000000..b778b2c Binary files /dev/null and b/be0/src/__pycache__/auth_rate_limit.cpython-313.pyc differ diff --git a/be0/src/__pycache__/awareness_manager.cpython-311.pyc b/be0/src/__pycache__/awareness_manager.cpython-311.pyc new file mode 100644 index 0000000..ff74473 Binary files /dev/null and b/be0/src/__pycache__/awareness_manager.cpython-311.pyc differ diff --git a/be0/src/__pycache__/chat_assistant.cpython-311.pyc b/be0/src/__pycache__/chat_assistant.cpython-311.pyc new file mode 100644 index 0000000..06967de Binary files /dev/null and b/be0/src/__pycache__/chat_assistant.cpython-311.pyc differ diff --git a/be0/src/__pycache__/chat_assistant.cpython-313.pyc b/be0/src/__pycache__/chat_assistant.cpython-313.pyc new file mode 100644 index 0000000..4a8ea6b Binary files /dev/null and b/be0/src/__pycache__/chat_assistant.cpython-313.pyc differ diff --git a/be0/src/__pycache__/compliance_verifier.cpython-311.pyc b/be0/src/__pycache__/compliance_verifier.cpython-311.pyc new file mode 100644 index 0000000..b386a03 Binary files /dev/null and b/be0/src/__pycache__/compliance_verifier.cpython-311.pyc differ diff --git a/be0/src/__pycache__/compliance_verifier.cpython-313.pyc b/be0/src/__pycache__/compliance_verifier.cpython-313.pyc new file mode 100644 index 0000000..02d257c Binary files /dev/null and b/be0/src/__pycache__/compliance_verifier.cpython-313.pyc differ diff --git a/be0/src/__pycache__/config.cpython-311.pyc b/be0/src/__pycache__/config.cpython-311.pyc new file mode 100644 index 0000000..ca407cb Binary files /dev/null and b/be0/src/__pycache__/config.cpython-311.pyc differ diff --git a/be0/src/__pycache__/config.cpython-38.pyc b/be0/src/__pycache__/config.cpython-38.pyc new file mode 100644 index 0000000..5b39d9d Binary files /dev/null and b/be0/src/__pycache__/config.cpython-38.pyc differ diff --git a/be0/src/__pycache__/main_new.cpython-313.pyc b/be0/src/__pycache__/main_new.cpython-313.pyc new file mode 100644 index 0000000..cd64919 Binary files /dev/null and b/be0/src/__pycache__/main_new.cpython-313.pyc differ diff --git a/be0/src/__pycache__/memory_manager.cpython-313.pyc b/be0/src/__pycache__/memory_manager.cpython-313.pyc new file mode 100644 index 0000000..c1e5209 Binary files /dev/null and b/be0/src/__pycache__/memory_manager.cpython-313.pyc differ diff --git a/be0/src/__pycache__/schemas.cpython-311.pyc b/be0/src/__pycache__/schemas.cpython-311.pyc new file mode 100644 index 0000000..003bc84 Binary files /dev/null and b/be0/src/__pycache__/schemas.cpython-311.pyc differ diff --git a/be0/src/__pycache__/staff_profile_domain.cpython-311.pyc b/be0/src/__pycache__/staff_profile_domain.cpython-311.pyc new file mode 100644 index 0000000..5b3ba38 Binary files /dev/null and b/be0/src/__pycache__/staff_profile_domain.cpython-311.pyc differ diff --git a/be0/src/__pycache__/staff_profile_domain.cpython-313.pyc b/be0/src/__pycache__/staff_profile_domain.cpython-313.pyc new file mode 100644 index 0000000..9ae156b Binary files /dev/null and b/be0/src/__pycache__/staff_profile_domain.cpython-313.pyc differ diff --git a/be0/src/__pycache__/structure_analysis.cpython-311.pyc b/be0/src/__pycache__/structure_analysis.cpython-311.pyc new file mode 100644 index 0000000..608a291 Binary files /dev/null and b/be0/src/__pycache__/structure_analysis.cpython-311.pyc differ diff --git a/be0/src/__pycache__/structure_analysis.cpython-313.pyc b/be0/src/__pycache__/structure_analysis.cpython-313.pyc new file mode 100644 index 0000000..489013f Binary files /dev/null and b/be0/src/__pycache__/structure_analysis.cpython-313.pyc differ diff --git a/be0/src/__pycache__/template_manager.cpython-311.pyc b/be0/src/__pycache__/template_manager.cpython-311.pyc new file mode 100644 index 0000000..68adc37 Binary files /dev/null and b/be0/src/__pycache__/template_manager.cpython-311.pyc differ diff --git a/be0/src/__pycache__/text_io.cpython-311.pyc b/be0/src/__pycache__/text_io.cpython-311.pyc new file mode 100644 index 0000000..e44b789 Binary files /dev/null and b/be0/src/__pycache__/text_io.cpython-311.pyc differ diff --git a/be0/src/__pycache__/text_io.cpython-38.pyc b/be0/src/__pycache__/text_io.cpython-38.pyc new file mode 100644 index 0000000..f907e0b Binary files /dev/null and b/be0/src/__pycache__/text_io.cpython-38.pyc differ diff --git a/be0/src/__pycache__/utils.cpython-311.pyc b/be0/src/__pycache__/utils.cpython-311.pyc new file mode 100644 index 0000000..eba5c69 Binary files /dev/null and b/be0/src/__pycache__/utils.cpython-311.pyc differ diff --git a/be0/src/__pycache__/utils.cpython-313.pyc b/be0/src/__pycache__/utils.cpython-313.pyc new file mode 100644 index 0000000..5898417 Binary files /dev/null and b/be0/src/__pycache__/utils.cpython-313.pyc differ diff --git a/be0/src/__pycache__/utils.cpython-38.pyc b/be0/src/__pycache__/utils.cpython-38.pyc new file mode 100644 index 0000000..241e0d3 Binary files /dev/null and b/be0/src/__pycache__/utils.cpython-38.pyc differ diff --git a/be0/src/admin_audit_routes.py b/be0/src/admin_audit_routes.py new file mode 100644 index 0000000..dd6236f --- /dev/null +++ b/be0/src/admin_audit_routes.py @@ -0,0 +1,235 @@ +"""Admin-only audit log query API (GET /api/v1/admin/audit).""" + +from __future__ import annotations + +import uuid +from datetime import datetime, timedelta, timezone +from typing import Annotated, Any, Optional + +from fastapi import APIRouter, Header, HTTPException, Query +from pydantic import BaseModel, Field +from sqlalchemy import asc, desc, func, select + +from src.auth_jwt import decode_access_token_user_id, decode_bearer_token +from src.initiative_db.engine import get_session, init_engine, is_postgres_enabled +from src.initiative_db.models import AuditEvent + +router = APIRouter(prefix="/admin", tags=["admin-audit"]) + + +def _jwt_role_strings(authorization: str | None) -> list[str]: + p = decode_bearer_token(authorization) + if not p: + return [] + r = p.get("roles") + if isinstance(r, list): + return [str(x) for x in r] + return [] + + +def require_admin_uid(authorization: str | None) -> uuid.UUID: + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để thực hiện thao tác.") + if "admin" not in _jwt_role_strings(authorization): + raise HTTPException(status_code=403, detail="Chỉ tài khoản quản trị mới thực hiện được.") + return uid + + +class AuditEventListItem(BaseModel): + model_config = {"from_attributes": True} + + id: int + occurred_at: datetime + actor_user_id: Optional[uuid.UUID] = None + actor_email: str + actor_role: str + action: str + entity_type: str + entity_id: Optional[str] = None + metadata: dict[str, Any] = Field(default_factory=dict) + request_id: Optional[uuid.UUID] = None + has_before: bool = False + has_after: bool = False + + +class AuditListResponse(BaseModel): + items: list[AuditEventListItem] + total: int + page: int + page_size: int + + +class AuditEventDetail(BaseModel): + id: int + occurred_at: datetime + actor_user_id: Optional[uuid.UUID] = None + actor_email: str + actor_role: str + action: str + entity_type: str + entity_id: Optional[str] = None + before: Optional[dict[str, Any]] = None + after: Optional[dict[str, Any]] = None + metadata: dict[str, Any] = Field(default_factory=dict) + request_id: Optional[uuid.UUID] = None + + +_AUDIT_ACTIONS = frozenset( + {"create", "read", "update", "delete", "login", "logout", "login_failed"} +) + + +def _parse_sort(sort: str) -> bool: + """True when sorting occurred_at ascending.""" + s = (sort or "occurred_at:desc").strip().lower() + if ":" in s: + col_name, direction = s.split(":", 1) + else: + col_name, direction = s, "desc" + if col_name != "occurred_at": + raise HTTPException(status_code=400, detail='sort chỉ hỗ trợ occurred_at (+ asc|desc)') + return direction in ("asc", "ascending", "old", "older") + + +def _where_audit( + *, + from_ts: datetime, + to_ts: datetime, + actor_user_id: Optional[uuid.UUID], + actor_email: Optional[str], + entity_type: Optional[str], + entity_id: Optional[str], + actions: Optional[list[str]], + request_id: Optional[uuid.UUID], +): + parts = [ + AuditEvent.occurred_at >= from_ts, + AuditEvent.occurred_at <= to_ts, + ] + if actor_user_id is not None: + parts.append(AuditEvent.actor_user_id == actor_user_id) + if actor_email: + parts.append(AuditEvent.actor_email == actor_email.strip().lower()) + if entity_type: + parts.append(AuditEvent.entity_type == entity_type.strip()) + if entity_id is not None and entity_id.strip() != "": + parts.append(AuditEvent.entity_id == entity_id.strip()) + if actions: + parts.append(AuditEvent.action.in_(actions)) + if request_id is not None: + parts.append(AuditEvent.request_id == request_id) + return parts + + +@router.get("/audit", response_model=AuditListResponse) +async def list_audit_events( + authorization: Annotated[str | None, Header()] = None, + from_: Annotated[ + Optional[datetime], + Query(alias="from", description="Inclusive lower bound (UTC). Default: now−7d"), + ] = None, + to: Annotated[ + Optional[datetime], + Query(description="Inclusive upper bound (UTC). Default: now"), + ] = None, + actor_user_id: Optional[uuid.UUID] = None, + actor_email: Optional[str] = None, + entity_type: Optional[str] = None, + entity_id: Optional[str] = None, + action: Optional[str] = Query( + None, description="Comma-separated audit_action values" + ), + request_id: Optional[uuid.UUID] = None, + page: int = Query(1, ge=1), + page_size: int = Query(50, ge=1, le=100), + sort: str = Query("occurred_at:desc", description='e.g. "occurred_at:desc"'), +): + require_admin_uid(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để đọc audit.") + + await init_engine() + now = datetime.now(timezone.utc) + end = to or now + start = from_ or (end - timedelta(days=7)) + if end < start: + raise HTTPException(status_code=400, detail="Tham số to phải >= from") + + actions_list: Optional[list[str]] = None + if action: + raw = [a.strip().lower() for a in action.split(",") if a.strip()] + bad = [a for a in raw if a not in _AUDIT_ACTIONS] + if bad: + raise HTTPException(status_code=400, detail=f"action không hợp lệ: {bad}") + actions_list = raw + + asc_order = _parse_sort(sort) + offset = (page - 1) * page_size + wh = _where_audit( + from_ts=start, + to_ts=end, + actor_user_id=actor_user_id, + actor_email=actor_email, + entity_type=entity_type, + entity_id=entity_id, + actions=actions_list, + request_id=request_id, + ) + + async with get_session() as session: + cnt_stmt = select(func.count()).select_from(AuditEvent).where(*wh) + total = int((await session.execute(cnt_stmt)).scalar_one()) + + order_clause = asc(AuditEvent.occurred_at) if asc_order else desc(AuditEvent.occurred_at) + stmt = select(AuditEvent).where(*wh).order_by(order_clause).limit(page_size).offset(offset) + rows = (await session.execute(stmt)).scalars().all() + + items = [ + AuditEventListItem( + id=r.id, + occurred_at=r.occurred_at, + actor_user_id=r.actor_user_id, + actor_email=r.actor_email, + actor_role=r.actor_role, + action=str(r.action), + entity_type=r.entity_type, + entity_id=r.entity_id, + metadata=dict(r.metadata_) if isinstance(r.metadata_, dict) else {}, + request_id=r.request_id, + has_before=r.before is not None, + has_after=r.after is not None, + ) + for r in rows + ] + return AuditListResponse(items=items, total=total, page=page, page_size=page_size) + + +@router.get("/audit/{event_id:int}", response_model=AuditEventDetail) +async def get_audit_event_detail( + event_id: int, + authorization: Annotated[str | None, Header()] = None, +): + require_admin_uid(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cần PostgreSQL để đọc audit.") + + await init_engine() + async with get_session() as session: + row = await session.get(AuditEvent, event_id) + if row is None: + raise HTTPException(status_code=404, detail="Không có sự kiện audit.") + return AuditEventDetail( + id=row.id, + occurred_at=row.occurred_at, + actor_user_id=row.actor_user_id, + actor_email=row.actor_email, + actor_role=row.actor_role, + action=str(row.action), + entity_type=row.entity_type, + entity_id=row.entity_id, + before=dict(row.before) if isinstance(row.before, dict) else row.before, + after=dict(row.after) if isinstance(row.after, dict) else row.after, + metadata=dict(row.metadata_) if isinstance(row.metadata_, dict) else {}, + request_id=row.request_id, + ) diff --git a/be0/src/admin_user_profile_routes.py b/be0/src/admin_user_profile_routes.py new file mode 100644 index 0000000..7da96e0 --- /dev/null +++ b/be0/src/admin_user_profile_routes.py @@ -0,0 +1,609 @@ +"""Admin APIs for staff profile verification queue (conditional updates + audit).""" + +from __future__ import annotations + +import uuid +from datetime import datetime, timezone +from typing import Any, Optional + +from fastapi import APIRouter, Header, HTTPException +from pydantic import BaseModel, Field +from sqlalchemy import delete, func, select, text, update +from sqlalchemy.exc import IntegrityError, ProgrammingError +from sqlalchemy.ext.asyncio import AsyncSession + +from src.audit import AuditAction, record_audit, resolve_actor_fields +from src.auth_api import _policy_admin_emails +from src.auth_jwt import decode_access_token_user_id, decode_bearer_token +from src.initiative_db.engine import get_session, is_postgres_enabled +from src.initiative_db.models import ( + AcademicTitle, + ApplicationAdminResult, + ApplicationArtifact, + ApplicationReviewDocument, + AuditLog, + Initiative, + Unit, + User, + UserRoleRow, + UserStaffProfile, +) +from src.staff_profile_domain import staff_row_for_audit + +router = APIRouter(prefix="/admin/user-profiles", tags=["admin-user-profiles"]) + +SYSTEM_DRAFT_USER_ID = uuid.UUID("00000000-0000-4000-8000-000000000001") +_FRONTEND_ROLES = frozenset({"admin", "editor", "viewer"}) + + +def _jwt_role_strings(authorization: str | None) -> list[str]: + p = decode_bearer_token(authorization) + if not p: + return [] + r = p.get("roles") + if isinstance(r, list): + return [str(x) for x in r] + return [] + + +def _require_admin_uid(authorization: str | None) -> uuid.UUID: + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để thực hiện thao tác.") + if "admin" not in _jwt_role_strings(authorization): + raise HTTPException(status_code=403, detail="Chỉ tài khoản quản trị mới thực hiện được.") + return uid + + +class PendingProfileItem(BaseModel): + userId: str + email: str + fullName: str + employeeId: Optional[str] = None + jobTitle: Optional[str] = None + verificationSubmittedAt: Optional[datetime] = None + version: int + + +class RegisteredUserItem(BaseModel): + """Active accounts with staff profile snapshot (admin read-only directory).""" + + userId: str + email: str + fullName: str + phone: Optional[str] = None + createdAt: datetime + employeeId: Optional[str] = None + jobTitle: Optional[str] = None + unitNameFreetext: Optional[str] = None + unitCatalogName: Optional[str] = None + academicTitleLabelVi: Optional[str] = None + academicTitleOther: Optional[str] = None + profileVerificationStatus: str = "draft" + roles: list[str] = Field(default_factory=list) + adminFromPolicy: bool = False + policyAdminLocked: bool = False + + +class ProfileDetailResponse(BaseModel): + userId: str + email: str + fullName: str + phone: Optional[str] = None + unitId: Optional[str] = None + unitCatalogName: Optional[str] = None + staffProfile: dict[str, Any] + + +class VerifyBody(BaseModel): + expectedVersion: int = Field(..., ge=1) + + +class RejectBody(BaseModel): + expectedVersion: int = Field(..., ge=1) + reason: str = Field(..., min_length=1, max_length=2000) + + +class RemoveUserBody(BaseModel): + confirmEmail: str = Field(..., min_length=3, max_length=320) + + +class SetUserRolesBody(BaseModel): + admin: bool = False + editor: bool = False + viewer: bool = False + + +class UserRolesStateResponse(BaseModel): + roles: list[str] + adminFromPolicy: bool + policyAdminLocked: bool + + +@router.get("/pending", response_model=list[PendingProfileItem]) +async def list_pending(authorization: str | None = Header(None)) -> list[PendingProfileItem]: + _require_admin_uid(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + async with get_session() as session: + stmt = ( + select(User, UserStaffProfile) + .join(UserStaffProfile, UserStaffProfile.user_id == User.id) + .where(UserStaffProfile.profile_verification_status == "pending", User.is_active.is_(True)) + .order_by(UserStaffProfile.verification_submitted_at.asc().nulls_last()) + ) + rows = (await session.execute(stmt)).all() + out: list[PendingProfileItem] = [] + for user, sp in rows: + out.append( + PendingProfileItem( + userId=str(user.id), + email=user.email, + fullName=user.full_name, + employeeId=sp.employee_id, + jobTitle=sp.job_title, + verificationSubmittedAt=sp.verification_submitted_at, + version=sp.version, + ) + ) + return out + + +@router.get("/registry", response_model=list[RegisteredUserItem]) +async def list_registered_users(authorization: str | None = Header(None)) -> list[RegisteredUserItem]: + """All active user accounts (successful registration) with HR fields for review / export.""" + _require_admin_uid(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + async with get_session() as session: + stmt = ( + select(User, UserStaffProfile, Unit, AcademicTitle) + .outerjoin(UserStaffProfile, UserStaffProfile.user_id == User.id) + .outerjoin(Unit, Unit.id == User.unit_id) + .outerjoin(AcademicTitle, AcademicTitle.code == UserStaffProfile.academic_title_code) + .where(User.is_active.is_(True)) + .order_by(User.created_at.desc()) + ) + rows = (await session.execute(stmt)).all() + by_user_roles: dict[uuid.UUID, list[UserRoleRow]] = {} + if rows: + uids = [r[0].id for r in rows] + role_stmt = select(UserRoleRow).where(UserRoleRow.user_id.in_(uids)) + for rr in (await session.execute(role_stmt)).scalars().all(): + by_user_roles.setdefault(rr.user_id, []).append(rr) + policy = _policy_admin_emails() + out: list[RegisteredUserItem] = [] + for user, sp, unit, title in rows: + status = "draft" + if sp is not None: + status = sp.profile_verification_status or "draft" + ur = by_user_roles.get(user.id, []) + role_set = sorted({str(x.role) for x in ur if str(x.role) in _FRONTEND_ROLES}) + admin_fp = any( + str(x.role) == "admin" and bool(x.admin_from_email_policy) for x in ur + ) + email_norm = user.email.strip().lower() + policy_lock = email_norm in policy + out.append( + RegisteredUserItem( + userId=str(user.id), + email=user.email, + fullName=user.full_name, + phone=user.phone, + createdAt=user.created_at, + employeeId=sp.employee_id if sp else None, + jobTitle=sp.job_title if sp else None, + unitNameFreetext=sp.unit_name_freetext if sp else None, + unitCatalogName=unit.name if unit is not None else None, + academicTitleLabelVi=title.label_vi if title is not None else None, + academicTitleOther=sp.academic_title_other if sp else None, + profileVerificationStatus=status, + roles=role_set, + adminFromPolicy=admin_fp, + policyAdminLocked=policy_lock, + ) + ) + return out + + +async def _detail(session: AsyncSession, user_id: uuid.UUID) -> ProfileDetailResponse | None: + stmt = ( + select(User, UserStaffProfile) + .join(UserStaffProfile, UserStaffProfile.user_id == User.id) + .where(User.id == user_id) + ) + row = (await session.execute(stmt)).first() + if row is None: + return None + user, sp = row + unit_name: str | None = None + if user.unit_id is not None: + u = await session.get(Unit, user.unit_id) + if u is not None: + unit_name = u.name + staff = { + "employeeId": sp.employee_id, + "academicTitleCode": sp.academic_title_code, + "academicTitleOther": sp.academic_title_other, + "unitNameFreetext": sp.unit_name_freetext, + "jobTitle": sp.job_title, + "profileVerificationStatus": sp.profile_verification_status, + "verificationSubmittedAt": sp.verification_submitted_at, + "verifiedAt": sp.verified_at, + "verifiedByUserId": str(sp.verified_by_user_id) if sp.verified_by_user_id else None, + "rejectionReason": sp.rejection_reason, + "version": sp.version, + } + return ProfileDetailResponse( + userId=str(user.id), + email=user.email, + fullName=user.full_name, + phone=user.phone, + unitId=str(user.unit_id) if user.unit_id else None, + unitCatalogName=unit_name, + staffProfile=staff, + ) + + +@router.get("/{user_id}", response_model=ProfileDetailResponse) +async def get_profile_detail( + user_id: uuid.UUID, authorization: str | None = Header(None) +) -> ProfileDetailResponse: + _require_admin_uid(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + async with get_session() as session: + detail = await _detail(session, user_id) + if detail is None: + raise HTTPException(status_code=404, detail="Không tìm thấy người dùng.") + return detail + + +@router.patch("/{user_id}/roles", response_model=UserRolesStateResponse) +async def set_user_roles( + user_id: uuid.UUID, + body: SetUserRolesBody, + authorization: str | None = Header(None), +) -> UserRolesStateResponse: + """Replace app-facing roles (admin / editor / viewer) for a user.""" + admin_id = _require_admin_uid(authorization) + if user_id == SYSTEM_DRAFT_USER_ID: + raise HTTPException(status_code=400, detail="Không thể sửa vai trò tài khoản hệ thống.") + + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + desired: set[str] = set() + if body.admin: + desired.add("admin") + if body.editor: + desired.add("editor") + if body.viewer: + desired.add("viewer") + if not desired: + raise HTTPException(status_code=400, detail="Chọn ít nhất một vai trò.") + + policy = _policy_admin_emails() + + async with get_session() as session: + user = await session.get(User, user_id) + if user is None or not user.is_active: + raise HTTPException(status_code=404, detail="Không tìm thấy người dùng.") + + email_norm = user.email.strip().lower() + if email_norm in policy and "admin" not in desired: + raise HTTPException( + status_code=400, + detail="Không thể gỡ quyền Quản trị: email thuộc danh sách quản trị hệ thống.", + ) + if user_id == admin_id and "admin" not in desired: + raise HTTPException( + status_code=400, + detail="Không thể tự gỡ quyền Quản trị của chính mình.", + ) + + stmt = select(UserRoleRow).where(UserRoleRow.user_id == user_id) + existing = list((await session.execute(stmt)).scalars().all()) + current_front = {str(r.role) for r in existing if str(r.role) in _FRONTEND_ROLES} + before_roles = sorted(current_front) + + to_remove = current_front - desired + to_add = desired - current_front + + for role in to_remove: + await session.execute( + delete(UserRoleRow).where( + UserRoleRow.user_id == user_id, + UserRoleRow.role == role, + ) + ) + + for role in to_add: + session.add( + UserRoleRow( + user_id=user_id, + role=role, + admin_from_email_policy=bool(role == "admin" and email_norm in policy), + ) + ) + + if to_remove or to_add: + user.credential_version = int(user.credential_version or 0) + 1 + user.updated_at = datetime.now(timezone.utc) + + after_roles = sorted(desired) + if before_roles != after_roles: + actor_email, actor_role = await resolve_actor_fields(session, admin_id) + await record_audit( + session, + actor_user_id=admin_id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user_roles", + entity_id=str(user_id), + before={"roles": before_roles}, + after={"roles": after_roles}, + metadata={"action": "set_roles"}, + ) + + await session.flush() + stmt2 = select(UserRoleRow).where(UserRoleRow.user_id == user_id) + final_rows = list((await session.execute(stmt2)).scalars().all()) + role_list = sorted({str(r.role) for r in final_rows if str(r.role) in _FRONTEND_ROLES}) + admin_fp = any( + str(r.role) == "admin" and bool(r.admin_from_email_policy) for r in final_rows + ) + return UserRolesStateResponse( + roles=role_list, + adminFromPolicy=admin_fp, + policyAdminLocked=(email_norm in policy), + ) + + +@router.post("/{user_id}/verify") +async def verify_profile( + user_id: uuid.UUID, + body: VerifyBody, + authorization: str | None = Header(None), +) -> dict[str, str]: + admin_id = _require_admin_uid(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + now = datetime.now(timezone.utc) + async with get_session() as session: + sp = await session.get(UserStaffProfile, user_id) + user = await session.get(User, user_id) + if sp is None or user is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ.") + before = staff_row_for_audit(sp, user.unit_id) + + stmt = ( + update(UserStaffProfile) + .where( + UserStaffProfile.user_id == user_id, + UserStaffProfile.profile_verification_status == "pending", + UserStaffProfile.version == body.expectedVersion, + ) + .values( + profile_verification_status="verified", + verified_at=now, + verified_by_user_id=admin_id, + rejection_reason=None, + version=UserStaffProfile.version + 1, + updated_at=now, + ) + ) + res = await session.execute(stmt) + if res.rowcount == 0: + raise HTTPException( + status_code=409, + detail="Không thể xác minh: trạng thái đã đổi hoặc phiên bản không khớp (vui lòng tải lại).", + ) + + await session.refresh(sp) + after = staff_row_for_audit(sp, user.unit_id) + actor_email, actor_role = await resolve_actor_fields(session, admin_id) + await record_audit( + session, + actor_user_id=admin_id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user_profile", + entity_id=str(user_id), + before=before, + after=after, + metadata={"action": "verify"}, + ) + return {"status": "verified"} + + +@router.post("/{user_id}/reject") +async def reject_profile( + user_id: uuid.UUID, + body: RejectBody, + authorization: str | None = Header(None), +) -> dict[str, str]: + admin_id = _require_admin_uid(authorization) + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + reason = body.reason.strip() + if not reason: + raise HTTPException(status_code=400, detail="Cần lý do từ chối.") + + now = datetime.now(timezone.utc) + async with get_session() as session: + sp = await session.get(UserStaffProfile, user_id) + user = await session.get(User, user_id) + if sp is None or user is None: + raise HTTPException(status_code=404, detail="Không tìm thấy hồ sơ.") + before = staff_row_for_audit(sp, user.unit_id) + + stmt = ( + update(UserStaffProfile) + .where( + UserStaffProfile.user_id == user_id, + UserStaffProfile.profile_verification_status == "pending", + UserStaffProfile.version == body.expectedVersion, + ) + .values( + profile_verification_status="rejected", + verified_at=None, + verified_by_user_id=None, + rejection_reason=reason, + version=UserStaffProfile.version + 1, + updated_at=now, + ) + ) + res = await session.execute(stmt) + if res.rowcount == 0: + raise HTTPException( + status_code=409, + detail="Không thể từ chối: trạng thái đã đổi hoặc phiên bản không khớp (vui lòng tải lại).", + ) + + await session.refresh(sp) + after = staff_row_for_audit(sp, user.unit_id) + actor_email, actor_role = await resolve_actor_fields(session, admin_id) + await record_audit( + session, + actor_user_id=admin_id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user_profile", + entity_id=str(user_id), + before=before, + after=after, + metadata={"action": "reject"}, + ) + return {"status": "rejected"} + + +@router.post("/{user_id}/remove") +async def remove_user_account( + user_id: uuid.UUID, + body: RemoveUserBody, + authorization: str | None = Header(None), +) -> dict[str, str]: + """ + Permanently delete a user row (cascades roles, staff profile, OTP tokens, etc.). + Blocked for admins, the system draft user, self-delete, and accounts that still own initiatives. + """ + admin_id = _require_admin_uid(authorization) + if user_id == admin_id: + raise HTTPException(status_code=400, detail="Không thể xóa chính mình.") + if user_id == SYSTEM_DRAFT_USER_ID: + raise HTTPException(status_code=400, detail="Không thể xóa tài khoản hệ thống.") + + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + confirm = body.confirmEmail.strip().lower() + if not confirm: + raise HTTPException(status_code=400, detail="Cần nhập email để xác nhận.") + + async with get_session() as session: + user = await session.get(User, user_id) + if user is None: + raise HTTPException(status_code=404, detail="Không tìm thấy người dùng.") + if user.email.strip().lower() != confirm: + raise HTTPException(status_code=400, detail="Email xác nhận không khớp tài khoản.") + + admin_stmt = select(UserRoleRow.user_id).where( + UserRoleRow.user_id == user_id, + UserRoleRow.role == "admin", + ) + if (await session.execute(admin_stmt)).first() is not None: + raise HTTPException(status_code=403, detail="Không thể xóa tài khoản quản trị.") + + own_count = ( + await session.execute( + select(func.count()).select_from(Initiative).where(Initiative.owner_id == user_id) + ) + ).scalar_one() + if own_count and int(own_count) > 0: + raise HTTPException( + status_code=409, + detail="Tài khoản còn sáng kiến/đơn (owner). Xóa hoặc chuyển dữ liệu trước.", + ) + + await session.execute( + update(UserStaffProfile) + .where(UserStaffProfile.verified_by_user_id == user_id) + .values(verified_by_user_id=None) + ) + await session.execute( + update(ApplicationArtifact) + .where(ApplicationArtifact.uploaded_by == user_id) + .values(uploaded_by=None) + ) + await session.execute( + update(ApplicationReviewDocument) + .where(ApplicationReviewDocument.created_by == user_id) + .values(created_by=None) + ) + await session.execute( + update(ApplicationAdminResult) + .where(ApplicationAdminResult.created_by == user_id) + .values(created_by=None) + ) + await session.execute( + update(ApplicationAdminResult) + .where(ApplicationAdminResult.updated_by == user_id) + .values(updated_by=None) + ) + await session.execute(update(AuditLog).where(AuditLog.actor_id == user_id).values(actor_id=None)) + + async with session.begin_nested(): + try: + await session.execute( + text("UPDATE authors SET user_id = NULL WHERE user_id = CAST(:uid AS uuid)"), + {"uid": str(user_id)}, + ) + except ProgrammingError: + pass + + before_user = { + "userId": str(user.id), + "email": user.email, + "fullName": user.full_name, + } + actor_email, actor_role = await resolve_actor_fields(session, admin_id) + await record_audit( + session, + actor_user_id=admin_id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.delete, + entity_type="user", + entity_id=str(user_id), + before=before_user, + after=None, + metadata={"action": "admin_remove_user"}, + ) + + # Delete staff profile before User: ORM would otherwise try to NULL user_staff_profiles.user_id, + # which is the row's primary key (AssertionError on flush). + sp_row = await session.get(UserStaffProfile, user_id) + if sp_row is not None: + await session.delete(sp_row) + await session.flush() + + try: + await session.delete(user) + await session.flush() + except IntegrityError: + await session.rollback() + raise HTTPException( + status_code=409, + detail="Không thể xóa: tài khoản còn được tham chiếu (ví dụ: minh chứng, đánh giá hội đồng).", + ) from None + + return {"status": "deleted"} \ No newline at end of file diff --git a/be0/src/application/__init__.py b/be0/src/application/__init__.py new file mode 100644 index 0000000..3e0635b --- /dev/null +++ b/be0/src/application/__init__.py @@ -0,0 +1,6 @@ +"""Application layer — use cases that orchestrate domain objects via ports. + +Depends on ``domain`` + ``shared_kernel`` only. Knows nothing about FastAPI, +SQLAlchemy, JWT, or argon2 — those arrive as ``ports`` (Protocols) injected by the +composition root. A use case is one business operation, testable with fakes. +""" diff --git a/be0/src/application/identity/__init__.py b/be0/src/application/identity/__init__.py new file mode 100644 index 0000000..1f2eb9e --- /dev/null +++ b/be0/src/application/identity/__init__.py @@ -0,0 +1 @@ +"""Identity use cases (Login is the first cut-over reference).""" diff --git a/be0/src/application/identity/dto.py b/be0/src/application/identity/dto.py new file mode 100644 index 0000000..6c64150 --- /dev/null +++ b/be0/src/application/identity/dto.py @@ -0,0 +1,24 @@ +"""Application DTOs for Identity — the inputs/outputs of use cases (not API schemas).""" + +from __future__ import annotations + +from dataclasses import dataclass + +from src.domain.identity.entities import User + + +@dataclass(frozen=True) +class LoginCommand: + email: str + password: str + client_ip: str + + +@dataclass(frozen=True) +class AuthenticatedUser: + """Result of a successful authentication. The API layer assembles the public + response (incl. staff profile) from this + a profile read.""" + + user: User + roles: list[str] + access_token: str diff --git a/be0/src/application/identity/ports.py b/be0/src/application/identity/ports.py new file mode 100644 index 0000000..0006c28 --- /dev/null +++ b/be0/src/application/identity/ports.py @@ -0,0 +1,54 @@ +"""Driven ports for the Identity application layer. + +Each is a structural ``Protocol`` implemented by an adapter in ``infrastructure``. +The use cases program against these, never against the concrete library. +""" + +from __future__ import annotations + +import uuid +from datetime import datetime +from typing import Protocol + + +class PasswordHasher(Protocol): + """Argon2id in production (``infrastructure.identity.argon2_hasher``).""" + + def hash(self, plain: str) -> str: ... + def verify(self, plain: str, hashed: str) -> bool: ... + + +class TokenIssuer(Protocol): + """Signs the access token from claims built by the domain.""" + + def issue( + self, + user_id: uuid.UUID, + email: str, + roles: list[str], + credential_version: int, + ) -> str: ... + + +class LoginRateLimiter(Protocol): + """Per-(email, ip) sliding window; returns False when the request must be denied.""" + + def allow(self, email: str, client_ip: str) -> bool: ... + + +class Clock(Protocol): + """Injectable time source (keeps use cases deterministic in tests).""" + + def now(self) -> datetime: ... + + +class AuthAuditSink(Protocol): + """Append-only audit of authentication outcomes.""" + + async def login_succeeded( + self, *, user_id: uuid.UUID, email: str, roles: list[str] + ) -> None: ... + + async def login_failed( + self, *, email: str, user_id: uuid.UUID | None, reason: str | None + ) -> None: ... diff --git a/be0/src/application/identity/use_cases/__init__.py b/be0/src/application/identity/use_cases/__init__.py new file mode 100644 index 0000000..e494b47 --- /dev/null +++ b/be0/src/application/identity/use_cases/__init__.py @@ -0,0 +1 @@ +"""One module per use case — one business operation each.""" diff --git a/be0/src/application/identity/use_cases/authenticate_user.py b/be0/src/application/identity/use_cases/authenticate_user.py new file mode 100644 index 0000000..85a250a --- /dev/null +++ b/be0/src/application/identity/use_cases/authenticate_user.py @@ -0,0 +1,80 @@ +"""AuthenticateUser — the ``POST /auth/login`` orchestration, framework-free. + +Behavior mirrors ``auth_api.login`` exactly so the cut-over is invisible to clients: +institutional-email normalization → rate limit (429) → credential check (401) → +email-verified check (403) → role reconcile → audit → signed token. + +Depends only on ports + domain; unit-tested with fakes (no DB). Status mapping +(DomainError subclass → HTTP code) happens in the API layer. +""" + +from __future__ import annotations + +from src.application.identity.dto import AuthenticatedUser, LoginCommand +from src.application.identity.ports import ( + AuthAuditSink, + LoginRateLimiter, + PasswordHasher, + TokenIssuer, +) +from src.domain.identity.errors import EmailNotVerified, InvalidCredentials +from src.domain.identity.repository import UserRepository +from src.domain.identity.value_objects import InstitutionalEmail +from src.shared_kernel.errors import RateLimited + +# Vietnamese messages preserved verbatim from auth_api.login. +_INVALID_CREDENTIALS_MSG = "Email hoặc mật khẩu không đúng." +_EMAIL_UNVERIFIED_MSG = ( + "Vui lòng xác minh email trước khi đăng nhập. Kiểm tra hộp thư " + "hoặc dùng chức năng gửi lại mã OTP trên trang đăng ký." +) +_RATE_LIMITED_MSG = "Quá nhiều lần đăng nhập. Vui lòng thử lại sau." + + +class AuthenticateUser: + def __init__( + self, + *, + users: UserRepository, + hasher: PasswordHasher, + tokens: TokenIssuer, + rate_limiter: LoginRateLimiter, + audit: AuthAuditSink, + ) -> None: + self._users = users + self._hasher = hasher + self._tokens = tokens + self._rate_limiter = rate_limiter + self._audit = audit + + async def execute(self, command: LoginCommand) -> AuthenticatedUser: + email = InstitutionalEmail.parse(command.email) # raises 400 on bad domain + + if not self._rate_limiter.allow(email.value, command.client_ip): + raise RateLimited(_RATE_LIMITED_MSG) + + user = await self._users.get_by_email(email.value) + # Wrong creds and unknown/inactive email are indistinguishable → 401. + if ( + user is None + or not user.can_authenticate() + or not self._hasher.verify(command.password, user.password_hash) + ): + await self._audit.login_failed( + email=email.value, + user_id=user.id if user is not None else None, + reason=None, + ) + raise InvalidCredentials(_INVALID_CREDENTIALS_MSG) + + # Correct creds but unverified email → 403 (distinct from 401). + if user.requires_email_verification(): + await self._audit.login_failed( + email=email.value, user_id=user.id, reason="email_unverified" + ) + raise EmailNotVerified(_EMAIL_UNVERIFIED_MSG) + + roles = await self._users.roles_after_reconcile(user) + await self._audit.login_succeeded(user_id=user.id, email=user.email, roles=roles) + token = self._tokens.issue(user.id, user.email, roles, user.credential_version) + return AuthenticatedUser(user=user, roles=roles, access_token=token) diff --git a/be0/src/audit.py b/be0/src/audit.py new file mode 100644 index 0000000..aa32814 --- /dev/null +++ b/be0/src/audit.py @@ -0,0 +1,176 @@ +"""Append-only audit events (PostgreSQL audit_events table).""" + +from __future__ import annotations + +import enum +import logging +import uuid +from typing import Any, Mapping, Optional, Sequence + +from sqlalchemy import select +from sqlalchemy.exc import InvalidRequestError, ProgrammingError +from sqlalchemy.orm import object_session +from sqlalchemy.ext.asyncio import AsyncSession + +from src.initiative_db.engine import get_session_factory, init_engine, is_postgres_enabled + +logger = logging.getLogger(__name__) + + +class AuditAction(str, enum.Enum): + create = "create" + read = "read" + update = "update" + delete = "delete" + login = "login" + logout = "logout" + login_failed = "login_failed" + + +def _audit_table_missing_error(exc: BaseException) -> bool: + parts: list[str] = [str(exc)] + orig = getattr(exc, "orig", None) + if orig is not None: + parts.append(str(orig)) + chain = getattr(exc, "__cause__", None) + if chain is not None: + parts.append(str(chain)) + blob = " ".join(parts).lower() + return "audit_events" in blob and "does not exist" in blob + + +async def record_audit( + session: AsyncSession, + *, + actor_user_id: Optional[uuid.UUID], + actor_email: str, + actor_role: str, + action: AuditAction, + entity_type: str, + entity_id: Optional[str] = None, + before: Optional[dict[str, Any]] = None, + after: Optional[dict[str, Any]] = None, + metadata: Optional[dict[str, Any]] = None, + request_id: Optional[uuid.UUID] = None, +) -> None: + """ + Insert one audit row in a SAVEPOINT so missing ``audit_events`` (migration not applied) + does not abort the surrounding business transaction. + """ + from src.initiative_db.models import AuditEvent + + row = AuditEvent( + actor_user_id=actor_user_id, + actor_email=actor_email, + actor_role=actor_role, + action=action.value, + entity_type=entity_type, + entity_id=entity_id, + before=before, + after=after, + metadata_=metadata or {}, + request_id=request_id, + ) + try: + async with session.begin_nested(): + session.add(row) + await session.flush() + except ProgrammingError as e: + if _audit_table_missing_error(e): + # If the ORM row is still tracked, drop it so outer commit() does not retry the INSERT. + if object_session(row) is session: + try: + session.expunge(row) + except InvalidRequestError: + pass + logger.warning( + "audit_events table missing — apply be0/migrations/008_audit_events.sql; skipping audit (%s)", + action.value, + ) + return + raise + + +async def persist_audit_standalone( + *, + actor_user_id: Optional[uuid.UUID], + actor_email: str, + actor_role: str, + action: AuditAction, + entity_type: str, + entity_id: Optional[str] = None, + before: Optional[dict[str, Any]] = None, + after: Optional[dict[str, Any]] = None, + metadata: Optional[dict[str, Any]] = None, + request_id: Optional[uuid.UUID] = None, +) -> None: + """ + Commit a single audit row in its own transaction (e.g. failed login where the + main request transaction rolls back). + """ + if not is_postgres_enabled(): + return + from src.initiative_db.models import AuditEvent + + await init_engine() + factory = get_session_factory() + async with factory() as session: + session.add( + AuditEvent( + actor_user_id=actor_user_id, + actor_email=actor_email, + actor_role=actor_role, + action=action.value, + entity_type=entity_type, + entity_id=entity_id, + before=before, + after=after, + metadata_=metadata or {}, + request_id=request_id, + ) + ) + try: + await session.commit() + except ProgrammingError as e: + if _audit_table_missing_error(e): + logger.warning( + "audit_events table missing — apply migration 008; skipping standalone audit (%s)", + action.value, + ) + await session.rollback() + return + await session.rollback() + raise + + +async def resolve_actor_fields(session: AsyncSession, user_id: uuid.UUID) -> tuple[str, str]: + """Return ``(email, roles_csv)`` for denormalized audit columns.""" + from src.initiative_db.models import User, UserRoleRow + + user = await session.get(User, user_id) + if user is None: + return "unknown@invalid", "none" + stmt = select(UserRoleRow.role).where(UserRoleRow.user_id == user_id) + roles = (await session.execute(stmt)).scalars().all() + r = ",".join(sorted({str(x) for x in roles})) if roles else "none" + return user.email, r + + +def roles_from_jwt_list(roles: Sequence[str] | None) -> str: + if not roles: + return "none" + return ",".join(sorted({str(x) for x in roles})) + + +def jwt_payload_actor_email(payload: Mapping[str, Any] | None) -> tuple[str, str]: + """Extract ``(email, roles_csv)`` from JWT payload claims.""" + if not payload: + return "", "none" + email = str(payload.get("email") or "") + raw = payload.get("roles") + if isinstance(raw, list): + return email, roles_from_jwt_list([str(x) for x in raw]) + return email, "none" + + +jwt_payload_actor_email_role = jwt_payload_actor_email diff --git a/be0/src/auth_api.py b/be0/src/auth_api.py new file mode 100644 index 0000000..6210225 --- /dev/null +++ b/be0/src/auth_api.py @@ -0,0 +1,1527 @@ +""" +Registration and login API — passwords hashed with Argon2id; JWT access tokens (HS256). +Requires PostgreSQL (users + user_roles tables). + +New accounts verify email with a **6-digit OTP** (email via SMTP or ``AUTH_MAIL_LOG_ONLY``); see ``/auth/verify-otp``, +``/auth/resend-otp``, migrations ``013_email_verification.sql`` + ``014_registration_otp.sql``, and optional env ``REGISTER_OTP_TTL_MINUTES`` (default **1**). +Legacy ``/auth/verify-email`` (magic link) remains for old tokens only. + +Registration requires a complete staff profile (same rules as profile verification submit): +employee id, academic title (+ detail when «other»), unit (catalog UUID and/or freetext name), job title. + +Server-derived roles: + - Emails in AUTH_ADMIN_EMAILS (comma-separated env) get role ``admin`` with + ``user_roles.admin_from_email_policy = TRUE`` (reconciled on register/login/refresh/me/profile). + - When AUTH_ADMIN_EMAILS is unset, a built-in UMP allow-list applies (institution defaults). + - All other allowed institutional emails get ``viewer`` on register (Người nộp đơn). + - Client-supplied ``role`` on register is ignored (privilege escalation fix). +""" + +from __future__ import annotations + +import hashlib +import hmac +import os +import re +import secrets +import uuid +from datetime import datetime, timedelta, timezone +from typing import Any, Literal + +import jwt +from argon2 import PasswordHasher + +try: + from argon2.exceptions import InvalidHashError, VerifyMismatchError +except ImportError: # some envs ship argon2 bindings without InvalidHashError + from argon2.exceptions import VerifyMismatchError + + class InvalidHashError(Exception): # noqa: N818 — mirror argon2-cffi + pass + +from fastapi import APIRouter, Header, HTTPException, Request, Response +from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator +from sqlalchemy import delete, select +from sqlalchemy.exc import IntegrityError, ProgrammingError + +from src.auth_jwt import ( + decode_access_token_user_id, + decode_bearer_token, + jwt_credential_version_from_payload, + jwt_secret as jwt_secret_key, +) +from src.auth_mail import ( + deliver_email_verification_email, + deliver_password_reset_email, + deliver_registration_otp_email, + mail_delivery_configured, +) +from src.auth_rate_limit import ( + allow_forgot_password, + allow_login, + allow_resend_registration_otp, + allow_resend_verification, + allow_reset_password, +) +from src.staff_profile_domain import ( + apply_reverify_from_verified, + assert_complete_for_submission, + assert_employee_id_shape, + assert_unit_exclusive, + material_staff_fields_changed, + normalize_employee_id, + staff_row_for_audit, +) +from src.audit import ( + AuditAction, + jwt_payload_actor_email, + persist_audit_standalone, + record_audit, + resolve_actor_fields, +) +from src.initiative_db.engine import get_session, is_postgres_enabled +from src.initiative_db.models import ( + AcademicTitle, + EmailVerificationToken, + PasswordResetToken, + RegistrationOtpCode, + Unit, + User, + UserRoleRow, + UserStaffProfile, +) + +router = APIRouter(prefix="/auth", tags=["auth"]) + +_pwd = PasswordHasher() + +MAX_PASSWORD_INPUT_CHARS = 512 + +RESET_TOKEN_TTL = timedelta(hours=1) +VERIFY_EMAIL_TOKEN_TTL = timedelta(hours=48) +_register_otp_minutes_raw = os.getenv("REGISTER_OTP_TTL_MINUTES", "1").strip() +try: + _REGISTER_OTP_MINUTES = int(_register_otp_minutes_raw or "1") +except ValueError: + _REGISTER_OTP_MINUTES = 1 +REGISTER_OTP_TTL = timedelta(minutes=max(1, min(_REGISTER_OTP_MINUTES, 24 * 60))) +REGISTER_OTP_MAX_FAILED_ATTEMPTS = 5 +_OTP_VERIFY_REJECT_DETAIL = "Mã OTP không đúng hoặc đã hết hạn." + +FORGOT_PASSWORD_RESPONSE: dict[str, str] = { + "message": "Nếu email tồn tại trong hệ thống, hướng dẫn đặt lại mật khẩu đã được gửi.", +} + +RESEND_VERIFICATION_RESPONSE: dict[str, str] = { + "message": ( + "Nếu tài khoản cần xác minh, mã OTP đã được gửi đến email (kiểm tra hộp thư/spam)." + ), +} + +RESEND_OTP_RESPONSE: dict[str, str] = { + "message": "Nếu tài khoản cần xác minh, mã OTP đã được gửi đến email (kiểm tra hộp thư/spam).", +} + +VERIFY_EMAIL_SUCCESS: dict[str, str] = { + "message": "Email đã xác minh. Bạn có thể đăng nhập.", +} + +PASSWORD_RESET_SUCCESS: dict[str, str] = { + "message": "Mật khẩu đã được cập nhật. Vui lòng đăng nhập bằng mật khẩu mới.", +} + +# UMP or UMC faculty email (authoritative allow-list for *domain*; role allow-list is separate). +INSTITUTIONAL_EMAIL_RE = re.compile( + r"^[a-zA-Z0-9._%+-]+@(ump|umc)\.edu\.vn\Z", re.IGNORECASE +) + +# Default policy admins when AUTH_ADMIN_EMAILS is not set (must stay in sync with migration 007 cleanup list). +_DEFAULT_POLICY_ADMIN_EMAILS: frozenset[str] = frozenset( + { + "thaontt@ump.edu.vn", + "nltanh@ump.edu.vn", + "ldbaochau@ump.edu.vn", + "htchuong@ump.edu.vn", + "ththinh@ump.edu.vn", + "lhthinh@ump.edu.vn", + } +) + +ROLE_LITERAL = Literal["admin", "editor", "viewer"] + + +def _policy_admin_emails() -> frozenset[str]: + """ + Emails that receive ``admin`` from institutional policy (not self-service). + + If AUTH_ADMIN_EMAILS is set, **only** that comma-separated list is used (lowercased). + If unset, the built-in UMP allow-list above applies. + """ + raw = os.getenv("AUTH_ADMIN_EMAILS", "").strip() + if raw: + return frozenset(part.strip().lower() for part in raw.split(",") if part.strip()) + return _DEFAULT_POLICY_ADMIN_EMAILS + + +def _hash_password(plain: str) -> str: + return _pwd.hash(plain) + + +def _verify_password(plain: str, hashed: str) -> bool: + try: + _pwd.verify(hashed, plain) + return True + except (VerifyMismatchError, InvalidHashError): + return False + + +def _assert_password_policy(password: str) -> None: + if len(password) < 6: + raise HTTPException(status_code=400, detail="Mật khẩu tối thiểu 6 ký tự.") + if len(password) > MAX_PASSWORD_INPUT_CHARS: + raise HTTPException(status_code=400, detail="Mật khẩu quá dài.") + if not re.search(r"[a-z]", password): + raise HTTPException(status_code=400, detail="Mật khẩu phải có ít nhất một chữ cái thường.") + if not re.search(r"[A-Z]", password): + raise HTTPException(status_code=400, detail="Mật khẩu phải có ít nhất một chữ cái hoa.") + if not re.search(r"\d", password): + raise HTTPException(status_code=400, detail="Mật khẩu phải có ít nhất một chữ số.") + if not re.search(r"[^A-Za-z0-9]", password): + raise HTTPException( + status_code=400, + detail="Mật khẩu phải có ít nhất một ký tự đặc biệt (không chỉ chữ và số).", + ) + + +def _normalize_institutional_email(email: str) -> str: + e = email.strip().lower() + if not INSTITUTIONAL_EMAIL_RE.match(e): + raise HTTPException( + status_code=400, + detail="Email phải là địa chỉ UMP hoặc UMC hợp lệ (dạng ten@ump.edu.vn hoặc ten@umc.edu.vn).", + ) + return e + + +def _hash_reset_token(raw: str) -> str: + return hashlib.sha256(raw.encode("utf-8")).hexdigest() + + +def _otp_ttl_label_vi() -> str: + """Human-readable validity window for API copy (matches REGISTER_OTP_TTL).""" + total = max(1, int(REGISTER_OTP_TTL.total_seconds())) + if total >= 3600: + return f"{total // 3600} giờ" + if total >= 60: + m = max(1, total // 60) + return "1 phút" if m == 1 else f"{m} phút" + return f"{total} giây" + + +def _register_success_otp_message() -> str: + label = _otp_ttl_label_vi() + return ( + "Đăng ký thành công. Mã OTP 6 số đã được gửi đến email UMP/UMC " + "(kiểm tra cả thư mục spam). " + f"Mã có hiệu lực trong {label}. Nhập mã trên trang đăng ký để kích hoạt tài khoản." + ) + + +def _maybe_raise_otp_table_missing(exc: ProgrammingError) -> None: + blob = str(exc).lower() + if "registration_otp_codes" not in blob: + return + if "does not exist" not in blob and "undefinedtable" not in blob.replace(" ", ""): + return + raise HTTPException( + status_code=503, + detail=( + "Cơ sở dữ liệu chưa có bảng mã OTP (registration_otp_codes). " + "Khởi động lại API để áp dụng migration, hoặc chạy be0/migrations/014_registration_otp.sql." + ), + ) from exc + + +async def _delete_pending_registration_otps(session: Any, user_id: uuid.UUID) -> None: + try: + await session.execute( + delete(RegistrationOtpCode).where( + RegistrationOtpCode.user_id == user_id, + RegistrationOtpCode.used_at.is_(None), + ) + ) + except ProgrammingError as e: + _maybe_raise_otp_table_missing(e) + raise + + +async def _issue_registration_otp(session: Any, user_id: uuid.UUID) -> str: + try: + await _delete_pending_registration_otps(session, user_id) + plaintext = f"{secrets.randbelow(10**6):06d}" + otp_hash = _hash_reset_token(plaintext) + session.add( + RegistrationOtpCode( + user_id=user_id, + otp_hash=otp_hash, + expires_at=datetime.now(timezone.utc) + REGISTER_OTP_TTL, + failed_attempts=0, + ) + ) + await session.flush() + return plaintext + except ProgrammingError as e: + _maybe_raise_otp_table_missing(e) + raise + + +def _client_ip(request: Request) -> str: + xf = request.headers.get("x-forwarded-for") + if xf: + part = xf.split(",")[0].strip() + if part: + return part + if request.client and request.client.host: + return request.client.host + return "unknown" + + +def _issue_token(user_id: uuid.UUID, email: str, roles: list[str], credential_version: int) -> str: + now = datetime.now(timezone.utc) + exp = now + timedelta(hours=int(os.getenv("JWT_EXPIRE_HOURS", "12"))) + payload: dict[str, Any] = { + "sub": str(user_id), + "email": email, + "roles": roles, + "cv": int(credential_version), + "iat": int(now.timestamp()), + "exp": int(exp.timestamp()), + } + return jwt.encode(payload, jwt_secret_key(), algorithm="HS256") + + +async def _load_roles(session: Any, user_id: uuid.UUID) -> list[str]: + stmt = select(UserRoleRow.role).where(UserRoleRow.user_id == user_id) + rows = (await session.execute(stmt)).scalars().all() + out = [str(r) for r in rows] + return sorted(set(out)) + + +async def _reconcile_policy_admin(session: Any, user: User) -> None: + """ + Mandatory policy sync for ``admin`` tied to AUTH_ADMIN_EMAILS / defaults. + + - Allow-listed email: ensure ``admin`` row with admin_from_email_policy=TRUE. + - Not allow-listed: delete ``admin`` row only when admin_from_email_policy is TRUE, + preserving exceptional manual admin rows (FALSE). + """ + email_norm = user.email.strip().lower() + policy = _policy_admin_emails() + + stmt = select(UserRoleRow).where( + UserRoleRow.user_id == user.id, + UserRoleRow.role == "admin", + ) + admin_row = (await session.execute(stmt)).scalar_one_or_none() + + if email_norm in policy: + if admin_row is None: + session.add( + UserRoleRow( + user_id=user.id, + role="admin", + admin_from_email_policy=True, + ) + ) + else: + admin_row.admin_from_email_policy = True + elif admin_row is not None and admin_row.admin_from_email_policy: + await session.delete(admin_row) + + await session.flush() + + +async def _roles_after_reconcile(session: Any, user: User) -> list[str]: + await _reconcile_policy_admin(session, user) + return await _load_roles(session, user.id) + + +async def _load_staff_profile(session: Any, user_id: uuid.UUID) -> UserStaffProfile: + sp = await session.get(UserStaffProfile, user_id) + if sp is None: + sp = UserStaffProfile(user_id=user_id) + session.add(sp) + await session.flush() + return sp + + +async def _assert_academic_title_active(session: Any, code: str | None) -> None: + if not code: + return + row = await session.get(AcademicTitle, code) + if row is None or not row.active: + raise HTTPException(status_code=400, detail="Học hàm / học vị không hợp lệ.") + + +async def _assert_unit_exists(session: Any, unit_id: uuid.UUID | None) -> None: + if unit_id is None: + return + if await session.get(Unit, unit_id) is None: + raise HTTPException(status_code=400, detail="Đơn vị không tồn tại trong danh mục.") + + +def _staff_profile_api_dict(user: User, sp: UserStaffProfile) -> dict[str, Any]: + return { + "employeeId": sp.employee_id, + "academicTitleCode": sp.academic_title_code, + "academicTitleOther": sp.academic_title_other, + "unitId": str(user.unit_id) if user.unit_id else None, + "unitNameFreetext": sp.unit_name_freetext, + "jobTitle": sp.job_title, + "profileVerificationStatus": sp.profile_verification_status, + "verificationSubmittedAt": sp.verification_submitted_at, + "verifiedAt": sp.verified_at, + "verifiedByUserId": str(sp.verified_by_user_id) if sp.verified_by_user_id else None, + "rejectionReason": sp.rejection_reason, + "version": sp.version, + } + + +def _user_public_dict(user: User, roles: list[str], sp: UserStaffProfile | None = None) -> dict[str, Any]: + out: dict[str, Any] = { + "id": str(user.id), + "email": user.email, + "name": user.full_name, + "roles": roles, + "phone": user.phone, + "emailVerified": bool(getattr(user, "email_verified", True)), + } + if sp is not None: + out["staffProfile"] = _staff_profile_api_dict(user, sp) + return out + + +class RegisterBody(BaseModel): + model_config = ConfigDict(extra="ignore") + + fullName: str = Field(..., min_length=2, max_length=200) + email: str = Field(..., min_length=5, max_length=254) + password: str = Field(..., min_length=1) + passwordConfirm: str = Field(..., min_length=1) + # Deprecated: ignored; server derives roles (admin vs viewer) from email policy. + role: ROLE_LITERAL | None = Field(default=None, description="Ignored.") + employeeId: str = Field(..., min_length=1, max_length=40) + academicTitleCode: str = Field(..., min_length=1, max_length=64) + academicTitleOther: str | None = Field(default=None, max_length=200) + unitId: uuid.UUID | None = None + unitNameFreetext: str | None = Field(default=None, max_length=300) + jobTitle: str = Field(..., min_length=1, max_length=120) + + @field_validator("fullName") + @classmethod + def strip_name(cls, v: str) -> str: + s = v.strip() + if len(s) < 2: + raise ValueError("Họ tên quá ngắn.") + return s + + @field_validator("academicTitleCode", mode="before") + @classmethod + def strip_title_code(cls, v: object) -> str: + if v is None or v == "": + raise ValueError("Chọn học hàm / học vị.") + s = str(v).strip() + if not s: + raise ValueError("Chọn học hàm / học vị.") + return s + + @field_validator("academicTitleOther", "unitNameFreetext", mode="before") + @classmethod + def strip_optional_text(cls, v: object) -> str | None: + if v is None or v == "": + return None + s = str(v).strip() + return s or None + + @field_validator("employeeId", mode="before") + @classmethod + def strip_employee_required(cls, v: object) -> str: + if v is None or v == "": + raise ValueError("Vui lòng nhập mã số nhân sự.") + s = str(v).strip() + if not s: + raise ValueError("Vui lòng nhập mã số nhân sự.") + return s + + @field_validator("jobTitle", mode="before") + @classmethod + def strip_job_required(cls, v: object) -> str: + if v is None or v == "": + raise ValueError("Nhập chức vụ công tác.") + s = str(v).strip() + if not s: + raise ValueError("Nhập chức vụ công tác.") + return s + + @model_validator(mode="after") + def staff_cross_field(self) -> RegisterBody: + if self.academicTitleCode == "other": + if not self.academicTitleOther or not str(self.academicTitleOther).strip(): + raise ValueError( + "Khi chọn «Khác», vui lòng nhập nội dung học hàm / học vị." + ) + has_unit = self.unitId is not None or ( + self.unitNameFreetext is not None and len(str(self.unitNameFreetext).strip()) > 0 + ) + if not has_unit: + raise ValueError("Chọn đơn vị công tác hoặc nhập tên đơn vị.") + return self + + +class LoginBody(BaseModel): + email: str = Field(..., max_length=254) + password: str = Field(..., min_length=1, max_length=MAX_PASSWORD_INPUT_CHARS) + + @field_validator("email") + @classmethod + def strip_login_email(cls, v: str) -> str: + s = v.strip() + if not s: + raise ValueError("Vui lòng nhập email.") + return s + + @field_validator("password") + @classmethod + def reject_blank_password(cls, v: str) -> str: + if not v.strip(): + raise ValueError("Vui lòng nhập mật khẩu.") + return v + + +class ForgotPasswordBody(BaseModel): + model_config = ConfigDict(extra="ignore") + + email: str = Field(..., min_length=5, max_length=254) + + +class ResetPasswordBody(BaseModel): + model_config = ConfigDict(extra="ignore") + + token: str = Field(..., min_length=10, max_length=512) + newPassword: str = Field(..., min_length=1) + newPasswordConfirm: str = Field(..., min_length=1) + + +@router.post("/register") +async def register(body: RegisterBody) -> dict[str, Any]: + if not is_postgres_enabled(): + raise HTTPException( + status_code=503, + detail="Đăng ký tạm thời không khả dụng (cơ sở dữ liệu chưa cấu hình).", + ) + + if body.password != body.passwordConfirm: + raise HTTPException(status_code=400, detail="Mật khẩu xác nhận không khớp.") + + _assert_password_policy(body.password) + email_n = _normalize_institutional_email(body.email) + policy = _policy_admin_emails() + + pwd_hash = _hash_password(body.password) + + mail_out: tuple[str, str] | None = None + async with get_session() as session: + existing = ( + await session.execute(select(User.id).where(User.email == email_n)) + ).scalar_one_or_none() + if existing is not None: + raise HTTPException(status_code=409, detail="Email này đã được đăng ký.") + + user = User( + id=uuid.uuid4(), + email=email_n, + password_hash=pwd_hash, + full_name=body.fullName, + email_verified=False, + ) + if body.unitId is not None: + user.unit_id = body.unitId + session.add(user) + await session.flush() + + await _assert_unit_exists(session, user.unit_id) + await _assert_academic_title_active(session, body.academicTitleCode) + emp = normalize_employee_id(body.employeeId) + + staff = UserStaffProfile( + user_id=user.id, + employee_id=emp, + academic_title_code=body.academicTitleCode, + academic_title_other=body.academicTitleOther, + unit_name_freetext=body.unitNameFreetext, + job_title=body.jobTitle, + ) + try: + assert_unit_exclusive(user, staff) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) from e + try: + assert_complete_for_submission(user, staff) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) from e + session.add(staff) + try: + await session.flush() + except IntegrityError as e: + raise HTTPException( + status_code=409, + detail="Mã số nhân sự đã được sử dụng hoặc dữ liệu không hợp lệ.", + ) from e + + if email_n in policy: + session.add( + UserRoleRow( + user_id=user.id, + role="admin", + admin_from_email_policy=True, + ) + ) + else: + session.add( + UserRoleRow( + user_id=user.id, + role="viewer", + admin_from_email_policy=False, + ) + ) + try: + await session.flush() + except IntegrityError: + raise HTTPException(status_code=409, detail="Không thể gán vai trò — thử lại.") from None + + roles = await _roles_after_reconcile(session, user) + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.create, + entity_type="user", + entity_id=str(user.id), + after={ + "email": user.email, + "fullName": user.full_name, + "roles": roles, + }, + metadata={"source": "auth_register"}, + ) + otp_plain = await _issue_registration_otp(session, user.id) + public_user = _user_public_dict(user, roles, staff) + mail_out = (email_n, otp_plain) + + otp_delivery = "none" + if mail_out is not None: + try: + ch = await deliver_registration_otp_email(mail_out[0], mail_out[1]) + if ch in ("smtp", "log_only", "none"): + otp_delivery = ch + except Exception as e: + import logging + + logging.getLogger(__name__).exception("register: OTP mail failed: %s", e) + otp_delivery = "smtp_failed" if mail_delivery_configured() else "none" + + return { + "message": _register_success_otp_message(), + "email": email_n, + "emailVerificationRequired": True, + "otpTtlSeconds": int(REGISTER_OTP_TTL.total_seconds()), + "otpDeliveryChannel": otp_delivery, + "user": public_user, + } + + +@router.post("/login") +async def login(body: LoginBody, request: Request) -> dict[str, Any]: + if not is_postgres_enabled(): + raise HTTPException( + status_code=503, + detail="Đăng nhập qua máy chủ yêu cầu cơ sở dữ liệu.", + ) + + email_n = _normalize_institutional_email(body.email) + if not allow_login(email_n, _client_ip(request)): + raise HTTPException( + status_code=429, + detail="Quá nhiều lần đăng nhập. Vui lòng thử lại sau.", + ) + + user: User | None = None + roles: list[str] = [] + staff_profile: UserStaffProfile | None = None + login_ok = False + needs_email_verify = False + failed_uid: uuid.UUID | None = None + failed_roles = "none" + + async with get_session() as session: + stmt = select(User).where(User.email == email_n, User.is_active.is_(True)) + user = (await session.execute(stmt)).scalar_one_or_none() + + if user is not None and _verify_password(body.password, user.password_hash): + if not user.email_verified: + needs_email_verify = True + failed_uid = user.id + _, failed_roles = await resolve_actor_fields(session, user.id) + else: + roles = await _roles_after_reconcile(session, user) + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.login, + entity_type="auth", + entity_id=str(user.id), + metadata={"path": "/auth/login"}, + ) + staff_profile = await _load_staff_profile(session, user.id) + login_ok = True + elif user is not None: + failed_uid = user.id + _, failed_roles = await resolve_actor_fields(session, user.id) + + if needs_email_verify: + await persist_audit_standalone( + actor_user_id=failed_uid, + actor_email=email_n, + actor_role=failed_roles, + action=AuditAction.login_failed, + entity_type="auth", + metadata={"event": "login_failed", "reason": "email_unverified"}, + ) + raise HTTPException( + status_code=403, + detail="Vui lòng xác minh email trước khi đăng nhập. Kiểm tra hộp thư " + "hoặc dùng chức năng gửi lại mã OTP trên trang đăng ký.", + ) + + if not login_ok: + await persist_audit_standalone( + actor_user_id=failed_uid, + actor_email=email_n, + actor_role=failed_roles, + action=AuditAction.login_failed, + entity_type="auth", + metadata={"event": "login_failed"}, + ) + raise HTTPException(status_code=401, detail="Email hoặc mật khẩu không đúng.") + + assert user is not None + + token = _issue_token(user.id, user.email, roles, int(user.credential_version)) + return {"accessToken": token, "user": _user_public_dict(user, roles, staff_profile)} + + +@router.post("/forgot-password") +async def forgot_password(body: ForgotPasswordBody, request: Request) -> dict[str, str]: + """Always same response for unknown / inactive email; rate-limited.""" + if not is_postgres_enabled(): + return FORGOT_PASSWORD_RESPONSE + + email_n = _normalize_institutional_email(body.email) + ip = _client_ip(request) + if not allow_forgot_password(email_n, ip): + raise HTTPException( + status_code=429, + detail="Quá nhiều yêu cầu. Vui lòng thử lại sau.", + ) + + if not mail_delivery_configured(): + import logging + + logging.getLogger(__name__).warning( + "forgot-password: mail not configured (SMTP_HOST or AUTH_MAIL_LOG_ONLY)" + ) + return FORGOT_PASSWORD_RESPONSE + + mail_to_send: tuple[str, str] | None = None + async with get_session() as session: + user = ( + await session.execute( + select(User).where(User.email == email_n, User.is_active.is_(True)) + ) + ).scalar_one_or_none() + + if user is None: + return FORGOT_PASSWORD_RESPONSE + + raw = secrets.token_urlsafe(32) + th = _hash_reset_token(raw) + await session.execute( + delete(PasswordResetToken).where( + PasswordResetToken.user_id == user.id, + PasswordResetToken.used_at.is_(None), + ) + ) + session.add( + PasswordResetToken( + user_id=user.id, + token_hash=th, + expires_at=datetime.now(timezone.utc) + RESET_TOKEN_TTL, + ) + ) + await session.flush() + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="auth", + entity_id=str(user.id), + metadata={"source": "auth_forgot_password", "event": "password_reset_requested"}, + ) + mail_to_send = (str(user.email), raw) + + if mail_to_send: + try: + await deliver_password_reset_email(mail_to_send[0], mail_to_send[1]) + except Exception as e: + import logging + + logging.getLogger(__name__).exception("forgot-password: mail send failed: %s", e) + + return FORGOT_PASSWORD_RESPONSE + + +@router.post("/reset-password") +async def reset_password(body: ResetPasswordBody, request: Request) -> dict[str, str]: + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + if not allow_reset_password(_client_ip(request)): + raise HTTPException( + status_code=429, + detail="Quá nhiều yêu cầu. Vui lòng thử lại sau.", + ) + + if body.newPassword != body.newPasswordConfirm: + raise HTTPException(status_code=400, detail="Mật khẩu xác nhận không khớp.") + _assert_password_policy(body.newPassword) + + raw = body.token.strip() + if not raw or len(raw) < 10: + raise HTTPException(status_code=400, detail="Liên kết không hợp lệ hoặc đã hết hạn.") + + th = _hash_reset_token(raw) + now = datetime.now(timezone.utc) + + async with get_session() as session: + row = ( + await session.execute( + select(PasswordResetToken).where(PasswordResetToken.token_hash == th) + ) + ).scalar_one_or_none() + + if row is None or row.used_at is not None or row.expires_at <= now: + raise HTTPException(status_code=400, detail="Liên kết không hợp lệ hoặc đã hết hạn.") + + user = await session.get(User, row.user_id) + if user is None or not user.is_active: + raise HTTPException(status_code=400, detail="Liên kết không hợp lệ hoặc đã hết hạn.") + + user.password_hash = _hash_password(body.newPassword) + user.credential_version = int(user.credential_version or 0) + 1 + user.updated_at = now + row.used_at = now + await session.flush() + + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user", + entity_id=str(user.id), + before={"password": "[redacted]"}, + after={"password": "[changed]"}, + metadata={"source": "auth_reset_password", "event": "password_reset_completed"}, + ) + + return PASSWORD_RESET_SUCCESS + + +class VerifyEmailBody(BaseModel): + model_config = ConfigDict(extra="ignore") + + token: str = Field(..., min_length=10, max_length=512) + + +class ResendVerificationBody(BaseModel): + model_config = ConfigDict(extra="ignore") + + email: str = Field(..., min_length=5, max_length=254) + + +class VerifyRegistrationOtpBody(BaseModel): + model_config = ConfigDict(extra="ignore") + + email: str = Field(..., min_length=5, max_length=254) + otp: str = Field(..., min_length=6, max_length=32) + + @field_validator("email") + @classmethod + def strip_email_votp(cls, v: object) -> str: + s = str(v).strip().lower() + if not s: + raise ValueError("Vui lòng nhập email.") + return s + + @field_validator("otp") + @classmethod + def otp_six_digits(cls, v: object) -> str: + s = str(v).strip() + if not re.fullmatch(r"\d{6}", s): + raise ValueError("Mã gồm đúng 6 chữ số.") + return s + + +class ResendRegistrationOtpBody(BaseModel): + model_config = ConfigDict(extra="ignore") + + email: str = Field(..., min_length=5, max_length=254) + + @field_validator("email") + @classmethod + def strip_email_resend_otp(cls, v: object) -> str: + return str(v).strip().lower() + + +@router.post("/verify-otp") +async def verify_registration_otp(body: VerifyRegistrationOtpBody) -> dict[str, str]: + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + try: + email_n = _normalize_institutional_email(body.email) + except HTTPException: + raise HTTPException(status_code=400, detail=_OTP_VERIFY_REJECT_DETAIL) from None + + otp_hash_expected = _hash_reset_token(body.otp) + now = datetime.now(timezone.utc) + + async with get_session() as session: + user = ( + await session.execute( + select(User).where(User.email == email_n, User.is_active.is_(True)) + ) + ).scalar_one_or_none() + + if user is None or user.email_verified: + raise HTTPException(status_code=400, detail=_OTP_VERIFY_REJECT_DETAIL) + + stmt = ( + select(RegistrationOtpCode) + .where( + RegistrationOtpCode.user_id == user.id, + RegistrationOtpCode.used_at.is_(None), + RegistrationOtpCode.expires_at > now, + RegistrationOtpCode.failed_attempts < REGISTER_OTP_MAX_FAILED_ATTEMPTS, + ) + .order_by(RegistrationOtpCode.created_at.desc()) + .limit(1) + ) + row = (await session.execute(stmt)).scalar_one_or_none() + + if row is None: + raise HTTPException(status_code=400, detail=_OTP_VERIFY_REJECT_DETAIL) + + if not hmac.compare_digest(row.otp_hash, otp_hash_expected): + row.failed_attempts += 1 + await session.flush() + raise HTTPException(status_code=400, detail=_OTP_VERIFY_REJECT_DETAIL) + + user.email_verified = True + user.updated_at = now + row.used_at = now + await session.flush() + + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user", + entity_id=str(user.id), + after={"emailVerified": True}, + metadata={"source": "auth_verify_registration_otp"}, + ) + + return VERIFY_EMAIL_SUCCESS + + +@router.post("/resend-otp") +async def resend_registration_otp(body: ResendRegistrationOtpBody, request: Request) -> dict[str, str]: + """Enumeration-safe envelope (same response whether email exists / needs OTP).""" + if not is_postgres_enabled(): + return RESEND_OTP_RESPONSE + + try: + email_n = _normalize_institutional_email(body.email) + except HTTPException: + return RESEND_OTP_RESPONSE + + ip = _client_ip(request) + if not allow_resend_registration_otp(email_n, ip): + raise HTTPException( + status_code=429, + detail="Quá nhiều yêu cầu. Vui lòng thử lại sau.", + ) + + if not mail_delivery_configured(): + import logging + + logging.getLogger(__name__).warning( + "resend-otp: mail not configured (SMTP_HOST or AUTH_MAIL_LOG_ONLY)" + ) + return RESEND_OTP_RESPONSE + + mail_to_send: tuple[str, str] | None = None + async with get_session() as session: + user = ( + await session.execute( + select(User).where(User.email == email_n, User.is_active.is_(True)) + ) + ).scalar_one_or_none() + + if user is not None and not user.email_verified: + code = await _issue_registration_otp(session, user.id) + await session.flush() + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="auth", + entity_id=str(user.id), + metadata={"source": "auth_resend_registration_otp"}, + ) + mail_to_send = (str(user.email), code) + + if mail_to_send is not None: + try: + await deliver_registration_otp_email(mail_to_send[0], mail_to_send[1]) + except Exception as e: + import logging + + logging.getLogger(__name__).exception("resend-otp: mail send failed: %s", e) + + return RESEND_OTP_RESPONSE + + +@router.post("/verify-email") +async def verify_email(body: VerifyEmailBody) -> dict[str, str]: + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + + raw = body.token.strip() + if len(raw) < 10: + raise HTTPException(status_code=400, detail="Liên kết không hợp lệ hoặc đã hết hạn.") + + th = _hash_reset_token(raw) + now = datetime.now(timezone.utc) + + async with get_session() as session: + row = ( + await session.execute( + select(EmailVerificationToken).where(EmailVerificationToken.token_hash == th) + ) + ).scalar_one_or_none() + + if row is None or row.used_at is not None or row.expires_at <= now: + raise HTTPException(status_code=400, detail="Liên kết không hợp lệ hoặc đã hết hạn.") + + user = await session.get(User, row.user_id) + if user is None or not user.is_active: + raise HTTPException(status_code=400, detail="Liên kết không hợp lệ hoặc đã hết hạn.") + + user.email_verified = True + user.updated_at = now + row.used_at = now + await session.flush() + + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user", + entity_id=str(user.id), + after={"emailVerified": True}, + metadata={"source": "auth_verify_email"}, + ) + + return VERIFY_EMAIL_SUCCESS + + +@router.post("/resend-verification") +async def resend_verification(body: ResendVerificationBody, request: Request) -> dict[str, str]: + """Same envelope whether or not the account exists / needs verification (enumeration-safe).""" + if not is_postgres_enabled(): + return RESEND_VERIFICATION_RESPONSE + + email_n = _normalize_institutional_email(body.email) + ip = _client_ip(request) + if not allow_resend_verification(email_n, ip): + raise HTTPException( + status_code=429, + detail="Quá nhiều yêu cầu. Vui lòng thử lại sau.", + ) + + if not mail_delivery_configured(): + import logging + + logging.getLogger(__name__).warning( + "resend-verification: mail not configured (SMTP_HOST or AUTH_MAIL_LOG_ONLY)" + ) + return RESEND_VERIFICATION_RESPONSE + + mail_to_send: tuple[str, str] | None = None + async with get_session() as session: + user = ( + await session.execute( + select(User).where(User.email == email_n, User.is_active.is_(True)) + ) + ).scalar_one_or_none() + + if user is not None and not user.email_verified: + code = await _issue_registration_otp(session, user.id) + await session.flush() + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="auth", + entity_id=str(user.id), + metadata={"source": "auth_resend_verification"}, + ) + mail_to_send = (str(user.email), code) + + if mail_to_send: + try: + await deliver_registration_otp_email(mail_to_send[0], mail_to_send[1]) + except Exception as e: + import logging + + logging.getLogger(__name__).exception("resend-verification: mail send failed: %s", e) + + return RESEND_VERIFICATION_RESPONSE + + +@router.post("/refresh") +async def refresh_session(authorization: str | None = Header(None)) -> dict[str, Any]: + if not authorization or not authorization.lower().startswith("bearer "): + raise HTTPException(401, detail="Thiếu token.") + raw = authorization.split(None, 1)[1].strip() + try: + payload = jwt.decode(raw, jwt_secret_key(), algorithms=["HS256"]) + except jwt.ExpiredSignatureError: + raise HTTPException( + status_code=401, + detail="Phiên đăng nhập hết hạn. Vui lòng đăng nhập lại.", + ) from None + except jwt.PyJWTError: + raise HTTPException(401, detail="Token không hợp lệ.") from None + + try: + uid = uuid.UUID(str(payload["sub"])) + except (ValueError, KeyError): + raise HTTPException(401, detail="Token không hợp lệ.") from None + + jwt_cv = jwt_credential_version_from_payload(payload) + + async with get_session() as session: + user = await session.get(User, uid) + if user is None or not user.is_active: + raise HTTPException(401, detail="Tài khoản không còn hiệu lực.") + if not user.email_verified: + raise HTTPException( + status_code=403, + detail="Vui lòng xác minh email trước khi tiếp tục.", + ) + db_cv = int(user.credential_version or 0) + if jwt_cv != db_cv: + raise HTTPException( + status_code=401, + detail="Phiên đăng nhập hết hạn. Vui lòng đăng nhập lại.", + ) + roles = await _roles_after_reconcile(session, user) + sp = await _load_staff_profile(session, user.id) + + token = _issue_token(user.id, user.email, roles, db_cv) + return {"accessToken": token, "user": _user_public_dict(user, roles, sp)} + + +@router.post("/logout") +async def logout(authorization: str | None = Header(None)) -> Response: + """Client clears JWT locally; audit row only when Bearer decodes cleanly.""" + if is_postgres_enabled(): + payload = decode_bearer_token(authorization) + if payload: + uid = decode_access_token_user_id(authorization) + email, roles_csv = jwt_payload_actor_email(payload) + await persist_audit_standalone( + actor_user_id=uid, + actor_email=email if email else "", + actor_role=roles_csv, + action=AuditAction.logout, + entity_type="auth", + entity_id=str(uid) if uid else None, + metadata={"path": "/auth/logout"}, + ) + return Response(status_code=204) + + +@router.get("/reference/academic-titles") +async def reference_academic_titles() -> list[dict[str, Any]]: + if not is_postgres_enabled(): + return [] + async with get_session() as session: + stmt = ( + select(AcademicTitle) + .where(AcademicTitle.active.is_(True)) + .order_by(AcademicTitle.sort_order, AcademicTitle.code) + ) + rows = (await session.execute(stmt)).scalars().all() + return [{"code": r.code, "labelVi": r.label_vi, "labelEn": r.label_en} for r in rows] + + +@router.get("/reference/units") +async def reference_units() -> list[dict[str, str]]: + if not is_postgres_enabled(): + return [] + async with get_session() as session: + stmt = select(Unit).order_by(Unit.name) + rows = (await session.execute(stmt)).scalars().all() + return [{"id": str(r.id), "name": r.name} for r in rows] + + +@router.get("/me") +async def get_current_profile(authorization: str | None = Header(None)) -> dict[str, Any]: + if not is_postgres_enabled(): + raise HTTPException( + status_code=503, + detail="Hồ sơ yêu cầu cơ sở dữ liệu.", + ) + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(401, detail="Vui lòng đăng nhập.") + + async with get_session() as session: + user = await session.get(User, uid) + if user is None or not user.is_active: + raise HTTPException(401, detail="Tài khoản không còn hiệu lực.") + if not user.email_verified: + raise HTTPException( + status_code=403, + detail="Vui lòng xác minh email trước khi tiếp tục.", + ) + roles = await _roles_after_reconcile(session, user) + sp = await _load_staff_profile(session, user.id) + return _user_public_dict(user, roles, sp) + + +class UpdateProfileBody(BaseModel): + model_config = ConfigDict(extra="ignore") + + fullName: str | None = None + phone: str | None = None + employeeId: str | None = None + academicTitleCode: str | None = None + academicTitleOther: str | None = None + unitId: uuid.UUID | None = None + unitNameFreetext: str | None = None + jobTitle: str | None = None + + @field_validator("fullName", mode="before") + @classmethod + def strip_full_name(cls, v: object) -> str | None: + if v is None: + return None + s = str(v).strip() + if len(s) < 2: + raise ValueError("Họ tên phải có ít nhất 2 ký tự.") + return s + + @field_validator("phone", mode="before") + @classmethod + def strip_phone(cls, v: object) -> str | None: + if v is None or v == "": + return None + s = str(v).strip() + if len(s) > 32: + raise ValueError("Số điện thoại quá dài.") + return s + + @field_validator("academicTitleCode", mode="before") + @classmethod + def strip_title_code(cls, v: object) -> str | None: + if v is None or v == "": + return None + s = str(v).strip() + return s or None + + @field_validator("academicTitleOther", "unitNameFreetext", "jobTitle", mode="before") + @classmethod + def strip_optional_text(cls, v: object) -> str | None: + if v is None or v == "": + return None + s = str(v).strip() + return s or None + + @field_validator("employeeId", mode="before") + @classmethod + def strip_employee(cls, v: object) -> str | None: + if v is None or v == "": + return None + return str(v).strip() + + +@router.patch("/profile") +async def update_profile( + body: UpdateProfileBody, authorization: str | None = Header(None) +) -> dict[str, Any]: + if not is_postgres_enabled(): + raise HTTPException( + status_code=503, + detail="Cập nhật hồ sơ yêu cầu cơ sở dữ liệu.", + ) + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(401, detail="Vui lòng đăng nhập.") + + patch = body.model_dump(exclude_unset=True) + if not patch: + raise HTTPException(400, detail="Không có dữ liệu để cập nhật.") + + now = datetime.now(timezone.utc) + async with get_session() as session: + user = await session.get(User, uid) + if user is None or not user.is_active: + raise HTTPException(401, detail="Tài khoản không còn hiệu lực.") + sp = await _load_staff_profile(session, user.id) + if sp.profile_verification_status == "pending": + raise HTTPException( + status_code=409, + detail="Hồ sơ đang chờ xác minh — không thể chỉnh sửa cho đến khi quản trị xử lý.", + ) + staff_before = staff_row_for_audit(sp, user.unit_id) + user_before = {"fullName": user.full_name, "phone": user.phone} + + if "fullName" in patch and patch["fullName"] is not None: + user.full_name = str(patch["fullName"]) + if "phone" in patch: + user.phone = patch["phone"] if patch["phone"] else None + + if "unitId" in patch: + user.unit_id = patch["unitId"] + if patch["unitId"] is not None: + sp.unit_name_freetext = None + await _assert_unit_exists(session, user.unit_id) + + if "unitNameFreetext" in patch: + sp.unit_name_freetext = patch["unitNameFreetext"] + if sp.unit_name_freetext: + user.unit_id = None + + if "employeeId" in patch: + emp = normalize_employee_id(patch.get("employeeId")) + try: + assert_employee_id_shape(emp) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) from e + sp.employee_id = emp + + if "academicTitleCode" in patch: + code = patch["academicTitleCode"] + await _assert_academic_title_active(session, code) + sp.academic_title_code = code + if code != "other": + sp.academic_title_other = None + + if "academicTitleOther" in patch: + sp.academic_title_other = patch["academicTitleOther"] + + if "jobTitle" in patch: + sp.job_title = patch["jobTitle"] + + try: + assert_unit_exclusive(user, sp) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) from e + + staff_after_partial = staff_row_for_audit(sp, user.unit_id) + if material_staff_fields_changed(staff_before, staff_after_partial): + sp.version += 1 + if sp.profile_verification_status == "verified": + apply_reverify_from_verified(sp, now) + + user.updated_at = now + sp.updated_at = now + try: + await session.flush() + except IntegrityError as e: + raise HTTPException( + status_code=409, + detail="Mã số nhân sự đã được sử dụng hoặc dữ liệu không hợp lệ.", + ) from e + + staff_final = staff_row_for_audit(sp, user.unit_id) + roles = await _roles_after_reconcile(session, user) + actor_email, actor_role = await resolve_actor_fields(session, user.id) + user_after = {"fullName": user.full_name, "phone": user.phone} + if user_before != user_after: + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user", + entity_id=str(user.id), + before=user_before, + after=user_after, + metadata={"source": "auth_profile"}, + ) + if staff_before != staff_final: + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user_profile", + entity_id=str(user.id), + before=staff_before, + after=staff_final, + metadata={"source": "auth_profile_staff"}, + ) + return _user_public_dict(user, roles, sp) + + +@router.post("/profile/submit-verification") +async def submit_profile_verification(authorization: str | None = Header(None)) -> dict[str, Any]: + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa cấu hình.") + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(401, detail="Vui lòng đăng nhập.") + + now = datetime.now(timezone.utc) + async with get_session() as session: + user = await session.get(User, uid) + if user is None or not user.is_active: + raise HTTPException(401, detail="Tài khoản không còn hiệu lực.") + sp = await _load_staff_profile(session, user.id) + if sp.profile_verification_status not in ("draft", "rejected"): + raise HTTPException(status_code=400, detail="Chỉ gửi xác minh khi hồ sơ ở trạng thái nháp hoặc bị từ chối.") + try: + assert_complete_for_submission(user, sp) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) from e + before = staff_row_for_audit(sp, user.unit_id) + sp.profile_verification_status = "pending" + sp.verification_submitted_at = now + sp.rejection_reason = None + sp.verified_at = None + sp.verified_by_user_id = None + sp.version += 1 + sp.updated_at = now + await session.flush() + after = staff_row_for_audit(sp, user.unit_id) + roles = await _roles_after_reconcile(session, user) + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user_profile", + entity_id=str(user.id), + before=before, + after=after, + metadata={"source": "auth_submit_verification"}, + ) + return _user_public_dict(user, roles, sp) + + +class ChangePasswordBody(BaseModel): + currentPassword: str = Field(..., min_length=1) + newPassword: str = Field(..., min_length=1) + newPasswordConfirm: str = Field(..., min_length=1) + + +@router.post("/change-password") +async def change_password( + body: ChangePasswordBody, authorization: str | None = Header(None) +) -> dict[str, Any]: + if not is_postgres_enabled(): + raise HTTPException( + status_code=503, + detail="Đổi mật khẩu yêu cầu cơ sở dữ liệu.", + ) + if body.newPassword != body.newPasswordConfirm: + raise HTTPException(400, detail="Mật khẩu mới xác nhận không khớp.") + _assert_password_policy(body.newPassword) + + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(401, detail="Vui lòng đăng nhập.") + + async with get_session() as session: + user = await session.get(User, uid) + if user is None or not user.is_active: + raise HTTPException(401, detail="Tài khoản không còn hiệu lực.") + if not _verify_password(body.currentPassword, user.password_hash): + raise HTTPException(400, detail="Mật khẩu hiện tại không đúng.") + + user.password_hash = _hash_password(body.newPassword) + user.credential_version = int(user.credential_version or 0) + 1 + user.updated_at = datetime.now(timezone.utc) + await session.flush() + actor_email, actor_role = await resolve_actor_fields(session, user.id) + await record_audit( + session, + actor_user_id=user.id, + actor_email=actor_email, + actor_role=actor_role, + action=AuditAction.update, + entity_type="user", + entity_id=str(user.id), + before={"password": "[redacted]"}, + after={"password": "[changed]"}, + metadata={"source": "auth_change_password"}, + ) + roles = await _roles_after_reconcile(session, user) + sp = await _load_staff_profile(session, user.id) + user_payload = _user_public_dict(user, roles, sp) + new_cv = int(user.credential_version) + + token = _issue_token(uid, user_payload["email"], roles, new_cv) + return {"accessToken": token, "user": user_payload} diff --git a/be0/src/auth_credential_middleware.py b/be0/src/auth_credential_middleware.py new file mode 100644 index 0000000..414ffc1 --- /dev/null +++ b/be0/src/auth_credential_middleware.py @@ -0,0 +1,72 @@ +"""Reject JWTs whose credential_version no longer matches the user (password change / reset).""" + +from __future__ import annotations + +import uuid +from typing import Awaitable, Callable + +from fastapi import Request, Response +from starlette.responses import JSONResponse + +from src.auth_jwt import decode_bearer_token, jwt_credential_version_from_payload +from src.initiative_db.engine import get_session, is_postgres_enabled +from src.initiative_db.models import User + +# POST-only auth paths that accept requests without cv check (unauthenticated or pre-migration tokens). +_AUTH_PUBLIC: set[tuple[str, str]] = { + ("/api/v1/auth/register", "POST"), + ("/api/v1/auth/login", "POST"), + ("/api/v1/auth/forgot-password", "POST"), + ("/api/v1/auth/reset-password", "POST"), + ("/api/v1/auth/verify-email", "POST"), + ("/api/v1/auth/resend-verification", "POST"), + ("/api/v1/auth/verify-otp", "POST"), + ("/api/v1/auth/resend-otp", "POST"), +} + + +def _is_public_auth_path(path: str, method: str) -> bool: + return (path, method.upper()) in _AUTH_PUBLIC + + +async def auth_credential_version_middleware( + request: Request, call_next: Callable[[Request], Awaitable[Response]] +) -> Response: + if not is_postgres_enabled(): + return await call_next(request) + + auth = request.headers.get("authorization") + if not auth: + return await call_next(request) + + path = request.url.path + method = request.method.upper() + if _is_public_auth_path(path, method): + return await call_next(request) + + payload = decode_bearer_token(auth) + if payload is None: + return JSONResponse({"detail": "Token không hợp lệ."}, status_code=401) + + try: + uid = uuid.UUID(str(payload["sub"])) + except (KeyError, ValueError): + return JSONResponse({"detail": "Token không hợp lệ."}, status_code=401) + + jwt_cv = jwt_credential_version_from_payload(payload) + + async with get_session() as session: + user = await session.get(User, uid) + if user is None or not user.is_active: + return JSONResponse( + {"detail": "Tài khoản không còn hiệu lực."}, + status_code=401, + ) + db_cv = int(user.credential_version or 0) + if jwt_cv != db_cv: + return JSONResponse( + {"detail": "Phiên đăng nhập hết hạn. Vui lòng đăng nhập lại."}, + status_code=401, + ) + + return await call_next(request) diff --git a/be0/src/auth_jwt.py b/be0/src/auth_jwt.py new file mode 100644 index 0000000..89ea5ea --- /dev/null +++ b/be0/src/auth_jwt.py @@ -0,0 +1,58 @@ +"""JWT decode helpers shared by API routes (no Argon2 import — avoids heavy auth stack).""" + +from __future__ import annotations + +import os +import uuid +from typing import Any + +import jwt + + +def jwt_secret() -> str: + secret = os.getenv("JWT_SECRET", "").strip() + env = os.getenv("ENVIRONMENT", "development").lower() + if not secret: + if env in ("production", "staging"): + raise RuntimeError("JWT_SECRET must be set when ENVIRONMENT is production or staging") + return "dev-only-change-me-jwt-secret-min-32-chars!!" + if len(secret) < 32: + raise ValueError("JWT_SECRET should be at least 32 characters") + return secret + + +def decode_bearer_token(authorization: str | None) -> dict[str, Any] | None: + """Return JWT claims dict or None.""" + if not authorization or not authorization.lower().startswith("bearer "): + return None + raw = authorization.split(None, 1)[1].strip() + try: + return jwt.decode(raw, jwt_secret(), algorithms=["HS256"]) # type: ignore[no-any-return] + except jwt.PyJWTError: + return None + + +def decode_access_token_user_id(authorization: str | None) -> uuid.UUID | None: + payload = decode_bearer_token(authorization) + if not payload or "sub" not in payload: + return None + try: + return uuid.UUID(str(payload["sub"])) + except ValueError: + return None + + +def jwt_credential_version_from_payload(payload: dict[str, Any] | None) -> int: + """ + Credential version embedded in JWT (``cv``). Missing claim defaults to 0 so legacy + tokens issued before ``credential_version`` match users still at version 0. + """ + if not payload: + return 0 + v = payload.get("cv") + if v is None: + return 0 + try: + return int(v) + except (TypeError, ValueError): + return 0 diff --git a/be0/src/auth_mail.py b/be0/src/auth_mail.py new file mode 100644 index 0000000..dd7cd87 --- /dev/null +++ b/be0/src/auth_mail.py @@ -0,0 +1,189 @@ +"""Outbound email for auth (password reset). Configure SMTP or AUTH_MAIL_LOG_ONLY for dev.""" + +from __future__ import annotations + +import asyncio +import logging +import os +import smtplib +import ssl +from email.message import EmailMessage +from typing import Literal + +logger = logging.getLogger(__name__) + +OtpDeliveryChannel = Literal["smtp", "log_only", "none"] + + +def public_web_origin() -> str: + raw = (os.getenv("AUTH_PUBLIC_WEB_ORIGIN") or os.getenv("PUBLIC_WEB_ORIGIN") or "").strip() + if raw: + return raw.rstrip("/") + return "http://localhost:8081" + + +def reset_link(raw_token: str) -> str: + base = public_web_origin() + return f"{base}/reset-password?token={raw_token}" + + +def verify_email_link(raw_token: str) -> str: + base = public_web_origin() + return f"{base}/verify-email?token={raw_token}" + + +def _registration_otp_validity_phrase_vi() -> str: + """Mirror auth_api REGISTER_OTP_TTL_MINUTES default for email copy.""" + raw = os.getenv("REGISTER_OTP_TTL_MINUTES", "1").strip() + try: + minutes = int(raw or "1") + except ValueError: + minutes = 1 + minutes = max(1, min(minutes, 24 * 60)) + if minutes == 1: + return "1 phút" + return f"{minutes} phút" + + + + +def mail_delivery_configured() -> bool: + if os.getenv("AUTH_MAIL_LOG_ONLY", "").lower() in ("1", "true", "yes"): + return True + return bool(os.getenv("SMTP_HOST", "").strip()) + + +def _send_smtp_sync(to_email: str, subject: str, text_body: str, html_body: str) -> None: + host = os.getenv("SMTP_HOST", "").strip() + port = int(os.getenv("SMTP_PORT", "587")) + user = os.getenv("SMTP_USER", "").strip() + password = os.getenv("SMTP_PASSWORD", "").strip() + mail_from = os.getenv("AUTH_MAIL_FROM", user or "noreply@localhost").strip() + + if user and not password: + logger.warning( + "SMTP_USER is set but SMTP_PASSWORD is empty; login will fail (535). " + "With Docker Compose from the repo root, define SMTP_* in the root `.env` " + "so interpolation passes them into the be0 service (be0/.env is not loaded unless env_file is set)." + ) + + msg = EmailMessage() + msg["Subject"] = subject + msg["From"] = mail_from + msg["To"] = to_email + msg.set_content(text_body) + msg.add_alternative(html_body, subtype="html") + + context = ssl.create_default_context() + with smtplib.SMTP(host, port, timeout=30) as server: + if os.getenv("SMTP_USE_TLS", "1").lower() in ("1", "true", "yes"): + server.starttls(context=context) + if user: + try: + server.login(user, password) + except smtplib.SMTPAuthenticationError as e: + logger.warning( + "SMTP login failed for user %s (code %s): %s. " + "If using Microsoft 365 / Outlook, use the mailbox full email as SMTP_USER, " + "set SMTP_PASSWORD to an app password when MFA is on, and ensure " + "Authenticated SMTP (SMTP AUTH) is enabled for that mailbox in the tenant.", + user, + e.smtp_code, + e.smtp_error.decode(errors="replace") if isinstance(e.smtp_error, bytes) else e.smtp_error, + ) + raise + server.send_message(msg) + + +async def deliver_password_reset_email(to_email: str, raw_token: str) -> None: + """Log link, send via SMTP, or no-op with warning if misconfigured.""" + link = reset_link(raw_token) + if os.getenv("AUTH_MAIL_LOG_ONLY", "").lower() in ("1", "true", "yes"): + logger.info("AUTH_MAIL_LOG_ONLY: password reset link for %s: %s", to_email, link) + return + + host = os.getenv("SMTP_HOST", "").strip() + if not host: + logger.warning( + "Password reset token created but mail is not configured " + "(set SMTP_HOST or AUTH_MAIL_LOG_ONLY=1). User: %s", + to_email, + ) + return + + subject = "Đặt lại mật khẩu — hệ thống sáng kiến" + text_body = ( + "Bạn (hoặc ai đó) đã yêu cầu đặt lại mật khẩu.\n\n" + f"Mở liên kết sau (hiệu lực giới hạn):\n{link}\n\n" + "Nếu bạn không yêu cầu, hãy bỏ qua email này." + ) + html_body = ( + "

Bạn (hoặc ai đó) đã yêu cầu đặt lại mật khẩu.

" + f'

Đặt lại mật khẩu

' + "

Nếu bạn không yêu cầu, hãy bỏ qua email này.

" + ) + await asyncio.to_thread(_send_smtp_sync, to_email, subject, text_body, html_body) + + +async def deliver_registration_otp_email(to_email: str, otp_plaintext: str) -> OtpDeliveryChannel: + """Send OTP via SMTP, log plaintext when AUTH_MAIL_LOG_ONLY=1, or no-op when SMTP unset.""" + ttl_phrase = _registration_otp_validity_phrase_vi() + if os.getenv("AUTH_MAIL_LOG_ONLY", "").lower() in ("1", "true", "yes"): + logger.info("AUTH_MAIL_LOG_ONLY: registration OTP for %s: %s", to_email, otp_plaintext) + return "log_only" + + host = os.getenv("SMTP_HOST", "").strip() + if not host: + logger.warning( + "Registration OTP created but mail is not configured " + "(set SMTP_HOST or AUTH_MAIL_LOG_ONLY=1). User: %s", + to_email, + ) + return "none" + + subject = "Mã xác minh đăng ký — hệ thống sáng kiến" + text_body = ( + "Cảm ơn bạn đã đăng ký.\n\n" + f"Mã xác minh của bạn: {otp_plaintext}\n\n" + f"Mã có hiệu lực trong {ttl_phrase}. Nhập đúng 6 chữ số trên trang đăng ký " + "ngay sau khi nhận email.\n" + "Nếu bạn không đăng ký, hãy bỏ qua email này." + ) + html_body = ( + "

Cảm ơn bạn đã đăng ký.

" + f"

Mã xác minh (nhập trên trang đăng ký): {otp_plaintext}

" + f"

Mã có hiệu lực trong {ttl_phrase}; vui lòng nhập mã trong thời gian này.

" + "

Nếu bạn không đăng ký, hãy bỏ qua email này.

" + ) + await asyncio.to_thread(_send_smtp_sync, to_email, subject, text_body, html_body) + return "smtp" + + +async def deliver_email_verification_email(to_email: str, raw_token: str) -> None: + """Log link, send via SMTP, or warn if misconfigured (same policy as password reset).""" + link = verify_email_link(raw_token) + if os.getenv("AUTH_MAIL_LOG_ONLY", "").lower() in ("1", "true", "yes"): + logger.info("AUTH_MAIL_LOG_ONLY: verify-email link for %s: %s", to_email, link) + return + + host = os.getenv("SMTP_HOST", "").strip() + if not host: + logger.warning( + "Email verification token created but mail is not configured " + "(set SMTP_HOST or AUTH_MAIL_LOG_ONLY=1). User: %s", + to_email, + ) + return + + subject = "Xác minh email — hệ thống sáng kiến" + text_body = ( + "Cảm ơn bạn đã đăng ký.\n\n" + f"Mở liên kết sau để kích hoạt tài khoản (hiệu lực giới hạn):\n{link}\n\n" + "Nếu bạn không đăng ký, hãy bỏ qua email này." + ) + html_body = ( + "

Cảm ơn bạn đã đăng ký.

" + f'

Xác minh email và kích hoạt tài khoản

' + "

Nếu bạn không đăng ký, hãy bỏ qua email này.

" + ) + await asyncio.to_thread(_send_smtp_sync, to_email, subject, text_body, html_body) diff --git a/be0/src/auth_rate_limit.py b/be0/src/auth_rate_limit.py new file mode 100644 index 0000000..319c274 --- /dev/null +++ b/be0/src/auth_rate_limit.py @@ -0,0 +1,89 @@ +"""Simple in-memory rate limits for unauthenticated auth endpoints (single-process).""" + +from __future__ import annotations + +import threading +import time +from collections import defaultdict + +_lock = threading.Lock() +_buckets: dict[str, list[float]] = defaultdict(list) + +_WINDOW_SEC = 3600.0 +_MAX_FORGOT_PER_EMAIL = 5 +_MAX_FORGOT_PER_IP = 30 +_MAX_RESET_PER_IP = 60 +_MAX_RESEND_VERIFY_PER_EMAIL = 5 +_MAX_RESEND_VERIFY_PER_IP = 30 +_MAX_RESEND_OTP_PER_EMAIL = 5 +_MAX_RESEND_OTP_PER_IP = 30 + + +def _prune(ts_list: list[float], now: float) -> None: + cutoff = now - _WINDOW_SEC + while ts_list and ts_list[0] < cutoff: + ts_list.pop(0) + + +def _hit(key: str, max_hits: int) -> bool: + """Return True if under limit (request allowed). False if rate limited.""" + now = time.monotonic() + with _lock: + bucket = _buckets[key] + _prune(bucket, now) + if len(bucket) >= max_hits: + return False + bucket.append(now) + return True + + +_MAX_LOGIN_PER_EMAIL = 5 +_MAX_LOGIN_PER_IP = 10 +_LOGIN_WINDOW_SEC = 900.0 + + +def allow_login(email_normalized: str, client_ip: str) -> bool: + """Return True if under limit (request allowed). False if rate limited.""" + now = time.monotonic() + with _lock: + for key, max_hits in ( + (f"login:email:{email_normalized}", _MAX_LOGIN_PER_EMAIL), + (f"login:ip:{client_ip}", _MAX_LOGIN_PER_IP), + ): + bucket = _buckets[key] + cutoff = now - _LOGIN_WINDOW_SEC + while bucket and bucket[0] < cutoff: + bucket.pop(0) + if len(bucket) >= max_hits: + return False + for key in (f"login:email:{email_normalized}", f"login:ip:{client_ip}"): + _buckets[key].append(now) + return True + + +def allow_forgot_password(email_normalized: str, client_ip: str) -> bool: + if not _hit(f"forgot:email:{email_normalized}", _MAX_FORGOT_PER_EMAIL): + return False + if not _hit(f"forgot:ip:{client_ip}", _MAX_FORGOT_PER_IP): + return False + return True + + +def allow_reset_password(client_ip: str) -> bool: + return _hit(f"reset:ip:{client_ip}", _MAX_RESET_PER_IP) + + +def allow_resend_verification(email_normalized: str, client_ip: str) -> bool: + if not _hit(f"resend_verify:email:{email_normalized}", _MAX_RESEND_VERIFY_PER_EMAIL): + return False + if not _hit(f"resend_verify:ip:{client_ip}", _MAX_RESEND_VERIFY_PER_IP): + return False + return True + + +def allow_resend_registration_otp(email_normalized: str, client_ip: str) -> bool: + if not _hit(f"resend_otp:email:{email_normalized}", _MAX_RESEND_OTP_PER_EMAIL): + return False + if not _hit(f"resend_otp:ip:{client_ip}", _MAX_RESEND_OTP_PER_IP): + return False + return True diff --git a/be0/src/be01/README.md b/be0/src/be01/README.md new file mode 100644 index 0000000..b1650fe --- /dev/null +++ b/be0/src/be01/README.md @@ -0,0 +1,86 @@ +# Biểu mẫu Sáng kiến – Auto-fill Toolkit + +Tự động điền hồ sơ sáng kiến (Mẫu số 01–04 + Bản cam kết) từ file JSON. + +## Files trong bộ công cụ + +| File | Mô tả | +|---|---| +| `template_sang_kien.docx` | Template Word với placeholder `{{...}}` (docxtpl/Jinja2) | +| `data_blank.json` | Schema JSON rỗng – copy rồi điền dữ liệu vào | +| `data_sop.json` | Ví dụ đã điền sẵn (SOP xét duyệt đạo đức nghiên cứu trên động vật) | +| `fill_template.py` | Script điền JSON vào template → xuất file `.docx` | +| `build_template.py` | Script tạo lại template (nếu cần chỉnh sửa bố cục) | + +## Cài đặt + +```bash +pip install docxtpl +``` + +## Sử dụng + +1. Copy `data_blank.json` → `data_cua_toi.json` +2. Điền dữ liệu vào file JSON +3. Chạy lệnh: + ```bash + python fill_template.py data_cua_toi.json output.docx + ``` + +## Quy ước JSON + +### Text đơn giản +```json +"ten_sang_kien": "Tên sáng kiến của tôi" +``` + +### Bảng (nhiều dòng) +Thêm nhiều object vào array – template sẽ tự nhân dòng: +```json +"danh_sach_tac_gia": [ + {"stt": "1", "ho_ten": "Nguyễn Văn A", "ngay_sinh": "01/01/1990", ...}, + {"stt": "2", "ho_ten": "Trần Thị B", "ngay_sinh": "02/02/1992", ...} +] +``` + +### Checkbox +```json +"phan_loai": { + "giai_phap_ky_thuat": true, // ☑ được đánh dấu + "sang_kien_tu_nckh": false, // ☐ không đánh dấu + "sang_kien_tu_sach": false +} +``` + +### Ngày tháng +```json +"ngay_ky": {"ngay": "15", "thang": "10", "nam": "2025"} +``` + +## Cấu trúc JSON + +``` +trang_bia → Trang bìa ngoài + trong +mau_01 → Mẫu số 01: Báo cáo mô tả sáng kiến +mau_02 → Mẫu số 02: Đơn đề nghị công nhận +mau_03 → Mẫu số 03: Bản xác nhận tỷ lệ đóng góp +mau_04 → Mẫu số 04: Phiếu đánh giá +ban_cam_ket → Bản cam kết +``` + +## Liên kết với fe0 (ứng viên) + +- **Frontend** (`fe0`) giữ bản mẫu có **nhãn tiếng Việt** làm khóa: `fe0/public/assets/bieu_mau_sang_kien_template.json` (và bản import trong `fe0/src/components/applicant/initiative-draft/bieu_mau_sang_kien_template.json`). +- Hàm `buildOfficialBieuMauFromDraft` trong `mapDraftToOfficialBieuMau.ts` điền đối tượng này từ **Đơn / Báo cáo / Xác nhận đóng góp** (cùng nguồn với `ReviewPanel`). +- Trường **`officialBieuMau`** trong JSON bundle xuất từ **Review** (`Tải JSON đầy đủ`) là cùng cấu trúc — backend có thể lưu JSONB hoặc `json.dumps` rồi chạy `fill_template.py` sau khi map sang `data_blank.json` (snake_case) nếu template Word dùng khóa Latin. + +## Lưu ý kỹ thuật + +- Template dùng cú pháp `docxtpl` (mở rộng Jinja2 cho Word) +- Bảng dùng pattern `{%tr for %}` / `{%tr endfor %}` – 2 dòng đánh dấu sẽ bị xóa khi render, chỉ còn các dòng dữ liệu +- Checkbox dùng `{% if %}☑{% else %}☐{% endif %}` +- Muốn xuống dòng trong cùng một field text: dùng `\n` trong JSON (docxtpl xử lý tự động) + +## Minh chứng đính kèm (2.1 / 2.2) + +Các tệp minh chứng PDF (và loại khác) **không** đi qua bộ `fill_template.py` ở đây. Chúng được tải lên MinIO qua **be0** và hiển thị trong **fe0** (tab Minh chứng). API trả về `downloadUrl` (tải / mở tab) và `viewUrl` (xem PDF nhúng với `Content-Disposition: inline`). diff --git a/be0/src/be01/__pycache__/build_template.cpython-313.pyc b/be0/src/be01/__pycache__/build_template.cpython-313.pyc new file mode 100644 index 0000000..5a09bc8 Binary files /dev/null and b/be0/src/be01/__pycache__/build_template.cpython-313.pyc differ diff --git a/be0/src/be01/__pycache__/docx_normalize.cpython-311.pyc b/be0/src/be01/__pycache__/docx_normalize.cpython-311.pyc new file mode 100644 index 0000000..5583d1a Binary files /dev/null and b/be0/src/be01/__pycache__/docx_normalize.cpython-311.pyc differ diff --git a/be0/src/be01/__pycache__/docx_normalize.cpython-313.pyc b/be0/src/be01/__pycache__/docx_normalize.cpython-313.pyc new file mode 100644 index 0000000..963ef0c Binary files /dev/null and b/be0/src/be01/__pycache__/docx_normalize.cpython-313.pyc differ diff --git a/be0/src/be01/__pycache__/docx_to_pdf.cpython-311.pyc b/be0/src/be01/__pycache__/docx_to_pdf.cpython-311.pyc new file mode 100644 index 0000000..b12589b Binary files /dev/null and b/be0/src/be01/__pycache__/docx_to_pdf.cpython-311.pyc differ diff --git a/be0/src/be01/__pycache__/docx_to_pdf.cpython-313.pyc b/be0/src/be01/__pycache__/docx_to_pdf.cpython-313.pyc new file mode 100644 index 0000000..0e1bea6 Binary files /dev/null and b/be0/src/be01/__pycache__/docx_to_pdf.cpython-313.pyc differ diff --git a/be0/src/be01/__pycache__/export_applications_list_xlsx.cpython-311.pyc b/be0/src/be01/__pycache__/export_applications_list_xlsx.cpython-311.pyc new file mode 100644 index 0000000..7b624eb Binary files /dev/null and b/be0/src/be01/__pycache__/export_applications_list_xlsx.cpython-311.pyc differ diff --git a/be0/src/be01/__pycache__/export_applications_list_xlsx.cpython-313.pyc b/be0/src/be01/__pycache__/export_applications_list_xlsx.cpython-313.pyc new file mode 100644 index 0000000..5e0f9c1 Binary files /dev/null and b/be0/src/be01/__pycache__/export_applications_list_xlsx.cpython-313.pyc differ diff --git a/be0/src/be01/__pycache__/fill_application_form.cpython-311.pyc b/be0/src/be01/__pycache__/fill_application_form.cpython-311.pyc new file mode 100644 index 0000000..4f24504 Binary files /dev/null and b/be0/src/be01/__pycache__/fill_application_form.cpython-311.pyc differ diff --git a/be0/src/be01/__pycache__/fill_application_form.cpython-313.pyc b/be0/src/be01/__pycache__/fill_application_form.cpython-313.pyc new file mode 100644 index 0000000..3be130c Binary files /dev/null and b/be0/src/be01/__pycache__/fill_application_form.cpython-313.pyc differ diff --git a/be0/src/be01/__pycache__/fill_template.cpython-313.pyc b/be0/src/be01/__pycache__/fill_template.cpython-313.pyc new file mode 100644 index 0000000..c047d00 Binary files /dev/null and b/be0/src/be01/__pycache__/fill_template.cpython-313.pyc differ diff --git a/be0/src/be01/__pycache__/official_to_data_blank.cpython-311.pyc b/be0/src/be01/__pycache__/official_to_data_blank.cpython-311.pyc new file mode 100644 index 0000000..d743b09 Binary files /dev/null and b/be0/src/be01/__pycache__/official_to_data_blank.cpython-311.pyc differ diff --git a/be0/src/be01/__pycache__/official_to_data_blank.cpython-313.pyc b/be0/src/be01/__pycache__/official_to_data_blank.cpython-313.pyc new file mode 100644 index 0000000..ba546b4 Binary files /dev/null and b/be0/src/be01/__pycache__/official_to_data_blank.cpython-313.pyc differ diff --git a/be0/src/be01/build_template.py b/be0/src/be01/build_template.py new file mode 100644 index 0000000..f5db2f5 --- /dev/null +++ b/be0/src/be01/build_template.py @@ -0,0 +1,583 @@ +""" +Build a DOCX template with docxtpl (Jinja2) placeholders. +The template mirrors the original 'Biểu mẫu kèm TB' structure: + - Trang bìa + - Mẫu số 01: Báo cáo mô tả sáng kiến + - Mẫu số 02: Đơn đề nghị công nhận sáng kiến + - Mẫu số 03: Bản xác nhận tỷ lệ đóng góp + - Bản cam kết +""" +from docx import Document +from docx.shared import Pt, Cm, Inches +from docx.enum.text import WD_ALIGN_PARAGRAPH, WD_BREAK +from docx.enum.table import WD_ALIGN_VERTICAL +from docx.oxml.ns import qn +from docx.oxml import OxmlElement + +OUTPUT = "/home/claude/template_project/template_sang_kien.docx" + +# Page margins — edit these to retune layout (python-docx: Cm / Inches / Pt) +MARGIN_TOP_CM = 2.0 +MARGIN_BOTTOM_CM = 2.0 +MARGIN_LEFT_IN = 0.79 +MARGIN_RIGHT_IN = 0.49 + + +# ---------- helpers ---------- +def apply_page_margins(doc: Document) -> None: + """Apply section margins to every section in the document.""" + for section in doc.sections: + section.top_margin = Cm(MARGIN_TOP_CM) + section.bottom_margin = Cm(MARGIN_BOTTOM_CM) + section.left_margin = Inches(MARGIN_LEFT_IN) + section.right_margin = Inches(MARGIN_RIGHT_IN) + + +def set_cell_border(cell, **kwargs): + tc = cell._tc + tcPr = tc.get_or_add_tcPr() + tcBorders = OxmlElement("w:tcBorders") + for edge in ("top", "left", "bottom", "right"): + if edge in kwargs: + border = OxmlElement(f"w:{edge}") + border.set(qn("w:val"), kwargs[edge].get("val", "single")) + border.set(qn("w:sz"), str(kwargs[edge].get("sz", 4))) + border.set(qn("w:color"), kwargs[edge].get("color", "000000")) + tcBorders.append(border) + tcPr.append(tcBorders) + + +def set_paragraph_indent(paragraph, *, left=0, right=0): + """Set paragraph left/right indents in twips.""" + p = paragraph._p + pPr = p.get_or_add_pPr() + ind = pPr.find(qn("w:ind")) + if ind is None: + ind = OxmlElement("w:ind") + pPr.append(ind) + ind.set(qn("w:left"), str(left)) + ind.set(qn("w:right"), str(right)) + + +def set_cell_margins(cell, *, top=0, right=0, bottom=0, left=0): + tc = cell._tc + tcPr = tc.get_or_add_tcPr() + tcMar = tcPr.find(qn("w:tcMar")) + if tcMar is None: + tcMar = OxmlElement("w:tcMar") + tcPr.append(tcMar) + for side, value in (("top", top), ("right", right), ("bottom", bottom), ("left", left)): + node = tcMar.find(qn(f"w:{side}")) + if node is None: + node = OxmlElement(f"w:{side}") + tcMar.append(node) + node.set(qn("w:w"), str(value)) + node.set(qn("w:type"), "dxa") + + +def add_para(doc, text="", bold=False, italic=False, align=None, size=13, before=0, after=0): + p = doc.add_paragraph() + if align == "center": + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + elif align == "right": + p.alignment = WD_ALIGN_PARAGRAPH.RIGHT + elif align == "justify": + p.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY + pf = p.paragraph_format + pf.space_before = Pt(before) + pf.space_after = Pt(after) + if text: + r = p.add_run(text) + r.bold = bold + r.italic = italic + r.font.size = Pt(size) + r.font.name = "Times New Roman" + return p + + +def add_run(p, text, bold=False, italic=False, size=13): + r = p.add_run(text) + r.bold = bold + r.italic = italic + r.font.size = Pt(size) + r.font.name = "Times New Roman" + return r + + +def page_break(doc): + p = doc.add_paragraph() + p.add_run().add_break(WD_BREAK.PAGE) + + +def set_cell_text(cell, text, bold=False, italic=False, align=None, size=13): + cell.text = "" + p = cell.paragraphs[0] + if align == "center": + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + elif align == "right": + p.alignment = WD_ALIGN_PARAGRAPH.RIGHT + elif align == "justify": + p.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY + r = p.add_run(text) + r.bold = bold + r.italic = italic + r.font.size = Pt(size) + r.font.name = "Times New Roman" + return p + + +def add_cell_para(cell, text, bold=False, italic=False, align=None, size=13): + p = cell.add_paragraph() + if align == "center": + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + elif align == "right": + p.alignment = WD_ALIGN_PARAGRAPH.RIGHT + r = p.add_run(text) + r.bold = bold + r.italic = italic + r.font.size = Pt(size) + r.font.name = "Times New Roman" + return p + + +# ---------- build ---------- +doc = Document() + +# Default font + page margins +style = doc.styles["Normal"] +style.font.name = "Times New Roman" +style.font.size = Pt(13) +apply_page_margins(doc) + + +# ===================================================================== +# TRANG BÌA (rendered twice: outer + inner, identical content) +# ===================================================================== +def render_cover(doc): + add_para(doc, "BỘ Y TẾ", bold=False, align="center", size=13, before=20) + add_para(doc, "ĐẠI HỌC Y DƯỢC\nTHÀNH PHỐ HỒ CHÍ MINH", bold=True, align="center", size=11) + add_para(doc, "===== ===== =====", bold=True, align="center", size=13, after=30) + + add_para(doc, "BÁO CÁO MÔ TẢ SÁNG KIẾN", bold=True, align="center", size=18, before=40, after=40) + + # Tên sáng kiến + p = add_para(doc, "", align="center", before=20, after=10) + add_run(p, "Tên sáng kiến (Tiếng Việt): ", bold=True, size=14) + add_run(p, "{{ trang_bia.ten_sang_kien }}", bold=True, size=14) + + # Tác giả + p = add_para(doc, "", align="center", before=20, after=10) + add_run(p, "Tác giả/nhóm tác giả sáng kiến: ", bold=True, size=14) + add_run(p, "{{ trang_bia.tac_gia }}", size=14) + + # Đơn vị + p = add_para(doc, "", align="center", before=20, after=10) + add_run(p, "Đơn vị công tác: ", bold=True, size=14) + add_run(p, "{{ trang_bia.don_vi }}", size=14) + + # Liên hệ + p = add_para(doc, "", align="center", before=20, after=10) + add_run(p, "Thông tin liên hệ (Điện thoại, Email): ", bold=True, size=14) + add_run(p, "{{ trang_bia.thong_tin_lien_he }}", size=14) + + # Năm + p = add_para(doc, "", align="center", before=60, after=10) + add_run(p, "Tp. Hồ Chí Minh – Năm {{ trang_bia.nam }}", bold=True, size=14) + + +render_cover(doc) +page_break(doc) +render_cover(doc) +page_break(doc) + + +# ===================================================================== +# MẪU SỐ 01 – BÁO CÁO MÔ TẢ SÁNG KIẾN +# ===================================================================== +add_para(doc, "Mẫu số 01", italic=True, align="right", size=12) +add_para(doc, "BÁO CÁO MÔ TẢ SÁNG KIẾN", bold=True, align="center", size=16, before=12, after=12) + +# 1. Mở đầu +p = add_para(doc, "") +add_run(p, "1. Mở đầu ", bold=True) +add_run(p, "(Giới thiệu về những vấn đề liên quan đến sáng kiến ở trong và ngoài đơn vị/trường mà tác giả đã biết, những khó khăn/bất cập/hạn chế tại đơn vị/trường liên quan đến nội dung của sáng kiến; từ đó nêu ra sự cần thiết phải thực hiện sáng kiến):", italic=True) +add_para(doc, "{{ mau_01.mo_dau }}", align="justify") + +# 2. Tên sáng kiến +p = add_para(doc, "") +add_run(p, "2. Tên sáng kiến (tên quy trình, giải pháp, phương pháp): ", bold=True) +add_run(p, "{{ mau_01.ten_sang_kien }}") + +# 3. Lĩnh vực áp dụng +p = add_para(doc, "") +add_run(p, "3. Lĩnh vực áp dụng của sáng kiến ", bold=True) +add_run(p, "(ví dụ: cải cách hành chính, quản lý giáo dục, bảo vệ môi trường, …): ", italic=True) +add_run(p, "{{ mau_01.linh_vuc_ap_dung }}") + +# 4. Mô tả sáng kiến +add_para(doc, "4. Mô tả sáng kiến:", bold=True) + +# 4.1 +p = add_para(doc, "") +add_run(p, "4.1 Tình trạng giải pháp đã biết hoặc hiện trạng công tác khi chưa có sáng kiến ", bold=True) +add_run(p, "(nêu hiện trạng trước khi áp dụng giải pháp mới/chưa có sáng kiến, phân tích ưu nhược điểm của giải pháp cũ để cho thấy sự cần thiết của việc đề xuất giải pháp mới để khắc phục nhược điểm của giải pháp cũ):", italic=True) +add_para(doc, "{{ mau_01.tinh_trang_da_biet }}", align="justify") + +# 4.2 +add_para(doc, "4.2. Nội dung giải pháp đề nghị công nhận là sáng kiến:", bold=True) + +p = add_para(doc, "") +add_run(p, "- Mục đích của sáng kiến (nêu vấn đề cần giải quyết): ", bold=True) +add_run(p, "{{ mau_01.muc_dich }}") + +p = add_para(doc, "") +add_run(p, "- Về nội dung của sáng kiến: ", bold=True) +add_run(p, "Mô tả ngắn gọn, đầy đủ và rõ ràng:", italic=True) + +add_para(doc, "+ Các bước thực hiện giải pháp:", bold=True) +add_para(doc, "{{ mau_01.cac_buoc_thuc_hien }}", align="justify") + +add_para(doc, "+ Các điều kiện cần thiết để áp dụng giải pháp:", bold=True) +add_para(doc, "{{ mau_01.dieu_kien_ap_dung }}", align="justify") + +p = add_para(doc, "") +add_run(p, "+ Lĩnh vực áp dụng: ", bold=True) +add_run(p, "{{ mau_01.linh_vuc_ap_dung_2 }}") + +add_para(doc, "+ Kết quả thu được:", bold=True) +add_para(doc, "{{ mau_01.ket_qua_thu_duoc }}", align="justify") + +add_para(doc, "+ Danh sách đơn vị/cá nhân đã tham gia áp dụng thử hoặc lần đầu (nếu có):", bold=True) + +# Table for danh sách áp dụng (3-row pattern for {%tr %} loops) +tbl = doc.add_table(rows=4, cols=4) +tbl.style = "Table Grid" +hdr = tbl.rows[0].cells +for i, h in enumerate(["TT", "Tên tổ chức/cá nhân", "Địa chỉ", "Lĩnh vực áp dụng sáng kiến"]): + set_cell_text(hdr[i], h, bold=True, align="center") +# marker row: {%tr for %} +set_cell_text(tbl.rows[1].cells[0], "{%tr for item in mau_01.danh_sach_ap_dung %}") +# data row +set_cell_text(tbl.rows[2].cells[0], "{{ item.tt }}", align="center") +set_cell_text(tbl.rows[2].cells[1], "{{ item.ten_to_chuc }}") +set_cell_text(tbl.rows[2].cells[2], "{{ item.dia_chi }}") +set_cell_text(tbl.rows[2].cells[3], "{{ item.linh_vuc }}") +# marker row: {%tr endfor %} +set_cell_text(tbl.rows[3].cells[0], "{%tr endfor %}") + +# Tính mới +p = add_para(doc, "", before=12) +add_run(p, "- Về tính mới của sáng kiến:", bold=True) +add_para(doc, "{{ mau_01.tinh_moi }}", align="justify") + +# Tính hiệu quả +p = add_para(doc, "", before=12) +add_run(p, "- Về tính hiệu quả: ", bold=True) +add_run(p, "So sánh hiệu quả về mặt kinh tế, xã hội, khoa học thu được hoặc dự kiến thu được khi áp dụng sáng kiến (phải có số liệu cụ thể, căn cứ để tính toán, kiểm tra, đánh giá):", italic=True) + +hieu_qua_fields = [ + ("+ Tạo ra lợi ích kinh tế: ", "loi_ich_kinh_te"), + ("+ Đem lại hiệu quả trong giảng dạy: ", "hieu_qua_giang_day"), + ("+ Tăng năng suất lao động: ", "tang_nang_suat"), + ("+ Nâng cao hiệu quả công việc: ", "nang_cao_hieu_qua"), + ("+ Nâng cao chất lượng công việc, dịch vụ: ", "nang_cao_chat_luong"), + ("+ Giảm chi phí: ", "giam_chi_phi"), + ("+ Cải thiện môi trường, điều kiện học tập, làm việc, sống: ", "cai_thien_moi_truong"), + ("+ Bảo vệ sức khỏe: ", "bao_ve_suc_khoe"), + ("+ Đảm bảo an toàn lao động, PCCC: ", "an_toan_lao_dong"), + ("+ Nâng cao khả năng, trình độ, nhận thức, trách nhiệm: ", "nang_cao_nhan_thuc"), +] +for label, key in hieu_qua_fields: + p = add_para(doc, "") + add_run(p, label, bold=True) + add_run(p, "{{ mau_01.tinh_hieu_qua." + key + " }}") + +# 6. Bảo mật +p = add_para(doc, "", before=12) +add_run(p, "6. Những thông tin cần được bảo mật (nếu có): ", bold=True) +add_run(p, "{{ mau_01.thong_tin_bao_mat }}") + +# Chữ ký Mẫu 01 +add_para(doc, "") +sig = doc.add_table(rows=1, cols=2) +sig.autofit = True +left, right = sig.rows[0].cells +set_cell_text(left, "LÃNH ĐẠO ĐƠN VỊ", bold=True, align="center") +add_cell_para(left, "(Ký, ghi rõ họ tên)", italic=True, align="center") +add_cell_para(left, "") +add_cell_para(left, "") +add_cell_para(left, "{{ mau_01.lanh_dao_don_vi }}", bold=True, align="center") + +set_cell_text(right, "Tp. Hồ Chí Minh, ngày {{ mau_01.ngay_ky.ngay }} tháng {{ mau_01.ngay_ky.thang }} năm {{ mau_01.ngay_ky.nam }}", italic=True, align="center") +add_cell_para(right, "Tác giả chính / Đại diện nhóm tác giả sáng kiến", bold=True, align="center") +add_cell_para(right, "(chữ ký và ghi rõ họ tên)", italic=True, align="center") +add_cell_para(right, "") +add_cell_para(right, "{{ mau_01.tac_gia_sang_kien }}", bold=True, align="center") +# remove borders on signature table +for row in sig.rows: + for cell in row.cells: + set_cell_border(cell, + top={"val": "nil"}, bottom={"val": "nil"}, + left={"val": "nil"}, right={"val": "nil"}) + +page_break(doc) + + +# ===================================================================== +# MẪU SỐ 02 – ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN +# ===================================================================== +# Header block (institution | CHXHCN VN) +hdr = doc.add_table(rows=1, cols=2) +p = set_cell_text(hdr.rows[0].cells[0], "ĐẠI HỌC Y DƯỢC\nTHÀNH PHỐ HỒ CHÍ MINH", bold=True, align="center", size=11) +set_paragraph_indent(p, left=-150, right=0) +p = add_cell_para(hdr.rows[0].cells[0], "{{ mau_02.don_vi|upper }}", bold=False, align="center", size=11) +set_paragraph_indent(p, left=-150, right=0) +p = set_cell_text(hdr.rows[0].cells[1], "CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM", bold=True, align="left", size=10.7) +set_paragraph_indent(p, left=-150, right=0) +add_cell_para(hdr.rows[0].cells[1], "Độc lập - Tự do - Hạnh phúc", bold=True, align="center", size=12) +set_cell_margins(hdr.rows[0].cells[1], top=0, right=0, bottom=0, left=0) +for row in hdr.rows: + for cell in row.cells: + set_cell_border(cell, top={"val": "nil"}, bottom={"val": "nil"}, left={"val": "nil"}, right={"val": "nil"}) + +add_para(doc, "Mẫu số 02", italic=True, align="right", size=12, before=12) +add_para(doc, "ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN", bold=True, align="center", size=16, before=6, after=6) +add_para(doc, "Kính gửi: Hội đồng sáng kiến Đại học Y Dược TP. Hồ Chí Minh", italic=True, align="center") + +add_para(doc, "Tên tôi (chúng tôi) là:", before=12) + +# Tác giả table (3-row pattern for {%tr %} loop) +t = doc.add_table(rows=4, cols=7) +t.style = "Table Grid" +headers = ["STT", "Họ và tên", "Ngày tháng năm sinh", "Nơi công tác", "Chức danh", "Trình độ chuyên môn", "Tỷ lệ (%) đóng góp"] +for i, h in enumerate(headers): + set_cell_text(t.rows[0].cells[i], h, bold=True, align="center", size=11) +set_cell_text(t.rows[1].cells[0], "{%tr for item in mau_02.danh_sach_tac_gia %}", size=11) +r = t.rows[2].cells +set_cell_text(r[0], "{{ item.stt }}", align="center", size=11) +set_cell_text(r[1], "{{ item.ho_ten }}", size=11) +set_cell_text(r[2], "{{ item.ngay_sinh }}", align="center", size=11) +set_cell_text(r[3], "{{ item.noi_cong_tac }}", size=11) +set_cell_text(r[4], "{{ item.chuc_danh }}", size=11) +set_cell_text(r[5], "{{ item.trinh_do }}", size=11) +set_cell_text(r[6], "{{ item.ty_le }}", align="center", size=11) +set_cell_text(t.rows[3].cells[0], "{%tr endfor %}", size=11) + +# Fields +p = add_para(doc, "", before=12) +add_run(p, "- Là tác giả (nhóm tác giả) đề nghị xét công nhận sáng kiến: ") +add_run(p, "\u201C{{ mau_02.ten_sang_kien }}\u201D", bold=True) + +p = add_para(doc, "") +add_run(p, "- Chủ đầu tư tạo ra sáng kiến: ") +add_run(p, "{{ mau_02.chu_dau_tu }}") + +p = add_para(doc, "") +add_run(p, "- Lĩnh vực áp dụng sáng kiến: ") +add_run(p, "{{ mau_02.linh_vuc_ap_dung }}") + +p = add_para(doc, "") +add_run(p, "- Ngày sáng kiến được áp dụng: ") +add_run(p, "{{ mau_02.ngay_ap_dung }}") + +add_para(doc, "- Nội dung của sáng kiến:", bold=True) +add_para(doc, "{{ mau_02.noi_dung }}", align="justify") + +add_para(doc, "- Sáng kiến này là:", bold=True, before=6) +add_para(doc, "{% if mau_02.phan_loai.giai_phap_ky_thuat %}☑{% else %}☐{% endif %} Giải pháp kỹ thuật, quản lý, tác nghiệp, ứng dụng tiến bộ kỹ thuật áp dụng cho Đại học Y Dược TP.HCM") +add_para(doc, "{% if mau_02.phan_loai.sang_kien_tu_nckh %}☑{% else %}☐{% endif %} Sáng kiến – cải tiến kỹ thuật từ các nghiên cứu khoa học có kết quả được đăng tải trên các tạp chí, hội nghị trong nước và quốc tế") +add_para(doc, "{% if mau_02.phan_loai.sang_kien_tu_sach %}☑{% else %}☐{% endif %} Sáng kiến – cải tiến kỹ thuật từ sách, giáo trình, tài liệu tham khảo") + +p = add_para(doc, "", before=6) +add_run(p, "- Những thông tin cần được bảo mật (nếu có): ", bold=True) +add_run(p, "{{ mau_02.thong_tin_bao_mat }}") + +add_para(doc, "- Các điều kiện cần thiết để áp dụng sáng kiến:", bold=True) +add_para(doc, "{{ mau_02.dieu_kien_ap_dung }}", align="justify") + +add_para(doc, "- Đánh giá lợi ích thu được hoặc dự kiến có thể thu được do áp dụng sáng kiến theo ý kiến của tác giả:", bold=True) +add_para(doc, "{{ mau_02.danh_gia_tac_gia }}", align="justify") + +add_para(doc, "- Đánh giá lợi ích thu được hoặc dự kiến có thể thu được do áp dụng sáng kiến theo ý kiến của tổ chức, cá nhân đã tham gia áp dụng sáng kiến lần đầu, kể cả áp dụng thử (nếu có):", bold=True) +add_para(doc, "{{ mau_02.danh_gia_to_chuc }}", align="justify") + +add_para(doc, "Danh sách những người đã tham gia áp dụng thử hoặc áp dụng sáng kiến lần đầu (nếu có):", bold=True, before=6) + +t2 = doc.add_table(rows=4, cols=7) +t2.style = "Table Grid" +headers2 = ["Số TT", "Họ và tên", "Ngày tháng năm sinh", "Nơi công tác", "Chức danh", "Trình độ chuyên môn", "Nội dung công việc hỗ trợ"] +for i, h in enumerate(headers2): + set_cell_text(t2.rows[0].cells[i], h, bold=True, align="center", size=11) +set_cell_text(t2.rows[1].cells[0], "{%tr for item in mau_02.danh_sach_tham_gia %}", size=11) +r2 = t2.rows[2].cells +set_cell_text(r2[0], "{{ item.stt }}", align="center", size=11) +set_cell_text(r2[1], "{{ item.ho_ten }}", size=11) +set_cell_text(r2[2], "{{ item.ngay_sinh }}", align="center", size=11) +set_cell_text(r2[3], "{{ item.noi_cong_tac }}", size=11) +set_cell_text(r2[4], "{{ item.chuc_danh }}", size=11) +set_cell_text(r2[5], "{{ item.trinh_do }}", size=11) +set_cell_text(r2[6], "{{ item.noi_dung_ho_tro }}", size=11) +set_cell_text(t2.rows[3].cells[0], "{%tr endfor %}", size=11) + +add_para(doc, "Tôi xin cam đoan mọi thông tin nêu trong đơn là trung thực, đúng sự thật và hoàn toàn chịu trách nhiệm trước pháp luật./.", before=12, align="justify") + +add_para(doc, "TP. Hồ Chí Minh, ngày {{ mau_02.ngay_ky.ngay }} tháng {{ mau_02.ngay_ky.thang }} năm {{ mau_02.ngay_ky.nam }}", italic=True, align="right", before=12) + +# signature table +sig2 = doc.add_table(rows=1, cols=2) +l2, r2c = sig2.rows[0].cells +set_cell_text(l2, "Xác nhận của lãnh đạo", bold=True, align="center") +add_cell_para(l2, "{{ mau_02.don_vi|upper }}", bold=False, align="center", size=11) +add_cell_para(l2, "") +add_cell_para(l2, "") +add_cell_para(l2, "{{ mau_02.lanh_dao_don_vi }}", bold=True, align="center") + +set_cell_text(r2c, "Tác giả chính / Đại diện nhóm tác giả sáng kiến", bold=True, align="center") +add_cell_para(r2c, "(chữ ký và ghi rõ họ tên)", italic=True, align="center") +add_cell_para(r2c, "") +add_cell_para(r2c, "") +add_cell_para(r2c, "{{ mau_02.tac_gia_sang_kien }}", bold=True, align="center") +for row in sig2.rows: + for cell in row.cells: + set_cell_border(cell, top={"val": "nil"}, bottom={"val": "nil"}, left={"val": "nil"}, right={"val": "nil"}) + +page_break(doc) + + +# ===================================================================== +# MẪU SỐ 03 – BẢN XÁC NHẬN TỶ LỆ (%) ĐÓNG GÓP +# ===================================================================== +hdr3 = doc.add_table(rows=1, cols=2) +set_cell_text(hdr3.rows[0].cells[0], "BỘ Y TẾ", bold=False, align="center", size=12) +p = add_cell_para(hdr3.rows[0].cells[0], "ĐẠI HỌC Y DƯỢC\nTHÀNH PHỐ HỒ CHÍ MINH", bold=True, align="center", size=14) +set_paragraph_indent(p, left=-150, right=0) +p = set_cell_text(hdr3.rows[0].cells[1], "CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM", bold=True, align="left", size=10.7) +set_paragraph_indent(p, left=-150, right=0) +add_cell_para(hdr3.rows[0].cells[1], "Độc lập – Tự do – Hạnh phúc", bold=True, align="center", size=12) +set_cell_margins(hdr3.rows[0].cells[1], top=0, right=0, bottom=0, left=0) +for row in hdr3.rows: + for cell in row.cells: + set_cell_border(cell, top={"val": "nil"}, bottom={"val": "nil"}, left={"val": "nil"}, right={"val": "nil"}) + +add_para(doc, "Mẫu số 03", italic=True, align="right", size=12, before=12) +add_para(doc, "TP. Hồ Chí Minh, ngày {{ mau_03.ngay_ky.ngay }} tháng {{ mau_03.ngay_ky.thang }} năm {{ mau_03.ngay_ky.nam }}", italic=True, align="center") +add_para(doc, "BẢN XÁC NHẬN", bold=True, align="center", size=16, before=12) +add_para(doc, "TỶ LỆ (%) ĐÓNG GÓP VÀO VIỆC TẠO RA SÁNG KIẾN", bold=True, align="center", size=14, after=12) + +p = add_para(doc, "") +add_run(p, "1. Tên sáng kiến: ", bold=True) +add_run(p, "{{ mau_03.ten_sang_kien }}") + +p = add_para(doc, "") +add_run(p, "2. Tác giả chính / Đại diện nhóm tác giả sáng kiến: ", bold=True) +add_run(p, "{{ mau_03.tac_gia_chinh }}") + +p = add_para(doc, "") +add_run(p, "Chức vụ, đơn vị công tác: ", bold=True) +add_run(p, "{{ mau_03.chuc_vu_don_vi }}") + +add_para(doc, "Tỷ lệ đóng góp:", bold=True, before=6) + +t3 = doc.add_table(rows=4, cols=5) +t3.style = "Table Grid" +for i, h in enumerate(["STT", "Họ và tên", "Đơn vị công tác", "% đóng góp", "Chữ ký xác nhận"]): + set_cell_text(t3.rows[0].cells[i], h, bold=True, align="center") +set_cell_text(t3.rows[1].cells[0], "{%tr for item in mau_03.ty_le_dong_gop %}") +r3 = t3.rows[2].cells +set_cell_text(r3[0], "{{ item.stt }}", align="center") +set_cell_text(r3[1], "{{ item.ho_ten }}") +set_cell_text(r3[2], "{{ item.don_vi }}") +set_cell_text(r3[3], "{{ item.phan_tram }}", align="center") +set_cell_text(r3[4], "{{ item.chu_ky }}", align="center") +set_cell_text(t3.rows[3].cells[0], "{%tr endfor %}") + +# TỔNG row +row_tong = t3.add_row().cells +set_cell_text(row_tong[0], "TỔNG", bold=True, align="center") +row_tong[0].merge(row_tong[1]).merge(row_tong[2]) +set_cell_text(row_tong[3], "100", bold=True, align="center") +set_cell_text(row_tong[4], "", align="center") + +add_para(doc, "Lưu ý:", bold=True, italic=True, before=12) +add_para(doc, "- Tổng % đóng góp phải bằng 100", italic=True) +add_para(doc, "- Sinh viên trong nhóm tác giả cần ghi rõ là \u201CSinh viên khoa/trường…., Đại học Y Dược TP.HCM\u201D", italic=True) +add_para(doc, "- Tác giả là người ngoài trường cần ghi đúng đơn vị công tác hiện tại hoặc đơn vị công tác tự do nếu chưa có đơn vị công tác cụ thể.", italic=True) + +add_para(doc, "", before=20) +add_para(doc, "TÁC GIẢ CHÍNH / ĐẠI DIỆN NHÓM TÁC GIẢ SÁNG KIẾN", bold=True, align="right") +add_para(doc, "(chữ ký và ghi rõ họ tên)", italic=True, align="right") +add_para(doc, "") +add_para(doc, "") +add_para(doc, "{{ mau_03.tac_gia_chinh_ky }}", bold=True, align="right") + +page_break(doc) + + +# ===================================================================== +# BẢN CAM KẾT +# ===================================================================== +add_para(doc, "CỘNG HOÀ XÃ HỘI CHỦ NGHĨA VIỆT NAM", bold=True, align="center") +add_para(doc, "Độc lập - Tự do - Hạnh phúc", bold=True, align="center") +add_para(doc, "TP. Hồ Chí Minh, ngày {{ ban_cam_ket.ngay_ky.ngay }} tháng {{ ban_cam_ket.ngay_ky.thang }} năm {{ ban_cam_ket.ngay_ky.nam }}", italic=True, align="center", before=6) + +add_para(doc, "BẢN CAM KẾT", bold=True, align="center", size=16, before=12) +add_para(doc, "(Áp dụng đối với cá nhân đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại Đại học Y Dược TP. Hồ Chí Minh năm {{ ban_cam_ket.nam_xet }} là tác giả của bài báo khoa học)", italic=True, align="center") + +add_para(doc, "I. THÔNG TIN CHỦ THỂ CAM KẾT:", bold=True, before=12) + +p = add_para(doc, "") +add_run(p, "Tác giả đăng ký sáng kiến: ") +add_run(p, "{{ ban_cam_ket.tac_gia_dang_ky }}") + +p = add_para(doc, "") +add_run(p, "CCCD/Hộ chiếu số: ") +add_run(p, "{{ ban_cam_ket.cccd }}") + +p = add_para(doc, "") +add_run(p, "Đơn vị: ") +add_run(p, "{{ ban_cam_ket.don_vi }}") + +p = add_para(doc, "") +add_run(p, "Tên Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH được đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD TP.HCM năm ") +add_run(p, "{{ ban_cam_ket.nam_xet }}", bold=True) +add_run(p, ": ") +add_run(p, "{{ ban_cam_ket.ten_bai_bao }}") + +p = add_para(doc, "", before=6) +add_run(p, "Với vai trò đối với bài báo ", italic=False) +add_run(p, "(☑ vào ô tương ứng)", italic=True) +add_run(p, ":") +add_para(doc, "{% if ban_cam_ket.vai_tro.tac_gia_chinh %}☑{% else %}☐{% endif %} Tác giả chính Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH.") +add_para(doc, "{% if ban_cam_ket.vai_tro.dong_tac_gia %}☑{% else %}☐{% endif %} Đồng tác giả Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH.") + +add_para(doc, "II. CAM KẾT NỘI DUNG (☑ vào ô tương ứng):", bold=True, before=12) + +add_para(doc, "- Quyền sở hữu đối với bài báo trong nước/quốc tế", bold=True, before=6) +add_para(doc, "{% if ban_cam_ket.cam_ket.quyen_so_huu_1 %}☑{% else %}☐{% endif %} Tôi là chủ sở hữu hợp pháp của bài báo hoặc được chủ sở hữu/đồng chủ sở hữu đồng ý cho sử dụng bài báo có tên nêu trên làm sản phẩm đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD.", align="justify") +add_para(doc, "{% if ban_cam_ket.cam_ket.quyen_so_huu_2 %}☑{% else %}☐{% endif %} Trường hợp bài báo là sản phẩm của nhiệm vụ NCKH: chủ sở hữu bài báo (cơ quan) đồng ý cho tác giả/nhóm tác giả sử dụng bài báo có tên nêu trên để đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD.", align="justify") + +add_para(doc, "- Đồng thuận của đồng tác giả bài báo trong nước/quốc tế", bold=True, before=6) +add_para(doc, "{% if ban_cam_ket.cam_ket.dong_thuan %}☑{% else %}☐{% endif %} Tất cả đồng tác giả đã biết, đồng ý và ký xác nhận cho phép Tác giả đăng ký sáng kiến được sử dụng bài báo có tên nêu trên để đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD.", align="justify") + +add_para(doc, "- Cam kết bài báo trong nước/quốc tế uy tín", bold=True, before=6) +add_para(doc, "{% if ban_cam_ket.cam_ket.bai_bao_uy_tin %}☑{% else %}☐{% endif %} Cá nhân đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD đối với bài báo trong nước/quốc tế cam kết bài báo không thuộc \u201CTạp chí săn mồi\u201D. Tôi xin chịu trách nhiệm kiểm tra, đối chiếu và cung cấp bằng chứng khi được yêu cầu.", align="justify") + +add_para(doc, "- Tuân thủ pháp luật sở hữu trí tuệ", bold=True, before=6) +add_para(doc, "{% if ban_cam_ket.cam_ket.tuan_thu_phap_luat %}☑{% else %}☐{% endif %} Tôi cam kết rằng việc sử dụng bài báo đăng ký xét công nhận sáng kiến tại ĐHYD sẽ không gây tranh chấp về: quyền tác giả/quyền liên quan, quyền sở hữu công nghiệp, tiết lộ bí mật kinh doanh, vi phạm bảo mật dữ liệu của bất kỳ bên thứ ba nào. Tôi chịu trách nhiệm trước pháp luật về tính trung thực, hợp pháp của hồ sơ.", align="justify") + +add_para(doc, "III. HẬU QUẢ PHÁP LÝ KHI THÔNG TIN KHÔNG TRUNG THỰC", bold=True, before=12) +add_para(doc, "Tôi xin cam kết chịu trách nhiệm đối với các thông tin kê khai nêu trên. Nếu thông tin được khai trong bản cam kết này không đúng thì tôi chấp nhận:", align="justify") +add_para(doc, "- Hủy kết quả công nhận sáng kiến đã được xét (nếu có);") +add_para(doc, "- Thu hồi, hủy các danh hiệu thi đua, khen thưởng, hoặc các quyền lợi phát sinh có sử dụng sáng kiến này để xét;") +add_para(doc, "- Xử lý theo quy định pháp luật hiện hành và theo quy chế/quy định của ĐHYD.") +add_para(doc, "Cam kết này có hiệu lực kể từ ngày ký và ràng buộc đối với cá nhân cam kết trong suốt thời gian xét công nhận sáng kiến và sau khi kết thúc 02 năm.", align="justify") + +add_para(doc, "", before=12) +add_para(doc, "NGƯỜI CAM KẾT", bold=True, align="right") +add_para(doc, "(Ký tên, ghi rõ họ tên)", italic=True, align="right") +add_para(doc, "") +add_para(doc, "") +add_para(doc, "{{ ban_cam_ket.nguoi_cam_ket }}", bold=True, align="right") + +doc.save(OUTPUT) +print(f"✓ Template saved to: {OUTPUT}") diff --git a/be0/src/be01/data_blank.json b/be0/src/be01/data_blank.json new file mode 100644 index 0000000..7d79600 --- /dev/null +++ b/be0/src/be01/data_blank.json @@ -0,0 +1,133 @@ +{ + "trang_bia": { + "ten_sang_kien": "", + "tac_gia": "", + "don_vi": "", + "thong_tin_lien_he": "", + "nam": "" + }, + "mau_01": { + "mo_dau": "", + "ten_sang_kien": "", + "linh_vuc_ap_dung": "", + "tinh_trang_da_biet": "", + "muc_dich": "", + "cac_buoc_thuc_hien": "", + "dieu_kien_ap_dung": "", + "linh_vuc_ap_dung_2": "", + "ket_qua_thu_duoc": "", + "danh_sach_ap_dung": [ + {"tt": "1", "ten_to_chuc": "", "dia_chi": "", "linh_vuc": ""} + ], + "tinh_moi": "", + "tinh_hieu_qua": { + "loi_ich_kinh_te": "", + "hieu_qua_giang_day": "", + "tang_nang_suat": "", + "nang_cao_hieu_qua": "", + "nang_cao_chat_luong": "", + "giam_chi_phi": "", + "cai_thien_moi_truong": "", + "bao_ve_suc_khoe": "", + "an_toan_lao_dong": "", + "nang_cao_nhan_thuc": "" + }, + "thong_tin_bao_mat": "", + "ngay_ky": {"ngay": "", "thang": "", "nam": ""}, + "lanh_dao_don_vi": "", + "tac_gia_sang_kien": "" + }, + "mau_02": { + "don_vi": "", + "danh_sach_tac_gia": [ + {"stt": "1", "ho_ten": "", "ngay_sinh": "", "noi_cong_tac": "", "chuc_danh": "", "trinh_do": "", "ty_le": ""} + ], + "ten_sang_kien": "", + "chu_dau_tu": "", + "linh_vuc_ap_dung": "", + "ngay_ap_dung": "", + "noi_dung": "", + "phan_loai": { + "giai_phap_ky_thuat": false, + "sang_kien_tu_nckh": false, + "sang_kien_tu_sach": false + }, + "thong_tin_bao_mat": "", + "dieu_kien_ap_dung": "", + "danh_gia_tac_gia": "", + "danh_gia_to_chuc": "", + "danh_sach_tham_gia": [ + {"stt": "1", "ho_ten": "", "ngay_sinh": "", "noi_cong_tac": "", "chuc_danh": "", "trinh_do": "", "noi_dung_ho_tro": ""} + ], + "ngay_ky": {"ngay": "", "thang": "", "nam": ""}, + "lanh_dao_don_vi": "", + "tac_gia_sang_kien": "" + }, + "mau_03": { + "ngay_ky": {"ngay": "", "thang": "", "nam": ""}, + "ten_sang_kien": "", + "tac_gia_chinh": "", + "chuc_vu_don_vi": "", + "ty_le_dong_gop": [ + {"stt": "1", "ho_ten": "", "don_vi": "", "phan_tram": "", "chu_ky": ""} + ], + "tac_gia_chinh_ky": "" + }, + "mau_04": { + "ten_sang_kien": "", + "tac_gia": "", + "chuc_vu_don_vi": "", + "tinh_moi": {"nhan_xet": "", "diem": ""}, + "tinh_hieu_qua": {"nhan_xet": "", "diem": ""}, + "tong_cong": "", + "ket_luan": "", + "ngay_ky": {"ngay": "", "thang": "", "nam": ""}, + "thanh_vien_hoi_dong": "" + }, + "ban_cam_ket": { + "ngay_ky": {"ngay": "", "thang": "", "nam": ""}, + "tac_gia_dang_ky": "", + "cccd": "", + "don_vi": "", + "ten_bai_bao": "", + "nam_xet": "", + "vai_tro": {"tac_gia_chinh": false, "dong_tac_gia": false}, + "cam_ket": { + "quyen_so_huu_1": false, + "quyen_so_huu_2": false, + "dong_thuan": false, + "bai_bao_uy_tin": false, + "tuan_thu_phap_luat": false + }, + "nguoi_cam_ket": "" + }, + "reference_material_honesty": { + "ngay_ky": {"ngay": "", "thang": "", "nam": ""}, + "tac_gia_dang_ky": "", + "cccd": "", + "don_vi": "", + "ten_tai_lieu": "", + "nam_xet": "", + "cam_ket": { + "thong_tin_trung_thuc": false, + "trach_nhiem_phap_luat": false, + "bo_sung_khi_yeu_cau": false + }, + "nguoi_cam_ket": "" + }, + "research_domestic_honesty": { + "ngay_ky": {"ngay": "", "thang": "", "nam": ""}, + "tieu_de_phu": "", + "tac_gia_dang_ky": "", + "cccd": "", + "don_vi": "", + "ten_bai_bao": "", + "nam_xet": "", + "cam_ket": { + "thong_tin_trung_thuc": false, + "trach_nhiem_phap_luat": false, + "bo_sung_khi_yeu_cau": false + }, + "nguoi_cam_ket": "" + } +} diff --git a/be0/src/be01/data_sop.json b/be0/src/be01/data_sop.json new file mode 100644 index 0000000..b939d08 --- /dev/null +++ b/be0/src/be01/data_sop.json @@ -0,0 +1,123 @@ +{ + "trang_bia": { + "ten_sang_kien": "Quy trình xét duyệt Đạo đức trong nghiên cứu trên động vật Đại học Y Dược thành phố Hồ Chí Minh", + "tac_gia": "Trần Hùng, Đỗ Thị Hồng Tươi, Trần Mạnh Hùng, Lê Thị Lan Phương, Trịnh Túy An, Võ Minh Tuấn, Trần Ngọc Đăng, Đỗ Quốc Vũ", + "don_vi": "Phòng Khoa học Công nghệ - Đại học Y Dược Thành phố Hồ Chí Minh", + "thong_tin_lien_he": "Đỗ Quốc Vũ – SĐT 0377318854; Email: doquocvu@ump.edu.vn", + "nam": "2025" + }, + "mau_01": { + "mo_dau": "Trong lĩnh vực nghiên cứu khoa học, việc sử dụng động vật có xương sống cho mục đích nghiên cứu, giảng dạy và dịch vụ khoa học công nghệ đã trở nên phổ biến và đóng vai trò quan trọng trong việc phát triển tri thức và ứng dụng thực tiễn. Tuy nhiên, việc thực hiện các nghiên cứu trên động vật cũng đặt ra nhiều vấn đề liên quan đến đạo đức và phúc lợi động vật.\nTại Đại học Y Dược Thành phố Hồ Chí Minh, hoạt động nghiên cứu trên động vật ngày càng phát triển nhưng vẫn còn tồn tại nhiều khó khăn, bất cập trong công tác xét duyệt đạo đức...\nDo đó, việc xây dựng và triển khai một sáng kiến về quy trình xét duyệt đạo đức trong nghiên cứu trên động vật đồng bộ, minh bạch, chuẩn hóa là rất cần thiết.", + "ten_sang_kien": "Quy trình xét duyệt Đạo đức trong nghiên cứu trên động vật Đại học Y Dược thành phố Hồ Chí Minh", + "linh_vuc_ap_dung": "Cải cách hành chính", + "tinh_trang_da_biet": "Trước khi ban hành và áp dụng quy trình xét duyệt đạo đức trong nghiên cứu trên động vật được chuẩn hóa, công tác xét duyệt tại Đại học Y Dược Thành phố Hồ Chí Minh còn tồn tại nhiều hạn chế và khó khăn cụ thể như: hiện trạng thủ công và thiếu đồng bộ; thiếu tiêu chí, chỉ tiêu đánh giá rõ ràng; thời gian xử lý kéo dài; quản lý lưu trữ hồ sơ chưa hiệu quả; nhận thức và trách nhiệm chưa đồng đều.", + "muc_dich": "Xây dựng và triển khai quy trình xét duyệt đạo đức trong nghiên cứu trên động vật đồng bộ, chuẩn hóa, minh bạch nhằm: Chuẩn hoá và tối ưu hoá quy trình xét duyệt hồ sơ; Đảm bảo tính minh bạch, tiêu chí đánh giá rõ ràng, tránh kéo dài thời gian xử lý hồ sơ; Đảm bảo chất lượng và nâng cao hiệu quả quản lý cũng như trách nhiệm tuân thủ đạo đức của các bên liên quan tại Đại học Y Dược TP. Hồ Chí Minh.", + "cac_buoc_thuc_hien": "Quy trình xét duyệt hồ sơ đạo đức trong nghiên cứu trên động vật tại Đại học Y Dược TP. Hồ Chí Minh được xây dựng và hoàn thiện qua 05 bước:\nBước 1. Thu thập cơ sở pháp lý: Tìm hiểu các văn bản, quy định của Bộ Y tế, trường và thông tin từ các phòng ban chức năng liên quan.\nBước 2. Soạn thảo quy trình: Xây dựng dự thảo quy trình xét duyệt đạo đức, sau đó gửi đến các đơn vị liên quan.\nBước 3. Lấy ý kiến và thống nhất: Tổ chức cuộc họp với lãnh đạo các đơn vị liên quan và Ban Giám hiệu.\nBước 4. Thẩm định: Điều chỉnh, hoàn thiện và gửi hồ sơ về Hội đồng thẩm định.\nBước 5. Hoàn thiện và ban hành: Điều chỉnh theo góp ý của Hội đồng thẩm định và ban hành Quy trình.", + "dieu_kien_ap_dung": "Đối tượng áp dụng: Tất cả các nghiên cứu trên động vật với đối tượng nghiên cứu là động vật có xương sống sử dụng cho mục đích nghiên cứu khoa học, giảng dạy hoặc hợp tác, dịch vụ khoa học công nghệ.\nPhạm vi áp dụng: Áp dụng đối với Hội đồng Đạo đức trong nghiên cứu trên động vật của Đại học Y Dược Thành phố Hồ Chí Minh, các cơ quan, đơn vị, tổ chức, cá nhân có triển khai hoạt động sử dụng động vật, nghiên cứu trên động vật.", + "linh_vuc_ap_dung_2": "Cải cách hành chính", + "ket_qua_thu_duoc": "Ngày 14/4/2025, Quy trình xét duyệt đạo đức trong nghiên cứu trên động vật chính thức được ban hành và triển khai toàn diện tại Đại học Y Dược TP. Hồ Chí Minh. Thời gian xét duyệt hồ sơ được rút ngắn rõ rệt; Tăng cường sự minh bạch và rõ ràng trong toàn bộ quá trình xét duyệt; Nâng cao hiệu quả quản lý và hỗ trợ của các đơn vị chức năng.", + "danh_sach_ap_dung": [ + { + "tt": "1", + "ten_to_chuc": "Đại học Y Dược TP. Hồ Chí Minh", + "dia_chi": "217 Hồng Bàng, P.11, Q.5, TP.HCM", + "linh_vuc": "Cải cách hành chính" + } + ], + "tinh_moi": "Sáng kiến lần đầu tiên xây dựng và chuẩn hóa quy trình xét duyệt đạo đức trong nghiên cứu trên động vật tại Đại học Y Dược TP.HCM: Quy trình được thiết lập theo hướng chuẩn hóa, hệ thống hóa và liên thông chặt chẽ giữa các đơn vị trong nhà trường; xác định rõ luồng xử lý từ người nộp hồ sơ → phòng Khoa học Công nghệ → Hội đồng đạo đức trên động vật → Ban Giám hiệu; áp dụng tiêu chí đánh giá minh bạch, khách quan.", + "tinh_hieu_qua": { + "loi_ich_kinh_te": "Sáng kiến giúp giảm thiểu các sai sót và việc chỉnh sửa hồ sơ nhiều lần, từ đó tiết kiệm chi phí nhân lực và vật liệu trong công tác xét duyệt.", + "hieu_qua_giang_day": "", + "tang_nang_suat": "Quy trình chuẩn hóa giúp cán bộ và nhà nghiên cứu dễ dàng chuẩn bị, nộp hồ sơ theo mẫu sẵn có và theo dõi tiến độ xử lý một cách hiệu quả, tiết kiệm thời gian và công sức.", + "nang_cao_hieu_qua": "Quy trình minh bạch và có tiêu chí đánh giá rõ ràng giúp nâng cao chất lượng xét duyệt, đồng thời tạo điều kiện thuận lợi và hỗ trợ tốt hơn cho các nhà nghiên cứu trong quá trình chuẩn bị hồ sơ.", + "nang_cao_chat_luong": "", + "giam_chi_phi": "", + "cai_thien_moi_truong": "Tạo điều kiện cho người nộp hồ sơ chủ động trong việc nộp hồ sơ và thành viên hội đồng có các biểu mẫu rõ ràng trong việc đánh giá hồ sơ.", + "bao_ve_suc_khoe": "", + "an_toan_lao_dong": "", + "nang_cao_nhan_thuc": "Tăng tính chủ động, chuyên nghiệp, nhận thức tầm quan trọng trong việc chuẩn bị hồ sơ của người nộp. Tăng sự phối hợp chặt chẽ với Hội đồng xét duyệt, nhà nghiên cứu trong việc tuân thủ quy trình và các quy định đạo đức trong nghiên cứu trên động vật." + }, + "thong_tin_bao_mat": "Không", + "ngay_ky": {"ngay": "", "thang": "", "nam": ""}, + "lanh_dao_don_vi": "Trần Ngọc Đăng", + "tac_gia_sang_kien": "Đỗ Quốc Vũ" + }, + "mau_02": { + "don_vi": "Phòng Khoa học Công nghệ", + "danh_sach_tac_gia": [ + {"stt": "1", "ho_ten": "PGS.TS Trần Hùng", "ngay_sinh": "", "noi_cong_tac": "BM Dược Liệu - Khoa Dược", "chuc_danh": "Chủ tịch HĐĐ trên động vật, Giảng viên cao cấp", "trinh_do": "Tiến sĩ", "ty_le": "15%"}, + {"stt": "2", "ho_ten": "PGS.TS Đỗ Thị Hồng Tươi", "ngay_sinh": "", "noi_cong_tac": "Phòng TCCB BM Dược Lý – Khoa Dược", "chuc_danh": "Trưởng phòng, Giảng viên cao cấp", "trinh_do": "Tiến sĩ", "ty_le": "15%"}, + {"stt": "3", "ho_ten": "PGS.TS Trần Mạnh Hùng", "ngay_sinh": "", "noi_cong_tac": "BM Dược Lý – Khoa Dược", "chuc_danh": "Trưởng bộ môn", "trinh_do": "Tiến sĩ", "ty_le": "10%"}, + {"stt": "4", "ho_ten": "TS. Lê Thị Lan Phương", "ngay_sinh": "", "noi_cong_tac": "Khoa YHCT", "chuc_danh": "Phó trưởng khoa", "trinh_do": "Tiến sĩ", "ty_le": "10%"}, + {"stt": "5", "ho_ten": "TS. Trịnh Túy An", "ngay_sinh": "", "noi_cong_tac": "Trung tâm Sapharcen", "chuc_danh": "Nghiên cứu viên", "trinh_do": "Tiến sĩ", "ty_le": "15%"}, + {"stt": "6", "ho_ten": "GS.TS Võ Minh Tuấn", "ngay_sinh": "", "noi_cong_tac": "Bộ môn Sản, Khoa Y", "chuc_danh": "Phó trưởng Bộ môn", "trinh_do": "Tiến sĩ", "ty_le": "10%"}, + {"stt": "7", "ho_ten": "PGS.TS Trần Ngọc Đăng", "ngay_sinh": "", "noi_cong_tac": "Phòng KHCN", "chuc_danh": "Phó trưởng phòng", "trinh_do": "Tiến sĩ", "ty_le": "10%"}, + {"stt": "8", "ho_ten": "CN. Đỗ Quốc Vũ", "ngay_sinh": "14/09/1996", "noi_cong_tac": "Phòng KHCN", "chuc_danh": "Chuyên viên", "trinh_do": "Cử nhân", "ty_le": "15%"} + ], + "ten_sang_kien": "Quy trình xét duyệt Đạo đức trong nghiên cứu trên động vật Đại học Y Dược thành phố Hồ Chí Minh", + "chu_dau_tu": "Đại học Y Dược thành phố Hồ Chí Minh", + "linh_vuc_ap_dung": "Cải cách hành chính", + "ngay_ap_dung": "14/4/2025", + "noi_dung": "Quy trình xét duyệt hồ sơ đạo đức trong nghiên cứu trên động vật tại Đại học Y Dược TP. Hồ Chí Minh được xây dựng và hoàn thiện qua 05 bước. Ngày 14/4/2025 quy trình chính thức được ban hành, giúp rút ngắn thời gian xét duyệt, tăng cường minh bạch, nâng cao hiệu quả quản lý.", + "phan_loai": { + "giai_phap_ky_thuat": true, + "sang_kien_tu_nckh": false, + "sang_kien_tu_sach": false + }, + "thong_tin_bao_mat": "Không", + "dieu_kien_ap_dung": "Đối tượng áp dụng: Tất cả các nghiên cứu trên động vật với đối tượng nghiên cứu là động vật có xương sống. Phạm vi áp dụng: Hội đồng Đạo đức trong nghiên cứu trên động vật của ĐHYD TP.HCM và các đơn vị liên quan.", + "danh_gia_tac_gia": "Sáng kiến giúp tạo sự rõ ràng, minh bạch về các hồ sơ cần nộp để xét duyệt, giúp người nộp chuẩn bị hồ sơ đầy đủ, chính xác, tránh sai sót. Quy trình xác định rõ thời gian xét duyệt, từ đó rút ngắn thời gian xử lý hồ sơ. Sáng kiến còn giúp giảm chi phí và nhân lực, tạo điều kiện thuận lợi cho viên chức, người lao động và người học. Sự minh bạch và chuẩn hóa quy trình cũng nâng cao nhận thức, trách nhiệm và hiệu quả phối hợp giữa các đơn vị.", + "danh_gia_to_chuc": "", + "danh_sach_tham_gia": [ + {"stt": "1", "ho_ten": "", "ngay_sinh": "", "noi_cong_tac": "", "chuc_danh": "", "trinh_do": "", "noi_dung_ho_tro": ""} + ], + "ngay_ky": {"ngay": "", "thang": "", "nam": "2025"}, + "lanh_dao_don_vi": "Trần Ngọc Đăng", + "tac_gia_sang_kien": "Đỗ Quốc Vũ" + }, + "mau_03": { + "ngay_ky": {"ngay": "", "thang": "", "nam": "2025"}, + "ten_sang_kien": "Quy trình xét duyệt Đạo đức trong nghiên cứu trên động vật Đại học Y Dược thành phố Hồ Chí Minh", + "tac_gia_chinh": "Đỗ Quốc Vũ", + "chuc_vu_don_vi": "Chuyên viên, Phòng Khoa học Công nghệ - Đại học Y Dược TP.HCM", + "ty_le_dong_gop": [ + {"stt": "1", "ho_ten": "PGS.TS Trần Hùng", "don_vi": "BM Dược Liệu - Khoa Dược", "phan_tram": "15", "chu_ky": ""}, + {"stt": "2", "ho_ten": "PGS.TS Đỗ Thị Hồng Tươi", "don_vi": "Phòng TCCB - BM Dược Lý, Khoa Dược", "phan_tram": "15", "chu_ky": ""}, + {"stt": "3", "ho_ten": "PGS.TS Trần Mạnh Hùng", "don_vi": "BM Dược Lý - Khoa Dược", "phan_tram": "10", "chu_ky": ""}, + {"stt": "4", "ho_ten": "TS. Lê Thị Lan Phương", "don_vi": "Khoa YHCT", "phan_tram": "10", "chu_ky": ""}, + {"stt": "5", "ho_ten": "TS. Trịnh Túy An", "don_vi": "Trung tâm Sapharcen", "phan_tram": "15", "chu_ky": ""}, + {"stt": "6", "ho_ten": "GS.TS Võ Minh Tuấn", "don_vi": "Bộ môn Sản, Khoa Y", "phan_tram": "10", "chu_ky": ""}, + {"stt": "7", "ho_ten": "PGS.TS Trần Ngọc Đăng", "don_vi": "Phòng KHCN", "phan_tram": "10", "chu_ky": ""}, + {"stt": "8", "ho_ten": "CN. Đỗ Quốc Vũ", "don_vi": "Phòng KHCN", "phan_tram": "15", "chu_ky": ""} + ], + "tac_gia_chinh_ky": "Đỗ Quốc Vũ" + }, + "mau_04": { + "ten_sang_kien": "Quy trình xét duyệt Đạo đức trong nghiên cứu trên động vật Đại học Y Dược thành phố Hồ Chí Minh", + "tac_gia": "Đỗ Quốc Vũ và nhóm tác giả", + "chuc_vu_don_vi": "Chuyên viên, Phòng Khoa học Công nghệ - Đại học Y Dược TP.HCM", + "tinh_moi": {"nhan_xet": "", "diem": ""}, + "tinh_hieu_qua": {"nhan_xet": "", "diem": ""}, + "tong_cong": "", + "ket_luan": "", + "ngay_ky": {"ngay": "", "thang": "", "nam": "2025"}, + "thanh_vien_hoi_dong": "" + }, + "ban_cam_ket": { + "ngay_ky": {"ngay": "", "thang": "", "nam": "2026"}, + "tac_gia_dang_ky": "", + "cccd": "", + "don_vi": "", + "ten_bai_bao": "", + "nam_xet": "2026", + "vai_tro": {"tac_gia_chinh": false, "dong_tac_gia": false}, + "cam_ket": { + "quyen_so_huu_1": false, + "quyen_so_huu_2": false, + "dong_thuan": false, + "bai_bao_uy_tin": false, + "tuan_thu_phap_luat": false + }, + "nguoi_cam_ket": "" + } +} diff --git a/be0/src/be01/docx_normalize.py b/be0/src/be01/docx_normalize.py new file mode 100644 index 0000000..fd93781 --- /dev/null +++ b/be0/src/be01/docx_normalize.py @@ -0,0 +1,1405 @@ +"""Normalize OOXML in generated .docx bytes for safer layout in browsers (docx-preview) and print.""" + +from __future__ import annotations + +import io +import re +import zipfile +import xml.etree.ElementTree as ET +from copy import deepcopy + +# Self-closing trHeight, e.g. +_TR_HEIGHT_RE = re.compile(r"]*/>", re.IGNORECASE) + +# Optional paired form (unusual for trHeight but defensive) +_TR_HEIGHT_BLOCK_RE = re.compile( + r"]*>.*?", + re.IGNORECASE | re.DOTALL, +) + +_PARAGRAPH_RE = re.compile(r"]*>.*?", re.IGNORECASE | re.DOTALL) +_RUN_RE = re.compile(r"]*>.*?", re.IGNORECASE | re.DOTALL) +_RUN_SZ_RE = re.compile(r'(]*\bw:val=")(\d+)(")', re.IGNORECASE) +_RUN_SZCS_RE = re.compile(r'(]*\bw:val=")(\d+)(")', re.IGNORECASE) + +_SHRINK_TARGET_PHRASES = ( + "ĐẠI HỌC Y DƯỢC THÀNH PHỐ HỒ CHÍ MINH", + "ĐẠI HỌC Y DƯỢCTHÀNH PHỐ HỒ CHÍ MINH", + "CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM", + "Phòng Khoa học Công nghệ", +) +_LEFT_SHIFT_TARGETS = ( + "ĐẠI HỌC Y DƯỢC", + "THÀNH PHỐ HỒ CHÍ MINH", + "PHÒNG KHOA HỌC CÔNG NGHỆ", + "{{ mau_02.don_vi|upper }}", +) + +_W_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main" +_NS = {"w": _W_NS} + + +def _paragraph_plain_text(p: ET.Element) -> str: + return "".join((t.text or "") for t in p.findall(".//w:t", _NS)) + + +def _compact_no_inner_spaces(text: str) -> str: + """Collapse whitespace incl. NBSP / figure space so « BỘ Y TẾ » matches reliably.""" + t = ( + text.replace("\u00a0", " ") + .replace("\u2007", " ") + .replace("\u202f", " ") + .replace("\u3000", " ") + ) + return "".join(t.split()) + + +def _ensure_paragraph_centered_no_indent(p: ET.Element) -> None: + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + p_pr = ET.SubElement(p, f"{{{_W_NS}}}pPr") + + ind = p_pr.find("w:ind", _NS) + if ind is not None: + p_pr.remove(ind) + + jc = p_pr.find("w:jc", _NS) + if jc is None: + jc = ET.SubElement(p_pr, f"{{{_W_NS}}}jc") + jc.set(f"{{{_W_NS}}}val", "center") + + +def _ensure_run_times_new_roman(r_pr: ET.Element) -> None: + fonts = r_pr.find("w:rFonts", _NS) + if fonts is None: + fonts = ET.SubElement(r_pr, f"{{{_W_NS}}}rFonts") + tnr = "Times New Roman" + fonts.set(f"{{{_W_NS}}}ascii", tnr) + fonts.set(f"{{{_W_NS}}}hAnsi", tnr) + fonts.set(f"{{{_W_NS}}}cs", tnr) + fonts.set(f"{{{_W_NS}}}eastAsia", tnr) + + +def _ensure_run_bold(r_pr: ET.Element) -> None: + for tag in ("b", "bCs"): + el = r_pr.find(f"w:{tag}", _NS) + if el is None: + el = ET.SubElement(r_pr, f"{{{_W_NS}}}{tag}") + el.set(f"{{{_W_NS}}}val", "1") + + +def _ensure_run_not_bold(r_pr: ET.Element) -> None: + """Strip bold so the run renders regular weight. Removes the tag entirely (rather than + setting val="0") because the cover ministry line never inherits bold from the styles we + emit, and tests pin the post-normalization XML to having no at all.""" + for tag in ("b", "bCs"): + el = r_pr.find(f"w:{tag}", _NS) + if el is not None: + r_pr.remove(el) + + +def _ensure_run_not_italic(r_pr: ET.Element) -> None: + """Force upright text with val="0" rather than removing the tag, so it overrides any + italic inherited from paragraph / character styles.""" + for tag in ("i", "iCs"): + el = r_pr.find(f"w:{tag}", _NS) + if el is None: + el = ET.SubElement(r_pr, f"{{{_W_NS}}}{tag}") + el.set(f"{{{_W_NS}}}val", "0") + + +def _run_has_text_content(run: ET.Element) -> bool: + for t in run.findall(".//w:t", _NS): + if (t.text or "").strip(): + return True + return False + + +def _local_tag(elem: ET.Element) -> str: + tag = elem.tag + return tag.split("}", 1)[-1] if "}" in tag else tag + + +# Letterhead: one paragraph with a soft break, or two stacked paragraphs. +_UNI_LINE_DHYD_COMPACT = "".join("ĐẠI HỌC Y DƯỢC".split()) +_UNI_LINE_TPHCM_COMPACT = "".join("THÀNH PHỐ HỒ CHÍ MINH".split()) + + +def _canonicalize_dhyd_typo(s: str) -> str: + """Map the official template's misspelled « HỘC » (Ộ = U+1ED8, Ô + dot below) onto the + correct « HỌC » (Ọ = U+1ECC, O + dot below) so all letterhead comparisons can use a + single canonical compact form regardless of which spelling the source DOCX carries.""" + return s.replace("\u1ed8", "\u1ecc").replace("\u1ed9", "\u1ecd") + + +def _is_university_letterhead_paragraph(para_text: str) -> bool: + n = ( + para_text.replace("\u00a0", " ") + .replace("\u2007", " ") + .replace("\u202f", " ") + .replace("\u3000", " ") + ) + canonical = _canonicalize_dhyd_typo(n) + if "ĐẠI HỌC Y DƯỢC" in canonical and "THÀNH PHỐ HỒ CHÍ MINH" in canonical: + return True + compact = "".join(canonical.split()) + return compact in (_UNI_LINE_DHYD_COMPACT, _UNI_LINE_TPHCM_COMPACT) + + +def _canonical_compact(text: str) -> str: + """Compact (drop all whitespace) and canonicalize the dhyd typo in one step.""" + return _canonicalize_dhyd_typo(_compact_no_inner_spaces(text)) + + +def _university_paragraph_has_soft_break(p: ET.Element) -> bool: + """True when a non-page already separates the two letterhead phrases in + document order inside this paragraph (handles a soft break inside one run, or a + break between two runs).""" + type_attr = f"{{{_W_NS}}}type" + dhyd = _UNI_LINE_DHYD_COMPACT + tphcm = _UNI_LINE_TPHCM_COMPACT + seen_dhyd = False + for elem in p.iter(): + local = _local_tag(elem) + if local == "t": + compact = _canonical_compact(elem.text or "") + if not seen_dhyd and dhyd in compact: + seen_dhyd = True + tphcm_at = compact.find(tphcm) + if tphcm_at != -1 and tphcm_at > compact.find(dhyd): + # Both phrases sit in the same with nothing between them. + return False + elif seen_dhyd and tphcm in compact: + return False + elif local == "br" and seen_dhyd: + if elem.attrib.get(type_attr) != "page": + return True + return False + + +def _ensure_university_letterhead_visual_split(p: ET.Element) -> None: + """Insert a soft immediately before the city-line phrase when both letterhead + phrases share a paragraph on one visual line. + + Idempotent: skips paragraphs that only carry one of the phrases (two-paragraph layout + is handled by the caller) and paragraphs already broken with a soft . + + Handles three shapes: + - both phrases inside a single -> split that + - phrases in two adjacent in the same -> insert between them + - phrases in elements that live in different -> insert inside + the run carrying the city line, right before its + """ + full_compact = _canonical_compact(_paragraph_plain_text(p)) + if _UNI_LINE_DHYD_COMPACT not in full_compact: + return + if _UNI_LINE_TPHCM_COMPACT not in full_compact: + return + if _university_paragraph_has_soft_break(p): + return + + target_compact = _UNI_LINE_TPHCM_COMPACT + head_char = target_compact[0] + + for run in p.findall("w:r", _NS): + for child_idx, child in enumerate(list(run)): + if _local_tag(child) != "t": + continue + text = child.text or "" + if target_compact not in _canonical_compact(text): + continue + + split_at = -1 + for i, ch in enumerate(text): + if ch == head_char and _canonical_compact(text[i:]).startswith(target_compact): + split_at = i + break + if split_at < 0: + continue + + br = ET.Element(f"{{{_W_NS}}}br") + if split_at == 0: + run.insert(child_idx, br) + else: + before = text[:split_at].rstrip() + after = text[split_at:] + child.text = before + new_t = ET.Element(f"{{{_W_NS}}}t") + new_t.set("{http://www.w3.org/XML/1998/namespace}space", "preserve") + new_t.text = after + run.insert(child_idx + 1, br) + run.insert(child_idx + 2, new_t) + return + + +def _paragraphs_document_order(root: ET.Element) -> list[ET.Element]: + """All ``w:p`` in serialization (depth-first) order.""" + return root.findall(".//w:p", _NS) + + +def _first_hard_page_break_paragraph_index(paragraphs: list[ET.Element]) -> int | None: + """Index of the first paragraph that contains ``w:br`` with ``type`` = ``page``.""" + type_attr = f"{{{_W_NS}}}type" + for i, p in enumerate(paragraphs): + for br in p.findall(".//w:br", _NS): + if br.attrib.get(type_attr) == "page": + return i + return None + + +def _paragraph_ids_first_physical_page(paragraphs: list[ET.Element]) -> set[int]: + """ + Paragraph object ids on the first printed page: everything up to and including the first + paragraph with a hard page break. If there is no hard break (rare), only the first chunks + of the body are considered so duplicate letterheads in later tables keep template styling. + """ + brk = _first_hard_page_break_paragraph_index(paragraphs) + if brk is not None: + return {id(paragraphs[i]) for i in range(brk + 1)} + cap = min(len(paragraphs), 24) + return {id(paragraphs[i]) for i in range(cap)} + + +def strip_table_row_height_rules_from_docx(data: bytes) -> bytes: + """ + Remove `` from `word/document.xml` so rows use automatic height in Word OOXML. + + Fixed / atLeast heights are often mapped incorrectly by docx-preview to CSS `height`, which + makes wrapped Vietnamese text overlap inside table cells when exporting to PDF in the + browser. LibreOffice also lays out more predictably without contradictory height hints. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + text = raw.decode("utf-8") + text, n1 = _TR_HEIGHT_RE.subn("", text) + text, n2 = _TR_HEIGHT_BLOCK_RE.subn("", text) + if n1 or n2: + raw = text.encode("utf-8") + zout.writestr(info, raw) + return out.getvalue() + + +def _should_rewrite_distribute_jc_in_aux_word_part(filename: str) -> bool: + """Parts besides ``document.xml`` that may carry ```` (styles, notes, headers).""" + if filename == "word/document.xml": + return False + if not filename.startswith("word/") or not filename.endswith(".xml"): + return False + if filename in ( + "word/styles.xml", + "word/footnotes.xml", + "word/endnotes.xml", + "word/comments.xml", + ): + return True + base = filename.split("/")[-1] + if base.startswith("header") and base.endswith(".xml"): + return True + if base.startswith("footer") and base.endswith(".xml"): + return True + return False + + +def _patch_settings_add_do_not_expand_shift_return(raw: bytes) -> bytes: + """Ensure ``w:doNotExpandShiftReturn`` so lines ending in soft breaks are not stretched when justified.""" + val_attr = f"{{{_W_NS}}}val" + root = ET.fromstring(raw) + compat = root.find("w:compat", _NS) + if compat is None: + compat = ET.SubElement(root, f"{{{_W_NS}}}compat") + dnsr = None + for child in compat: + if _local_tag(child) == "doNotExpandShiftReturn": + dnsr = child + break + if dnsr is None: + dnsr = ET.SubElement(compat, f"{{{_W_NS}}}doNotExpandShiftReturn") + dnsr.set(val_attr, "1") + return ET.tostring(root, encoding="utf-8", xml_declaration=True) + + +def relax_justified_softbreak_paragraphs_in_docx(data: bytes) -> bytes: + """ + Cap word-spacing growth in justified paragraphs so word gaps stay closer to normal + word space (target « not more than a few spaces » between words on short lines). + + Transforms: + + 1. ```` -> ```` everywhere it appears + (``document.xml``, ``styles.xml``, headers/footers, footnotes…). ``distribute`` also + adds inter-*character* pitch on every line and is the worst case for gappy Vietnamese. + 2. In ``word/document.xml`` only: justified (``both``) paragraphs that contain a non-page + soft ```` are split into one paragraph per soft-break segment so each visual + line before a soft break becomes a true paragraph last line and is not stretched. + 3. ``word/settings.xml``: set compatibility ``w:doNotExpandShiftReturn`` so consumers that + honor it also skip stretching lines that end in a soft line break. + + The XML structure is preserved otherwise (runs, run properties, text content, page + breaks all survive unchanged); only soft ```` elements are consumed during the + paragraph split. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + _convert_distribute_to_both(root) + _split_justified_paragraphs_at_softbreaks(root) + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + elif _should_rewrite_distribute_jc_in_aux_word_part(info.filename): + root = ET.fromstring(raw) + _convert_distribute_to_both(root) + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + elif info.filename == "word/settings.xml": + raw = _patch_settings_add_do_not_expand_shift_return(raw) + zout.writestr(info, raw) + return out.getvalue() + + +def _convert_distribute_to_both(root: ET.Element) -> None: + """Rewrite every ```` to ``both`` so the last line of each + paragraph stops being stretched to fill the column width.""" + val_attr = f"{{{_W_NS}}}val" + for jc in root.findall(".//w:jc", _NS): + if jc.attrib.get(val_attr) == "distribute": + jc.attrib[val_attr] = "both" + + +def _paragraph_is_justified_both(p: ET.Element) -> bool: + val_attr = f"{{{_W_NS}}}val" + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + return False + jc = p_pr.find("w:jc", _NS) + if jc is None: + return False + return jc.attrib.get(val_attr) == "both" + + +def _paragraph_has_soft_break(p: ET.Element) -> bool: + type_attr = f"{{{_W_NS}}}type" + for br in p.findall(".//w:br", _NS): + if br.attrib.get(type_attr) != "page": + return True + return False + + +def _clone_paragraph_shell(p: ET.Element) -> ET.Element: + """Create an empty ```` element that copies ``p``'s attributes and a deep copy of + its ```` (so paragraph alignment, indent, spacing and run-default properties + travel into the split fragments). Body content is not copied.""" + new_p = ET.Element(p.tag, dict(p.attrib)) + p_pr = p.find("w:pPr", _NS) + if p_pr is not None: + new_p.append(deepcopy(p_pr)) + return new_p + + +def _split_run_at_softbreak(run: ET.Element) -> list[ET.Element]: + """Split a single ```` containing one or more non-page ```` into a list of + runs. The returned list is interleaved so callers can detect break points: each + consecutive pair represents content separated by a soft break. Runs without a soft + break return a single-element list containing the original run.""" + type_attr = f"{{{_W_NS}}}type" + rPr = run.find("w:rPr", _NS) + body_children = [child for child in run if _local_tag(child) != "rPr"] + has_soft_break = any( + _local_tag(c) == "br" and c.attrib.get(type_attr) != "page" for c in body_children + ) + if not has_soft_break: + return [run] + + fragments: list[ET.Element] = [] + current = ET.Element(run.tag, dict(run.attrib)) + if rPr is not None: + current.append(deepcopy(rPr)) + + def _flush() -> None: + nonlocal current + fragments.append(current) + current = ET.Element(run.tag, dict(run.attrib)) + if rPr is not None: + current.append(deepcopy(rPr)) + + for child in body_children: + if _local_tag(child) == "br" and child.attrib.get(type_attr) != "page": + _flush() + continue + current.append(deepcopy(child)) + fragments.append(current) + return fragments + + +def _split_justified_paragraphs_at_softbreaks(root: ET.Element) -> None: + """For each paragraph with ```` that contains a non-page soft break, + replace the paragraph with N paragraphs, splitting at each soft break. + + Walks each ````'s direct children in document order. Soft breaks inside runs are + handled by :func:`_split_run_at_softbreak`; soft breaks that appear as direct paragraph + children (rare but valid) trigger a paragraph boundary directly. Paragraph property + elements (````) are deep-copied into every fragment so alignment / indent / + spacing survive. + """ + type_attr = f"{{{_W_NS}}}type" + parent_map = {child: parent for parent in root.iter() for child in parent} + + for p in list(root.findall(".//w:p", _NS)): + if not _paragraph_is_justified_both(p): + continue + if not _paragraph_has_soft_break(p): + continue + + fragments: list[ET.Element] = [_clone_paragraph_shell(p)] + + def _start_new_fragment() -> None: + fragments.append(_clone_paragraph_shell(p)) + + for child in list(p): + local = _local_tag(child) + if local == "pPr": + continue + if local == "br" and child.attrib.get(type_attr) != "page": + _start_new_fragment() + continue + if local == "r": + run_parts = _split_run_at_softbreak(child) + if len(run_parts) == 1: + fragments[-1].append(run_parts[0]) + continue + for idx, part in enumerate(run_parts): + if idx > 0: + _start_new_fragment() + fragments[-1].append(part) + continue + fragments[-1].append(deepcopy(child)) + + if len(fragments) < 2: + continue + + parent = parent_map.get(p) + if parent is None: + continue + siblings = list(parent) + try: + idx = siblings.index(p) + except ValueError: + continue + parent.remove(p) + for offset, frag in enumerate(fragments): + parent.insert(idx + offset, frag) + + +def shrink_overflow_sensitive_text_half_point(data: bytes) -> bytes: + """ + Reduce font size by 0.5pt (1 half-point in OOXML) for specific long phrases + that frequently wrap inside narrow header/table cells. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + text = raw.decode("utf-8") + + def _patch_paragraph(match: re.Match[str]) -> str: + para = match.group(0) + para_text = "".join(re.findall(r"]*>(.*?)", para, re.IGNORECASE | re.DOTALL)) + if not any(t in para_text for t in _SHRINK_TARGET_PHRASES): + return para + + def _patch_run(run_match: re.Match[str]) -> str: + run = run_match.group(0) + + def _dec_sz(m: re.Match[str]) -> str: + next_val = max(2, int(m.group(2)) - 1) + return f"{m.group(1)}{next_val}{m.group(3)}" + + run = _RUN_SZ_RE.sub(_dec_sz, run) + run = _RUN_SZCS_RE.sub(_dec_sz, run) + return run + + return _RUN_RE.sub(_patch_run, para) + + text = _PARAGRAPH_RE.sub(_patch_paragraph, text) + raw = text.encode("utf-8") + zout.writestr(info, raw) + return out.getvalue() + + +def style_national_header_line(data: bytes) -> bytes: + """ + Force `CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM` runs to: + - 13pt + - bold + - slightly condensed character spacing + """ + target = "CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM" + + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + for p in root.findall(".//w:p", _NS): + para_text = "".join((t.text or "") for t in p.findall(".//w:t", _NS)) + if target not in para_text: + continue + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + p_pr = ET.SubElement(p, f"{{{_W_NS}}}pPr") + + jc = p_pr.find("w:jc", _NS) + if jc is None: + jc = ET.SubElement(p_pr, f"{{{_W_NS}}}jc") + jc.set(f"{{{_W_NS}}}val", "left") + + ind = p_pr.find("w:ind", _NS) + if ind is None: + ind = ET.SubElement(p_pr, f"{{{_W_NS}}}ind") + # Shift further left by ~10px (150 twips) from previous layout. + ind.set(f"{{{_W_NS}}}left", "-270") + ind.set(f"{{{_W_NS}}}right", "0") + + for r in p.findall("w:r", _NS): + rPr = r.find("w:rPr", _NS) + if rPr is None: + rPr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + + sz = rPr.find("w:sz", _NS) + if sz is None: + sz = ET.SubElement(rPr, f"{{{_W_NS}}}sz") + sz.set(f"{{{_W_NS}}}val", "21") # 10.5pt (closest OOXML half-point to 10.7) + + sz_cs = rPr.find("w:szCs", _NS) + if sz_cs is None: + sz_cs = ET.SubElement(rPr, f"{{{_W_NS}}}szCs") + sz_cs.set(f"{{{_W_NS}}}val", "21") + + b = rPr.find("w:b", _NS) + if b is None: + b = ET.SubElement(rPr, f"{{{_W_NS}}}b") + b.set(f"{{{_W_NS}}}val", "1") + + b_cs = rPr.find("w:bCs", _NS) + if b_cs is None: + b_cs = ET.SubElement(rPr, f"{{{_W_NS}}}bCs") + b_cs.set(f"{{{_W_NS}}}val", "1") + + spacing = rPr.find("w:spacing", _NS) + if spacing is None: + spacing = ET.SubElement(rPr, f"{{{_W_NS}}}spacing") + spacing.set(f"{{{_W_NS}}}val", "-6") + + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +def shift_selected_header_lines_left(data: bytes) -> bytes: + """Shift selected header/unit lines left by ~10px (150 twips).""" + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + for p in root.findall(".//w:p", _NS): + para_text = "".join((t.text or "") for t in p.findall(".//w:t", _NS)) + if not any(tok in para_text for tok in _LEFT_SHIFT_TARGETS): + continue + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + p_pr = ET.SubElement(p, f"{{{_W_NS}}}pPr") + ind = p_pr.find("w:ind", _NS) + if ind is None: + ind = ET.SubElement(p_pr, f"{{{_W_NS}}}ind") + ind.set(f"{{{_W_NS}}}left", "-150") + ind.set(f"{{{_W_NS}}}right", "0") + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +def normalize_bo_y_te_header_lines(data: bytes) -> bytes: + """ + First-page letterhead only: + + - Ministry line « BỘ Y TẾ »: centered, regular weight (bold tag stripped), upright. + - University block « ĐẠI HỘC Y DƯỢC » + « THÀNH PHỐ HỒ CHÍ MINH »: centered, bold, upright. + When the two phrases live in one paragraph on a single visual line, a soft is + inserted before the city line so the cover renders the phrases on two stacked lines. + + Later pages / tables that repeat the ministry block are left unchanged. + + Run after `shift_selected_header_lines_left` so negative indents do not offset these blocks. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + paragraphs = _paragraphs_document_order(root) + first_page_ids = _paragraph_ids_first_physical_page(paragraphs) + for p in paragraphs: + if id(p) not in first_page_ids: + continue + para_text = _paragraph_plain_text(p) + compact = _compact_no_inner_spaces(para_text) + + if compact == "BỘYTẾ": + _ensure_paragraph_centered_no_indent(p) + for r in p.findall("w:r", _NS): + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + _ensure_run_not_bold(r_pr) + _ensure_run_not_italic(r_pr) + _ensure_run_times_new_roman(r_pr) + continue + + if _is_university_letterhead_paragraph(para_text): + _ensure_paragraph_centered_no_indent(p) + _ensure_university_letterhead_visual_split(p) + for r in p.findall("w:r", _NS): + if not _run_has_text_content(r): + continue + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + _ensure_run_bold(r_pr) + _ensure_run_not_italic(r_pr) + _ensure_run_times_new_roman(r_pr) + + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +_DON_VI_PREFIX = "ĐƠN VỊ:" +_DON_VI_SIGNATURE_PREFIX = "Đơn vị " +_SIGNATURE_LEADER_PHRASE = "Xác nhận của lãnh đạo" +_NATIONAL_HEADER_PHRASE = "CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM" + + +def _table_plain_text(tbl: ET.Element) -> str: + return "".join((t.text or "") for t in tbl.findall(".//w:t", _NS)) + + +def _strip_text_prefix_from_paragraph(p: ET.Element, prefix: str) -> bool: + """Remove the first occurrence of ``prefix`` from a in this paragraph. Returns + True when the prefix was found and stripped. The trailing remainder is left-stripped + so a trailing whitespace token does not turn the line into a leading-space artifact.""" + for t in p.findall(".//w:t", _NS): + text = t.text or "" + idx = text.find(prefix) + if idx < 0: + continue + remainder = text[idx + len(prefix):].lstrip() + t.text = (text[:idx] + remainder).lstrip() + t.set("{http://www.w3.org/XML/1998/namespace}space", "preserve") + return True + return False + + +def _strip_don_vi_prefix_from_paragraph(p: ET.Element) -> None: + """Remove the literal « ĐƠN VỊ: » prefix from the first that contains it; keep + the rest of the paragraph untouched (Jinja placeholder, trailing whitespace, etc.).""" + _strip_text_prefix_from_paragraph(p, _DON_VI_PREFIX) + + +def normalize_mau_02_letterhead_table(data: bytes) -> bytes: + """Tighten the Mẫu số 02 letterhead table so the four header lines align cleanly: + + - Right cell « CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM »: shrink font so the phrase fits + on a single line within the narrow right column, override « left » alignment set + by :func:`style_national_header_line` back to centered, and clear its negative + indent (which only made sense for the full-width header on other pages). + - Left cell « ĐẠI HỌC Y DƯỢC THÀNH PHỐ HỒ CHÍ MINH »: insert a soft before + « THÀNH PHỐ HỒ CHÍ MINH » so the phrase wraps cleanly into two stacked lines, and + strip bold from every run. + - Left cell « ĐƠN VỊ: {{ mau_02.don_vi }} »: drop the « ĐƠN VỊ: » literal prefix so + only the resolved unit name remains; force bold and strip italic on every run. + + Scoped to the Mẫu số 02 table by requiring both « ĐƠN VỊ: » and the national-header + phrase in the same , so the duplicate letterhead blocks on Mẫu số 03 / Bản + cam kết (which have « BỘ Y TẾ » instead of « ĐƠN VỊ ») are left untouched. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + for tbl in root.findall(".//w:tbl", _NS): + all_text = _table_plain_text(tbl) + if _DON_VI_PREFIX not in all_text: + continue + if _NATIONAL_HEADER_PHRASE not in all_text: + continue + for p in tbl.findall(".//w:p", _NS): + para_text = _paragraph_plain_text(p) + + if _NATIONAL_HEADER_PHRASE in para_text: + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + p_pr = ET.SubElement(p, f"{{{_W_NS}}}pPr") + ind = p_pr.find("w:ind", _NS) + if ind is not None: + p_pr.remove(ind) + jc = p_pr.find("w:jc", _NS) + if jc is None: + jc = ET.SubElement(p_pr, f"{{{_W_NS}}}jc") + jc.set(f"{{{_W_NS}}}val", "center") + for r in p.findall("w:r", _NS): + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + sz = r_pr.find("w:sz", _NS) + if sz is None: + sz = ET.SubElement(r_pr, f"{{{_W_NS}}}sz") + sz.set(f"{{{_W_NS}}}val", "19") # 9.5pt + sz_cs = r_pr.find("w:szCs", _NS) + if sz_cs is None: + sz_cs = ET.SubElement(r_pr, f"{{{_W_NS}}}szCs") + sz_cs.set(f"{{{_W_NS}}}val", "19") + spacing = r_pr.find("w:spacing", _NS) + if spacing is None: + spacing = ET.SubElement(r_pr, f"{{{_W_NS}}}spacing") + spacing.set(f"{{{_W_NS}}}val", "-8") + continue + + if _is_university_letterhead_paragraph(para_text): + _ensure_paragraph_centered_no_indent(p) + _ensure_university_letterhead_visual_split(p) + for r in p.findall("w:r", _NS): + if not _run_has_text_content(r): + continue + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + _ensure_run_not_bold(r_pr) + _ensure_run_not_italic(r_pr) + _ensure_run_times_new_roman(r_pr) + continue + + if _DON_VI_PREFIX in para_text: + _strip_don_vi_prefix_from_paragraph(p) + _ensure_paragraph_centered_no_indent(p) + for r in p.findall("w:r", _NS): + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + _ensure_run_bold(r_pr) + _ensure_run_not_italic(r_pr) + _ensure_run_times_new_roman(r_pr) + + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +_BO_Y_TE_COMPACT = "BỘYTẾ" + + +_MAU_04_HEADER_COMPACT = "Mẫusố04" + + +def _paragraph_has_page_break(p: ET.Element) -> bool: + """True when ``p`` contains at least one ````.""" + type_attr = f"{{{_W_NS}}}type" + for br in p.findall(".//w:br", _NS): + if br.attrib.get(type_attr) == "page": + return True + return False + + +def _paragraph_is_empty_page_break(p: ET.Element) -> bool: + """True when the paragraph contains a page break and no visible text — i.e. its + only purpose is to force a page break. Such paragraphs render as a blank page in + docx-preview when followed by a table, even though LibreOffice/Word collapse them. + """ + if not _paragraph_has_page_break(p): + return False + for t in p.findall(".//w:t", _NS): + if (t.text or "").strip(): + return False + return True + + +def _ensure_pPr(p: ET.Element) -> ET.Element: + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + p_pr = ET.Element(f"{{{_W_NS}}}pPr") + p.insert(0, p_pr) + return p_pr + + +def _ensure_page_break_before(p: ET.Element) -> None: + """Add ```` to the paragraph's ```` (after ``pStyle`` if + present, to satisfy the OOXML element order). Idempotent.""" + p_pr = _ensure_pPr(p) + if p_pr.find("w:pageBreakBefore", _NS) is not None: + return + pbb = ET.Element(f"{{{_W_NS}}}pageBreakBefore") + p_style = p_pr.find("w:pStyle", _NS) + if p_style is None: + p_pr.insert(0, pbb) + else: + children = list(p_pr) + idx = children.index(p_style) + p_pr.insert(idx + 1, pbb) + + +def _first_paragraph_inside_table(tbl: ET.Element) -> ET.Element | None: + """Return the first ```` inside the first cell of the first row of ``tbl``, or + ``None`` if the table has no paragraph (rare; tables always have a cell paragraph).""" + first_row = tbl.find("w:tr", _NS) + if first_row is None: + return None + first_cell = first_row.find("w:tc", _NS) + if first_cell is None: + return None + return first_cell.find("w:p", _NS) + + +def collapse_empty_page_break_paragraphs_in_docx(data: bytes) -> bytes: + """Eliminate blank pages introduced by empty paragraphs whose only purpose is to + host a ````. The browser-side docx-preview renderer treats such + paragraphs as occupying their own page when followed by a table (see body indices + where the rendered docx shows e.g. « page 4 / 10 » as a blank sheet between Mẫu số + 01 sig table and Mẫu số 02 letterhead table), even though LibreOffice/Word collapse + them onto the surrounding pages. + + Transform pattern (per empty page-break paragraph ``P_break``): + + 1. Look at the *next* body sibling of ``P_break``: + - ````: add ```` to its ```` (idempotent), then + remove ``P_break`` from the body. The next paragraph now starts on a new page. + - ````: add ```` to the ```` of the first + paragraph inside the first cell of the first row of the table, then remove + ``P_break``. The table is then anchored to a new page via the cell paragraph. + - Anything else / no next sibling: leave ``P_break`` untouched so the original + page break is preserved as a fallback. + + Idempotent: once a paragraph already carries ````, a second pass + does not double-register it; once the empty break paragraph is removed, there is + nothing left to collapse. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + body = root.find("w:body", _NS) + if body is not None: + _collapse_empty_page_break_paragraphs(body) + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +def _collapse_empty_page_break_paragraphs(body: ET.Element) -> None: + """Body-direct-children scan: do not recurse into tables (page breaks inside table + cells are out of scope here).""" + children = list(body) + for i, child in enumerate(children): + if _local_tag(child) != "p": + continue + if not _paragraph_is_empty_page_break(child): + continue + nxt = children[i + 1] if i + 1 < len(children) else None + if nxt is None: + continue + + nxt_local = _local_tag(nxt) + if nxt_local == "p": + _ensure_page_break_before(nxt) + body.remove(child) + elif nxt_local == "tbl": + anchor = _first_paragraph_inside_table(nxt) + if anchor is None: + continue + _ensure_page_break_before(anchor) + body.remove(child) + + +def strip_mau_04_evaluation_section_in_docx(data: bytes) -> bytes: + """Remove the « Mẫu số 04 — PHIẾU ĐÁNH GIÁ SÁNG KIẾN » section from the applicant + template. The council evaluation form is filled in the dedicated council UI + (``fe0/src/components/council/evaluation/InitiativeEvaluationForm.tsx``); the + applicant's submission package must not embed an empty council scorecard. + + Section boundaries inside ``word/document.xml``: + + - Locate the ```` whose plain text is « Mẫu số 04 » (compact match handles any + stray NBSP / figure-space variants). + - Walk backwards through the body children until we hit the empty ```` that + contains a ```` — that page break ends the previous section's + page and starts the Mẫu số 04 page. Everything from that page-break paragraph up + to (and including) the Mẫu số 04 header is removed. + - Walk forwards from the Mẫu số 04 header through the body children until we hit + either the next ```` with a page break or a ````. We remove + everything strictly before that boundary, so the page break leading into the next + section (« Bản cam kết ») is preserved and the section keeps its own page. + + Idempotent: a second pass finds no « Mẫu số 04 » paragraph and is a no-op. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + body = root.find("w:body", _NS) + if body is not None: + _strip_mau_04_section_from_body(body) + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +def _strip_mau_04_section_from_body(body: ET.Element) -> None: + """Locate the Mẫu số 04 header paragraph and remove the surrounding section bounded + by the page break before it and the page break (or ````) after it.""" + children = list(body) + + target_idx: int | None = None + for i, child in enumerate(children): + if _local_tag(child) != "p": + continue + compact = _compact_no_inner_spaces(_paragraph_plain_text(child)).strip() + if compact == _MAU_04_HEADER_COMPACT: + target_idx = i + break + + if target_idx is None: + return + + # Walk backwards to find the page-break paragraph that opens the Mẫu số 04 page. + # Bail out (do nothing) if no page break is found, so we never strip into the + # previous section by mistake. + start_idx: int | None = None + for j in range(target_idx - 1, -1, -1): + child = children[j] + if _local_tag(child) == "p" and _paragraph_has_page_break(child): + start_idx = j + break + if start_idx is None: + return + + # Walk forwards from the Mẫu số 04 header until we hit either the next page break + # paragraph (separator into the next section) or a . Stop one element + # before that boundary so the boundary marker itself survives. + end_idx: int = len(children) - 1 + for j in range(target_idx + 1, len(children)): + child = children[j] + if _local_tag(child) == "sectPr": + end_idx = j - 1 + break + if _local_tag(child) == "p" and _paragraph_has_page_break(child): + end_idx = j - 1 + break + + for child in children[start_idx : end_idx + 1]: + body.remove(child) + + +def normalize_subsequent_letterhead_tables(data: bytes) -> bytes: + """Two letterhead tables on later pages (Mẫu số 03 + Bản cam kết) carry the same + three-line block « BỘ Y TẾ », « ĐẠI HỌC Y DƯỢC THÀNH PHỐ HỒ CHÍ MINH » and + « CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM ». First-page-only + :func:`normalize_bo_y_te_header_lines` skips them, so this pass mirrors that + treatment inside the table cells: + + - « BỘ Y TẾ »: strip bold, center. + - University block: insert a soft between the two phrases so the cell + wraps cleanly into two stacked lines; keep bold; force upright and Times New + Roman; center. + - « CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM »: shrink to 9.5pt with condensed + character spacing and center so the phrase fits on a single line within the + narrow right cell (the same trick used for the Mẫu số 02 letterhead table). + + Scoped to tables containing both « BỘ Y TẾ » and the national-header phrase, so + the Mẫu số 02 letterhead table (which uses « ĐƠN VỊ: » instead of « BỘ Y TẾ ») + and unrelated tables are left untouched. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + for tbl in root.findall(".//w:tbl", _NS): + all_text = _table_plain_text(tbl) + if _NATIONAL_HEADER_PHRASE not in all_text: + continue + if _compact_no_inner_spaces(all_text).find(_BO_Y_TE_COMPACT) < 0: + continue + for p in tbl.findall(".//w:p", _NS): + para_text = _paragraph_plain_text(p) + compact = _compact_no_inner_spaces(para_text) + + if compact == _BO_Y_TE_COMPACT: + _ensure_paragraph_centered_no_indent(p) + for r in p.findall("w:r", _NS): + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + _ensure_run_not_bold(r_pr) + _ensure_run_not_italic(r_pr) + _ensure_run_times_new_roman(r_pr) + continue + + if _is_university_letterhead_paragraph(para_text): + _ensure_paragraph_centered_no_indent(p) + _ensure_university_letterhead_visual_split(p) + for r in p.findall("w:r", _NS): + if not _run_has_text_content(r): + continue + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + _ensure_run_bold(r_pr) + _ensure_run_not_italic(r_pr) + _ensure_run_times_new_roman(r_pr) + continue + + if _NATIONAL_HEADER_PHRASE in para_text: + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + p_pr = ET.SubElement(p, f"{{{_W_NS}}}pPr") + ind = p_pr.find("w:ind", _NS) + if ind is not None: + p_pr.remove(ind) + jc = p_pr.find("w:jc", _NS) + if jc is None: + jc = ET.SubElement(p_pr, f"{{{_W_NS}}}jc") + jc.set(f"{{{_W_NS}}}val", "center") + for r in p.findall("w:r", _NS): + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + sz = r_pr.find("w:sz", _NS) + if sz is None: + sz = ET.SubElement(r_pr, f"{{{_W_NS}}}sz") + sz.set(f"{{{_W_NS}}}val", "19") # 9.5pt + sz_cs = r_pr.find("w:szCs", _NS) + if sz_cs is None: + sz_cs = ET.SubElement(r_pr, f"{{{_W_NS}}}szCs") + sz_cs.set(f"{{{_W_NS}}}val", "19") + spacing = r_pr.find("w:spacing", _NS) + if spacing is None: + spacing = ET.SubElement(r_pr, f"{{{_W_NS}}}spacing") + spacing.set(f"{{{_W_NS}}}val", "-8") + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +def normalize_mau_02_signature_unit_prefix(data: bytes) -> bytes: + """In the Mẫu số 02 sign-off table (« Xác nhận của lãnh đạo / Tác giả sáng kiến »), + drop the literal « Đơn vị » word that the template prints before + ``{{ mau_02.don_vi }}`` so the rendered cell shows only the resolved unit name. + Forces the unit-name run bold and non-italic. + + Scoped to tables containing « Xác nhận của lãnh đạo », so the Mẫu số 03 column + header « Đơn vị công tác » and the trang bìa label « Đơn vị công tác: » remain + untouched. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + for tbl in root.findall(".//w:tbl", _NS): + all_text = _table_plain_text(tbl) + if _SIGNATURE_LEADER_PHRASE not in all_text: + continue + if _DON_VI_SIGNATURE_PREFIX not in all_text: + continue + for p in tbl.findall(".//w:p", _NS): + para_text = _paragraph_plain_text(p) + if not para_text.lstrip().startswith(_DON_VI_SIGNATURE_PREFIX): + continue + # Skip the header-style label « Đơn vị công tác » used elsewhere. + if "công tác" in para_text: + continue + if not _strip_text_prefix_from_paragraph(p, _DON_VI_SIGNATURE_PREFIX): + continue + for r in p.findall("w:r", _NS): + if not _run_has_text_content(r): + continue + r_pr = r.find("w:rPr", _NS) + if r_pr is None: + r_pr = ET.SubElement(r, f"{{{_W_NS}}}rPr") + _ensure_run_bold(r_pr) + _ensure_run_not_italic(r_pr) + _ensure_run_times_new_roman(r_pr) + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +def normalize_mau_02_body_alignment_spacing(data: bytes) -> bytes: + """Normalize top-level paragraph alignment/spacing across the whole Mẫu số 02 section. + + The reference style is the compact, single-spaced body layout used in the sample + screenshot: headings keep their visual alignment, while regular body lines share + the same no-gap spacing so the page density remains consistent. + + Scope: + - Only `word/document.xml` + - Only top-level body paragraphs between `Mẫu số 02` and `Mẫu số 03` + - Table cell paragraphs are intentionally untouched + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + body = root.find("w:body", _NS) + if body is not None: + children = list(body) + start_idx = None + end_idx = None + for idx, child in enumerate(children): + if _local_tag(child) != "p": + continue + text = _paragraph_plain_text(child).strip() + if start_idx is None and "Mẫu số 02" in text: + start_idx = idx + continue + if start_idx is not None and "Mẫu số 03" in text: + end_idx = idx + break + if start_idx is not None: + if end_idx is None: + end_idx = len(children) + for child in children[start_idx:end_idx]: + if _local_tag(child) != "p": + continue + p = child + text = _paragraph_plain_text(p).strip() + if not text: + continue + + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + p_pr = ET.SubElement(p, f"{{{_W_NS}}}pPr") + + # Remove manual paragraph indentation so each line follows + # one consistent body column. + ind = p_pr.find("w:ind", _NS) + if ind is not None: + p_pr.remove(ind) + + spacing = p_pr.find("w:spacing", _NS) + if spacing is None: + spacing = ET.SubElement(p_pr, f"{{{_W_NS}}}spacing") + spacing.set(f"{{{_W_NS}}}before", "0") + spacing.set(f"{{{_W_NS}}}after", "0") + spacing.set(f"{{{_W_NS}}}line", "240") + spacing.set(f"{{{_W_NS}}}lineRule", "auto") + + jc = p_pr.find("w:jc", _NS) + if jc is None: + jc = ET.SubElement(p_pr, f"{{{_W_NS}}}jc") + + if "Mẫu số 02" in text: + jc.set(f"{{{_W_NS}}}val", "right") + spacing.set(f"{{{_W_NS}}}before", "120") + elif "ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN" in text: + jc.set(f"{{{_W_NS}}}val", "center") + spacing.set(f"{{{_W_NS}}}before", "80") + spacing.set(f"{{{_W_NS}}}after", "80") + elif text.startswith("Kính gửi:"): + jc.set(f"{{{_W_NS}}}val", "center") + elif text.startswith("TP. Hồ Chí Minh, ngày"): + jc.set(f"{{{_W_NS}}}val", "right") + spacing.set(f"{{{_W_NS}}}before", "80") + else: + jc.set(f"{{{_W_NS}}}val", "left") + + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +_SIGNATURE_DATE_PREFIX = "Tp. Hồ Chí Minh, ngày" + + +def _signature_table_date_already_lifted(tbl: ET.Element, col_count: int) -> bool: + """True when the table already starts with a single-cell gridSpan row hosting the date.""" + type_val_attr = f"{{{_W_NS}}}val" + for row in tbl.findall("w:tr", _NS): + cells = row.findall("w:tc", _NS) + if len(cells) != 1: + return False + tc_pr = cells[0].find("w:tcPr", _NS) + if tc_pr is None: + return False + grid_span = tc_pr.find("w:gridSpan", _NS) + if grid_span is None or grid_span.attrib.get(type_val_attr) != str(col_count): + return False + for p in cells[0].findall("w:p", _NS): + if _SIGNATURE_DATE_PREFIX in _paragraph_plain_text(p): + return True + return False + return False + + +def move_signature_date_to_top_row(data: bytes) -> bytes: + """Reflow each signature table so its «Tp. Hồ Chí Minh, ngày … tháng … năm …» paragraph + sits in its own row at the top of the table, hosted in a single cell that spans every + column. + + Two effects observable in the rendered PDF: + + - The date paragraph occupies a cell as wide as the table, so it fits on one visual line + (it was wrapping at half-cell width). + - The original row's other content rises one line, putting the signing-role labels + « LÃNH ĐẠO ĐƠN VỊ » and « Tác giả sáng kiến » on the same visual line. + + Idempotent: a second pass detects the new top row and is a no-op. Tables that do not + contain the date prefix are left untouched. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/document.xml": + root = ET.fromstring(raw) + for tbl in root.findall(".//w:tbl", _NS): + grid = tbl.find("w:tblGrid", _NS) + if grid is None: + continue + col_count = len(grid.findall("w:gridCol", _NS)) + if col_count < 2: + continue + if _signature_table_date_already_lifted(tbl, col_count): + continue + + hit = None + for cell in tbl.findall(".//w:tc", _NS): + for p in cell.findall("w:p", _NS): + if _SIGNATURE_DATE_PREFIX in _paragraph_plain_text(p): + hit = (cell, p) + break + if hit: + break + if not hit: + continue + + cell, p = hit + cell.remove(p) + if cell.find("w:p", _NS) is None: + ET.SubElement(cell, f"{{{_W_NS}}}p") + + # Right-align the lifted paragraph and drop any negative indent so + # the full row width is available. + p_pr = p.find("w:pPr", _NS) + if p_pr is None: + p_pr = ET.Element(f"{{{_W_NS}}}pPr") + p.insert(0, p_pr) + ind = p_pr.find("w:ind", _NS) + if ind is not None: + p_pr.remove(ind) + jc = p_pr.find("w:jc", _NS) + if jc is None: + jc = ET.SubElement(p_pr, f"{{{_W_NS}}}jc") + jc.set(f"{{{_W_NS}}}val", "right") + + new_row = ET.Element(f"{{{_W_NS}}}tr") + new_cell = ET.SubElement(new_row, f"{{{_W_NS}}}tc") + new_tc_pr = ET.SubElement(new_cell, f"{{{_W_NS}}}tcPr") + grid_span = ET.SubElement(new_tc_pr, f"{{{_W_NS}}}gridSpan") + grid_span.set(f"{{{_W_NS}}}val", str(col_count)) + # Preserve the borderless look of the surrounding signature cells. + tc_borders = ET.SubElement(new_tc_pr, f"{{{_W_NS}}}tcBorders") + for side in ("top", "left", "bottom", "right"): + b = ET.SubElement(tc_borders, f"{{{_W_NS}}}{side}") + b.set(f"{{{_W_NS}}}val", "nil") + new_cell.append(p) + + first_row_index = None + for idx, child in enumerate(list(tbl)): + if _local_tag(child) == "tr": + first_row_index = idx + break + if first_row_index is None: + tbl.append(new_row) + else: + tbl.insert(first_row_index, new_row) + + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() + + +def force_times_new_roman_in_styles_docx(data: bytes) -> bytes: + """ + Built-in paragraph styles still reference Calibri in `word/styles.xml`; use Times New Roman + and set docDefaults `rFonts` so inherited runs match when exporting to PDF. + """ + inp = io.BytesIO(data) + out = io.BytesIO() + with zipfile.ZipFile(inp, "r") as zin: + with zipfile.ZipFile(out, "w", compression=zipfile.ZIP_DEFLATED) as zout: + for info in zin.infolist(): + raw = zin.read(info.filename) + if info.filename == "word/styles.xml": + root = ET.fromstring(raw) + for fonts in root.findall(".//w:rFonts", _NS): + for key in list(fonts.attrib.keys()): + if key.endswith("}ascii") or key.endswith( + "}hAnsi" + ) or key.endswith("}cs") or key.endswith("}eastAsia"): + fonts.attrib[key] = "Times New Roman" + rpr_default = root.find("./w:docDefaults/w:rPrDefault/w:rPr", _NS) + if rpr_default is not None: + fonts = rpr_default.find("w:rFonts", _NS) + if fonts is None: + fonts = ET.SubElement(rpr_default, f"{{{_W_NS}}}rFonts") + tnr = "Times New Roman" + fonts.set(f"{{{_W_NS}}}ascii", tnr) + fonts.set(f"{{{_W_NS}}}hAnsi", tnr) + fonts.set(f"{{{_W_NS}}}cs", tnr) + fonts.set(f"{{{_W_NS}}}eastAsia", tnr) + raw = ET.tostring(root, encoding="utf-8", xml_declaration=True) + zout.writestr(info, raw) + return out.getvalue() diff --git a/be0/src/be01/docx_to_pdf.py b/be0/src/be01/docx_to_pdf.py new file mode 100644 index 0000000..0d16b71 --- /dev/null +++ b/be0/src/be01/docx_to_pdf.py @@ -0,0 +1,96 @@ +"""Convert DOCX bytes to PDF using headless LibreOffice (layout matches Word export).""" + +from __future__ import annotations + +import os +import shutil +import subprocess +import tempfile +from pathlib import Path + +from src.be01.docx_normalize import ( + relax_justified_softbreak_paragraphs_in_docx, + strip_table_row_height_rules_from_docx, +) + + +def resolve_libreoffice_soffice() -> str: + """Return path to `soffice`/`libreoffice` binary.""" + env = (os.environ.get("LIBREOFFICE_PATH") or "").strip() + if env: + p = Path(env) + if p.is_file(): + return str(p.resolve()) + w = shutil.which(env) + if w: + return w + for name in ("soffice", "libreoffice"): + found = shutil.which(name) + if found: + return found + mac = Path("/Applications/LibreOffice.app/Contents/MacOS/soffice") + if mac.is_file(): + return str(mac) + raise FileNotFoundError( + "Không tìm thấy LibreOffice (soffice). Cài đặt libreoffice-writer-nogui hoặc đặt LIBREOFFICE_PATH." + ) + + +def convert_docx_bytes_to_pdf( + docx_bytes: bytes, + *, + timeout_sec: float = 120.0, + relax_justified_softbreaks: bool = True, + strip_table_row_heights: bool = False, +) -> bytes: + """ + Write DOCX to a temp file, run headless LibreOffice, read resulting PDF. + + Uses the same rendering stack as manual « Save as PDF » in LibreOffice (≈ Word layout). + + ``relax_justified_softbreaks`` defaults to ``True`` to run + :func:`~src.be01.docx_normalize.relax_justified_softbreak_paragraphs_in_docx` + (``distribute`` → ``both`` in document/styles/headers, split soft breaks in the body, + ``doNotExpandShiftReturn`` in settings) so LibreOffice/Word-like renderers avoid absurd + inter-word gaps while keeping justified paragraphs. + """ + if not docx_bytes or len(docx_bytes) < 100: + raise ValueError("DOCX payload quá nhỏ hoặc rỗng.") + + soffice = resolve_libreoffice_soffice() + if strip_table_row_heights: + docx_bytes = strip_table_row_height_rules_from_docx(docx_bytes) + if relax_justified_softbreaks: + docx_bytes = relax_justified_softbreak_paragraphs_in_docx(docx_bytes) + + with tempfile.TemporaryDirectory(prefix="docx2pdf-") as td: + td_path = Path(td) + docx_path = td_path / "document.docx" + docx_path.write_bytes(docx_bytes) + home = td_path / ".lo_home" + home.mkdir(parents=True, exist_ok=True) + cmd = [ + soffice, + "--headless", + "--nologo", + "--nodefault", + "--nofirststartwizard", + "--convert-to", + "pdf", + "--outdir", + str(td_path), + str(docx_path), + ] + env = {**os.environ, "HOME": str(home)} + proc = subprocess.run( + cmd, + capture_output=True, + text=True, + timeout=timeout_sec, + env=env, + ) + pdf_path = td_path / "document.pdf" + if proc.returncode != 0 or not pdf_path.is_file(): + err = (proc.stderr or proc.stdout or "").strip() or f"exit {proc.returncode}" + raise RuntimeError(f"LibreOffice không chuyển được DOCX sang PDF: {err}") + return pdf_path.read_bytes() diff --git a/be0/src/be01/export_applications_list_xlsx.py b/be0/src/be01/export_applications_list_xlsx.py new file mode 100644 index 0000000..ddec186 --- /dev/null +++ b/be0/src/be01/export_applications_list_xlsx.py @@ -0,0 +1,90 @@ +"""Excel export for admin danh sách sáng kiến (TT / MSSK: «YYYY-n» theo năm hồ sơ).""" + +from __future__ import annotations + +from collections import defaultdict +from datetime import datetime, timezone +from io import BytesIO +from typing import Any, Dict, List, Tuple + +import pandas as pd + + +def _calendar_year(row: Dict[str, Any]) -> int: + cy = row.get("calendarYear") + if isinstance(cy, int) and cy > 1900: + return cy + sd = str(row.get("submittedDate") or "") + if len(sd) >= 4 and sd[:4].isdigit(): + return int(sd[:4]) + return datetime.now(timezone.utc).year + + +def _merit_xep_loai(row: Dict[str, Any]) -> str: + """Aligned with fe0 «Đã duyệt»: 2.1.1 / 2.1.2 và Xuất sắc — Sách giáo trình → Xuất sắc; otherwise approved → Khá.""" + if str(row.get("status") or "") != "approved": + return "" + ic = row.get("initiativeClassification") + rek = str(row.get("researchEvidenceKind") or row.get("research_evidence_kind") or "") + tek = str(row.get("textbookEvidenceKind") or row.get("textbook_evidence_kind") or "") + if ic == "research" and rek in ("international", "domestic"): + return "Xuất sắc" + if ic == "textbook" and tek == "book": + return "Xuất sắc" + if ic in ("research", "textbook", "technical"): + return "Khá" + return "Khá" + + +def _authors_cell(row: Dict[str, Any], payload: Dict[str, Any]) -> str: + tabs = payload.get("tabs") if isinstance(payload.get("tabs"), dict) else {} + app_tab = tabs.get("application") if isinstance(tabs.get("application"), dict) else {} + authors = app_tab.get("authors") + if isinstance(authors, list) and authors: + names: List[str] = [] + for a in authors: + if isinstance(a, dict): + n = str(a.get("name") or "").strip() + if n: + names.append(n) + if names: + return ", ".join(names) + author = row.get("author") or {} + single = str((author.get("name") if isinstance(author, dict) else "") or "").strip() + return single or "—" + + +def _year_stt_codes(pairs: List[Tuple[Dict[str, Any], Dict[str, Any]]]) -> List[str]: + """Per calendar year, running index 1..n → «2026-1», «2026-2», …""" + by_year: Dict[int, int] = defaultdict(int) + out: List[str] = [] + for row, _ in pairs: + y = _calendar_year(row) + by_year[y] += 1 + out.append(f"{y}-{by_year[y]}") + return out + + +def build_applications_list_xlsx(pairs: List[Tuple[Dict[str, Any], Dict[str, Any]]]) -> bytes: + """ + One sheet, headers: TT, MSSK, Tên sáng kiến, Tác giả, Xếp loại. + TT and MSSK both use «YYYY-n» (n resets each calendar year, export order). + """ + codes = _year_stt_codes(pairs) + rows_out: List[Dict[str, Any]] = [] + for i, (row, payload) in enumerate(pairs): + code = codes[i] + rows_out.append( + { + "TT": code, + "MSSK": code, + "Tên sáng kiến": str(row.get("name") or ""), + "Tác giả": _authors_cell(row, payload), + "Xếp loại": _merit_xep_loai(row), + } + ) + df = pd.DataFrame(rows_out) + buf = BytesIO() + with pd.ExcelWriter(buf, engine="openpyxl") as writer: + df.to_excel(writer, index=False, sheet_name="Danh sách") + return buf.getvalue() diff --git a/be0/src/be01/fill_application_form.py b/be0/src/be01/fill_application_form.py new file mode 100644 index 0000000..4b689d0 --- /dev/null +++ b/be0/src/be01/fill_application_form.py @@ -0,0 +1,121 @@ +"""Fill `fe0/public/assets/template_application_form.docx` (docxtpl) from be01 `data_blank` context.""" + +from __future__ import annotations + +import io +import os +import re +import zipfile +from pathlib import Path +from typing import Any, Dict + + +def get_application_template_path() -> Path: + """Resolve the Word file used by the applicant « application form » preview.""" + env = (os.environ.get("TEMPLATE_APPLICATION_FORM_DOCX") or "").strip() + if env: + p = Path(env) + if p.is_file(): + return p + # be0/src/be01/thisfile → repo root = parents[3] (be01, src, be0, workspace) + # be0/src/be01/thisfile → parents[3] = monorepo root (…/be0 → …/src → …/be0 → project root) + here = Path(__file__).resolve() + root = here.parents[3] + candidate = root / "fe0" / "public" / "assets" / "template_application_form.docx" + # In Docker, project root is often not mounted; set TEMPLATE_APPLICATION_FORM_DOCX (see docker-compose). + if candidate.is_file(): + return candidate + raise FileNotFoundError( + f"Không tìm thấy template_application_form.docx (đã tìm {candidate}. " + "Đặt biến môi trường TEMPLATE_APPLICATION_FORM_DOCX tới file .docx hợp lệ.)" + ) + + +def fill_application_form_docx(context: Dict[str, Any]) -> bytes: + """Run docxtpl render in memory; `context` is the be01 / `data_blank.json` object tree.""" + from docxtpl import DocxTemplate + from jinja2.exceptions import TemplateSyntaxError + from src.be01.docx_normalize import ( + collapse_empty_page_break_paragraphs_in_docx, + force_times_new_roman_in_styles_docx, + move_signature_date_to_top_row, + normalize_bo_y_te_header_lines, + normalize_mau_02_body_alignment_spacing, + normalize_mau_02_letterhead_table, + normalize_mau_02_signature_unit_prefix, + normalize_subsequent_letterhead_tables, + relax_justified_softbreak_paragraphs_in_docx, + shrink_overflow_sensitive_text_half_point, + shift_selected_header_lines_left, + strip_mau_04_evaluation_section_in_docx, + strip_table_row_height_rules_from_docx, + style_national_header_line, + ) + + def _rewrite_structural_docxtpl_tags(template_bytes: bytes) -> bytes: + """ + Convert docxtpl structural tags (`{%tr ... %}`, `{%tc ... %}`, etc.) to + plain Jinja tags so rendering can proceed when control tags were placed + in paragraphs instead of the expected OOXML container. + """ + marker = re.compile(r"\{%\s*(tr|tc|p|r)\s+", flags=re.IGNORECASE) + in_buf = io.BytesIO(template_bytes) + out_buf = io.BytesIO() + with zipfile.ZipFile(in_buf, "r") as zin, zipfile.ZipFile(out_buf, "w") as zout: + for info in zin.infolist(): + data = zin.read(info.filename) + if info.filename.endswith(".xml"): + text = data.decode("utf-8", errors="ignore") + text = marker.sub("{% ", text) + data = text.encode("utf-8") + zout.writestr(info, data) + return out_buf.getvalue() + + template_path = get_application_template_path() + doc = DocxTemplate(str(template_path)) + try: + doc.render(context) + except TemplateSyntaxError as exc: + msg = str(exc).lower() + if not any(f"unknown tag '{tag}'" in msg for tag in ("tr", "tc", "p", "r")): + raise + patched_template = _rewrite_structural_docxtpl_tags(template_path.read_bytes()) + doc = DocxTemplate(io.BytesIO(patched_template)) + doc.render(context) + buf = io.BytesIO() + doc.docx.save(buf) + raw = buf.getvalue() + # Mẫu số 04 (« PHIẾU ĐÁNH GIÁ SÁNG KIẾN ») is filled by council members in the + # dedicated council UI (InitiativeEvaluationForm.tsx under /council). The applicant + # submission package must not embed a blank scorecard, so strip the section out of + # the rendered DOCX before any other normalization pass touches its letterhead. + raw = strip_mau_04_evaluation_section_in_docx(raw) + # docx-preview (the browser-side viewer) renders an empty paragraph that only hosts + # a as occupying its own blank page when the next sibling is a + # table — which the rendered application form does between sections (e.g. between + # Mẫu số 01 sig table and Mẫu số 02 letterhead table). Collapse those page-break + # paragraphs into on the next content so docx-preview no longer + # allocates a blank page in between. LibreOffice/Word are unaffected since they + # treat both encodings the same way at render time. + raw = collapse_empty_page_break_paragraphs_in_docx(raw) + raw = shrink_overflow_sensitive_text_half_point(raw) + raw = style_national_header_line(raw) + raw = shift_selected_header_lines_left(raw) + raw = normalize_bo_y_te_header_lines(raw) + raw = normalize_mau_02_body_alignment_spacing(raw) + raw = normalize_mau_02_letterhead_table(raw) + raw = normalize_mau_02_signature_unit_prefix(raw) + raw = normalize_subsequent_letterhead_tables(raw) + raw = move_signature_date_to_top_row(raw) + raw = force_times_new_roman_in_styles_docx(raw) + # Justified paragraphs that contain soft line breaks (or distribute alignment) stretch + # short trailing lines to fill the column, producing gaps of many spaces between every + # word. Split such paragraphs at soft breaks (keeping justification, both -> last line + # naturally unstretched) and rewrite distribute -> both so word spacing stays compact. + raw = relax_justified_softbreak_paragraphs_in_docx(raw) + + # Strict-fidelity default: do not mutate layout-critical OOXML after render. + # This normalization is opt-in via env flag for specific browser fallback issues. + if (os.environ.get("DOCX_STRIP_TABLE_ROW_HEIGHTS") or "").strip() in {"1", "true", "TRUE", "yes", "YES"}: + raw = strip_table_row_height_rules_from_docx(raw) + return raw diff --git a/be0/src/be01/fill_template.py b/be0/src/be01/fill_template.py new file mode 100644 index 0000000..77f575a --- /dev/null +++ b/be0/src/be01/fill_template.py @@ -0,0 +1,45 @@ +""" +Fill the template DOCX with data from a JSON file. + +Usage: + python fill_template.py [] + +Example: + python fill_template.py data_sop.json output.docx + python fill_template.py data_blank.json blank_output.docx + +Requirements: + pip install docxtpl +""" +import json +import sys +from pathlib import Path +from docxtpl import DocxTemplate + +TEMPLATE = Path(__file__).parent / "template_sang_kien.docx" + + +def fill(json_path: str, output_path: str = "filled.docx"): + # Load data + with open(json_path, "r", encoding="utf-8") as f: + context = json.load(f) + + # Convert \n in strings to real line breaks that Word will display. + # docxtpl renders \n as a soft line break only if we pre-process or use its + # Listing helper. The simplest workaround: leave \n as-is — Word renders + # most of them; for safety, we replace with explicit line-break tags. + # (docxtpl 0.16+ supports newlines if passed through Listing or RichText.) + + tpl = DocxTemplate(str(TEMPLATE)) + tpl.render(context) + tpl.save(output_path) + print(f"✓ Filled document saved: {output_path}") + + +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python fill_template.py []") + sys.exit(1) + json_file = sys.argv[1] + out_file = sys.argv[2] if len(sys.argv) > 2 else "filled.docx" + fill(json_file, out_file) diff --git a/be0/src/be01/filled_sop.docx b/be0/src/be01/filled_sop.docx new file mode 100644 index 0000000..3ad49aa Binary files /dev/null and b/be0/src/be01/filled_sop.docx differ diff --git a/be0/src/be01/official_to_data_blank.py b/be0/src/be01/official_to_data_blank.py new file mode 100644 index 0000000..2a8942b --- /dev/null +++ b/be0/src/be01/official_to_data_blank.py @@ -0,0 +1,414 @@ +"""Convert official Vietnamese-key ReviewPanel JSON into be01 `data_blank.json` shape.""" + +from __future__ import annotations + +import copy +import json +from pathlib import Path +from typing import Any, Dict + +_DATA_BLANK_PATH = Path(__file__).parent / "data_blank.json" + + +def _s(v: Any) -> str: + if v is None: + return "" + return str(v).strip() + + +def _resolve_don_vi_cong_tac(official: Dict[str, Any]) -> str: + """ + TRANG BÌA / Mẫu 02 « Đơn vị » for docxtpl (`trang_bia.don_vi`, `mau_02.don_vi`). + + Aligns with fe0 `resolveDonViCongTacDisplay`: explicit cover/Mẫu02 fields first, then + first author « Nơi công tác », then Mẫu 03 « Chức vụ, đơn vị », then first Mẫu 03 + table row. Older rows in DB may have empty « Đơn vị công tác » but filled author + workplace — this fills the Word context without re-saving from the SPA. + """ + bia = official.get("TRANG BÌA") if isinstance(official.get("TRANG BÌA"), dict) else {} + m02 = official.get("MẪU SỐ 02 - ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN") + m02 = m02 if isinstance(m02, dict) else {} + m03 = official.get("MẪU SỐ 03 - BẢN XÁC NHẬN TỶ LỆ (%) ĐÓNG GÓP VÀO VIỆC TẠO RA SÁNG KIẾN") + m03 = m03 if isinstance(m03, dict) else {} + + for val in (_s(bia.get("Đơn vị công tác")), _s(m02.get("Đơn vị"))): + if val: + return val + + dstg = m02.get("Danh sách tác giả") + if isinstance(dstg, list) and dstg: + row0 = dstg[0] + if isinstance(row0, dict): + noi = _s(row0.get("Nơi công tác")) + if noi: + return noi + + chuc = _s(m03.get("Chức vụ, đơn vị công tác")) + if chuc: + return chuc + + tyle = m03.get("Tỷ lệ đóng góp") + if isinstance(tyle, list): + for row in tyle: + if isinstance(row, dict): + dv = _s(row.get("Đơn vị công tác")) + if dv: + return dv + return "" + + +def _blank() -> Dict[str, Any]: + with open(_DATA_BLANK_PATH, "r", encoding="utf-8") as handle: + return json.load(handle) + + +def official_to_data_blank(official: Dict[str, Any]) -> Dict[str, Any]: + out = copy.deepcopy(_blank()) + don_vi_resolved = _resolve_don_vi_cong_tac(official) + + bia = official.get("TRANG BÌA") if isinstance(official.get("TRANG BÌA"), dict) else {} + out["trang_bia"]["ten_sang_kien"] = str(bia.get("Tên sáng kiến (Tiếng Việt)") or "") + out["trang_bia"]["tac_gia"] = str(bia.get("Tác giả/nhóm tác giả sáng kiến") or "") + out["trang_bia"]["don_vi"] = don_vi_resolved + out["trang_bia"]["thong_tin_lien_he"] = str(bia.get("Thông tin liên hệ (Điện thoại, Email)") or "") + out["trang_bia"]["nam"] = str(bia.get("Năm") or "") + + m01 = official.get("MẪU SỐ 01 - BÁO CÁO MÔ TẢ SÁNG KIẾN") + if isinstance(m01, dict): + out["mau_01"]["mo_dau"] = str(m01.get("1. Mở đầu") or "") + out["mau_01"]["ten_sang_kien"] = str( + m01.get("2. Tên sáng kiến (tên quy trình, giải pháp, phương pháp)") or "" + ) + out["mau_01"]["linh_vuc_ap_dung"] = str(m01.get("3. Lĩnh vực áp dụng của sáng kiến") or "") + m4 = m01.get("4. Mô tả sáng kiến") if isinstance(m01.get("4. Mô tả sáng kiến"), dict) else {} + out["mau_01"]["tinh_trang_da_biet"] = str( + m4.get("4.1 Tình trạng giải pháp đã biết hoặc hiện trạng công tác khi chưa có sáng kiến") or "" + ) + inner = ( + m4.get("4.2 Nội dung giải pháp đề nghị công nhận là sáng kiến") + if isinstance(m4.get("4.2 Nội dung giải pháp đề nghị công nhận là sáng kiến"), dict) + else {} + ) + out["mau_01"]["muc_dich"] = str(inner.get("Mục đích của sáng kiến") or "") + nd = inner.get("Về nội dung của sáng kiến") if isinstance(inner.get("Về nội dung của sáng kiến"), dict) else {} + out["mau_01"]["cac_buoc_thuc_hien"] = str(nd.get("Các bước thực hiện giải pháp") or "") + out["mau_01"]["dieu_kien_ap_dung"] = str(nd.get("Các điều kiện cần thiết để áp dụng giải pháp") or "") + out["mau_01"]["linh_vuc_ap_dung_2"] = str(nd.get("Lĩnh vực áp dụng") or "") + out["mau_01"]["ket_qua_thu_duoc"] = str(nd.get("Kết quả thu được") or "") + ds = nd.get("Danh sách đơn vị/cá nhân đã tham gia áp dụng thử hoặc lần đầu") + if isinstance(ds, list) and ds: + out["mau_01"]["danh_sach_ap_dung"] = [ + { + "tt": str(x.get("TT") or ""), + "ten_to_chuc": str(x.get("Tên tổ chức/cá nhân") or ""), + "dia_chi": str(x.get("Địa chỉ") or ""), + "linh_vuc": str(x.get("Lĩnh vực áp dụng sáng kiến") or ""), + } + for x in ds + if isinstance(x, dict) + ] or out["mau_01"]["danh_sach_ap_dung"] + out["mau_01"]["tinh_moi"] = str(inner.get("Về tính mới của sáng kiến") or "") + thq = inner.get("Về tính hiệu quả") if isinstance(inner.get("Về tính hiệu quả"), dict) else {} + out["mau_01"]["tinh_hieu_qua"]["loi_ich_kinh_te"] = str(thq.get("Tạo ra lợi ích kinh tế") or "") + out["mau_01"]["tinh_hieu_qua"]["hieu_qua_giang_day"] = str(thq.get("Đem lại hiệu quả trong giảng dạy") or "") + out["mau_01"]["tinh_hieu_qua"]["tang_nang_suat"] = str(thq.get("Tăng năng suất lao động") or "") + out["mau_01"]["tinh_hieu_qua"]["nang_cao_hieu_qua"] = str(thq.get("Nâng cao hiệu quả công việc") or "") + out["mau_01"]["tinh_hieu_qua"]["nang_cao_chat_luong"] = str( + thq.get("Nâng cao chất lượng công việc, dịch vụ") or "" + ) + out["mau_01"]["tinh_hieu_qua"]["giam_chi_phi"] = str(thq.get("Giảm chi phí") or "") + out["mau_01"]["tinh_hieu_qua"]["cai_thien_moi_truong"] = str( + thq.get("Cải thiện môi trường, điều kiện học tập, làm việc, sống") or "" + ) + out["mau_01"]["tinh_hieu_qua"]["bao_ve_suc_khoe"] = str(thq.get("Bảo vệ sức khỏe") or "") + out["mau_01"]["tinh_hieu_qua"]["an_toan_lao_dong"] = str(thq.get("Đảm bảo an toàn lao động, PCCC") or "") + out["mau_01"]["tinh_hieu_qua"]["nang_cao_nhan_thuc"] = str( + thq.get("Nâng cao khả năng, trình độ, nhận thức, trách nhiệm") or "" + ) + out["mau_01"]["thong_tin_bao_mat"] = str(m01.get("6. Những thông tin cần được bảo mật (nếu có)") or "") + nk = m01.get("Ngày ký") if isinstance(m01.get("Ngày ký"), dict) else {} + out["mau_01"]["ngay_ky"]["ngay"] = str(nk.get("Ngày") or "") + out["mau_01"]["ngay_ky"]["thang"] = str(nk.get("Tháng") or "") + out["mau_01"]["ngay_ky"]["nam"] = str(nk.get("Năm") or "") + out["mau_01"]["lanh_dao_don_vi"] = str(m01.get("Lãnh đạo đơn vị (Ký, ghi rõ họ tên)") or "") + out["mau_01"]["tac_gia_sang_kien"] = str(m01.get("Tác giả sáng kiến (Ký, ghi rõ họ tên)") or "") + + m02 = official.get("MẪU SỐ 02 - ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN") + if isinstance(m02, dict): + out["mau_02"]["don_vi"] = don_vi_resolved + dstg = m02.get("Danh sách tác giả") + if isinstance(dstg, list) and dstg: + out["mau_02"]["danh_sach_tac_gia"] = [ + { + "stt": str(x.get("STT") or ""), + "ho_ten": str(x.get("Họ và tên") or ""), + "ngay_sinh": str(x.get("Ngày tháng năm sinh") or ""), + "noi_cong_tac": str(x.get("Nơi công tác") or ""), + "chuc_danh": str(x.get("Chức danh") or ""), + "trinh_do": str(x.get("Trình độ chuyên môn") or ""), + "ty_le": str(x.get("Tỷ lệ (%) đóng góp vào việc tạo ra sáng kiến") or ""), + } + for x in dstg + if isinstance(x, dict) + ] or out["mau_02"]["danh_sach_tac_gia"] + out["mau_02"]["ten_sang_kien"] = str(m02.get("Tên sáng kiến đề nghị xét công nhận") or "") + out["mau_02"]["chu_dau_tu"] = str(m02.get("Chủ đầu tư tạo ra sáng kiến") or "") + out["mau_02"]["linh_vuc_ap_dung"] = str(m02.get("Lĩnh vực áp dụng sáng kiến") or "") + out["mau_02"]["ngay_ap_dung"] = str(m02.get("Ngày sáng kiến được áp dụng") or "") + out["mau_02"]["noi_dung"] = str(m02.get("Nội dung của sáng kiến") or "") + pl = m02.get("Phân loại sáng kiến (đánh dấu ☑)") + if isinstance(pl, dict): + k1 = "Giải pháp kỹ thuật, quản lý, tác nghiệp, ứng dụng tiến bộ kỹ thuật áp dụng cho ĐHYD TP.HCM" + k2 = "Sáng kiến – cải tiến kỹ thuật từ các nghiên cứu khoa học có kết quả được đăng tải trên các tạp chí, hội nghị trong nước và quốc tế" + k3 = "Sáng kiến – cải tiến kỹ thuật từ sách, giáo trình, tài liệu tham khảo" + out["mau_02"]["phan_loai"]["giai_phap_ky_thuat"] = bool(pl.get(k1)) + out["mau_02"]["phan_loai"]["sang_kien_tu_nckh"] = bool(pl.get(k2)) + out["mau_02"]["phan_loai"]["sang_kien_tu_sach"] = bool(pl.get(k3)) + out["mau_02"]["thong_tin_bao_mat"] = str(m02.get("Những thông tin cần được bảo mật (nếu có)") or "") + out["mau_02"]["dieu_kien_ap_dung"] = str(m02.get("Các điều kiện cần thiết để áp dụng sáng kiến") or "") + out["mau_02"]["danh_gia_tac_gia"] = str( + m02.get("Đánh giá lợi ích theo ý kiến của tác giả") or "" + ) + out["mau_02"]["danh_gia_to_chuc"] = str( + m02.get("Đánh giá lợi ích theo ý kiến của tổ chức, cá nhân đã tham gia áp dụng sáng kiến lần đầu") + or "" + ) + dsg = m02.get("Danh sách những người đã tham gia áp dụng thử hoặc áp dụng sáng kiến lần đầu") + if isinstance(dsg, list) and dsg: + out["mau_02"]["danh_sach_tham_gia"] = [ + { + "stt": str(x.get("Số TT") or ""), + "ho_ten": str(x.get("Họ và tên") or ""), + "ngay_sinh": str(x.get("Ngày tháng năm sinh") or ""), + "noi_cong_tac": str(x.get("Nơi công tác") or ""), + "chuc_danh": str(x.get("Chức danh") or ""), + "trinh_do": str(x.get("Trình độ chuyên môn") or ""), + "noi_dung_ho_tro": str(x.get("Nội dung công việc hỗ trợ") or ""), + } + for x in dsg + if isinstance(x, dict) + ] or out["mau_02"]["danh_sach_tham_gia"] + m02nk = m02.get("Ngày ký") if isinstance(m02.get("Ngày ký"), dict) else {} + out["mau_02"]["ngay_ky"]["ngay"] = str(m02nk.get("Ngày") or "") + out["mau_02"]["ngay_ky"]["thang"] = str(m02nk.get("Tháng") or "") + out["mau_02"]["ngay_ky"]["nam"] = str(m02nk.get("Năm") or "") + out["mau_02"]["lanh_dao_don_vi"] = str(m02.get("Xác nhận của lãnh đạo Đơn vị") or "") + out["mau_02"]["tac_gia_sang_kien"] = str(m02.get("Tác giả sáng kiến (Ký, ghi rõ họ tên)") or "") + + m03 = official.get("MẪU SỐ 03 - BẢN XÁC NHẬN TỶ LỆ (%) ĐÓNG GÓP VÀO VIỆC TẠO RA SÁNG KIẾN") + if isinstance(m03, dict): + m03nk = m03.get("Ngày ký") if isinstance(m03.get("Ngày ký"), dict) else {} + out["mau_03"]["ngay_ky"]["ngay"] = str(m03nk.get("Ngày") or "") + out["mau_03"]["ngay_ky"]["thang"] = str(m03nk.get("Tháng") or "") + out["mau_03"]["ngay_ky"]["nam"] = str(m03nk.get("Năm") or "") + out["mau_03"]["ten_sang_kien"] = str(m03.get("1. Tên sáng kiến") or "") + out["mau_03"]["tac_gia_chinh"] = str( + m03.get("2. Tác giả chính/Đại diện nhóm tác giả sáng kiến") or "" + ) + out["mau_03"]["chuc_vu_don_vi"] = str(m03.get("Chức vụ, đơn vị công tác") or "") + tyle = m03.get("Tỷ lệ đóng góp") + if isinstance(tyle, list) and tyle: + out["mau_03"]["ty_le_dong_gop"] = [ + { + "stt": str(x.get("STT") or ""), + "ho_ten": str(x.get("Họ và tên") or ""), + "don_vi": str(x.get("Đơn vị công tác") or ""), + "phan_tram": str(x.get("% đóng góp") or ""), + "chu_ky": str(x.get("Chữ ký xác nhận") or ""), + } + for x in tyle + if isinstance(x, dict) + ] or out["mau_03"]["ty_le_dong_gop"] + out["mau_03"]["tac_gia_chinh_ky"] = str( + m03.get("Tác giả chính/Đại diện nhóm tác giả sáng kiến (chữ ký và ghi rõ họ tên)") + or "" + ) + + m04 = official.get("MẪU SỐ 04 - PHIẾU ĐÁNH GIÁ SÁNG KIẾN") + if isinstance(m04, dict): + out["mau_04"]["ten_sang_kien"] = str(m04.get("1. Tên sáng kiến") or "") + out["mau_04"]["tac_gia"] = str(m04.get("2. Tác giả/đồng tác giả sáng kiến") or "") + out["mau_04"]["chuc_vu_don_vi"] = str(m04.get("Chức vụ, đơn vị công tác") or "") + ndg = m04.get("3. Nội dung đánh giá") if isinstance(m04.get("3. Nội dung đánh giá"), dict) else {} + tm = ndg.get("Tính mới (Tối đa 40 điểm)") if isinstance(ndg.get("Tính mới (Tối đa 40 điểm)"), dict) else {} + th = ( + ndg.get("Tính hiệu quả (Tối đa 60 điểm)") + if isinstance(ndg.get("Tính hiệu quả (Tối đa 60 điểm)"), dict) + else {} + ) + out["mau_04"]["tinh_moi"]["nhan_xet"] = str(tm.get("Nhận xét") or "") + out["mau_04"]["tinh_moi"]["diem"] = str(tm.get("Điểm chấm") or "") + out["mau_04"]["tinh_hieu_qua"]["nhan_xet"] = str(th.get("Nhận xét") or "") + out["mau_04"]["tinh_hieu_qua"]["diem"] = str(th.get("Điểm chấm") or "") + out["mau_04"]["tong_cong"] = str(ndg.get("Tổng cộng") or "") + out["mau_04"]["ket_luan"] = str(m04.get("Kết luận") or "") + m4nk = m04.get("Ngày ký") if isinstance(m04.get("Ngày ký"), dict) else {} + out["mau_04"]["ngay_ky"]["ngay"] = str(m4nk.get("Ngày") or "") + out["mau_04"]["ngay_ky"]["thang"] = str(m4nk.get("Tháng") or "") + out["mau_04"]["ngay_ky"]["nam"] = str(m4nk.get("Năm") or "") + out["mau_04"]["thanh_vien_hoi_dong"] = str( + m04.get("Thành viên Hội đồng (Ký, ghi rõ họ tên)") or "" + ) + + bck = official.get("BẢN CAM KẾT") + if isinstance(bck, dict): + bnk = bck.get("Ngày ký") if isinstance(bck.get("Ngày ký"), dict) else {} + out["ban_cam_ket"]["ngay_ky"]["ngay"] = str(bnk.get("Ngày") or "") + out["ban_cam_ket"]["ngay_ky"]["thang"] = str(bnk.get("Tháng") or "") + out["ban_cam_ket"]["ngay_ky"]["nam"] = str(bnk.get("Năm") or "") + i1 = bck.get("I. THÔNG TIN CHỦ THỂ CAM KẾT") + if isinstance(i1, dict): + out["ban_cam_ket"]["tac_gia_dang_ky"] = str(i1.get("Tác giả đăng ký sáng kiến") or "") + out["ban_cam_ket"]["cccd"] = str(i1.get("CCCD/Hộ chiếu số") or "") + out["ban_cam_ket"]["don_vi"] = str(i1.get("Đơn vị") or "") + ten_raw = i1.get( + "Tên Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH" + ) or i1.get("Tên Bài báo trong nước/quốc tế") + out["ban_cam_ket"]["ten_bai_bao"] = str(ten_raw or "") + out["ban_cam_ket"]["nam_xet"] = str(i1.get("Năm xét công nhận sáng kiến") or "") + vt = i1.get("Vai trò đối với bài báo (☑ vào ô tương ứng)") + if isinstance(vt, dict): + out["ban_cam_ket"]["vai_tro"]["tac_gia_chinh"] = bool( + vt.get("Tác giả chính Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH") + ) + out["ban_cam_ket"]["vai_tro"]["dong_tac_gia"] = bool( + vt.get("Đồng tác giả Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH") + ) + ii = bck.get("II. CAM KẾT NỘI DUNG (☑ vào ô tương ứng)") + if isinstance(ii, dict): + # Keys must match `bieu_mau_sang_kien_template.json` (numbered subsections). Legacy unnumbered keys kept as fallback. + quyen = ii.get("1. Quyền sở hữu đối với bài báo trong nước/quốc tế") + if not isinstance(quyen, dict): + quyen = ii.get("Quyền sở hữu đối với bài báo trong nước/quốc tế") + if isinstance(quyen, dict): + kq1_full = ( + "Tôi là chủ sở hữu hợp pháp của bài báo hoặc được chủ sở hữu/đồng chủ sở hữu đồng ý cho sử dụng bài báo có tên nêu trên làm sản phẩm đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD" + ) + kq1_short = "Tôi là chủ sở hữu hợp pháp của bài báo hoặc được chủ sở hữu/đồng chủ sở hữu đồng ý cho sử dụng bài báo" + kq2_full = ( + "Trường hợp bài báo là sản phẩm của nhiệm vụ NCKH: chủ sở hữu bài báo (cơ quan) đồng ý cho tác giả/nhóm tác giả sử dụng bài báo có tên nêu trên để đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD" + ) + kq2_short = "Trường hợp bài báo là sản phẩm của nhiệm vụ NCKH: chủ sở hữu bài báo (cơ quan) đồng ý cho tác giả/nhóm tác giả sử dụng" + out["ban_cam_ket"]["cam_ket"]["quyen_so_huu_1"] = bool( + quyen.get(kq1_full) or quyen.get(kq1_short) + ) + out["ban_cam_ket"]["cam_ket"]["quyen_so_huu_2"] = bool( + quyen.get(kq2_full) or quyen.get(kq2_short) + ) + dt = ii.get("2. Đồng thuận của đồng tác giả bài báo trong nước/quốc tế") + if not isinstance(dt, dict): + dt = ii.get("Đồng thuận của đồng tác giả bài báo trong nước/quốc tế") + if isinstance(dt, dict): + kd_full = ( + "Tất cả đồng tác giả đã biết, đồng ý và ký xác nhận cho phép Tác giả đăng ký sáng kiến được sử dụng bài báo có tên nêu trên để đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD" + ) + kd_short = "Tất cả đồng tác giả đã biết, đồng ý và ký xác nhận cho phép Tác giả đăng ký sáng kiến" + out["ban_cam_ket"]["cam_ket"]["dong_thuan"] = bool( + dt.get(kd_full) or dt.get(kd_short) + ) + uy = ii.get("3. Cam kết bài báo trong nước/quốc tế uy tín") + if not isinstance(uy, dict): + uy = ii.get("Cam kết bài báo trong nước/quốc tế uy tín") + if isinstance(uy, dict): + ku_full = ( + "Cá nhân đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD đối với bài báo trong nước/quốc tế cam kết bài báo không thuộc 'Tạp chí săn mồi'. Tôi xin chịu trách nhiệm kiểm tra, đối chiếu và cung cấp bằng chứng khi được yêu cầu" + ) + ku_short = "Cam kết bài báo không thuộc 'Tạp chí săn mồi'" + out["ban_cam_ket"]["cam_ket"]["bai_bao_uy_tin"] = bool( + uy.get(ku_full) or uy.get(ku_short) + ) + tt = ii.get("4. Tuân thủ pháp luật sở hữu trí tuệ") + if not isinstance(tt, dict): + tt = ii.get("Tuân thủ pháp luật sở hữu trí tuệ") + if isinstance(tt, dict): + kt_full = ( + "Tôi cam kết rằng việc sử dụng bài báo đăng ký xét công nhận sáng kiến tại ĐHYD sẽ không gây tranh chấp về: quyền tác giả/quyền liên quan, quyền sở hữu công nghiệp, tiết lộ bí mật kinh doanh, vi phạm bảo mật dữ liệu của bất kỳ bên thứ ba nào. Tôi chịu trách nhiệm trước pháp luật về tính trung thực, hợp pháp của hồ sơ" + ) + kt_short = "Cam kết việc sử dụng bài báo sẽ không gây tranh chấp về quyền tác giả, sở hữu trí tuệ, bí mật kinh doanh, bảo mật dữ liệu" + out["ban_cam_ket"]["cam_ket"]["tuan_thu_phap_luat"] = bool( + tt.get(kt_full) or tt.get(kt_short) + ) + out["ban_cam_ket"]["nguoi_cam_ket"] = str( + bck.get("Người cam kết (Ký tên, ghi rõ họ tên)") or "" + ) + + bx = official.get("BẢN XÁC NHẬN TÀI LIỆU THAM KHẢO (2.2.2)") + if isinstance(bx, dict): + bnk = bx.get("Ngày ký") if isinstance(bx.get("Ngày ký"), dict) else {} + out["reference_material_honesty"]["ngay_ky"]["ngay"] = str(bnk.get("Ngày") or "") + out["reference_material_honesty"]["ngay_ky"]["thang"] = str(bnk.get("Tháng") or "") + out["reference_material_honesty"]["ngay_ky"]["nam"] = str(bnk.get("Năm") or "") + i1 = bx.get("I. THÔNG TIN ĐĂNG KÝ") + if isinstance(i1, dict): + out["reference_material_honesty"]["tac_gia_dang_ky"] = str(i1.get("Tác giả đăng ký sáng kiến") or "") + out["reference_material_honesty"]["cccd"] = str(i1.get("CCCD/Hộ chiếu số") or "") + out["reference_material_honesty"]["don_vi"] = str(i1.get("Đơn vị") or "") + out["reference_material_honesty"]["ten_tai_lieu"] = str( + i1.get("Tên tài liệu tham khảo (theo Quyết định xuất bản)") or "" + ) + out["reference_material_honesty"]["nam_xet"] = str(i1.get("Năm xét công nhận sáng kiến") or "") + ii = bx.get("II. XÁC NHẬN VÀ CAM KẾT (☑ vào ô tương ứng)") + if isinstance(ii, dict): + _RM_ONE_FULL = ( + "Tôi cam đoan các thông tin kê khai và minh chứng đính kèm đối với tài liệu tham khảo là trung thực, đúng sự thật và phù hợp với Quyết định xuất bản trong giai đoạn quy định (15/4/2025–15/4/2026)." + ) + _RM_TWO_FULL = ( + "Tôi hoàn toàn chịu trách nhiệm trước pháp luật và trước nhà trường về tính hợp pháp của tài liệu và nội dung đăng ký." + ) + _RM_THREE_FULL = "Tôi đồng ý bổ sung hoặc chỉnh sửa hồ sơ khi được yêu cầu." + s1 = ii.get("1. Trung thực thông tin và minh chứng") + if isinstance(s1, dict): + out["reference_material_honesty"]["cam_ket"]["thong_tin_trung_thuc"] = bool(s1.get(_RM_ONE_FULL)) + s2 = ii.get("2. Trách nhiệm pháp luật") + if isinstance(s2, dict): + out["reference_material_honesty"]["cam_ket"]["trach_nhiem_phap_luat"] = bool(s2.get(_RM_TWO_FULL)) + s3 = ii.get("3. Bổ sung hồ sơ khi được yêu cầu") + if isinstance(s3, dict): + out["reference_material_honesty"]["cam_ket"]["bo_sung_khi_yeu_cau"] = bool(s3.get(_RM_THREE_FULL)) + out["reference_material_honesty"]["nguoi_cam_ket"] = str( + bx.get("Người cam kết (Ký tên, ghi rõ họ tên)") or "" + ) + + dj = official.get("BẢN XÁC NHẬN BÀI BÁO TRONG NƯỚC (2.1.2)") + if isinstance(dj, dict): + dnk = dj.get("Ngày ký") if isinstance(dj.get("Ngày ký"), dict) else {} + out["research_domestic_honesty"]["ngay_ky"]["ngay"] = str(dnk.get("Ngày") or "") + out["research_domestic_honesty"]["ngay_ky"]["thang"] = str(dnk.get("Tháng") or "") + out["research_domestic_honesty"]["ngay_ky"]["nam"] = str(dnk.get("Năm") or "") + _dj_sub = ( + "Tiêu đề phụ (Áp dụng đối với cá nhân đăng ký minh chứng là bài báo tạp chí trong nước nhóm 2.1.2)" + ) + out["research_domestic_honesty"]["tieu_de_phu"] = str(dj.get(_dj_sub) or "") + i1d = dj.get("I. THÔNG TIN ĐĂNG KÝ") + if isinstance(i1d, dict): + out["research_domestic_honesty"]["tac_gia_dang_ky"] = str(i1d.get("Tác giả đăng ký sáng kiến") or "") + out["research_domestic_honesty"]["cccd"] = str(i1d.get("CCCD/Hộ chiếu số") or "") + out["research_domestic_honesty"]["don_vi"] = str(i1d.get("Đơn vị") or "") + out["research_domestic_honesty"]["ten_bai_bao"] = str( + i1d.get("Tên bài báo (tạp chí trong nước, giai đoạn xuất bản quy định)") or "" + ) + out["research_domestic_honesty"]["nam_xet"] = str(i1d.get("Năm xét công nhận sáng kiến") or "") + iid = dj.get("II. XÁC NHẬN VÀ CAM KẾT (☑ vào ô tương ứng)") + if isinstance(iid, dict): + _DJ_ONE_FULL = ( + "Tôi cam đoan các thông tin kê khai và minh chứng đính kèm đối với bài báo trên tạp chí trong nước là trung thực, đúng sự thật và phù hợp với thời điểm xuất bản trong giai đoạn quy định (15/4/2025–15/4/2026)." + ) + _DJ_TWO_FULL = ( + "Tôi hoàn toàn chịu trách nhiệm trước pháp luật và trước nhà trường về tính hợp pháp của bài báo và nội dung đăng ký." + ) + _DJ_THREE_FULL = "Tôi đồng ý bổ sung hoặc chỉnh sửa hồ sơ khi được yêu cầu." + d1 = iid.get("1. Trung thực thông tin và minh chứng") + if isinstance(d1, dict): + out["research_domestic_honesty"]["cam_ket"]["thong_tin_trung_thuc"] = bool(d1.get(_DJ_ONE_FULL)) + d2 = iid.get("2. Trách nhiệm pháp luật") + if isinstance(d2, dict): + out["research_domestic_honesty"]["cam_ket"]["trach_nhiem_phap_luat"] = bool(d2.get(_DJ_TWO_FULL)) + d3 = iid.get("3. Bổ sung hồ sơ khi được yêu cầu") + if isinstance(d3, dict): + out["research_domestic_honesty"]["cam_ket"]["bo_sung_khi_yeu_cau"] = bool(d3.get(_DJ_THREE_FULL)) + out["research_domestic_honesty"]["nguoi_cam_ket"] = str( + dj.get("Người cam kết (Ký tên, ghi rõ họ tên)") or "" + ) + + return out + diff --git a/be0/src/be01/template_sang_kien.docx b/be0/src/be01/template_sang_kien.docx new file mode 100644 index 0000000..f975a0b Binary files /dev/null and b/be0/src/be01/template_sang_kien.docx differ diff --git a/be0/src/chat_assistant.py b/be0/src/chat_assistant.py new file mode 100644 index 0000000..48e5212 --- /dev/null +++ b/be0/src/chat_assistant.py @@ -0,0 +1,344 @@ +""" +Chat Assistant Module +Implements a conversational AI assistant using Ollama for policy and compliance questions. +""" + +import ollama +from typing import List, Dict, Optional, Any +from pydantic import BaseModel +from fastapi import HTTPException +from src.utils import initialize_a_logger + + +class ChatMessage(BaseModel): + """Represents a single chat message.""" + role: str # "user" or "assistant" + content: str + + +class ChatRequest(BaseModel): + """Request model for chat messages.""" + message: str + conversation_history: Optional[List[ChatMessage]] = None + context: Optional[str] = None # Additional context about policies/documents + + +class ChatResponse(BaseModel): + """Response model for chat messages.""" + message: str + model: str + tokens_used: Optional[int] = None + + +class ChatAssistant: + """ + Chat Assistant for answering policy and compliance questions. + Uses Ollama to provide intelligent responses about IT governance, compliance, and policies. + """ + + def __init__(self, model_name: str = "qwen2.5:3b", config: Optional[Dict] = None): + """ + Initialize the Chat Assistant. + + Args: + model_name: Name of the Ollama model to use + config: Optional configuration dictionary + """ + self.model_name = model_name + self.config = config or {} + self.logger = initialize_a_logger('./logs/ChatAssistant.log') + self.logger.info(f"ChatAssistant initialized with model: {model_name}") + + # Check Ollama connectivity on initialization + self._check_ollama_connection() + + # System prompt for the assistant + self.system_prompt = """You are a helpful compliance and policy assistant. Your role is to: +1. Answer questions about IT governance, compliance policies, and regulatory requirements +2. Provide guidance on ISO 27001, NIST, GDPR, and other compliance frameworks +3. Help users understand workflow processes and requirements +4. Verify content against compliance standards +5. Be accurate, helpful, and concise in your responses + +Always provide clear, actionable advice. If you're unsure about something, say so rather than guessing. +When discussing compliance requirements, cite specific standards or frameworks when possible.""" + + def _check_ollama_connection(self): + """Check if Ollama is accessible and model is available.""" + try: + models = ollama.list() + model_names = [m.get("name", "") for m in models.get("models", [])] + self.logger.info(f"Ollama connected. Available models: {model_names}") + + # Check if our model is available + if self.model_name not in model_names: + self.logger.warning( + f"Model '{self.model_name}' not found in available models. " + f"Available models: {model_names}. " + f"Trying to use it anyway - it may need to be pulled." + ) + except Exception as e: + self.logger.error(f"Failed to connect to Ollama: {e}") + self.logger.warning( + "Ollama connection check failed. " + "The service may not be available. " + "Chat functionality may not work until Ollama is running." + ) + + def _build_messages( + self, + user_message: str, + conversation_history: Optional[List[ChatMessage]] = None, + context: Optional[str] = None + ) -> List[Dict[str, str]]: + """ + Build the message list for Ollama API. + + Args: + user_message: The current user message + conversation_history: Previous messages in the conversation + context: Additional context to include + + Returns: + List of message dictionaries for Ollama + """ + messages = [] + + # Add system prompt + messages.append({ + "role": "system", + "content": self.system_prompt + }) + + # Add context if provided + if context: + messages.append({ + "role": "system", + "content": f"Additional context: {context}" + }) + + # Add conversation history + if conversation_history: + for msg in conversation_history[-10:]: # Keep last 10 messages for context + messages.append({ + "role": msg.role, + "content": msg.content + }) + + # Add current user message + messages.append({ + "role": "user", + "content": user_message + }) + + return messages + + async def chat(self, request: ChatRequest) -> ChatResponse: + """ + Process a chat message and return a response. + + Args: + request: Chat request with message and optional history + + Returns: + Chat response with assistant's message + + Raises: + HTTPException: If the chat request fails + """ + try: + self.logger.info(f"Processing chat message: {request.message[:100] if request.message else 'Empty message'}...") + + # Validate request + if not request.message or not request.message.strip(): + raise ValueError("Message cannot be empty") + + # Build messages for Ollama + messages = self._build_messages( + user_message=request.message, + conversation_history=request.conversation_history, + context=request.context + ) + + self.logger.debug(f"Sending {len(messages)} messages to Ollama model: {self.model_name}") + + # Call Ollama API + try: + response = ollama.chat( + model=self.model_name, + messages=messages, + options={ + "temperature": 0.7, # Slightly creative for conversational responses + "top_p": 0.9, + } + ) + except ConnectionError as e: + self.logger.error(f"Ollama connection error: {e}", exc_info=True) + raise HTTPException( + status_code=503, + detail="Cannot connect to Ollama service. Please ensure Ollama is running and accessible." + ) + except Exception as ollama_error: + error_str = str(ollama_error).lower() + self.logger.error(f"Ollama API error: {ollama_error}", exc_info=True) + + # Check if it's a connection error + if "connection" in error_str or "refused" in error_str or "connect" in error_str: + raise HTTPException( + status_code=503, + detail="Ollama service is not available. Please ensure Ollama is running on localhost:11434." + ) + # Check if model is not found + if "not found" in error_str or ("model" in error_str and "not" in error_str): + raise HTTPException( + status_code=404, + detail=f"Model '{self.model_name}' not found. Please ensure the model is available. Try: ollama pull {self.model_name}" + ) + # Generic Ollama error + raise HTTPException( + status_code=500, + detail=f"Ollama error: {str(ollama_error)}" + ) + + # Extract response content + assistant_message = response.get("message", {}).get("content", "") + + if not assistant_message: + self.logger.warning("Empty response from Ollama") + assistant_message = "I apologize, but I couldn't generate a response. Please try again." + + self.logger.info(f"Generated response: {assistant_message[:100]}...") + + return ChatResponse( + message=assistant_message, + model=self.model_name, + tokens_used=response.get("eval_count", 0) + ) + + except HTTPException: + # Re-raise HTTP exceptions as-is + raise + except Exception as e: + error_message = str(e) + self.logger.error(f"Error in chat: {error_message}", exc_info=True) + raise HTTPException( + status_code=500, + detail=f"Failed to generate chat response: {error_message}" + ) + + async def verify_content( + self, + field_name: str, + content: str, + verification_criteria: Optional[str] = None + ) -> ChatResponse: + """ + Verify content against compliance requirements. + + Args: + field_name: Name of the field being verified + content: Content to verify + verification_criteria: Optional specific criteria to check against + + Returns: + Chat response with verification feedback + """ + try: + self.logger.info(f"Verifying content for field: {field_name}") + + # Build verification prompt + verification_prompt = f"""Please review and verify the following content from the field "{field_name}". + +Content to verify: +"{content}" + +Please provide: +1. Whether the content meets compliance requirements +2. Any suggestions for improvement +3. Any potential issues or concerns +4. Specific recommendations + +""" + + if verification_criteria: + verification_prompt += f"\nSpecific criteria to check:\n{verification_criteria}\n" + + verification_prompt += "\nProvide a detailed, helpful verification response." + + # Create chat request + chat_request = ChatRequest( + message=verification_prompt, + context=f"Content verification for field: {field_name}" + ) + + return await self.chat(chat_request) + + except HTTPException: + raise + except Exception as e: + error_message = str(e) + self.logger.error(f"Error verifying content: {error_message}", exc_info=True) + raise HTTPException( + status_code=500, + detail=f"Failed to verify content: {error_message}" + ) + + async def answer_policy_question( + self, + question: str, + policy_context: Optional[str] = None + ) -> ChatResponse: + """ + Answer a question about policies or compliance. + + Args: + question: The user's question + policy_context: Optional context about specific policies + + Returns: + Chat response with the answer + """ + try: + self.logger.info(f"Answering policy question: {question[:100]}...") + + # Enhance question with policy context + enhanced_question = question + if policy_context: + enhanced_question = f"Context: {policy_context}\n\nQuestion: {question}" + + chat_request = ChatRequest( + message=enhanced_question, + context="Policy and compliance question answering" + ) + + return await self.chat(chat_request) + + except HTTPException: + raise + except Exception as e: + error_message = str(e) + self.logger.error(f"Error answering policy question: {error_message}", exc_info=True) + raise HTTPException( + status_code=500, + detail=f"Failed to answer question: {error_message}" + ) + + +# Global instance +_chat_assistant_instance: Optional[ChatAssistant] = None + + +def get_chat_assistant(model_name: str = "qwen2.5:3b") -> ChatAssistant: + """ + Get or create the global ChatAssistant instance. + + Args: + model_name: Name of the Ollama model to use + + Returns: + ChatAssistant instance + """ + global _chat_assistant_instance + if _chat_assistant_instance is None: + _chat_assistant_instance = ChatAssistant(model_name=model_name) + return _chat_assistant_instance diff --git a/be0/src/compliance_verifier.py b/be0/src/compliance_verifier.py new file mode 100644 index 0000000..0fef8b7 --- /dev/null +++ b/be0/src/compliance_verifier.py @@ -0,0 +1,142 @@ + +import ollama + +from pathlib import Path +import uuid +import json +import asyncio +from enum import Enum +from fastapi import HTTPException +from pydantic import BaseModel, Field, validator +from typing import List, Dict, TypedDict, Literal, Any +import numpy as np +np.random.seed(42) +from src.utils import initialize_a_logger +from src.structure_analysis import StructureAnalyzer + + +class PromptRequest(BaseModel): + prompt: str + +class ComplianceRequest(BaseModel): + external_requirements: List[str] + internal_requirements: List[str] + + +class Compliance_Verifier(object): + def __init__(self, config=None): + self.config = config + self.logger = initialize_a_logger('./logs/ComplianceVerifier.log') + self.logger.debug(f"Compliance start") + self.structure_analyzer = StructureAnalyzer() + + async def generate_text(self, req: PromptRequest): + """ + Sends a prompt to the Qwen 2.5 3B model running on the local Ollama server. + """ + model_name = "qwen2.5:3b" + + try: + response = ollama.chat( + model=model_name, + messages=[{'role': 'user', 'content': req.prompt}], + options={"temperature": 0.0}, + ) + + content = response.get("message", {}).get("content", "Error: No content found in response.") + + return {"oss_json": content} + except Exception as e: + # Raising an HTTPException will return the error to the user via FastAPI + error_message = str(e) + self.logger.error(f"Error generating text: {error_message}") + raise HTTPException(status_code=500, detail=error_message) + + async def vectorize_requirement(self, req: PromptRequest): + self.logger.debug(f"embed req : {req }") + try: + response = ollama.embeddings( + model="embeddinggemma:300m", + prompt=req + ) + + embedding = response["embedding"] + self.logger.debug(f"embedding : {embedding }") + + return { + "embedding_preview": embedding, + "total_dimensions": len(embedding), + "model": "embeddinggemma:300m" + } + + except Exception as e: + self.logger.debug(f"embedding : {embedding }") + return {"error": str(e)} + + async def structural_similarity(self, data: ComplianceRequest) -> Dict[str, Any]: + req1 = " ".join(data.external_requirements) + req2 = " ".join(data.internal_requirements) + self.logger.debug(f"req1 : {req1 }") + self.logger.debug(f"req2 : {req2}") + try: + keywords_req1 = self.structure_analyzer.extract_keywords(req1) + keywords_req2 = self.structure_analyzer.extract_keywords(req2) + + self.logger.debug(f"keywords_req1 : {keywords_req1 }") + self.logger.debug(f"keywords_req2 : {keywords_req2}") + + common = list(set(keywords_req1) & set(keywords_req2)) + self.logger.debug(f"common : {common}") + + return { + "keywords_req1": keywords_req1, + "keywords_req2": keywords_req2, + "structure_match": common + } + except Exception as e: + self.logger.info(f"Failed to log structure_match: {e}") + return { + "keywords_req1": [], + "keywords_req2": [], + "structure_match": -1 + } + + async def semantic_similarity(self, data: ComplianceRequest) -> Dict[str, Any]: + req1 = " ".join(data.external_requirements) + req2 = " ".join(data.internal_requirements) + self.logger.debug(f"req1 : {req1 }") + self.logger.debug(f"req2 : {req2}") + try: + + result1 = await self.vectorize_requirement(req1) + result2 = await self.vectorize_requirement(req2) + self.logger.debug(f"result1 : {result1 }") + self.logger.debug(f"result2 : {result2 }") + if "error" in result1 or "error" in result2: + return { + "error": "Failed to generate embeddings", + "result1": result1, + "result2": result2 + } + + emb1 = np.array(result1.get('embedding_preview', [])) + emb2 = np.array(result2.get('embedding_preview', [])) + self.logger.debug(f"emb1: {emb1.shape }") + self.logger.debug(f"emb2: {emb2.shape }") + similarity = None + if len(emb1) > 0 and len(emb2) > 0: + similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2)) +0.000000001 + self.logger.debug(f"CustomMCP: {similarity}") + self.logger.info(f"similarity_score {similarity}") + return { + "requirement1": req1, + "requirement2": req2, + "similarity_score": float(similarity) if similarity is not None else None + } + except Exception as e: + self.logger.info(f"Failed to log similarity_score: {e}") + return { + "requirement1": req1, + "requirement2": req2, + "similarity_score": -1 + } \ No newline at end of file diff --git a/be0/src/domain/__init__.py b/be0/src/domain/__init__.py new file mode 100644 index 0000000..1895df7 --- /dev/null +++ b/be0/src/domain/__init__.py @@ -0,0 +1,5 @@ +"""Domain layer — pure business model + rules, organized by bounded context. + +No imports of FastAPI, SQLAlchemy, aioboto3, jwt, argon2, or ``os.getenv``. The +domain expresses *what* is true; adapters in ``infrastructure`` provide *how*. +""" diff --git a/be0/src/domain/identity/__init__.py b/be0/src/domain/identity/__init__.py new file mode 100644 index 0000000..51d29ee --- /dev/null +++ b/be0/src/domain/identity/__init__.py @@ -0,0 +1,6 @@ +"""Identity bounded context — Users, roles, credentials, authentication rules. + +Reference slice for the Clean Architecture refactor. The rules here are extracted +verbatim (behavior-preserving) from ``src/auth_api.py`` so the live monolith and the +layered version agree until each endpoint is cut over. +""" diff --git a/be0/src/domain/identity/entities.py b/be0/src/domain/identity/entities.py new file mode 100644 index 0000000..9cba7b1 --- /dev/null +++ b/be0/src/domain/identity/entities.py @@ -0,0 +1,41 @@ +"""The User aggregate — identity, credentials, roles, verification state.""" + +from __future__ import annotations + +import uuid +from dataclasses import dataclass, field + +from src.shared_kernel.entity import AggregateRoot + + +@dataclass(eq=False) +class User(AggregateRoot): + """User aggregate root. + + Mutable entity (``eq=False`` so identity-based equality from ``AggregateRoot`` + is kept). Holds the persisted ``password_hash``; verifying a plaintext password + is delegated to a ``PasswordHasher`` port in the application layer (the domain + never imports argon2). + """ + + id: uuid.UUID + email: str + full_name: str + password_hash: str + email_verified: bool + is_active: bool + credential_version: int + roles: tuple[str, ...] = () + phone: str | None = None + unit_id: uuid.UUID | None = field(default=None) + + def can_authenticate(self) -> bool: + """Account must be active to authenticate at all.""" + return self.is_active + + def requires_email_verification(self) -> bool: + return not self.email_verified + + def bump_credential_version(self) -> None: + """Invalidate all previously issued JWTs (password change/reset).""" + self.credential_version = int(self.credential_version or 0) + 1 diff --git a/be0/src/domain/identity/errors.py b/be0/src/domain/identity/errors.py new file mode 100644 index 0000000..4cdb831 --- /dev/null +++ b/be0/src/domain/identity/errors.py @@ -0,0 +1,25 @@ +"""Identity-context domain errors (subclasses carry the exact Vietnamese messages).""" + +from __future__ import annotations + +from src.shared_kernel.errors import ( + AuthenticationError, + AuthorizationError, + ValidationError, +) + + +class InvalidInstitutionalEmail(ValidationError): + """Email is not a valid @ump.edu.vn / @umc.edu.vn address.""" + + +class WeakPassword(ValidationError): + """Password failed the strength policy.""" + + +class InvalidCredentials(AuthenticationError): + """Email/password did not match an active account.""" + + +class EmailNotVerified(AuthorizationError): + """Account exists but the email has not been verified yet.""" diff --git a/be0/src/domain/identity/repository.py b/be0/src/domain/identity/repository.py new file mode 100644 index 0000000..7d193ef --- /dev/null +++ b/be0/src/domain/identity/repository.py @@ -0,0 +1,28 @@ +"""UserRepository PORT — the domain's contract; implemented in ``infrastructure``. + +The application layer depends on this Protocol, never on SQLAlchemy. The concrete +``SqlAlchemyUserRepository`` lives in ``infrastructure/identity`` and maps the ORM +``User`` row to the domain ``User`` aggregate. +""" + +from __future__ import annotations + +import uuid +from typing import Protocol, runtime_checkable + +from src.domain.identity.entities import User + + +@runtime_checkable +class UserRepository(Protocol): + async def get_by_email(self, email: str) -> User | None: + """Return the active user for ``email`` (already normalized) or None.""" + ... + + async def get_by_id(self, user_id: uuid.UUID) -> User | None: + """Return the active user by id or None.""" + ... + + async def roles_after_reconcile(self, user: User) -> list[str]: + """Apply the policy-admin reconcile, then return the user's sorted roles.""" + ... diff --git a/be0/src/domain/identity/services.py b/be0/src/domain/identity/services.py new file mode 100644 index 0000000..9e9e2c0 --- /dev/null +++ b/be0/src/domain/identity/services.py @@ -0,0 +1,94 @@ +"""Pure domain services for Identity — role policy + access-token claims. + +These are decisions, not side effects: they take plain data and return plain data +or an action enum. Persisting the decision (DB writes) and signing the token live in +infrastructure. Mirrors ``auth_api._policy_admin_emails``, ``_reconcile_policy_admin`` +and ``_issue_token``. +""" + +from __future__ import annotations + +import uuid +from datetime import datetime, timedelta +from enum import Enum, auto +from typing import Any + +# Default policy admins when AUTH_ADMIN_EMAILS is unset +# (must stay in sync with migration 007 cleanup list + auth_api._DEFAULT_POLICY_ADMIN_EMAILS). +DEFAULT_POLICY_ADMIN_EMAILS: frozenset[str] = frozenset( + { + "thaontt@ump.edu.vn", + "nltanh@ump.edu.vn", + "ldbaochau@ump.edu.vn", + "htchuong@ump.edu.vn", + } +) + + +def policy_admin_emails(auth_admin_emails_env: str | None) -> frozenset[str]: + """Emails that receive ``admin`` from institutional policy. + + If ``AUTH_ADMIN_EMAILS`` is set, ONLY that comma-separated list applies + (lowercased). If unset, the built-in UMP allow-list applies. Pure — the env + value is passed in by the caller (composition layer), not read here. + """ + raw = (auth_admin_emails_env or "").strip() + if raw: + return frozenset(part.strip().lower() for part in raw.split(",") if part.strip()) + return DEFAULT_POLICY_ADMIN_EMAILS + + +class AdminReconcileAction(Enum): + """What to do with a user's ``admin`` role row given email policy.""" + + none = auto() + add_admin = auto() # policy admin, no row → INSERT (admin_from_email_policy=True) + mark_policy = auto() # policy admin, row exists → ensure admin_from_email_policy=True + remove_admin = auto() # not policy admin, policy-granted row exists → DELETE + + +def reconcile_admin_action( + email: str, + policy_admins: frozenset[str], + has_admin_row: bool, + admin_from_policy: bool, +) -> AdminReconcileAction: + """Pure decision mirroring ``auth_api._reconcile_policy_admin``. + + Manual admin rows (``admin_from_email_policy=False``) are preserved when the + email is not allow-listed. + """ + email_norm = email.strip().lower() + if email_norm in policy_admins: + return ( + AdminReconcileAction.mark_policy + if has_admin_row + else AdminReconcileAction.add_admin + ) + if has_admin_row and admin_from_policy: + return AdminReconcileAction.remove_admin + return AdminReconcileAction.none + + +def build_access_token_claims( + user_id: uuid.UUID, + email: str, + roles: list[str], + credential_version: int, + now: datetime, + expire_hours: int, +) -> dict[str, Any]: + """Build HS256 JWT claims (mirror of ``auth_api._issue_token``); signing is infra. + + A session-scoped token carries exactly one identity + the active roles + ``cv`` + (credential version) so a password change invalidates it. + """ + exp = now + timedelta(hours=int(expire_hours)) + return { + "sub": str(user_id), + "email": email, + "roles": roles, + "cv": int(credential_version), + "iat": int(now.timestamp()), + "exp": int(exp.timestamp()), + } diff --git a/be0/src/domain/identity/value_objects.py b/be0/src/domain/identity/value_objects.py new file mode 100644 index 0000000..c177337 --- /dev/null +++ b/be0/src/domain/identity/value_objects.py @@ -0,0 +1,74 @@ +"""Pure value objects + policy for the Identity context. + +Behavior mirrors ``src/auth_api.py`` exactly (same regex, same Vietnamese messages, +same password rules) so the layered slice and the live monolith stay in lock-step. +""" + +from __future__ import annotations + +import re +from dataclasses import dataclass +from enum import Enum + +from src.domain.identity.errors import InvalidInstitutionalEmail, WeakPassword +from src.shared_kernel.value_object import ValueObject + +# Authoritative *domain* allow-list for UMP/UMC faculty email +# (mirrors auth_api.INSTITUTIONAL_EMAIL_RE). +_INSTITUTIONAL_EMAIL_RE = re.compile( + r"^[a-zA-Z0-9._%+-]+@(ump|umc)\.edu\.vn\Z", re.IGNORECASE +) + +_INVALID_EMAIL_MSG = ( + "Email phải là địa chỉ UMP hoặc UMC hợp lệ " + "(dạng ten@ump.edu.vn hoặc ten@umc.edu.vn)." +) + +MAX_PASSWORD_INPUT_CHARS = 512 + + +class Role(str, Enum): + """The three canonical system roles (FE labels map to these 1:1).""" + + admin = "admin" # Quản trị viên + editor = "editor" # Hội đồng (council) + viewer = "viewer" # Người nộp đơn (applicant) + + +@dataclass(frozen=True) +class InstitutionalEmail(ValueObject): + """A normalized, validated UMP/UMC institutional email.""" + + value: str + + @classmethod + def parse(cls, raw: str) -> "InstitutionalEmail": + normalized = (raw or "").strip().lower() + if not _INSTITUTIONAL_EMAIL_RE.match(normalized): + raise InvalidInstitutionalEmail(_INVALID_EMAIL_MSG) + return cls(normalized) + + def __str__(self) -> str: # pragma: no cover - trivial + return self.value + + +def assert_password_policy(password: str) -> None: + """Raise :class:`WeakPassword` if ``password`` violates the policy. + + Exact mirror of ``auth_api._assert_password_policy`` — messages preserved so + API responses are identical pre/post cut-over. + """ + if len(password) < 6: + raise WeakPassword("Mật khẩu tối thiểu 6 ký tự.") + if len(password) > MAX_PASSWORD_INPUT_CHARS: + raise WeakPassword("Mật khẩu quá dài.") + if not re.search(r"[a-z]", password): + raise WeakPassword("Mật khẩu phải có ít nhất một chữ cái thường.") + if not re.search(r"[A-Z]", password): + raise WeakPassword("Mật khẩu phải có ít nhất một chữ cái hoa.") + if not re.search(r"\d", password): + raise WeakPassword("Mật khẩu phải có ít nhất một chữ số.") + if not re.search(r"[^A-Za-z0-9]", password): + raise WeakPassword( + "Mật khẩu phải có ít nhất một ký tự đặc biệt (không chỉ chữ và số)." + ) diff --git a/be0/src/imagehub_routes.py b/be0/src/imagehub_routes.py new file mode 100644 index 0000000..000719c --- /dev/null +++ b/be0/src/imagehub_routes.py @@ -0,0 +1,1981 @@ +"""ImageHub — content-addressed imaging dataset versioning (milestone 1 walking skeleton). + +A user (investigator/PI) creates a dataset, uploads imaging files, and snapshots versions. +Files are stored as content-addressed, globally deduped blobs in MinIO (one blob per distinct +sha256). The current working file set lives in ``imagehub_dataset_files``; a version freezes a +manifest snapshot. Admin sees every dataset (the clinical data repository); a non-admin sees +only their own (their research data). Every mutation writes an append-only audit row. + +Mounted under ``/api/v1`` in main.py → routes live at ``/api/v1/datasets/*``. +""" +from __future__ import annotations + +import json +import re +import unicodedata +import uuid +from datetime import datetime, timedelta, timezone +from typing import Any, Optional + +from fastapi import APIRouter, Body, File, Form, Header, HTTPException, Query, UploadFile +from pydantic import BaseModel, Field +from sqlalchemy import func, or_, select +from sqlalchemy.exc import IntegrityError + +from src.auth_jwt import decode_access_token_user_id, decode_bearer_token +from src.imagehub_segmentation import MaskUpload, SegmentationError, SegmentationService +from src.imagehub_task_pipeline import ( + StageInfo, + TaskPipelineError, + compute_finalize, + compute_review, + initial_transition, + validate_set_reference, +) +from src.initiative_db.engine import get_session, is_postgres_enabled +from src.initiative_db.models import ( + ImagehubBlob, + ImagehubDataset, + ImagehubDatasetAudit, + ImagehubDatasetFile, + ImagehubDatasetMember, + ImagehubDatasetStage, + ImagehubTask, + ImagehubTaskReviewEvent, + ImagehubVersion, + ResearchProject, + User, +) +from src.minio.storage import StorageError, storage + +router = APIRouter(prefix="/datasets", tags=["imagehub"]) + +_VISIBILITIES = ("private", "internal", "public") +_ROLE_ADMIN = "Quản trị viên" +_ROLE_OWNER = "Chủ sở hữu" + + +# --------------------------------------------------------------------------- # +# Auth (mirrors research_routes / the extracted admin routers) +# --------------------------------------------------------------------------- # +def _jwt_roles(authorization: str | None) -> list[str]: + p = decode_bearer_token(authorization) + if not p: + return [] + r = p.get("roles") + return [str(x) for x in r] if isinstance(r, list) else [] + + +def _require_authed_uid(authorization: str | None) -> uuid.UUID: + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để thực hiện thao tác.") + return uid + + +def _is_admin(authorization: str | None) -> bool: + return "admin" in _jwt_roles(authorization) + + +def _require_admin_uid(authorization: str | None) -> uuid.UUID: + uid = _require_authed_uid(authorization) + if not _is_admin(authorization): + raise HTTPException(status_code=403, detail="Chỉ tài khoản quản trị mới thực hiện được.") + return uid + + +def _require_db() -> None: + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa sẵn sàng.") + + +def _role_label(authorization: str | None) -> str: + return _ROLE_ADMIN if _is_admin(authorization) else _ROLE_OWNER + + +# --------------------------------------------------------------------------- # +# Helpers +# --------------------------------------------------------------------------- # +def _slugify(name: str) -> str: + base = unicodedata.normalize("NFKD", name or "").encode("ascii", "ignore").decode("ascii") + base = re.sub(r"[^a-zA-Z0-9]+", "-", base).strip("-").lower() + return base[:60] or "dataset" + + +def _safe_logical_path(name: Optional[str]) -> str: + """Flatten an uploaded filename to a safe logical path (basename; diacritics preserved).""" + raw = (name or "").strip().replace("\\", "/") + base = raw.rsplit("/", 1)[-1] + base = re.sub(r"\s+", "_", base) + base = "".join(ch for ch in base if unicodedata.category(ch)[0] != "C") + return base[:200] or "file" + + +def _safe_folder_path(name: Optional[str]) -> str: + """The sanitized relative directory of an uploaded path (folders kept, basename dropped). + + Mirrors _safe_logical_path but preserves slashes so a dataset can hold a real folder tree + (e.g. the nnU-Net imagesTr/labelsTr layout). Rejects traversal, leading slashes and control + characters. Returns '' when the upload carries no directory component. + """ + raw = (name or "").strip().replace("\\", "/") + head = raw.rsplit("/", 1)[0] if "/" in raw else "" + parts: list[str] = [] + for seg in head.split("/"): + seg = re.sub(r"\s+", "_", seg.strip()) + seg = "".join(ch for ch in seg if unicodedata.category(ch)[0] != "C") + if seg in ("", ".", ".."): + continue + parts.append(seg) + return "/".join(parts)[:400] + + +def _split_name_ext(name: str) -> tuple[str, str]: + """Split a filename into (stem, ext), treating .nii.gz / .tar.gz as one extension.""" + low = name.lower() + for double in (".nii.gz", ".tar.gz"): + if low.endswith(double): + return name[: -len(double)], name[-len(double):] + dot = name.rfind(".") + return (name, "") if dot <= 0 else (name[:dot], name[dot:]) + + +def _case_number(stem: str) -> int | None: + """The trailing integer of a case stem (after stripping any _NNNN channel tag), else None. + e.g. "1"->1, "100"->100, "POLYP25_00001"->1, "10_0000"->10.""" + base = re.sub(r"_\d{4}$", "", stem) + m = re.search(r"(\d+)$", base) + return int(m.group(1)) if m else None + + +def _normalized_name(logical_path: str, prefix: str, is_label: bool) -> str | None: + """Target name: image -> {prefix}_{NNNNN}_0000{ext}, label -> {prefix}_{NNNNN}{ext}, where + NNNNN is the file's case number 5-digit zero-padded (so an image and its label share a case + identifier, e.g. POLYP25_00001_0000.png + POLYP25_00001.png). Returns None when no case number + can be derived or the name is already correct (idempotent).""" + stem, ext = _split_name_ext(logical_path) + num = _case_number(stem) + if num is None: + return None + case_id = f"{prefix}_{num:05d}" + new = f"{case_id}{ext}" if is_label else f"{case_id}_0000{ext}" + return new if new != logical_path else None + + +def _coerce_tags(v: Any) -> list[str]: + if isinstance(v, list): + return [str(x).strip() for x in v if str(x).strip()] + return [] + + +def _sniff_imaging_meta(filename: Optional[str], data: bytes, media_type: str) -> dict[str, Any]: + """Best-effort, synchronous imaging metadata. Never raises — returns {} on any failure. + + Heavy extraction (full tag dump, thumbnails, conversion) is deferred to the worker tier. + """ + name = (filename or "").lower() + # DICOM: "DICM" magic at byte 128, or a .dcm extension. + if (len(data) > 132 and data[128:132] == b"DICM") or name.endswith(".dcm"): + try: + import io as _io + + import pydicom + + ds = pydicom.dcmread(_io.BytesIO(data), stop_before_pixels=True, force=True) + out: dict[str, Any] = {"format": "dicom"} + for tag, key in ( + ("Modality", "modality"), + ("Rows", "rows"), + ("Columns", "columns"), + ("BodyPartExamined", "bodyPart"), + ("StudyInstanceUID", "studyUid"), + ("SeriesInstanceUID", "seriesUid"), + ): + val = getattr(ds, tag, None) + if val is not None: + out[key] = int(val) if key in ("rows", "columns") else str(val) + return out + except Exception: + return {} + # NIfTI: .nii / .nii.gz — load via a temp file (nibabel needs a path for gz). + if name.endswith(".nii") or name.endswith(".nii.gz"): + import os as _os + import tempfile as _tempfile + + suffix = ".nii.gz" if name.endswith(".nii.gz") else ".nii" + tmp_path = "" + try: + import nibabel as nib + + with _tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp: + tmp.write(data) + tmp_path = tmp.name + img = nib.load(tmp_path) + zooms = [float(z) for z in img.header.get_zooms()] + return {"format": "nifti", "shape": [int(s) for s in img.shape], "voxelSize": zooms} + except Exception: + return {} + finally: + if tmp_path: + try: + _os.unlink(tmp_path) + except OSError: + pass + return {} + + +async def _actor_name(session, uid: uuid.UUID) -> str: + u = await session.get(User, uid) + if u is None: + return "" + return (u.full_name or u.email or "").strip() + + +async def _write_audit( + session, + dataset_id: uuid.UUID, + actor_uid: Optional[uuid.UUID], + actor_name: str, + role_label: str, + action: str, + subject: str = "", + detail: str = "", +) -> None: + session.add( + ImagehubDatasetAudit( + dataset_id=dataset_id, + actor_user_id=actor_uid, + actor_name=actor_name or "", + role_label=role_label or "", + action=action, + subject=subject or "", + detail=detail or "", + ) + ) + + +async def _load_dataset(session, dataset_id: str, uid: uuid.UUID, is_admin: bool) -> ImagehubDataset: + """Fetch a dataset enforcing owner-or-admin access (404 hides others' rows).""" + try: + did = uuid.UUID(dataset_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy bộ dữ liệu.") + row = ( + await session.execute(select(ImagehubDataset).where(ImagehubDataset.id == did)) + ).scalar_one_or_none() + if row is None or (not is_admin and row.owner_user_id != uid): + raise HTTPException(status_code=404, detail="Không tìm thấy bộ dữ liệu.") + return row + + +# --------------------------------------------------------------------------- # +# Membership-aware access (multi-labeler). `_load_dataset` above stays owner-or-platform-admin and +# guards the management ops (create/delete/settings, upload, stage CRUD, member CRUD, generate, +# assign-to-others). The helpers below additionally admit dataset MEMBERS, for the read + task-work +# surface a labeler needs. +# --------------------------------------------------------------------------- # +_PROJECT_ADMIN_ROLES = ("owner", "admin", "project_admin") + + +async def _member_role( + session, dataset: ImagehubDataset, uid: uuid.UUID, is_admin: bool +) -> Optional[str]: + """The caller's effective role on a dataset: 'admin' (platform), 'owner', 'project_admin' or + 'member' (from the membership table), else None (no access).""" + if is_admin: + return "admin" + if dataset.owner_user_id == uid: + return "owner" + return ( + await session.execute( + select(ImagehubDatasetMember.role).where( + ImagehubDatasetMember.dataset_id == dataset.id, + ImagehubDatasetMember.user_id == uid, + ) + ) + ).scalar_one_or_none() + + +async def _load_dataset_any( + session, dataset_id: str, uid: uuid.UUID, is_admin: bool +) -> tuple[ImagehubDataset, str]: + """Fetch a dataset enforcing owner-OR-member-OR-admin access (404 hides others). Returns + (dataset, role) so callers can gate by role.""" + try: + did = uuid.UUID(dataset_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy bộ dữ liệu.") + row = ( + await session.execute(select(ImagehubDataset).where(ImagehubDataset.id == did)) + ).scalar_one_or_none() + if row is None: + raise HTTPException(status_code=404, detail="Không tìm thấy bộ dữ liệu.") + role = await _member_role(session, row, uid, is_admin) + if role is None: + raise HTTPException(status_code=404, detail="Không tìm thấy bộ dữ liệu.") + return row, role + + +async def _load_dataset_read(session, dataset_id: str, uid: uuid.UUID, is_admin: bool) -> ImagehubDataset: + """Owner/member/admin read access — the dataset only (drops the role).""" + ds, _role = await _load_dataset_any(session, dataset_id, uid, is_admin) + return ds + + +async def _load_dataset_admin( + session, dataset_id: str, uid: uuid.UUID, is_admin: bool +) -> tuple[ImagehubDataset, str]: + """Project-management access: owner, platform-admin, or a project_admin member (403 for plain + members, 404 for non-members). Dataset-structural / destructive ops still use `_load_dataset`.""" + ds, role = await _load_dataset_any(session, dataset_id, uid, is_admin) + if role not in _PROJECT_ADMIN_ROLES: + raise HTTPException(status_code=403, detail="Chỉ quản trị dự án mới thực hiện được.") + return ds, role + + +def _can_work_task(role: str, task_assignee: Optional[uuid.UUID], uid: uuid.UUID) -> bool: + """A project admin/owner may work any task; a plain member only the tasks assigned to them.""" + if role in _PROJECT_ADMIN_ROLES: + return True + return task_assignee is not None and task_assignee == uid + + +async def _counts(session, dataset_ids: list[uuid.UUID]) -> tuple[dict, dict]: + """Grouped file + version counts for a set of datasets (avoids N+1 in the list view).""" + if not dataset_ids: + return {}, {} + file_rows = ( + await session.execute( + select(ImagehubDatasetFile.dataset_id, func.count()) + .where(ImagehubDatasetFile.dataset_id.in_(dataset_ids)) + .group_by(ImagehubDatasetFile.dataset_id) + ) + ).all() + ver_rows = ( + await session.execute( + select(ImagehubVersion.dataset_id, func.count()) + .where(ImagehubVersion.dataset_id.in_(dataset_ids)) + .group_by(ImagehubVersion.dataset_id) + ) + ).all() + return {r[0]: int(r[1]) for r in file_rows}, {r[0]: int(r[1]) for r in ver_rows} + + +# --------------------------------------------------------------------------- # +# Schemas +# --------------------------------------------------------------------------- # +class DatasetOut(BaseModel): + id: str + ownerUserId: str + ownerEmail: Optional[str] = None + name: str = "" + slug: str = "" + description: str = "" + visibility: str = "private" + modalityTags: list[str] = Field(default_factory=list) + labelMap: dict[str, str] = Field(default_factory=dict) + defaultBranch: str = "main" + researchProjectId: Optional[str] = None + fileCount: int = 0 + versionCount: int = 0 + createdAt: Optional[datetime] = None + updatedAt: Optional[datetime] = None + + +class DatasetCreateIn(BaseModel): + name: str = Field(default="", max_length=300) + description: str = Field(default="", max_length=4000) + visibility: str = "private" + modalityTags: list[str] = Field(default_factory=list) + researchProjectId: Optional[str] = None + + +class DatasetUpdateIn(BaseModel): + name: Optional[str] = Field(default=None, max_length=300) + description: Optional[str] = Field(default=None, max_length=4000) + visibility: Optional[str] = None + modalityTags: Optional[list[str]] = None + labelMap: Optional[dict[str, str]] = None + + +class VersionCreateIn(BaseModel): + message: Optional[str] = Field(default=None, max_length=2000) + + +class FileOut(BaseModel): + id: str + logicalPath: str + folderPath: str = "" + sha256: str + size: int + mediaType: str + imagingMeta: dict[str, Any] = Field(default_factory=dict) + fileKind: str = "image" + parentFileId: Optional[str] = None + organLabel: str = "" + uploadedAt: Optional[datetime] = None + downloadUrl: Optional[str] = None + + +class NormalizeIn(BaseModel): + prefix: str # lesion+year code, e.g. "POLYP25" + + +class NormalizeResultOut(BaseModel): + ok: bool + renamed: int + skipped: int + files: list[dict[str, str]] # {id, oldPath, newPath, folderPath} + + +class VersionOut(BaseModel): + id: str + seq: int + message: str = "" + fileCount: int = 0 + manifest: list[dict[str, Any]] = Field(default_factory=list) + createdAt: Optional[datetime] = None + + +class AuditOut(BaseModel): + id: int + occurredAt: Optional[datetime] = None + actorName: str = "" + roleLabel: str = "" + action: str + subject: str = "" + detail: str = "" + + +def _coerce_label_map(value: Any) -> dict[str, str]: + """Sanitize a per-dataset value->name label map: positive-int string keys, trimmed non-empty + names (<=100 chars), bounded to 512 entries. Returns {} for anything malformed so the viewer + falls back to the built-in TotalSegmentator names.""" + if not isinstance(value, dict): + return {} + out: dict[str, str] = {} + for k, v in value.items(): + ks = str(k) + if not (ks.isascii() and ks.isdigit()): # plain positive-int keys only ("1".."N") + continue + iv = int(ks) + if iv <= 0 or not isinstance(v, str): + continue + name = v.strip() + if not name: + continue + out[str(iv)] = name[:100] + if len(out) >= 512: + break + return out + + +def _ds_to_out( + row: ImagehubDataset, file_count: int = 0, version_count: int = 0, owner_email: Optional[str] = None +) -> DatasetOut: + return DatasetOut( + id=str(row.id), + ownerUserId=str(row.owner_user_id), + ownerEmail=owner_email, + name=row.name or "", + slug=row.slug or "", + description=row.description or "", + visibility=row.visibility or "private", + modalityTags=_coerce_tags(row.modality_tags), + labelMap=_coerce_label_map(row.label_map), + defaultBranch=row.default_branch or "main", + researchProjectId=str(row.research_project_id) if row.research_project_id else None, + fileCount=file_count, + versionCount=version_count, + createdAt=row.created_at, + updatedAt=row.updated_at, + ) + + +def _version_to_out(row: ImagehubVersion) -> VersionOut: + manifest = row.manifest if isinstance(row.manifest, list) else [] + return VersionOut( + id=str(row.id), + seq=row.seq, + message=row.message or "", + fileCount=len(manifest), + manifest=manifest, + createdAt=row.created_at, + ) + + +# --------------------------------------------------------------------------- # +# Endpoints — datasets +# --------------------------------------------------------------------------- # +@router.post("", response_model=DatasetOut) +async def create_dataset( + payload: Optional[DatasetCreateIn] = Body(None), + authorization: Optional[str] = Header(None), +) -> DatasetOut: + """Authed: create a dataset owned by the current user.""" + _require_db() + uid = _require_authed_uid(authorization) + p = payload or DatasetCreateIn() + name = (p.name or "").strip() + if not name: + raise HTTPException(status_code=422, detail="Cần nhập tên bộ dữ liệu.") + visibility = p.visibility if p.visibility in _VISIBILITIES else "private" + project_uuid: Optional[uuid.UUID] = None + if p.researchProjectId: + try: + project_uuid = uuid.UUID(p.researchProjectId) + except (ValueError, TypeError): + raise HTTPException(status_code=422, detail="Đề tài không hợp lệ.") + async with get_session() as session: + # If linking to a research project ("workspace"), it must exist and be owned by the + # caller (or the caller is a platform admin). Read BEFORE add to avoid an autoflush. + if project_uuid is not None: + owner = ( + await session.execute( + select(ResearchProject.owner_user_id).where(ResearchProject.id == project_uuid) + ) + ).scalar_one_or_none() + if owner is None: + raise HTTPException(status_code=422, detail="Đề tài không tồn tại.") + if owner != uid and not _is_admin(authorization): + raise HTTPException(status_code=403, detail="Bạn không sở hữu đề tài này.") + row = ImagehubDataset( + id=uuid.uuid4(), + owner_user_id=uid, + name=name, + slug=_slugify(name), + description=(p.description or "").strip(), + visibility=visibility, + modality_tags=_coerce_tags(p.modalityTags), + research_project_id=project_uuid, + ) + session.add(row) + await session.flush() + await _write_audit( + session, row.id, uid, await _actor_name(session, uid), _ROLE_OWNER, + "Tạo bộ dữ liệu", name, + ) + await session.commit() + await session.refresh(row) + return _ds_to_out(row) + + +@router.get("", response_model=list[DatasetOut]) +async def list_datasets( + scope: str = "mine", + authorization: Optional[str] = Header(None), + projectId: Optional[str] = None, +) -> list[DatasetOut]: + """List datasets. Non-admin always sees only their own; admin may pass ?scope=all (clinical repo). + ?projectId= further restricts to datasets linked to that research project ("workspace").""" + _require_db() + uid = _require_authed_uid(authorization) + is_admin = _is_admin(authorization) + async with get_session() as session: + stmt = ( + select(ImagehubDataset, User.email) + .join(User, User.id == ImagehubDataset.owner_user_id) + .order_by(ImagehubDataset.created_at.desc()) + ) + if scope != "all" or not is_admin: + member_ds = select(ImagehubDatasetMember.dataset_id).where( + ImagehubDatasetMember.user_id == uid + ) + stmt = stmt.where( + or_(ImagehubDataset.owner_user_id == uid, ImagehubDataset.id.in_(member_ds)) + ) + if projectId: + try: + stmt = stmt.where(ImagehubDataset.research_project_id == uuid.UUID(projectId)) + except (ValueError, TypeError): + raise HTTPException(status_code=422, detail="Đề tài không hợp lệ.") + rows = (await session.execute(stmt)).all() + datasets = [r[0] for r in rows] + emails = {r[0].id: r[1] for r in rows} + files, versions = await _counts(session, [d.id for d in datasets]) + return [ + _ds_to_out(d, files.get(d.id, 0), versions.get(d.id, 0), emails.get(d.id)) + for d in datasets + ] + + +@router.get("/{dataset_id}", response_model=DatasetOut) +async def get_dataset(dataset_id: str, authorization: Optional[str] = Header(None)) -> DatasetOut: + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + row = await _load_dataset_read(session, dataset_id, uid, _is_admin(authorization)) + files, versions = await _counts(session, [row.id]) + owner = await session.get(User, row.owner_user_id) + return _ds_to_out(row, files.get(row.id, 0), versions.get(row.id, 0), owner.email if owner else None) + + +@router.put("/{dataset_id}", response_model=DatasetOut) +async def update_dataset( + dataset_id: str, + payload: DatasetUpdateIn = Body(...), + authorization: Optional[str] = Header(None), +) -> DatasetOut: + """Owner or admin: update dataset metadata.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + row = await _load_dataset(session, dataset_id, uid, _is_admin(authorization)) + if payload.name is not None: + new_name = payload.name.strip() + if not new_name: + raise HTTPException(status_code=422, detail="Tên bộ dữ liệu không được để trống.") + row.name = new_name + row.slug = _slugify(new_name) + if payload.description is not None: + row.description = payload.description.strip() + if payload.visibility is not None: + if payload.visibility not in _VISIBILITIES: + raise HTTPException(status_code=422, detail="Mức hiển thị không hợp lệ.") + row.visibility = payload.visibility + if payload.modalityTags is not None: + row.modality_tags = _coerce_tags(payload.modalityTags) + if payload.labelMap is not None: + row.label_map = _coerce_label_map(payload.labelMap) + row.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, row.id, uid, await _actor_name(session, uid), _role_label(authorization), + "Cập nhật bộ dữ liệu", row.name, + ) + await session.commit() + await session.refresh(row) + files, versions = await _counts(session, [row.id]) + return _ds_to_out(row, files.get(row.id, 0), versions.get(row.id, 0)) + + +@router.delete("/{dataset_id}") +async def delete_dataset(dataset_id: str, authorization: Optional[str] = Header(None)) -> dict[str, Any]: + """Owner or admin: delete a dataset. Files/versions/audit cascade; blobs are left (no GC in v1).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + row = await _load_dataset(session, dataset_id, uid, _is_admin(authorization)) + await session.delete(row) + await session.commit() + return {"ok": True} + + +# --------------------------------------------------------------------------- # +# Endpoints — files (content-addressed upload + browse) +# --------------------------------------------------------------------------- # +@router.post("/{dataset_id}/files") +async def upload_files( + dataset_id: str, + files: list[UploadFile] = File(...), + authorization: Optional[str] = Header(None), + paths: Optional[str] = Form(None), +) -> dict[str, Any]: + """Owner or admin: upload one or more files. Each is content-addressed + deduped; a file at an + existing (folder, logical path) is replaced. ``paths`` is an optional JSON array of relative + upload paths (index-aligned with ``files``) so a directory structure is preserved as folder_path. + Best-effort imaging metadata is sniffed synchronously.""" + _require_db() + uid = _require_authed_uid(authorization) + is_admin = _is_admin(authorization) + results: list[dict[str, Any]] = [] + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, is_admin) + rel_paths: list[str] = [] + if paths: + try: + parsed_paths = json.loads(paths) + if isinstance(parsed_paths, list): + rel_paths = [str(p) for p in parsed_paths] + except (ValueError, TypeError): + rel_paths = [] + for i, uf in enumerate(files): + data = await uf.read() + if not data: + continue + media_type = uf.content_type or "application/octet-stream" + try: + blob = await storage.put_blob(data, media_type) + except ValueError as exc: + raise HTTPException(status_code=413, detail=str(exc)) from exc + except StorageError as exc: + raise HTTPException(status_code=502, detail=f"Lưu trữ thất bại: {exc}") from exc + if await session.get(ImagehubBlob, blob["sha256"]) is None: + session.add( + ImagehubBlob( + sha256=blob["sha256"], + size_bytes=blob["size"], + media_type=media_type, + storage_bucket=blob["bucket"], + storage_key=blob["key"], + ) + ) + await session.flush() + logical_path = _safe_logical_path(uf.filename) + rel = rel_paths[i] if i < len(rel_paths) else (uf.filename or "") + folder_path = _safe_folder_path(rel) + meta = _sniff_imaging_meta(uf.filename, data, media_type) + file_row = ( + await session.execute( + select(ImagehubDatasetFile).where( + ImagehubDatasetFile.dataset_id == ds.id, + ImagehubDatasetFile.folder_path == folder_path, + ImagehubDatasetFile.logical_path == logical_path, + ) + ) + ).scalar_one_or_none() + if file_row is None: + file_row = ImagehubDatasetFile( + id=uuid.uuid4(), dataset_id=ds.id, folder_path=folder_path, logical_path=logical_path + ) + session.add(file_row) + file_row.blob_sha256 = blob["sha256"] + file_row.size_bytes = blob["size"] + file_row.media_type = media_type + file_row.imaging_meta = meta + file_row.uploaded_by = uid + file_row.updated_at = datetime.now(tz=timezone.utc) + results.append( + { + "path": f"{folder_path}/{logical_path}" if folder_path else logical_path, + "sha256": blob["sha256"], + "size": blob["size"], + "deduped": blob["deduped"], + "imagingMeta": meta, + } + ) + ds.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, await _actor_name(session, uid), _role_label(authorization), + "Tải tệp lên", f"{len(results)} tệp", + ) + await session.commit() + return {"ok": True, "files": results} + + +@router.get("/{dataset_id}/files", response_model=list[FileOut]) +async def list_files(dataset_id: str, authorization: Optional[str] = Header(None)) -> list[FileOut]: + """Owner, member or admin: the dataset's current working files, each with a presigned URL.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset_read(session, dataset_id, uid, _is_admin(authorization)) + rows = ( + await session.execute( + select(ImagehubDatasetFile, ImagehubBlob) + .join(ImagehubBlob, ImagehubBlob.sha256 == ImagehubDatasetFile.blob_sha256) + .where(ImagehubDatasetFile.dataset_id == ds.id) + .order_by(ImagehubDatasetFile.folder_path, ImagehubDatasetFile.logical_path) + ) + ).all() + out: list[FileOut] = [] + for f, b in rows: + try: + url = await storage.get_download_url( + b.storage_bucket, b.storage_key, filename=f.logical_path, inline=False + ) + except Exception: + url = None + out.append( + FileOut( + id=str(f.id), + logicalPath=f.logical_path, + folderPath=f.folder_path or "", + sha256=f.blob_sha256, + size=f.size_bytes, + mediaType=f.media_type, + imagingMeta=f.imaging_meta if isinstance(f.imaging_meta, dict) else {}, + fileKind=f.file_kind or "image", + parentFileId=str(f.parent_file_id) if f.parent_file_id else None, + organLabel=f.organ_label or "", + uploadedAt=f.created_at, + downloadUrl=url, + ) + ) + return out + + +@router.get("/{dataset_id}/files/{file_id}/download") +async def download_file( + dataset_id: str, file_id: str, authorization: Optional[str] = Header(None) +) -> dict[str, Any]: + """Owner, member or admin: a fresh presigned download URL for a single file.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset_read(session, dataset_id, uid, _is_admin(authorization)) + try: + fid = uuid.UUID(file_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy tệp.") + row = ( + await session.execute( + select(ImagehubDatasetFile, ImagehubBlob) + .join(ImagehubBlob, ImagehubBlob.sha256 == ImagehubDatasetFile.blob_sha256) + .where(ImagehubDatasetFile.id == fid, ImagehubDatasetFile.dataset_id == ds.id) + ) + ).first() + if row is None: + raise HTTPException(status_code=404, detail="Không tìm thấy tệp.") + f, b = row + url = await storage.get_download_url( + b.storage_bucket, b.storage_key, filename=f.logical_path, inline=False + ) + return {"url": url, "logicalPath": f.logical_path} + + +@router.post("/{dataset_id}/files/{parent_file_id}/segmentations") +async def upload_segmentations( + dataset_id: str, + parent_file_id: str, + masks: list[UploadFile] = File(...), + organs: Optional[list[str]] = Form(None), + authorization: Optional[str] = Header(None), +) -> dict[str, Any]: + """Owner or admin: upload organ-mask files and link them to a parent image file. + + Each mask is content-addressed + stored like any file, but recorded with + ``file_kind='segmentation'`` and ``parent_file_id`` pointing at the image it + segments. ``organs[i]`` is the organ label for ``masks[i]`` (parallel arrays; + falls back to the filename). Domain rules live in ``SegmentationService`` — the + handler is just transport (read multipart → service → audit → commit).""" + _require_db() + uid = _require_authed_uid(authorization) + is_admin = _is_admin(authorization) + organ_labels = organs or [] + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, is_admin) + items: list[MaskUpload] = [] + for i, uf in enumerate(masks): + items.append( + MaskUpload( + filename=uf.filename or "mask.nii.gz", + data=await uf.read(), + media_type=uf.content_type or "application/octet-stream", + organ_label=organ_labels[i] if i < len(organ_labels) else "", + ) + ) + service = SegmentationService( + session, + put_blob=storage.put_blob, + sniff_meta=_sniff_imaging_meta, + safe_name=_safe_logical_path, + ) + try: + linked = await service.link_masks(ds, parent_file_id, items, uid) + except SegmentationError as exc: + raise HTTPException(status_code=exc.status, detail=str(exc)) from exc + except ValueError as exc: # storage.put_blob size cap → 413 + raise HTTPException(status_code=413, detail=str(exc)) from exc + except StorageError as exc: + raise HTTPException(status_code=502, detail=f"Lưu trữ thất bại: {exc}") from exc + ds.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, await _actor_name(session, uid), _role_label(authorization), + "Tải mặt nạ phân vùng", f"{len(linked)} mặt nạ", + ) + await session.commit() + results = [ + { + "id": str(r.id), + "logicalPath": r.logical_path, + "organLabel": r.organ_label, + "parentFileId": str(r.parent_file_id) if r.parent_file_id else None, + } + for r in linked + ] + return {"ok": True, "masks": results} + + +@router.post("/{dataset_id}/files/normalize-channels", response_model=NormalizeResultOut) +async def normalize_channels( + dataset_id: str, payload: NormalizeIn, authorization: Optional[str] = Header(None) +) -> NormalizeResultOut: + """Owner or admin: rename files to the {prefix}_{caseID}_0000.{ext} convention — images under + imagesTr/imagesTs get the _0000 channel suffix, labels under labelsTr/labelsTs stay channel-free, + and caseID is the file's number 5-digit zero-padded so an image and its label share a case id + (e.g. POLYP25_00001_0000.png + POLYP25_00001.png). Idempotent; logical rename only — no blob move.""" + _require_db() + uid = _require_authed_uid(authorization) + is_admin = _is_admin(authorization) + prefix = re.sub(r"[^A-Za-z0-9]", "", payload.prefix or "").upper() + if not prefix: + raise HTTPException(status_code=400, detail="Thiếu mã tổn thương (tiền tố).") + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, is_admin) + rows = ( + await session.execute( + select(ImagehubDatasetFile).where(ImagehubDatasetFile.dataset_id == ds.id) + ) + ).scalars().all() + existing = {(r.folder_path, r.logical_path) for r in rows} + # Resolve the actor + role BEFORE mutating any logical_path: a read after a mutation would + # autoflush the pending UPDATE and raise a unique-violation outside the commit try/except. + actor = await _actor_name(session, uid) + role = _role_label(authorization) + results: list[dict[str, str]] = [] + plan: list[tuple[ImagehubDatasetFile, str]] = [] + skipped = 0 + for r in rows: + seg = (r.folder_path or "").strip("/").split("/")[-1] + if r.file_kind == "segmentation": + continue + if seg in ("imagesTr", "imagesTs"): + is_label = False + elif seg in ("labelsTr", "labelsTs"): + is_label = True + else: + continue # root / other folders are left alone + new = _normalized_name(r.logical_path, prefix, is_label) + if new is None: + skipped += 1 + continue + if (r.folder_path, new) in existing: + skipped += 1 + continue + results.append( + { + "id": str(r.id), + "oldPath": r.logical_path, + "newPath": new, + "folderPath": r.folder_path or "", + } + ) + plan.append((r, new)) + renamed = len(plan) + if renamed: + for r, new in plan: + r.logical_path = new + r.updated_at = datetime.now(tz=timezone.utc) + ds.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, role, + action="Chuẩn hoá tên tệp", subject=f"{renamed} tệp", + detail=f"Đổi tên sang định dạng {prefix}__0000", + ) + try: + await session.commit() + except IntegrityError: + await session.rollback() + raise HTTPException(status_code=409, detail="Tên tệp bị trùng sau khi chuẩn hoá.") + return NormalizeResultOut(ok=True, renamed=renamed, skipped=skipped, files=results) + + +# --------------------------------------------------------------------------- # +# Endpoints — versions (the snapshot spine) + audit +# --------------------------------------------------------------------------- # +@router.post("/{dataset_id}/versions", response_model=VersionOut) +async def create_version( + dataset_id: str, + payload: Optional[VersionCreateIn] = Body(None), + authorization: Optional[str] = Header(None), +) -> VersionOut: + """Owner or admin: freeze the current working files into a new version snapshot.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, _is_admin(authorization)) + files = ( + await session.execute( + select(ImagehubDatasetFile) + .where(ImagehubDatasetFile.dataset_id == ds.id) + .order_by(ImagehubDatasetFile.logical_path) + ) + ).scalars().all() + if not files: + raise HTTPException(status_code=422, detail="Bộ dữ liệu chưa có tệp nào để tạo phiên bản.") + manifest = [ + {"logicalPath": f.logical_path, "blobSha256": f.blob_sha256, + "size": f.size_bytes, "mediaType": f.media_type} + for f in files + ] + max_seq = ( + await session.execute( + select(func.coalesce(func.max(ImagehubVersion.seq), 0)).where( + ImagehubVersion.dataset_id == ds.id + ) + ) + ).scalar_one() + parent = ( + await session.execute( + select(ImagehubVersion.id) + .where(ImagehubVersion.dataset_id == ds.id) + .order_by(ImagehubVersion.seq.desc()) + .limit(1) + ) + ).scalar_one_or_none() + msg = ((payload.message if payload else None) or "").strip() + row = ImagehubVersion( + id=uuid.uuid4(), + dataset_id=ds.id, + seq=int(max_seq) + 1, + message=msg, + manifest=manifest, + parent_version_id=parent, + author_user_id=uid, + ) + session.add(row) + await session.flush() + await _write_audit( + session, ds.id, uid, await _actor_name(session, uid), _role_label(authorization), + "Tạo phiên bản", f"v{row.seq}", msg, + ) + await session.commit() + await session.refresh(row) + return _version_to_out(row) + + +@router.get("/{dataset_id}/versions", response_model=list[VersionOut]) +async def list_versions(dataset_id: str, authorization: Optional[str] = Header(None)) -> list[VersionOut]: + """Owner or admin: version history, newest first.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, _is_admin(authorization)) + rows = ( + await session.execute( + select(ImagehubVersion) + .where(ImagehubVersion.dataset_id == ds.id) + .order_by(ImagehubVersion.seq.desc()) + ) + ).scalars().all() + return [_version_to_out(r) for r in rows] + + +@router.get("/{dataset_id}/audit", response_model=list[AuditOut]) +async def list_audit(dataset_id: str, authorization: Optional[str] = Header(None)) -> list[AuditOut]: + """Owner or admin: the append-only audit trail for a dataset, newest first.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, _is_admin(authorization)) + rows = ( + await session.execute( + select(ImagehubDatasetAudit) + .where(ImagehubDatasetAudit.dataset_id == ds.id) + .order_by(ImagehubDatasetAudit.occurred_at.desc(), ImagehubDatasetAudit.id.desc()) + ) + ).scalars().all() + return [ + AuditOut( + id=r.id, + occurredAt=r.occurred_at, + actorName=r.actor_name, + roleLabel=r.role_label, + action=r.action, + subject=r.subject, + detail=r.detail, + ) + for r in rows + ] + + +# --------------------------------------------------------------------------- # +# Endpoints — labeling-pipeline stages (Label -> Review_1 -> Review_2 ...) +# --------------------------------------------------------------------------- # +_STAGE_KINDS = ("label", "review") + + +class StageOut(BaseModel): + id: str + name: str = "" + kind: str = "label" + seq: int = 0 + reviewPercent: Optional[int] = None + autoAssign: bool = True + + +class StageCreateIn(BaseModel): + name: str = Field(default="", max_length=200) + kind: str = "label" + reviewPercent: Optional[int] = Field(default=None, ge=0, le=100) + autoAssign: bool = True + + +class StageUpdateIn(BaseModel): + name: Optional[str] = Field(default=None, max_length=200) + reviewPercent: Optional[int] = Field(default=None, ge=0, le=100) + autoAssign: Optional[bool] = None + seq: Optional[int] = None + + +def _stage_to_out(row: ImagehubDatasetStage) -> StageOut: + return StageOut( + id=str(row.id), + name=row.name or "", + kind=row.kind or "label", + seq=row.seq, + reviewPercent=row.review_percent, + autoAssign=bool(row.auto_assign), + ) + + +async def _load_stage(session, dataset_id: uuid.UUID, stage_id: str) -> ImagehubDatasetStage: + try: + sid = uuid.UUID(stage_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy giai đoạn.") + stage = ( + await session.execute( + select(ImagehubDatasetStage).where( + ImagehubDatasetStage.id == sid, ImagehubDatasetStage.dataset_id == dataset_id + ) + ) + ).scalar_one_or_none() + if stage is None: + raise HTTPException(status_code=404, detail="Không tìm thấy giai đoạn.") + return stage + + +@router.get("/{dataset_id}/stages", response_model=list[StageOut]) +async def list_stages(dataset_id: str, authorization: Optional[str] = Header(None)) -> list[StageOut]: + """Owner, member or admin: the dataset's labeling-pipeline stages, in pipeline order.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset_read(session, dataset_id, uid, _is_admin(authorization)) + rows = ( + await session.execute( + select(ImagehubDatasetStage) + .where(ImagehubDatasetStage.dataset_id == ds.id) + .order_by(ImagehubDatasetStage.seq, ImagehubDatasetStage.created_at) + ) + ).scalars().all() + return [_stage_to_out(r) for r in rows] + + +@router.post("/{dataset_id}/stages", response_model=StageOut) +async def add_stage( + dataset_id: str, + payload: StageCreateIn = Body(...), + authorization: Optional[str] = Header(None), +) -> StageOut: + """Owner or admin: append a stage to the pipeline.""" + _require_db() + uid = _require_authed_uid(authorization) + if payload.kind not in _STAGE_KINDS: + raise HTTPException(status_code=422, detail="Loại giai đoạn không hợp lệ.") + name = (payload.name or "").strip() + if not name: + raise HTTPException(status_code=422, detail="Tên giai đoạn không được để trống.") + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, _is_admin(authorization)) + # Resolve the actor BEFORE adding the new row: a query here would otherwise autoflush the + # pending INSERT and raise the unique-violation outside the commit() try/except below. + actor = await _actor_name(session, uid) + max_seq = ( + await session.execute( + select(func.max(ImagehubDatasetStage.seq)).where(ImagehubDatasetStage.dataset_id == ds.id) + ) + ).scalar() + stage = ImagehubDatasetStage( + dataset_id=ds.id, + name=name, + kind=payload.kind, + seq=(int(max_seq) + 1) if max_seq is not None else 0, + review_percent=payload.reviewPercent if payload.kind == "review" else None, + auto_assign=payload.autoAssign, + ) + session.add(stage) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), + "Thêm giai đoạn", name, + ) + try: + await session.commit() + except IntegrityError: + await session.rollback() + raise HTTPException(status_code=409, detail="Tên giai đoạn đã tồn tại trong bộ dữ liệu.") + await session.refresh(stage) + return _stage_to_out(stage) + + +@router.patch("/{dataset_id}/stages/{stage_id}", response_model=StageOut) +async def update_stage( + dataset_id: str, + stage_id: str, + payload: StageUpdateIn = Body(...), + authorization: Optional[str] = Header(None), +) -> StageOut: + """Owner or admin: rename a stage, set its review %, toggle Automatic Task Assignment, reorder.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, _is_admin(authorization)) + stage = await _load_stage(session, ds.id, stage_id) + # Resolve the actor before mutating the row (autoflush-before-commit guard; see add_stage). + actor = await _actor_name(session, uid) + if payload.name is not None: + nm = payload.name.strip() + if not nm: + raise HTTPException(status_code=422, detail="Tên giai đoạn không được để trống.") + stage.name = nm + if payload.reviewPercent is not None: + stage.review_percent = payload.reviewPercent + if payload.autoAssign is not None: + stage.auto_assign = payload.autoAssign + if payload.seq is not None: + stage.seq = payload.seq + stage.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), + "Cập nhật giai đoạn", stage.name, + ) + try: + await session.commit() + except IntegrityError: + await session.rollback() + raise HTTPException(status_code=409, detail="Tên giai đoạn đã tồn tại trong bộ dữ liệu.") + await session.refresh(stage) + return _stage_to_out(stage) + + +@router.delete("/{dataset_id}/stages/{stage_id}") +async def delete_stage( + dataset_id: str, stage_id: str, authorization: Optional[str] = Header(None) +) -> dict[str, Any]: + """Owner or admin: remove a stage from the pipeline.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset(session, dataset_id, uid, _is_admin(authorization)) + stage = await _load_stage(session, ds.id, stage_id) + actor = await _actor_name(session, uid) + name = stage.name + await session.delete(stage) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), + "Xóa giai đoạn", name, + ) + await session.commit() + return {"ok": True} + + +# --------------------------------------------------------------------------- # +# Endpoints — task pipeline (a file flows Label -> Review_n -> Ground Truth) +# +# Single-user MVP: a Task is one row per *image* file (imagehub_tasks), created on demand from the +# dataset's files once the pipeline has at least one stage. Access reuses the dataset owner-or-admin +# gate; multi-labeler membership/assignment is a later phase. Every write follows the +# autoflush-before-commit guard (resolve reads before add/mutate; see add_stage). +# --------------------------------------------------------------------------- # +class TaskOut(BaseModel): + id: str + name: str = "" + fileId: str + fileLogicalPath: str = "" + currentStageId: Optional[str] = None + currentStageName: Optional[str] = None + pipelineState: str = "inLabel" + queueStatus: str = "assigned" + assigneeUserId: Optional[str] = None + assigneeName: Optional[str] = None + assignmentMode: str = "auto" + priority: float = 0.0 + isReferenceStandard: bool = False + createdAt: datetime + updatedAt: datetime + + +class GenerateTasksOut(BaseModel): + created: int = 0 + total: int = 0 + + +class ReviewIn(BaseModel): + decision: str = Field(..., description="accept | acceptWithCorrections | reject") + note: Optional[str] = Field(default=None, max_length=2000) + + +class ReviewStatsOut(BaseModel): + accepted: int = 0 + acceptWithCorrections: int = 0 + rejected: int = 0 + + +class PriorityIn(BaseModel): + priority: float = Field(..., ge=0, le=1) + + +class SetReferenceIn(BaseModel): + isReferenceStandard: bool = True + + +class TaskDetailOut(TaskOut): + """A single task plus its saved annotations (the AnnotationTool's working payload).""" + + annotations: list[Any] = Field(default_factory=list) + + +class SaveIn(BaseModel): + annotations: list[Any] = Field(default_factory=list) + submit: bool = False + + +def _assignee_label(full_name: Optional[str], email: Optional[str]) -> Optional[str]: + name = (full_name or email or "").strip() + return name or None + + +def _build_task_out( + task: ImagehubTask, + file_logical_path: Optional[str], + stage_name: Optional[str], + assignee_name: Optional[str], +) -> TaskOut: + return TaskOut( + id=str(task.id), + name=task.name or "", + fileId=str(task.dataset_file_id), + fileLogicalPath=file_logical_path or "", + currentStageId=str(task.current_stage_id) if task.current_stage_id else None, + currentStageName=stage_name, + pipelineState=task.pipeline_state, + queueStatus=task.queue_status, + assigneeUserId=str(task.assignee_user_id) if task.assignee_user_id else None, + assigneeName=assignee_name, + assignmentMode=task.assignment_mode, + priority=float(task.priority or 0.0), + isReferenceStandard=bool(task.is_reference_standard), + createdAt=task.created_at, + updatedAt=task.updated_at, + ) + + +async def _stage_infos(session, dataset_id: uuid.UUID) -> list[StageInfo]: + rows = ( + await session.execute( + select(ImagehubDatasetStage.id, ImagehubDatasetStage.kind, ImagehubDatasetStage.seq).where( + ImagehubDatasetStage.dataset_id == dataset_id + ) + ) + ).all() + return [StageInfo(id=str(sid), kind=kind or "label", seq=int(seq)) for sid, kind, seq in rows] + + +async def _load_task(session, dataset_id: uuid.UUID, task_id: str) -> ImagehubTask: + try: + tid = uuid.UUID(task_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy công việc.") + task = ( + await session.execute( + select(ImagehubTask).where(ImagehubTask.id == tid, ImagehubTask.dataset_id == dataset_id) + ) + ).scalar_one_or_none() + if task is None: + raise HTTPException(status_code=404, detail="Không tìm thấy công việc.") + return task + + +async def _task_out_after(session, task: ImagehubTask) -> TaskOut: + """Build a TaskOut for one task, looking up its file path / stage name / assignee name.""" + f = await session.get(ImagehubDatasetFile, task.dataset_file_id) + file_path = f.logical_path if f else "" + stage_name: Optional[str] = None + if task.current_stage_id: + st = await session.get(ImagehubDatasetStage, task.current_stage_id) + stage_name = st.name if st else None + assignee_name: Optional[str] = None + if task.assignee_user_id: + u = await session.get(User, task.assignee_user_id) + if u is not None: + assignee_name = _assignee_label(u.full_name, u.email) + return _build_task_out(task, file_path, stage_name, assignee_name) + + +async def _task_detail_after(session, task: ImagehubTask) -> TaskDetailOut: + """A single task as TaskDetailOut (TaskOut fields + its saved annotations).""" + base = await _task_out_after(session, task) + return TaskDetailOut(**base.model_dump(), annotations=task.annotations or []) + + +@router.post("/{dataset_id}/tasks/generate", response_model=GenerateTasksOut) +async def generate_tasks( + dataset_id: str, authorization: Optional[str] = Header(None) +) -> GenerateTasksOut: + """Owner or admin: create one task per image file that doesn't have one yet. + + Requires the dataset to have at least one pipeline stage (409 otherwise); tasks start at the + first stage. Idempotent — files that already have a task are skipped. + """ + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, _role = await _load_dataset_admin(session, dataset_id, uid, _is_admin(authorization)) + # All reads BEFORE any add (autoflush-before-commit guard; see add_stage). + stages = await _stage_infos(session, ds.id) + if not stages: + raise HTTPException( + status_code=409, + detail="Hãy thêm giai đoạn quy trình trong Cài đặt trước khi tạo công việc.", + ) + init = initial_transition(stages) + actor = await _actor_name(session, uid) + files = ( + await session.execute( + select(ImagehubDatasetFile.id, ImagehubDatasetFile.logical_path).where( + ImagehubDatasetFile.dataset_id == ds.id, + ImagehubDatasetFile.file_kind == "image", + ) + ) + ).all() + existing = set( + ( + await session.execute( + select(ImagehubTask.dataset_file_id).where(ImagehubTask.dataset_id == ds.id) + ) + ).scalars().all() + ) + start_stage = uuid.UUID(init.current_stage_id) if init.current_stage_id else None + created = 0 + for fid, lpath in files: + if fid in existing: + continue + session.add( + ImagehubTask( + dataset_id=ds.id, + dataset_file_id=fid, + name=lpath or "", + current_stage_id=start_stage, + pipeline_state=init.pipeline_state, + queue_status="assigned", + ) + ) + created += 1 + if created: + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), + "Tạo công việc", f"{created} tác vụ", + ) + await session.commit() + return GenerateTasksOut(created=created, total=len(files)) + + +@router.get("/{dataset_id}/tasks", response_model=list[TaskOut]) +async def list_tasks( + dataset_id: str, + stage: Optional[str] = Query(None), + status: Optional[str] = Query(None), + state: Optional[str] = Query(None), + reference: Optional[bool] = Query(None), + mine: Optional[bool] = Query(None), + authorization: Optional[str] = Header(None), +) -> list[TaskOut]: + """Owner, member or admin: the dataset's tasks, highest priority first. Optional filters + (mine=true → only tasks assigned to the caller).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset_read(session, dataset_id, uid, _is_admin(authorization)) + q = ( + select( + ImagehubTask, + ImagehubDatasetFile.logical_path, + ImagehubDatasetStage.name, + User.full_name, + User.email, + ) + .join(ImagehubDatasetFile, ImagehubTask.dataset_file_id == ImagehubDatasetFile.id) + .outerjoin(ImagehubDatasetStage, ImagehubTask.current_stage_id == ImagehubDatasetStage.id) + .outerjoin(User, ImagehubTask.assignee_user_id == User.id) + .where(ImagehubTask.dataset_id == ds.id) + ) + if stage: + try: + q = q.where(ImagehubTask.current_stage_id == uuid.UUID(stage)) + except (ValueError, TypeError): + raise HTTPException(status_code=422, detail="Giai đoạn không hợp lệ.") + if status: + q = q.where(ImagehubTask.queue_status == status) + if state: + q = q.where(ImagehubTask.pipeline_state == state) + if reference is not None: + q = q.where(ImagehubTask.is_reference_standard == reference) + if mine: + q = q.where(ImagehubTask.assignee_user_id == uid) + q = q.order_by(ImagehubTask.priority.desc(), ImagehubTask.created_at.asc()) + rows = (await session.execute(q)).all() + return [ + _build_task_out(t, lpath, sname, _assignee_label(fn, em)) + for (t, lpath, sname, fn, em) in rows + ] + + +class AssignIn(BaseModel): + userId: Optional[str] = None + + +@router.post("/{dataset_id}/tasks/{task_id}/assign", response_model=TaskOut) +async def assign_task( + dataset_id: str, + task_id: str, + payload: Optional[AssignIn] = Body(None), + authorization: Optional[str] = Header(None), +) -> TaskOut: + """Assign a task. No body (or your own id) = claim it yourself; a project admin / owner may pass + another member's userId to assign it to them (the target must be the owner or a member).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, role = await _load_dataset_any(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + target_uid = uid + if payload and payload.userId: + try: + target_uid = uuid.UUID(payload.userId) + except (ValueError, TypeError): + raise HTTPException(status_code=422, detail="Người dùng không hợp lệ.") + if target_uid != uid and role not in _PROJECT_ADMIN_ROLES: + raise HTTPException( + status_code=403, detail="Chỉ quản trị dự án mới gán việc cho người khác." + ) + actor = await _actor_name(session, uid) + # The assignee must be the dataset owner or an existing member. + if target_uid != ds.owner_user_id: + is_member = ( + await session.execute( + select(ImagehubDatasetMember.id).where( + ImagehubDatasetMember.dataset_id == ds.id, + ImagehubDatasetMember.user_id == target_uid, + ) + ) + ).scalar_one_or_none() + if is_member is None: + raise HTTPException( + status_code=422, detail="Người được gán không phải thành viên của bộ dữ liệu." + ) + task.assignee_user_id = target_uid + task.assignment_mode = "manual" + task.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), "Gán việc", task.name, + ) + await session.commit() + await session.refresh(task) + return await _task_out_after(session, task) + + +@router.post("/{dataset_id}/tasks/{task_id}/unassign", response_model=TaskOut) +async def unassign_task( + dataset_id: str, task_id: str, authorization: Optional[str] = Header(None) +) -> TaskOut: + """Owner/admin clear any task's assignee; a member may release their own task.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, role = await _load_dataset_any(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + if not _can_work_task(role, task.assignee_user_id, uid): + raise HTTPException(status_code=403, detail="Bạn không được phân công công việc này.") + actor = await _actor_name(session, uid) + task.assignee_user_id = None + task.assignment_mode = "auto" + task.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), "Bỏ gán việc", task.name, + ) + await session.commit() + await session.refresh(task) + return await _task_out_after(session, task) + + +@router.post("/{dataset_id}/tasks/{task_id}/finalize", response_model=TaskOut) +async def finalize_task( + dataset_id: str, task_id: str, authorization: Optional[str] = Header(None) +) -> TaskOut: + """TP1: finalize a Label task — advance it to the next stage (or Ground Truth).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, role = await _load_dataset_any(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + if not _can_work_task(role, task.assignee_user_id, uid): + raise HTTPException(status_code=403, detail="Bạn không được phân công công việc này.") + stages = await _stage_infos(session, ds.id) + actor = await _actor_name(session, uid) + try: + t = compute_finalize( + task.pipeline_state, + str(task.current_stage_id) if task.current_stage_id else None, + stages, + ) + except TaskPipelineError as exc: + raise HTTPException(status_code=exc.status, detail=str(exc)) + task.pipeline_state = t.pipeline_state + task.current_stage_id = uuid.UUID(t.current_stage_id) if t.current_stage_id else None + task.queue_status = t.queue_status + task.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), t.action, task.name, + ) + await session.commit() + await session.refresh(task) + return await _task_out_after(session, task) + + +@router.post("/{dataset_id}/tasks/{task_id}/review", response_model=TaskOut) +async def review_task( + dataset_id: str, + task_id: str, + payload: ReviewIn = Body(...), + authorization: Optional[str] = Header(None), +) -> TaskOut: + """TP2/TP3: accept (optionally with corrections) advances; reject returns to the first stage.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, role = await _load_dataset_any(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + if not _can_work_task(role, task.assignee_user_id, uid): + raise HTTPException(status_code=403, detail="Bạn không được phân công công việc này.") + stages = await _stage_infos(session, ds.id) + actor = await _actor_name(session, uid) + try: + t = compute_review( + task.pipeline_state, + str(task.current_stage_id) if task.current_stage_id else None, + stages, + payload.decision, + ) + except TaskPipelineError as exc: + raise HTTPException(status_code=exc.status, detail=str(exc)) + review_stage_id = task.current_stage_id # the Review stage where the decision is made + task.pipeline_state = t.pipeline_state + task.current_stage_id = uuid.UUID(t.current_stage_id) if t.current_stage_id else None + task.queue_status = t.queue_status + task.updated_at = datetime.now(tz=timezone.utc) + # Record the structured verdict (queryable history + per-reviewer counters + reject reason). + session.add( + ImagehubTaskReviewEvent( + dataset_id=ds.id, + task_id=task.id, + stage_id=review_stage_id, + reviewer_user_id=uid, + decision=payload.decision, + note=(payload.note or "").strip(), + ) + ) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), t.action, task.name, + ) + await session.commit() + await session.refresh(task) + return await _task_out_after(session, task) + + +@router.get("/{dataset_id}/review-stats", response_model=ReviewStatsOut) +async def review_stats( + dataset_id: str, + userId: Optional[str] = None, + days: int = 30, + authorization: Optional[str] = Header(None), +) -> ReviewStatsOut: + """Accept/reject/corrections tallies over the last ``days`` (the productivity panel). Optional + ``userId`` scopes to one reviewer; omitted = dataset-wide.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, _role = await _load_dataset_any(session, dataset_id, uid, _is_admin(authorization)) + cutoff = datetime.now(tz=timezone.utc) - timedelta(days=max(1, min(days, 365))) + conds = [ + ImagehubTaskReviewEvent.dataset_id == ds.id, + ImagehubTaskReviewEvent.created_at >= cutoff, + ] + if userId: + try: + conds.append(ImagehubTaskReviewEvent.reviewer_user_id == uuid.UUID(userId)) + except (ValueError, TypeError): + raise HTTPException(status_code=422, detail="Người dùng không hợp lệ.") + rows = ( + await session.execute( + select(ImagehubTaskReviewEvent.decision, func.count()) + .where(*conds) + .group_by(ImagehubTaskReviewEvent.decision) + ) + ).all() + counts = {str(d): int(c) for d, c in rows} + return ReviewStatsOut( + accepted=counts.get("accept", 0), + acceptWithCorrections=counts.get("acceptWithCorrections", 0), + rejected=counts.get("reject", 0), + ) + + +@router.post("/{dataset_id}/tasks/{task_id}/skip", response_model=TaskOut) +async def skip_task( + dataset_id: str, task_id: str, authorization: Optional[str] = Header(None) +) -> TaskOut: + """Q2: send a task to the end of the queue (still owned).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, role = await _load_dataset_any(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + if not _can_work_task(role, task.assignee_user_id, uid): + raise HTTPException(status_code=403, detail="Bạn không được phân công công việc này.") + # Read the current max skip-seq BEFORE mutating this row (autoflush guard). + max_seq = ( + await session.execute( + select(func.max(ImagehubTask.skipped_seq)).where(ImagehubTask.dataset_id == ds.id) + ) + ).scalar() + actor = await _actor_name(session, uid) + task.queue_status = "skipped" + task.skipped_seq = (int(max_seq) + 1) if max_seq is not None else 0 + task.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), "Bỏ qua công việc", task.name, + ) + await session.commit() + await session.refresh(task) + return await _task_out_after(session, task) + + +@router.post("/{dataset_id}/tasks/{task_id}/priority", response_model=TaskOut) +async def set_task_priority( + dataset_id: str, + task_id: str, + payload: PriorityIn = Body(...), + authorization: Optional[str] = Header(None), +) -> TaskOut: + """PR4: set a task's priority (float 0..1; higher floats to the top of queues).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, _role = await _load_dataset_admin(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + actor = await _actor_name(session, uid) + task.priority = float(payload.priority) + task.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), + "Đặt độ ưu tiên", f"{task.name} = {task.priority:.2f}", + ) + await session.commit() + await session.refresh(task) + return await _task_out_after(session, task) + + +@router.post("/{dataset_id}/tasks/{task_id}/reference", response_model=TaskOut) +async def set_task_reference( + dataset_id: str, + task_id: str, + payload: SetReferenceIn = Body(...), + authorization: Optional[str] = Header(None), +) -> TaskOut: + """RS1/RS2: set or unset a Ground Truth task as a project reference standard.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, _role = await _load_dataset_admin(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + actor = await _actor_name(session, uid) + try: + validate_set_reference(task.pipeline_state, payload.isReferenceStandard) + except TaskPipelineError as exc: + raise HTTPException(status_code=exc.status, detail=str(exc)) + task.is_reference_standard = bool(payload.isReferenceStandard) + task.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), + "Đặt chuẩn tham chiếu" if task.is_reference_standard else "Bỏ chuẩn tham chiếu", + task.name, + ) + await session.commit() + await session.refresh(task) + return await _task_out_after(session, task) + + +@router.get("/{dataset_id}/tasks/{task_id}", response_model=TaskDetailOut) +async def get_task( + dataset_id: str, task_id: str, authorization: Optional[str] = Header(None) +) -> TaskDetailOut: + """Owner, member or admin: one task with its saved annotations (for the AnnotationTool).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds = await _load_dataset_read(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + return await _task_detail_after(session, task) + + +@router.post("/{dataset_id}/tasks/{task_id}/save", response_model=TaskDetailOut) +async def save_task( + dataset_id: str, + task_id: str, + payload: SaveIn = Body(...), + authorization: Optional[str] = Header(None), +) -> TaskDetailOut: + """Persist the labeler's annotations. Q3 save -> 'saved'; Q1 submit-draft -> 'pendingFinalization' + (a draft stays in the queue until Finalize advances the stage).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, role = await _load_dataset_any(session, dataset_id, uid, _is_admin(authorization)) + task = await _load_task(session, ds.id, task_id) + if not _can_work_task(role, task.assignee_user_id, uid): + raise HTTPException(status_code=403, detail="Bạn không được phân công công việc này.") + if task.pipeline_state == "groundTruth": + raise HTTPException(status_code=409, detail="Tác vụ đã hoàn tất, không thể chỉnh sửa.") + actor = await _actor_name(session, uid) + task.annotations = payload.annotations + task.queue_status = "pendingFinalization" if payload.submit else "saved" + task.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), + "Nộp bản nháp" if payload.submit else "Lưu chú thích", task.name, + ) + await session.commit() + await session.refresh(task) + return await _task_detail_after(session, task) + + +# --------------------------------------------------------------------------- # +# Endpoints — dataset membership (multi-labeler). Owner + platform-admin manage; a member gets read +# access + may work tasks assigned to them (see _load_dataset_any / _can_work_task). +# --------------------------------------------------------------------------- # +_MEMBER_ROLES = ("member", "project_admin") + + +class MemberOut(BaseModel): + userId: str + email: str = "" + fullName: str = "" + role: str = "member" + createdAt: Optional[datetime] = None + + +class MemberAddIn(BaseModel): + email: Optional[str] = None + userId: Optional[str] = None + role: str = "member" + + +@router.get("/{dataset_id}/members", response_model=list[MemberOut]) +async def list_members( + dataset_id: str, authorization: Optional[str] = Header(None) +) -> list[MemberOut]: + """Owner, admin or project-admin: the dataset's members.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, _role = await _load_dataset_admin(session, dataset_id, uid, _is_admin(authorization)) + rows = ( + await session.execute( + select(ImagehubDatasetMember, User.email, User.full_name) + .join(User, User.id == ImagehubDatasetMember.user_id) + .where(ImagehubDatasetMember.dataset_id == ds.id) + .order_by(ImagehubDatasetMember.created_at) + ) + ).all() + return [ + MemberOut( + userId=str(m.user_id), + email=email or "", + fullName=full_name or "", + role=m.role, + createdAt=m.created_at, + ) + for m, email, full_name in rows + ] + + +@router.post("/{dataset_id}/members", response_model=MemberOut) +async def add_member( + dataset_id: str, payload: MemberAddIn = Body(...), authorization: Optional[str] = Header(None) +) -> MemberOut: + """Owner / admin / project-admin: add a member by email or userId so they can be assigned tasks. + Only the owner + platform-admin may grant the project_admin role (anti-escalation).""" + _require_db() + uid = _require_authed_uid(authorization) + new_role = payload.role if payload.role in _MEMBER_ROLES else "member" + async with get_session() as session: + ds, caller_role = await _load_dataset_admin(session, dataset_id, uid, _is_admin(authorization)) + if new_role == "project_admin" and caller_role not in ("owner", "admin"): + raise HTTPException( + status_code=403, detail="Chỉ chủ sở hữu mới cấp quyền quản trị dự án." + ) + # Resolve the target user + dup-check BEFORE add (autoflush-before-commit guard). + target: Optional[User] = None + if payload.userId: + try: + target = await session.get(User, uuid.UUID(payload.userId)) + except (ValueError, TypeError): + target = None + elif payload.email: + target = ( + await session.execute( + select(User).where(func.lower(User.email) == payload.email.strip().lower()) + ) + ).scalar_one_or_none() + if target is None: + raise HTTPException(status_code=404, detail="Không tìm thấy người dùng.") + if target.id == ds.owner_user_id: + raise HTTPException(status_code=409, detail="Người dùng đã là chủ sở hữu bộ dữ liệu.") + actor = await _actor_name(session, uid) + existing = ( + await session.execute( + select(ImagehubDatasetMember).where( + ImagehubDatasetMember.dataset_id == ds.id, + ImagehubDatasetMember.user_id == target.id, + ) + ) + ).scalar_one_or_none() + if existing is not None: + raise HTTPException(status_code=409, detail="Người dùng đã là thành viên.") + member = ImagehubDatasetMember(dataset_id=ds.id, user_id=target.id, role=new_role, added_by=uid) + session.add(member) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), + "Thêm thành viên", target.email or str(target.id), + ) + try: + await session.commit() + except IntegrityError: + await session.rollback() + raise HTTPException(status_code=409, detail="Người dùng đã là thành viên.") + await session.refresh(member) + return MemberOut( + userId=str(target.id), + email=target.email or "", + fullName=target.full_name or "", + role=new_role, + createdAt=member.created_at, + ) + + +@router.delete("/{dataset_id}/members/{user_id}") +async def remove_member( + dataset_id: str, user_id: str, authorization: Optional[str] = Header(None) +) -> dict[str, Any]: + """Owner / admin / project-admin: remove a member. A project-admin may not remove another + project-admin (only the owner + platform-admin can).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + ds, caller_role = await _load_dataset_admin(session, dataset_id, uid, _is_admin(authorization)) + try: + tid = uuid.UUID(user_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy thành viên.") + member = ( + await session.execute( + select(ImagehubDatasetMember).where( + ImagehubDatasetMember.dataset_id == ds.id, + ImagehubDatasetMember.user_id == tid, + ) + ) + ).scalar_one_or_none() + if member is None: + raise HTTPException(status_code=404, detail="Không tìm thấy thành viên.") + if member.role == "project_admin" and caller_role not in ("owner", "admin"): + raise HTTPException( + status_code=403, detail="Chỉ chủ sở hữu mới gỡ quyền quản trị dự án." + ) + actor = await _actor_name(session, uid) + await session.delete(member) + await _write_audit( + session, ds.id, uid, actor, _role_label(authorization), "Xóa thành viên", user_id, + ) + await session.commit() + return {"ok": True} diff --git a/be0/src/imagehub_segmentation.py b/be0/src/imagehub_segmentation.py new file mode 100644 index 0000000..7de8576 --- /dev/null +++ b/be0/src/imagehub_segmentation.py @@ -0,0 +1,166 @@ +"""ImageHub — organ-segmentation linking (Phase D). + +A cohesive domain/service module for linking organ-mask files to the image they +segment. The HTTP route (``imagehub_routes.upload_segmentations``) stays a thin +transport layer; the domain rules live here, in one unit-testable place: + + * a mask may only attach to an existing *image* file in the *same* dataset; + * masks are namespaced under their parent (``.seg/``) so they + never collide with — or replace — a sibling image row under the dataset's + UNIQUE (dataset_id, logical_path); + * the link is recorded explicitly (``file_kind='segmentation'`` + ``parent_file_id`` + + ``organ_label``), not inferred. + +Infrastructure (blob storage, imaging-metadata sniffing, filename safety) is +*injected*, not imported — so this module has no FastAPI/HTTP coupling, no circular +dependency on the router, and can be unit-tested with fakes. This is the project's +pragmatic take on Clean Architecture: a cohesive service over the flat routers, not +the (unwired) domain/application/infrastructure tree. +""" +from __future__ import annotations + +import uuid +from dataclasses import dataclass +from datetime import datetime, timezone +from typing import Any, Awaitable, Callable, Optional + +from sqlalchemy import select + +from src.initiative_db.models import ImagehubBlob, ImagehubDataset, ImagehubDatasetFile + +# Injected infrastructure contracts (kept HTTP-free + testable). +PutBlob = Callable[[bytes, Optional[str]], Awaitable[dict[str, Any]]] +SniffMeta = Callable[[Optional[str], bytes, str], dict[str, Any]] +SafeName = Callable[[Optional[str]], str] + +_IMAGING_EXTS = (".nii.gz", ".nii", ".dcm", ".dicom") + + +class SegmentationError(Exception): + """A domain rule was violated; ``status`` is the HTTP code the caller should map to.""" + + def __init__(self, message: str, status: int = 422) -> None: + super().__init__(message) + self.status = status + + +@dataclass(frozen=True) +class MaskUpload: + """One organ-mask file to be linked to a parent image.""" + + filename: str + data: bytes + media_type: str + organ_label: str + + +class SegmentationService: + """Links organ-mask files to a parent image within a single dataset.""" + + def __init__( + self, session, *, put_blob: PutBlob, sniff_meta: SniffMeta, safe_name: SafeName + ) -> None: + self._session = session + self._put_blob = put_blob + self._sniff_meta = sniff_meta + self._safe_name = safe_name + + def _mask_logical_path(self, parent_logical_path: str, mask_filename: str) -> str: + """``.seg/`` — groups masks under their parent + and keeps them out of the parent's logical-path namespace.""" + stem = parent_logical_path + low = stem.lower() + for ext in _IMAGING_EXTS: + if low.endswith(ext): + stem = stem[: -len(ext)] + break + return f"{stem}.seg/{self._safe_name(mask_filename)}" + + async def _resolve_parent( + self, dataset: ImagehubDataset, parent_file_id: str + ) -> ImagehubDatasetFile: + try: + pid = uuid.UUID(parent_file_id) + except (ValueError, TypeError): + raise SegmentationError("Không tìm thấy tệp ảnh gốc.", 404) + parent = ( + await self._session.execute( + select(ImagehubDatasetFile).where( + ImagehubDatasetFile.id == pid, + ImagehubDatasetFile.dataset_id == dataset.id, + ) + ) + ).scalar_one_or_none() + if parent is None: + raise SegmentationError("Không tìm thấy tệp ảnh gốc.", 404) + if parent.file_kind != "image": + raise SegmentationError("Chỉ có thể gắn mặt nạ vào tệp ảnh gốc.", 422) + return parent + + async def _ensure_blob(self, data: bytes, media_type: str) -> dict[str, Any]: + blob = await self._put_blob(data, media_type) + if await self._session.get(ImagehubBlob, blob["sha256"]) is None: + self._session.add( + ImagehubBlob( + sha256=blob["sha256"], + size_bytes=blob["size"], + media_type=media_type, + storage_bucket=blob["bucket"], + storage_key=blob["key"], + ) + ) + await self._session.flush() + return blob + + async def link_masks( + self, + dataset: ImagehubDataset, + parent_file_id: str, + masks: list[MaskUpload], + actor_uid: uuid.UUID, + ) -> list[ImagehubDatasetFile]: + """Store + link each mask to the parent image. Returns the created/updated rows. + + Re-linking a mask whose namespaced path already exists replaces it in place + (a corrected mask for the same organ). Raises ``SegmentationError`` on a bad + parent or an empty payload; propagates the injected ``put_blob`` errors. + """ + parent = await self._resolve_parent(dataset, parent_file_id) + if not masks: + raise SegmentationError("Chưa chọn tệp mặt nạ nào.", 422) + + linked: list[ImagehubDatasetFile] = [] + for m in masks: + if not m.data: + continue + blob = await self._ensure_blob(m.data, m.media_type) + logical_path = self._mask_logical_path(parent.logical_path, m.filename) + meta = self._sniff_meta(m.filename, m.data, m.media_type) + organ = (m.organ_label or "").strip() or self._safe_name(m.filename) + row = ( + await self._session.execute( + select(ImagehubDatasetFile).where( + ImagehubDatasetFile.dataset_id == dataset.id, + ImagehubDatasetFile.logical_path == logical_path, + ) + ) + ).scalar_one_or_none() + if row is None: + row = ImagehubDatasetFile( + id=uuid.uuid4(), dataset_id=dataset.id, logical_path=logical_path + ) + self._session.add(row) + row.blob_sha256 = blob["sha256"] + row.size_bytes = blob["size"] + row.media_type = m.media_type + row.imaging_meta = meta + row.file_kind = "segmentation" + row.parent_file_id = parent.id + row.organ_label = organ + row.uploaded_by = actor_uid + row.updated_at = datetime.now(tz=timezone.utc) + linked.append(row) + + if not linked: + raise SegmentationError("Tệp mặt nạ rỗng.", 422) + return linked diff --git a/be0/src/imagehub_task_pipeline.py b/be0/src/imagehub_task_pipeline.py new file mode 100644 index 0000000..9858188 --- /dev/null +++ b/be0/src/imagehub_task_pipeline.py @@ -0,0 +1,136 @@ +"""ImageHub — task pipeline state machine (project-workflow §3/§4, single-user MVP). + +A cohesive, HTTP-free domain module: the per-task transitions that move a task through a +dataset's ordered pipeline stages (Label -> Review_1 -> Review_n -> Ground Truth). The HTTP +routes in ``imagehub_routes`` stay a thin transport layer; the rules live here so they are +unit-testable with plain data (no DB, no FastAPI), mirroring ``imagehub_segmentation``. + +Mapped from ``docs/workflows/project-workflow-spec.md``: + * TP1 finalize advances a Label task to the next stage (or Ground Truth if none follow); + * TP2 review accept (with/without corrections) advances; TP3 reject returns to the first stage; + * §9 RS1 only a Ground Truth task may be a reference standard. + +Multi-labeler assignment, issues, comments, time and evaluation are later phases; this module is +deliberately limited to the pipeline + the bits the Data Page MVP needs. +""" +from __future__ import annotations + +from dataclasses import dataclass +from typing import Optional + +# State vocabularies — must match the CHECK constraints in migration 021. +PIPELINE_STATES = ("inLabel", "inReview", "groundTruth", "issue") +QUEUE_STATUSES = ("assigned", "saved", "pendingFinalization", "skipped") +REVIEW_DECISIONS = ("accept", "acceptWithCorrections", "reject") + + +class TaskPipelineError(Exception): + """A pipeline rule was violated; ``status`` is the HTTP code the caller should map to.""" + + def __init__(self, message: str, status: int = 422) -> None: + super().__init__(message) + self.status = status + + +@dataclass(frozen=True) +class StageInfo: + """The minimum a transition needs to know about a pipeline stage.""" + + id: str + kind: str # 'label' | 'review' + seq: int + + +@dataclass(frozen=True) +class Transition: + """The computed result of an event: the task's new pipeline position + an audit label.""" + + pipeline_state: str + current_stage_id: Optional[str] + queue_status: str + action: str + + +def order_stages(stages: list[StageInfo]) -> list[StageInfo]: + """Pipeline order: by ``seq`` then ``id`` (stable, deterministic).""" + return sorted(stages, key=lambda s: (s.seq, s.id)) + + +def first_stage(stages: list[StageInfo]) -> StageInfo: + ordered = order_stages(stages) + if not ordered: + raise TaskPipelineError( + "Bộ dữ liệu chưa có giai đoạn nào trong quy trình.", status=409 + ) + return ordered[0] + + +def stage_after(stages: list[StageInfo], stage_id: Optional[str]) -> Optional[StageInfo]: + """The stage following ``stage_id`` in pipeline order, or None if it is the last / unknown.""" + if stage_id is None: + return None + ordered = order_stages(stages) + for i, s in enumerate(ordered): + if s.id == stage_id: + return ordered[i + 1] if i + 1 < len(ordered) else None + return None + + +def state_for_stage(stage: StageInfo) -> str: + """A review stage holds tasks ``inReview``; any other (label/pre-label) holds them ``inLabel``.""" + return "inReview" if stage.kind == "review" else "inLabel" + + +def initial_transition(stages: list[StageInfo]) -> Transition: + """Where a brand-new task starts: the first stage, ``assigned`` and unworked.""" + s0 = first_stage(stages) + return Transition(state_for_stage(s0), s0.id, "assigned", "Tạo công việc") + + +def _advance(current_stage_id: Optional[str], stages: list[StageInfo]) -> Transition: + """Move to the next stage, or to terminal Ground Truth when none follows.""" + nxt = stage_after(stages, current_stage_id) + if nxt is None: + return Transition("groundTruth", None, "assigned", "") + return Transition(state_for_stage(nxt), nxt.id, "assigned", "") + + +def compute_finalize( + pipeline_state: str, current_stage_id: Optional[str], stages: list[StageInfo] +) -> Transition: + """TP1: finalizing a Label task advances it to the next stage (or Ground Truth).""" + if pipeline_state != "inLabel": + raise TaskPipelineError( + "Chỉ có thể hoàn tất công việc đang ở giai đoạn gán nhãn.", status=409 + ) + t = _advance(current_stage_id, stages) + action = "Hoàn tất → Ground Truth" if t.pipeline_state == "groundTruth" else "Hoàn tất gán nhãn" + return Transition(t.pipeline_state, t.current_stage_id, t.queue_status, action) + + +def compute_review( + pipeline_state: str, current_stage_id: Optional[str], stages: list[StageInfo], decision: str +) -> Transition: + """TP2 accept / TP3 reject from a Review stage.""" + if pipeline_state != "inReview": + raise TaskPipelineError( + "Chỉ có thể duyệt công việc đang ở giai đoạn rà soát.", status=409 + ) + if decision not in REVIEW_DECISIONS: + raise TaskPipelineError("Quyết định duyệt không hợp lệ.", status=422) + if decision == "reject": + s0 = first_stage(stages) + return Transition("inLabel", s0.id, "assigned", "Từ chối — trả lại để chỉnh sửa") + t = _advance(current_stage_id, stages) + label = "Chấp nhận (có chỉnh sửa)" if decision == "acceptWithCorrections" else "Chấp nhận" + if t.pipeline_state == "groundTruth": + label += " → Ground Truth" + return Transition(t.pipeline_state, t.current_stage_id, t.queue_status, label) + + +def validate_set_reference(pipeline_state: str, value: bool) -> None: + """RS1: only a Ground Truth task may be flagged as a reference standard.""" + if value and pipeline_state != "groundTruth": + raise TaskPipelineError( + "Chỉ tác vụ Ground Truth mới được đặt làm chuẩn tham chiếu.", status=409 + ) diff --git a/be0/src/infrastructure/config/__pycache__/settings.cpython-313.pyc b/be0/src/infrastructure/config/__pycache__/settings.cpython-313.pyc new file mode 100644 index 0000000..60b8155 Binary files /dev/null and b/be0/src/infrastructure/config/__pycache__/settings.cpython-313.pyc differ diff --git a/be0/src/infrastructure/config/settings.py b/be0/src/infrastructure/config/settings.py new file mode 100644 index 0000000..3e20df6 --- /dev/null +++ b/be0/src/infrastructure/config/settings.py @@ -0,0 +1,74 @@ +""" +Application settings and configuration management. +Uses Pydantic Settings for type-safe configuration with environment variable support. +""" +from pydantic_settings import BaseSettings +from typing import List, Optional +from functools import lru_cache + + +class Settings(BaseSettings): + """Application configuration settings.""" + + # Application + app_name: str = "ProfytAI Compliance Platform" + app_version: str = "1.0.0" + debug: bool = False + environment: str = "development" # development, staging, production + + # Server + host: str = "0.0.0.0" + port: int = 4402 + + # Database - Neo4j + neo4j_uri: str = "bolt://localhost:7687" + neo4j_user: str = "neo4j" + neo4j_password: str = "password" + neo4j_database: str = "neo4j" + + # Security + secret_key: str = "your-secret-key-change-in-production" # Should be in .env + algorithm: str = "HS256" + access_token_expire_minutes: int = 30 + refresh_token_expire_days: int = 7 + cors_origins: List[str] = ["http://localhost:8080", "http://localhost:3000"] + + # AI/ML - Ollama + ollama_base_url: str = "http://localhost:11434" + ollama_model: str = "qwen2.5:3b" + embedding_model: str = "embeddinggemma:300m" + ollama_timeout: int = 300 # 5 minutes + + # Storage + upload_dir: str = "./assets/data/uploads" + max_upload_size: int = 10 * 1024 * 1024 # 10MB + allowed_file_extensions: List[str] = [".pdf", ".docx", ".txt"] + + # Rate Limiting + rate_limit_per_minute: int = 60 + rate_limit_per_hour: int = 1000 + + # Logging + log_level: str = "INFO" + log_file: str = "./logs/app.log" + log_format: str = "%(asctime)s - %(name)s - %(levelname)s - %(message)s" + + # Workflow + max_workflow_items: int = 100 + workflow_cleanup_days: int = 90 # Auto-cleanup old workflows + + class Config: + env_file = ".env" + env_file_encoding = "utf-8" + case_sensitive = False + env_nested_delimiter = "__" + + +@lru_cache() +def get_settings() -> Settings: + """Get cached settings instance.""" + return Settings() + + +# Global settings instance +settings = get_settings() diff --git a/be0/src/infrastructure/vector_db/__pycache__/qdrant_service.cpython-311.pyc b/be0/src/infrastructure/vector_db/__pycache__/qdrant_service.cpython-311.pyc new file mode 100644 index 0000000..4c2db6b Binary files /dev/null and b/be0/src/infrastructure/vector_db/__pycache__/qdrant_service.cpython-311.pyc differ diff --git a/be0/src/infrastructure/vector_db/__pycache__/qdrant_service.cpython-313.pyc b/be0/src/infrastructure/vector_db/__pycache__/qdrant_service.cpython-313.pyc new file mode 100644 index 0000000..9a62700 Binary files /dev/null and b/be0/src/infrastructure/vector_db/__pycache__/qdrant_service.cpython-313.pyc differ diff --git a/be0/src/infrastructure/vector_db/qdrant_service.py b/be0/src/infrastructure/vector_db/qdrant_service.py new file mode 100644 index 0000000..665faca --- /dev/null +++ b/be0/src/infrastructure/vector_db/qdrant_service.py @@ -0,0 +1,246 @@ +""" +Qdrant Vector Database Service +Manages idea storage and similarity search using Qdrant +""" +import httpx +from typing import List, Dict, Optional, Any +from pydantic import BaseModel +from src.utils import initialize_a_logger +import ollama +import uuid +from datetime import datetime + +class Idea(BaseModel): + """Idea model for storage""" + id: str + title: str + description: str + category: Optional[str] = None + created_at: str + embedding: Optional[List[float]] = None + +class QdrantService: + """Service for managing ideas in Qdrant vector database""" + + def __init__(self, qdrant_url: Optional[str] = None, collection_name: str = "ump_ideas"): + # Try Docker service name first, fallback to localhost + import os + self.qdrant_url = qdrant_url or os.getenv("QDRANT_URL", "http://localhost:6333") + self.collection_name = collection_name + self.logger = initialize_a_logger('./logs/QdrantService.log') + self.embedding_model = "embeddinggemma:300m" + + async def initialize_collection(self): + """Initialize Qdrant collection if it doesn't exist""" + try: + # Check if collection exists + async with httpx.AsyncClient() as client: + response = await client.get( + f"{self.qdrant_url}/collections/{self.collection_name}" + ) + if response.status_code == 200: + self.logger.info(f"Collection {self.collection_name} already exists") + return + + # Create collection if it doesn't exist + if response.status_code == 404: + self.logger.info(f"Creating collection {self.collection_name}") + # First, get embedding dimension by generating a test embedding + try: + test_embedding = await self.generate_embedding("test") + vector_size = len(test_embedding) + self.logger.info(f"Detected embedding dimension: {vector_size}") + except Exception as e: + self.logger.warning(f"Could not detect embedding size, using default: {e}") + vector_size = 768 # Default fallback + + create_response = await client.put( + f"{self.qdrant_url}/collections/{self.collection_name}", + json={ + "vectors": { + "size": vector_size, + "distance": "Cosine" + } + } + ) + if create_response.status_code in [200, 201]: + self.logger.info(f"Collection {self.collection_name} created successfully") + else: + self.logger.error(f"Failed to create collection: {create_response.text}") + except Exception as e: + self.logger.error(f"Error initializing collection: {e}", exc_info=True) + + async def generate_embedding(self, text: str) -> List[float]: + """Generate vector embedding for text using Ollama""" + try: + # Generate embedding directly with text string + response = ollama.embeddings( + model=self.embedding_model, + prompt=text + ) + embedding = response.get("embedding", []) + if not embedding: + raise ValueError("Empty embedding returned") + return embedding + except Exception as e: + self.logger.error(f"Error generating embedding: {e}", exc_info=True) + raise + + async def add_idea(self, title: str, description: str, category: Optional[str] = None) -> Dict[str, Any]: + """Add a new idea to the vector database""" + try: + # Generate embedding for the idea (combine title and description) + idea_text = f"{title}\n{description}" + embedding = await self.generate_embedding(idea_text) + + # Create idea object + idea_id = str(uuid.uuid4()) + idea = Idea( + id=idea_id, + title=title, + description=description, + category=category, + created_at=datetime.now().isoformat(), + embedding=embedding + ) + + # Store in Qdrant + async with httpx.AsyncClient() as client: + response = await client.put( + f"{self.qdrant_url}/collections/{self.collection_name}/points", + json={ + "points": [{ + "id": idea_id, + "vector": embedding, + "payload": { + "title": title, + "description": description, + "category": category, + "created_at": idea.created_at + } + }] + } + ) + + if response.status_code in [200, 201]: + self.logger.info(f"Idea {idea_id} added successfully") + return { + "id": idea_id, + "title": title, + "description": description, + "category": category, + "created_at": idea.created_at, + "status": "success" + } + else: + error_msg = f"Failed to add idea: {response.text}" + self.logger.error(error_msg) + raise Exception(error_msg) + + except Exception as e: + self.logger.error(f"Error adding idea: {e}", exc_info=True) + raise + + async def search_similar_ideas(self, query_text: str, limit: int = 5, score_threshold: float = 0.5) -> List[Dict[str, Any]]: + """Search for similar ideas using vector similarity""" + try: + # Generate embedding for query + query_embedding = await self.generate_embedding(query_text) + + # Search in Qdrant + async with httpx.AsyncClient() as client: + response = await client.post( + f"{self.qdrant_url}/collections/{self.collection_name}/points/search", + json={ + "vector": query_embedding, + "limit": limit, + "score_threshold": score_threshold, + "with_payload": True + } + ) + + if response.status_code == 200: + results = response.json() + similar_ideas = [] + for result in results.get("result", []): + payload = result.get("payload", {}) + similar_ideas.append({ + "id": result.get("id"), + "title": payload.get("title", ""), + "description": payload.get("description", ""), + "category": payload.get("category"), + "created_at": payload.get("created_at"), + "similarity_score": result.get("score", 0.0) + }) + self.logger.info(f"Found {len(similar_ideas)} similar ideas") + return similar_ideas + else: + error_msg = f"Failed to search ideas: {response.text}" + self.logger.error(error_msg) + return [] + + except Exception as e: + self.logger.error(f"Error searching ideas: {e}", exc_info=True) + return [] + + async def get_all_ideas(self, limit: int = 100) -> List[Dict[str, Any]]: + """Get all ideas from the database""" + try: + async with httpx.AsyncClient() as client: + response = await client.post( + f"{self.qdrant_url}/collections/{self.collection_name}/points/scroll", + json={ + "limit": limit, + "with_payload": True + } + ) + + if response.status_code == 200: + data = response.json() + ideas = [] + for point in data.get("result", {}).get("points", []): + payload = point.get("payload", {}) + ideas.append({ + "id": point.get("id"), + "title": payload.get("title", ""), + "description": payload.get("description", ""), + "category": payload.get("category"), + "created_at": payload.get("created_at") + }) + return ideas + else: + return [] + except Exception as e: + self.logger.error(f"Error getting all ideas: {e}", exc_info=True) + return [] + + async def delete_idea(self, idea_id: str) -> bool: + """Delete an idea from the database""" + try: + async with httpx.AsyncClient() as client: + response = await client.post( + f"{self.qdrant_url}/collections/{self.collection_name}/points/delete", + json={ + "points": [idea_id] + } + ) + + if response.status_code in [200, 201]: + self.logger.info(f"Idea {idea_id} deleted successfully") + return True + else: + self.logger.error(f"Failed to delete idea: {response.text}") + return False + except Exception as e: + self.logger.error(f"Error deleting idea: {e}", exc_info=True) + return False + +# Global instance +_qdrant_service: Optional[QdrantService] = None + +def get_qdrant_service() -> QdrantService: + """Get or create Qdrant service instance""" + global _qdrant_service + if _qdrant_service is None: + _qdrant_service = QdrantService() + return _qdrant_service diff --git a/be0/src/initiative_db/__init__.py b/be0/src/initiative_db/__init__.py new file mode 100644 index 0000000..e58f433 --- /dev/null +++ b/be0/src/initiative_db/__init__.py @@ -0,0 +1,17 @@ +"""PostgreSQL persistence for initiative cases, drafts, and related documents.""" + +from src.initiative_db.engine import ( + dispose_engine, + get_database_url, + get_session, + init_engine, + is_postgres_enabled, +) + +__all__ = [ + "get_database_url", + "is_postgres_enabled", + "init_engine", + "dispose_engine", + "get_session", +] diff --git a/be0/src/initiative_db/__pycache__/__init__.cpython-311.pyc b/be0/src/initiative_db/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000..4678d70 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/__init__.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/__init__.cpython-313.pyc b/be0/src/initiative_db/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..30934cc Binary files /dev/null and b/be0/src/initiative_db/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/application_admin_results.cpython-311.pyc b/be0/src/initiative_db/__pycache__/application_admin_results.cpython-311.pyc new file mode 100644 index 0000000..7689fdf Binary files /dev/null and b/be0/src/initiative_db/__pycache__/application_admin_results.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/application_admin_results.cpython-313.pyc b/be0/src/initiative_db/__pycache__/application_admin_results.cpython-313.pyc new file mode 100644 index 0000000..b235196 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/application_admin_results.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/application_backup.cpython-311.pyc b/be0/src/initiative_db/__pycache__/application_backup.cpython-311.pyc new file mode 100644 index 0000000..98fd131 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/application_backup.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/application_backup.cpython-313.pyc b/be0/src/initiative_db/__pycache__/application_backup.cpython-313.pyc new file mode 100644 index 0000000..6ee86ea Binary files /dev/null and b/be0/src/initiative_db/__pycache__/application_backup.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/application_storage.cpython-311.pyc b/be0/src/initiative_db/__pycache__/application_storage.cpython-311.pyc new file mode 100644 index 0000000..3183de9 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/application_storage.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/application_storage.cpython-313.pyc b/be0/src/initiative_db/__pycache__/application_storage.cpython-313.pyc new file mode 100644 index 0000000..86f6bfb Binary files /dev/null and b/be0/src/initiative_db/__pycache__/application_storage.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/backup_naming.cpython-311.pyc b/be0/src/initiative_db/__pycache__/backup_naming.cpython-311.pyc new file mode 100644 index 0000000..a494c65 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/backup_naming.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/backup_naming.cpython-313.pyc b/be0/src/initiative_db/__pycache__/backup_naming.cpython-313.pyc new file mode 100644 index 0000000..5a3784f Binary files /dev/null and b/be0/src/initiative_db/__pycache__/backup_naming.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/drafts.cpython-311.pyc b/be0/src/initiative_db/__pycache__/drafts.cpython-311.pyc new file mode 100644 index 0000000..789e38b Binary files /dev/null and b/be0/src/initiative_db/__pycache__/drafts.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/drafts.cpython-313.pyc b/be0/src/initiative_db/__pycache__/drafts.cpython-313.pyc new file mode 100644 index 0000000..54a4396 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/drafts.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/engine.cpython-311.pyc b/be0/src/initiative_db/__pycache__/engine.cpython-311.pyc new file mode 100644 index 0000000..cf05dc3 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/engine.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/engine.cpython-313.pyc b/be0/src/initiative_db/__pycache__/engine.cpython-313.pyc new file mode 100644 index 0000000..1822029 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/engine.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/models.cpython-311.pyc b/be0/src/initiative_db/__pycache__/models.cpython-311.pyc new file mode 100644 index 0000000..1d472ea Binary files /dev/null and b/be0/src/initiative_db/__pycache__/models.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/models.cpython-313.pyc b/be0/src/initiative_db/__pycache__/models.cpython-313.pyc new file mode 100644 index 0000000..0b1b5cf Binary files /dev/null and b/be0/src/initiative_db/__pycache__/models.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/repair_split_submission.cpython-313.pyc b/be0/src/initiative_db/__pycache__/repair_split_submission.cpython-313.pyc new file mode 100644 index 0000000..d0a9136 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/repair_split_submission.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/submission_readiness.cpython-311.pyc b/be0/src/initiative_db/__pycache__/submission_readiness.cpython-311.pyc new file mode 100644 index 0000000..3120ba2 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/submission_readiness.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/submission_readiness.cpython-313.pyc b/be0/src/initiative_db/__pycache__/submission_readiness.cpython-313.pyc new file mode 100644 index 0000000..f8f6c6c Binary files /dev/null and b/be0/src/initiative_db/__pycache__/submission_readiness.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/submissions.cpython-311.pyc b/be0/src/initiative_db/__pycache__/submissions.cpython-311.pyc new file mode 100644 index 0000000..eb0390c Binary files /dev/null and b/be0/src/initiative_db/__pycache__/submissions.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/submissions.cpython-313.pyc b/be0/src/initiative_db/__pycache__/submissions.cpython-313.pyc new file mode 100644 index 0000000..4824489 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/submissions.cpython-313.pyc differ diff --git a/be0/src/initiative_db/__pycache__/user_notifications.cpython-311.pyc b/be0/src/initiative_db/__pycache__/user_notifications.cpython-311.pyc new file mode 100644 index 0000000..79574cf Binary files /dev/null and b/be0/src/initiative_db/__pycache__/user_notifications.cpython-311.pyc differ diff --git a/be0/src/initiative_db/__pycache__/user_notifications.cpython-313.pyc b/be0/src/initiative_db/__pycache__/user_notifications.cpython-313.pyc new file mode 100644 index 0000000..d3ad018 Binary files /dev/null and b/be0/src/initiative_db/__pycache__/user_notifications.cpython-313.pyc differ diff --git a/be0/src/initiative_db/application_admin_results.py b/be0/src/initiative_db/application_admin_results.py new file mode 100644 index 0000000..ab7c4e1 --- /dev/null +++ b/be0/src/initiative_db/application_admin_results.py @@ -0,0 +1,317 @@ +"""Admin CRUD for per-initiative adjudication results (Duyệt / Từ chối).""" + +from __future__ import annotations + +import uuid +from datetime import datetime, timezone +from typing import Any, Dict, Optional + +from sqlalchemy import select +from sqlalchemy.ext.asyncio import AsyncSession + +from src.audit import AuditAction, record_audit, resolve_actor_fields +from src.initiative_db.models import ApplicationAdminResult, Initiative, User +from src.initiative_db.submissions import ( + _as_submission_item, + _resolve_initiative_and_latest_draft_for_application_id, +) + +__all__ = [ + "get_admin_result_for_application", + "create_admin_result", + "update_admin_result", + "upsert_admin_result", + "delete_admin_result", +] + + +def _audit_admin_result_snapshot(rec: ApplicationAdminResult) -> Dict[str, Any]: + return { + "id": str(rec.id), + "decision": rec.decision, + "feedback": rec.feedback, + "rationale": rec.rationale, + "initiativeId": str(rec.initiative_id), + } + + +def _iso_utc(dt: Optional[datetime]) -> str: + if dt is None: + return "" + v = dt.astimezone(timezone.utc).replace(microsecond=0).isoformat() + return v.replace("+00:00", "Z") + + +def _row_to_api( + initiative: Initiative, + payload: dict[str, Any], + rec: ApplicationAdminResult, + application_id: str, +) -> Dict[str, Any]: + return { + "id": str(rec.id), + "applicationId": application_id, + "initiativeId": str(initiative.id), + "decision": rec.decision, + "feedback": rec.feedback or "", + "rationale": rec.rationale, + "createdAt": _iso_utc(rec.created_at), + "updatedAt": _iso_utc(rec.updated_at), + "createdBy": str(rec.created_by) if rec.created_by else None, + "updatedBy": str(rec.updated_by) if rec.updated_by else None, + "createdByFullName": None, + "updatedByFullName": None, + "applicationName": str(_as_submission_item(initiative, payload).get("name") or ""), + } + + +async def _attach_admin_user_full_names(session: AsyncSession, row: Dict[str, Any], rec: ApplicationAdminResult) -> None: + """Mutates ``row`` in place: resolves ``users.full_name`` for created/updated audit ids.""" + ids = [uid for uid in (rec.created_by, rec.updated_by) if uid is not None] + if not ids: + return + stmt = select(User.id, User.full_name).where(User.id.in_(ids)) + fetched = (await session.execute(stmt)).all() + by_id = {uid: (fn or "").strip() or None for uid, fn in fetched} + if rec.created_by is not None: + row["createdByFullName"] = by_id.get(rec.created_by) + if rec.updated_by is not None: + row["updatedByFullName"] = by_id.get(rec.updated_by) + + +async def get_admin_result_for_application( + session: AsyncSession, + application_id: str, +) -> Optional[Dict[str, Any]]: + try: + initiative, draft = await _resolve_initiative_and_latest_draft_for_application_id(session, application_id) + except LookupError: + return None + + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + stmt = select(ApplicationAdminResult).where(ApplicationAdminResult.initiative_id == initiative.id) + rec = (await session.execute(stmt)).scalar_one_or_none() + if rec is None: + return None + + aid = application_id.strip() + row = _row_to_api(initiative, payload, rec, aid) + await _attach_admin_user_full_names(session, row, rec) + return row + + +async def create_admin_result( + session: AsyncSession, + application_id: str, + admin_user_id: uuid.UUID, + *, + decision: str, + feedback: str, + rationale: Optional[str], +) -> Dict[str, Any]: + initiative, draft = await _resolve_initiative_and_latest_draft_for_application_id(session, application_id) + + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + existing = ( + await session.execute( + select(ApplicationAdminResult).where(ApplicationAdminResult.initiative_id == initiative.id) + ) + ).scalar_one_or_none() + if existing is not None: + raise ValueError("result_already_exists") + + d = (decision or "").strip().lower() + if d not in ("approved", "rejected"): + raise ValueError("invalid_decision") + + rec = ApplicationAdminResult( + initiative_id=initiative.id, + decision=d, + feedback=(feedback or "").strip(), + rationale=(rationale or "").strip() or None, + created_by=admin_user_id, + updated_by=admin_user_id, + ) + session.add(rec) + + initiative.status = d + initiative.updated_at = datetime.now(timezone.utc) + + await session.flush() + await session.refresh(rec) + + row = _row_to_api(initiative, payload, rec, application_id.strip()) + await _attach_admin_user_full_names(session, row, rec) + ae, ar = await resolve_actor_fields(session, admin_user_id) + await record_audit( + session, + actor_user_id=admin_user_id, + actor_email=ae, + actor_role=ar, + action=AuditAction.create, + entity_type="application_admin_result", + entity_id=application_id.strip(), + after=_audit_admin_result_snapshot(rec), + metadata={"initiativeId": str(initiative.id)}, + ) + return row + + +async def update_admin_result( + session: AsyncSession, + application_id: str, + admin_user_id: uuid.UUID, + *, + decision: str, + feedback: str, + rationale: Optional[str], +) -> Dict[str, Any]: + initiative, draft = await _resolve_initiative_and_latest_draft_for_application_id(session, application_id) + + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + stmt = select(ApplicationAdminResult).where(ApplicationAdminResult.initiative_id == initiative.id) + rec = (await session.execute(stmt)).scalar_one_or_none() + if rec is None: + raise LookupError("result_not_found") + + before_audit = _audit_admin_result_snapshot(rec) + + d = (decision or "").strip().lower() + if d not in ("approved", "rejected"): + raise ValueError("invalid_decision") + + rec.decision = d + rec.feedback = (feedback or "").strip() + rec.rationale = (rationale or "").strip() or None + rec.updated_by = admin_user_id + rec.updated_at = datetime.now(timezone.utc) + + initiative.status = d + initiative.updated_at = datetime.now(timezone.utc) + + await session.flush() + await session.refresh(rec) + + row = _row_to_api(initiative, payload, rec, application_id.strip()) + await _attach_admin_user_full_names(session, row, rec) + ae, ar = await resolve_actor_fields(session, admin_user_id) + await record_audit( + session, + actor_user_id=admin_user_id, + actor_email=ae, + actor_role=ar, + action=AuditAction.update, + entity_type="application_admin_result", + entity_id=application_id.strip(), + before=before_audit, + after=_audit_admin_result_snapshot(rec), + metadata={"initiativeId": str(initiative.id)}, + ) + return row + + +async def upsert_admin_result( + session: AsyncSession, + application_id: str, + admin_user_id: uuid.UUID, + *, + decision: str, + feedback: str, + rationale: Optional[str], +) -> Dict[str, Any]: + """ + Create or replace admin adjudication in one transaction (idempotent PUT semantics). + Keeps ``initiatives.status`` in sync with ``decision``. + """ + initiative, draft = await _resolve_initiative_and_latest_draft_for_application_id(session, application_id) + + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + stmt = select(ApplicationAdminResult).where(ApplicationAdminResult.initiative_id == initiative.id) + rec = (await session.execute(stmt)).scalar_one_or_none() + + d = (decision or "").strip().lower() + if d not in ("approved", "rejected"): + raise ValueError("invalid_decision") + + fb = (feedback or "").strip() + rat = (rationale or "").strip() or None + now = datetime.now(timezone.utc) + + audit_action = AuditAction.create + before_snap: Dict[str, Any] | None = None + if rec is None: + rec = ApplicationAdminResult( + initiative_id=initiative.id, + decision=d, + feedback=fb, + rationale=rat, + created_by=admin_user_id, + updated_by=admin_user_id, + ) + session.add(rec) + else: + audit_action = AuditAction.update + before_snap = _audit_admin_result_snapshot(rec) + rec.decision = d + rec.feedback = fb + rec.rationale = rat + rec.updated_by = admin_user_id + rec.updated_at = now + + initiative.status = d + initiative.updated_at = now + + await session.flush() + await session.refresh(rec) + + row = _row_to_api(initiative, payload, rec, application_id.strip()) + await _attach_admin_user_full_names(session, row, rec) + ae, ar = await resolve_actor_fields(session, admin_user_id) + await record_audit( + session, + actor_user_id=admin_user_id, + actor_email=ae, + actor_role=ar, + action=audit_action, + entity_type="application_admin_result", + entity_id=application_id.strip(), + before=before_snap, + after=_audit_admin_result_snapshot(rec), + metadata={"initiativeId": str(initiative.id)}, + ) + return row + + +async def delete_admin_result( + session: AsyncSession, + application_id: str, + *, + actor_user_id: uuid.UUID, +) -> None: + initiative, _draft = await _resolve_initiative_and_latest_draft_for_application_id(session, application_id) + + stmt = select(ApplicationAdminResult).where(ApplicationAdminResult.initiative_id == initiative.id) + rec = (await session.execute(stmt)).scalar_one_or_none() + if rec is None: + raise LookupError("result_not_found") + + snap = _audit_admin_result_snapshot(rec) + aid = application_id.strip() + await session.delete(rec) + + initiative.status = "submitted" + initiative.updated_at = datetime.now(timezone.utc) + + await session.flush() + ae, ar = await resolve_actor_fields(session, actor_user_id) + await record_audit( + session, + actor_user_id=actor_user_id, + actor_email=ae, + actor_role=ar, + action=AuditAction.delete, + entity_type="application_admin_result", + entity_id=aid, + before=snap, + metadata={"initiativeId": str(initiative.id)}, + ) diff --git a/be0/src/initiative_db/application_backup.py b/be0/src/initiative_db/application_backup.py new file mode 100644 index 0000000..3636876 --- /dev/null +++ b/be0/src/initiative_db/application_backup.py @@ -0,0 +1,342 @@ +"""Admin streaming ZIP backup: full PDF, official DOCX/PDF, evidence — manifest + SHA-256 verify.""" + +from __future__ import annotations + +import hashlib +import json +import logging +import os +from pathlib import Path +from typing import Any, Dict, Iterable, Iterator, List, Optional + +import boto3 +from botocore.config import Config as BotoConfig +from botocore.exceptions import ClientError +from zipstream import ZipStream + +from src.initiative_db.application_storage import ( + EVIDENCE_ROLE_RESEARCH, + EVIDENCE_ROLE_TECHNICAL, + EVIDENCE_ROLE_TEXTBOOK, + ROLE_OFFICIAL_FORM_DOCX, + ROLE_OFFICIAL_FORM_PDF, + STORAGE_EXTERNAL_URL, + STORAGE_FILESYSTEM, + STORAGE_MINIO_ATTACHMENTS, + STORAGE_MINIO_EXPORTS, + effective_storage_kind, +) +from src.initiative_db.backup_naming import official_form_pdf_backup_zip_path +from src.initiative_db.models import ApplicationArtifact, Initiative +from src.minio.storage import S3Settings, _sanitize_filename + +logger = logging.getLogger(__name__) + + +class BackupIntegrityError(Exception): + """SHA-256 mismatch while packing an artifact into the backup ZIP.""" + + +def _sync_s3_client(settings: S3Settings): + return boto3.client( + "s3", + endpoint_url=settings.s3_endpoint_url, + aws_access_key_id=settings.s3_access_key, + aws_secret_access_key=settings.s3_secret_key, + region_name=settings.s3_region, + config=BotoConfig(signature_version="s3v4"), + ) + + +def _iter_chunks_s3(settings: S3Settings, bucket: str, key: str, chunk_size: int = 64 * 1024) -> Iterator[bytes]: + s3 = _sync_s3_client(settings) + try: + obj = s3.get_object(Bucket=bucket, Key=key) + except ClientError as exc: + code = exc.response.get("Error", {}).get("Code", "") + if code in ("404", "NoSuchKey"): + raise FileNotFoundError(f"s3://{bucket}/{key}") from exc + raise + body = obj["Body"] + try: + while True: + chunk = body.read(chunk_size) + if not chunk: + break + yield chunk + finally: + body.close() + + +def _submitted_initiatives_dir() -> Path: + return Path( + os.getenv( + "SUBMITTED_INITIATIVES_DIR", + str((Path(__file__).resolve().parents[3] / "fe0" / "public" / "submitted-initiatives").resolve()), + ) + ) + + +def _filesystem_path_for_uri(storage_uri: str) -> Path: + base = _submitted_initiatives_dir().resolve() + u = storage_uri.strip() + prefix = "/submitted-initiatives/" + if u.startswith(prefix): + rel = u[len(prefix) :].lstrip("/") + elif u.startswith("/"): + raw = u.lstrip("/") + if raw.startswith("submitted-initiatives/"): + rel = raw[len("submitted-initiatives/") :].lstrip("/") + else: + rel = raw + else: + rel = u.lstrip("/") + + candidate = (base / rel).resolve() + try: + candidate.relative_to(base) + except ValueError as exc: + raise ValueError(f"Invalid backup path outside submitted-initiatives: {storage_uri}") from exc + return candidate + + +def _iter_file_chunks(path: Path, chunk_size: int = 64 * 1024) -> Iterator[bytes]: + with open(path, "rb") as handle: + while True: + chunk = handle.read(chunk_size) + if not chunk: + break + yield chunk + + +def _guarded_chunk_iter( + chunks: Iterable[bytes], + expected_sha256: Optional[str], + role: str, + arcname: str, +) -> Iterator[bytes]: + h = hashlib.sha256() + for chunk in chunks: + h.update(chunk) + yield chunk + digest = h.hexdigest() + if expected_sha256 and digest.lower() != str(expected_sha256).lower(): + raise BackupIntegrityError( + f"SHA-256 mismatch for {role} ({arcname}): stored={expected_sha256[:16]}… verified={digest[:16]}…" + ) + + +def _zip_path_for_artifact(art: ApplicationArtifact) -> str: + role = art.role + if role == "full_pdf": + return "submitted/full-package.pdf" + if role == ROLE_OFFICIAL_FORM_DOCX: + return "submitted/official-form.docx" + if role == ROLE_OFFICIAL_FORM_PDF: + return "submitted/official-form.pdf" + if role == EVIDENCE_ROLE_RESEARCH: + name = _sanitize_filename(art.original_name or "evidence") + return f"evidence/research/{name}" + if role == EVIDENCE_ROLE_TEXTBOOK: + name = _sanitize_filename(art.original_name or "evidence") + return f"evidence/textbook/{name}" + if role == EVIDENCE_ROLE_TECHNICAL: + name = _sanitize_filename(art.original_name or "evidence") + return f"evidence/technical/{name}" + safe = _sanitize_filename(art.original_name or role) + return f"other/{role}/{safe}" + + +def _synthesize_official_form_pdf_bytes(obm: Dict[str, Any]) -> bytes: + """Same pipeline as ``POST /api/v1/docx/preview-application-form-pdf`` and submit-time persist.""" + from src.be01.docx_to_pdf import convert_docx_bytes_to_pdf + from src.be01.fill_application_form import fill_application_form_docx + from src.be01.official_to_data_blank import official_to_data_blank + + ctx = official_to_data_blank(dict(obm)) + docx_bytes = fill_application_form_docx(ctx) + return convert_docx_bytes_to_pdf( + docx_bytes, + relax_justified_softbreaks=True, + strip_table_row_heights=False, + ) + + +def _iter_memory_chunks(data: bytes, chunk_size: int = 64 * 1024) -> Iterator[bytes]: + for i in range(0, len(data), chunk_size): + yield data[i : i + chunk_size] + + +def build_backup_zipstream( + *, + settings: S3Settings, + initiative: Initiative, + application_id: str, + case_code: str, + artifacts: List[ApplicationArtifact], + review_doc_json: Optional[Dict[str, Any]], + owner_id: Optional[str], + submitted_at: Optional[str], +) -> ZipStream: + z = ZipStream() + manifest_files: List[Dict[str, Any]] = [] + + official_obm: Optional[Dict[str, Any]] = None + if isinstance(review_doc_json, dict): + raw_obm = review_doc_json.get("officialBieuMau") + if isinstance(raw_obm, dict) and len(raw_obm) > 0: + official_obm = raw_obm + + synth_pdf: Optional[bytes] = None + synth_arc: Optional[str] = None + if official_obm is not None: + try: + synth_pdf = _synthesize_official_form_pdf_bytes(official_obm) + synth_arc = official_form_pdf_backup_zip_path(official_obm) or "submitted/official-form.pdf" + except Exception: + logger.exception( + "Backup: failed to synthesize official form PDF from officialBieuMau; falling back to stored artifact if any" + ) + synth_pdf = None + synth_arc = None + + for art in sorted(artifacts, key=lambda a: _zip_path_for_artifact(a)): + role = art.role + if role == ROLE_OFFICIAL_FORM_PDF and synth_pdf is not None: + continue + uri = (art.storage_uri or "").strip() + if not uri: + continue + sk = effective_storage_kind(role, uri, art.storage_kind) + arcname = _zip_path_for_artifact(art) + if role == ROLE_OFFICIAL_FORM_PDF and official_obm is not None: + custom_pdf_path = official_form_pdf_backup_zip_path(official_obm) + if custom_pdf_path: + arcname = custom_pdf_path + + if sk == STORAGE_EXTERNAL_URL: + manifest_files.append( + { + "role": role, + "zip_path": arcname, + "original_name": art.original_name, + "mime_type": art.mime_type, + "byte_size": art.byte_size, + "stored_sha256": art.sha256, + "verified_sha256": None, + "storage_kind": sk, + "skipped": True, + "skip_reason": "external_url_not_packed", + } + ) + continue + + raw_iter: Optional[Iterator[bytes]] = None + if sk == STORAGE_FILESYSTEM: + path = _filesystem_path_for_uri(uri) + if not path.is_file(): + logger.warning("Backup skip missing filesystem object role=%s path=%s", role, path) + manifest_files.append( + { + "role": role, + "zip_path": arcname, + "original_name": art.original_name, + "mime_type": art.mime_type, + "byte_size": art.byte_size, + "stored_sha256": art.sha256, + "verified_sha256": None, + "storage_kind": sk, + "skipped": True, + "skip_reason": "filesystem_missing", + } + ) + continue + raw_iter = _iter_file_chunks(path) + elif sk in (STORAGE_MINIO_EXPORTS, STORAGE_MINIO_ATTACHMENTS): + bucket = ( + settings.s3_bucket_exports if sk == STORAGE_MINIO_EXPORTS else settings.s3_bucket_attachments + ) + key = uri + try: + raw_iter = _iter_chunks_s3(settings, bucket, key) + except FileNotFoundError: + logger.warning("Backup skip missing MinIO object role=%s bucket=%s key=%s", role, bucket, key[:64]) + manifest_files.append( + { + "role": role, + "zip_path": arcname, + "original_name": art.original_name, + "mime_type": art.mime_type, + "byte_size": art.byte_size, + "stored_sha256": art.sha256, + "verified_sha256": None, + "storage_kind": sk, + "skipped": True, + "skip_reason": "minio_missing", + } + ) + continue + else: + continue + + chunk_iter: Iterator[bytes] = ( + _guarded_chunk_iter(raw_iter, art.sha256, role, arcname) if art.sha256 else raw_iter + ) + z.add(chunk_iter, arcname) + + manifest_files.append( + { + "role": role, + "zip_path": arcname, + "original_name": art.original_name, + "mime_type": art.mime_type, + "byte_size": art.byte_size, + "stored_sha256": art.sha256, + "verified_sha256": art.sha256 if art.sha256 else None, + "integrity_note": "sha256_checked_during_zip" if art.sha256 else "no_stored_sha256", + "storage_kind": sk, + "skipped": False, + } + ) + + if synth_pdf is not None and synth_arc is not None: + z.add(_iter_memory_chunks(synth_pdf), synth_arc) + digest = hashlib.sha256(synth_pdf).hexdigest() + manifest_files.append( + { + "role": ROLE_OFFICIAL_FORM_PDF, + "zip_path": synth_arc, + "original_name": synth_arc.rsplit("/", 1)[-1], + "mime_type": "application/pdf", + "byte_size": len(synth_pdf), + "stored_sha256": None, + "verified_sha256": digest, + "integrity_note": "synthesized_for_backup_match_preview_endpoint", + "storage_kind": "synthesized", + "skipped": False, + } + ) + + missing_official = ( + not any(a.role in (ROLE_OFFICIAL_FORM_DOCX, ROLE_OFFICIAL_FORM_PDF) for a in artifacts) and synth_pdf is None + ) + + manifest: Dict[str, Any] = { + "applicationId": application_id, + "case_code": case_code, + "initiative_id": str(initiative.id), + "owner_id": owner_id, + "submitted_at": submitted_at, + "missing_official_form_artifacts": missing_official, + "files": manifest_files, + } + + if review_doc_json is not None: + z.add( + json.dumps(review_doc_json, ensure_ascii=False, indent=2).encode("utf-8"), + "metadata/application_review_documents.json", + ) + + z.add(json.dumps(manifest, ensure_ascii=False, indent=2).encode("utf-8"), "manifest.json") + + return z diff --git a/be0/src/initiative_db/application_storage.py b/be0/src/initiative_db/application_storage.py new file mode 100644 index 0000000..c08c378 --- /dev/null +++ b/be0/src/initiative_db/application_storage.py @@ -0,0 +1,420 @@ +"""Append-only tab snapshots, submit snapshots, taxonomy/workflow projection, artifact registry.""" + +from __future__ import annotations + +import uuid +from datetime import date, datetime, timezone +from typing import Any, Dict, List, Mapping, Optional + +from sqlalchemy import desc, func, select +from sqlalchemy.dialects.postgresql import insert as pg_insert +from sqlalchemy.ext.asyncio import AsyncSession + +from src.initiative_db.models import ( + ApplicationArtifact, + ApplicationSubmitSnapshot, + ApplicationTaxonomy, + ApplicationWorkflow, + DraftTabSnapshot, + Initiative, +) + +# Initiative Đơn — minh chứng (2.1 nghiên cứu / 2.2 giáo trình / kỹ thuật nhóm 1) +# Must match application_artifacts.role CHECK in 002_application_storage_extensions.sql +EVIDENCE_ROLE_RESEARCH = "research_evidence" +EVIDENCE_ROLE_TEXTBOOK = "textbook_evidence" +EVIDENCE_ROLE_TECHNICAL = "technical_evidence" + +ROLE_OFFICIAL_FORM_DOCX = "official_form_docx" +ROLE_OFFICIAL_FORM_PDF = "official_form_pdf" + +STORAGE_MINIO_EXPORTS = "minio_exports" +STORAGE_MINIO_ATTACHMENTS = "minio_attachments" +STORAGE_FILESYSTEM = "filesystem" +STORAGE_EXTERNAL_URL = "external_url" + +VALID_DRAFT_TABS = frozenset({"report", "application", "contribution"}) + + +def effective_storage_kind(role: str, storage_uri: str, declared: Optional[str]) -> str: + """Resolve ``storage_kind`` from DB column or legacy ``storage_uri`` shape.""" + if declared in ( + STORAGE_MINIO_EXPORTS, + STORAGE_MINIO_ATTACHMENTS, + STORAGE_FILESYSTEM, + STORAGE_EXTERNAL_URL, + ): + return declared + u = (storage_uri or "").strip() + if u.startswith("http://") or u.startswith("https://"): + return STORAGE_EXTERNAL_URL + if u.startswith("/submitted-initiatives/"): + return STORAGE_FILESYSTEM + if u.startswith("/") and not u.startswith("/initiatives/"): + return STORAGE_FILESYSTEM + if role in (EVIDENCE_ROLE_RESEARCH, EVIDENCE_ROLE_TEXTBOOK, EVIDENCE_ROLE_TECHNICAL): + return STORAGE_MINIO_ATTACHMENTS + return STORAGE_MINIO_EXPORTS + + +def _parse_date_only(value: Any) -> Optional[date]: + if value is None: + return None + raw = str(value).strip() + if len(raw) < 10: + return None + try: + return date.fromisoformat(raw[:10]) + except ValueError: + return None + + +async def record_tab_snapshot( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + draft_id: Optional[uuid.UUID], + tab: str, + payload: Mapping[str, Any], + source: str = "autosave", +) -> None: + if tab not in VALID_DRAFT_TABS: + return + stmt = select(func.coalesce(func.max(DraftTabSnapshot.tab_version), 0)).where( + DraftTabSnapshot.initiative_id == initiative_id, + DraftTabSnapshot.tab == tab, + ) + max_v = (await session.execute(stmt)).scalar() + next_v = int(max_v or 0) + 1 + session.add( + DraftTabSnapshot( + initiative_id=initiative_id, + draft_id=draft_id, + tab=tab, + tab_version=next_v, + payload=dict(payload), + source=source, + ) + ) + + +async def record_submit_snapshot( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + submission_record_id: str, + merged_tabs: Mapping[str, Any], + submit_metadata: Mapping[str, Any], + full_pdf_uri: str, +) -> None: + session.add( + ApplicationSubmitSnapshot( + initiative_id=initiative_id, + submission_record_id=submission_record_id, + merged_tabs=dict(merged_tabs), + submit_metadata=dict(submit_metadata), + full_pdf_uri=full_pdf_uri, + ) + ) + + +async def upsert_application_taxonomy( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + subject_id: str, + group_id: str, + topic_type: str, +) -> None: + now = datetime.now(timezone.utc) + stmt = pg_insert(ApplicationTaxonomy).values( + initiative_id=initiative_id, + subject_id=subject_id or "", + group_id=group_id or "", + topic_type=topic_type or "", + updated_at=now, + ) + stmt = stmt.on_conflict_do_update( + index_elements=[ApplicationTaxonomy.initiative_id], + set_={ + "subject_id": stmt.excluded.subject_id, + "group_id": stmt.excluded.group_id, + "topic_type": stmt.excluded.topic_type, + "updated_at": stmt.excluded.updated_at, + }, + ) + await session.execute(stmt) + + +async def upsert_application_workflow( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + submission_record: Mapping[str, Any], +) -> None: + review_status = str(submission_record.get("reviewStatus") or "not_reviewed") + review_deadline = _parse_date_only(submission_record.get("reviewDeadline")) + reviewer = submission_record.get("reviewer") + supervisor = submission_record.get("supervisor") + conference = submission_record.get("conference") + now = datetime.now(timezone.utc) + stmt = pg_insert(ApplicationWorkflow).values( + initiative_id=initiative_id, + review_status=review_status, + review_deadline=review_deadline, + reviewer=dict(reviewer) if isinstance(reviewer, dict) else None, + supervisor=dict(supervisor) if isinstance(supervisor, dict) else None, + conference=dict(conference) if isinstance(conference, dict) else None, + updated_at=now, + ) + stmt = stmt.on_conflict_do_update( + index_elements=[ApplicationWorkflow.initiative_id], + set_={ + "review_status": stmt.excluded.review_status, + "review_deadline": stmt.excluded.review_deadline, + "reviewer": stmt.excluded.reviewer, + "supervisor": stmt.excluded.supervisor, + "conference": stmt.excluded.conference, + "updated_at": stmt.excluded.updated_at, + }, + ) + await session.execute(stmt) + + +async def upsert_artifact_full_pdf( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + storage_uri: str, + original_name: Optional[str], + byte_size: Optional[int], + sha256_hex: Optional[str], + uploaded_by: Optional[uuid.UUID], + storage_kind: Optional[str] = None, +) -> None: + now = datetime.now(timezone.utc) + stmt = pg_insert(ApplicationArtifact).values( + initiative_id=initiative_id, + role="full_pdf", + storage_uri=storage_uri, + storage_kind=storage_kind, + original_name=original_name, + mime_type="application/pdf", + byte_size=byte_size, + sha256=sha256_hex, + uploaded_by=uploaded_by, + uploaded_at=now, + ) + stmt = stmt.on_conflict_do_update( + index_elements=[ApplicationArtifact.initiative_id, ApplicationArtifact.role], + set_={ + "storage_uri": stmt.excluded.storage_uri, + "storage_kind": stmt.excluded.storage_kind, + "original_name": stmt.excluded.original_name, + "byte_size": stmt.excluded.byte_size, + "sha256": stmt.excluded.sha256, + "uploaded_by": stmt.excluded.uploaded_by, + "uploaded_at": stmt.excluded.uploaded_at, + }, + ) + await session.execute(stmt) + + +async def upsert_artifact_official_form( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + role: str, + storage_uri: str, + original_name: Optional[str], + byte_size: Optional[int], + sha256_hex: Optional[str], + uploaded_by: Optional[uuid.UUID], + mime_type: str, + storage_kind: str = STORAGE_MINIO_EXPORTS, +) -> None: + now = datetime.now(timezone.utc) + stmt = pg_insert(ApplicationArtifact).values( + initiative_id=initiative_id, + role=role, + storage_uri=storage_uri, + storage_kind=storage_kind, + original_name=original_name, + mime_type=mime_type, + byte_size=byte_size, + sha256=sha256_hex, + uploaded_by=uploaded_by, + uploaded_at=now, + ) + stmt = stmt.on_conflict_do_update( + index_elements=[ApplicationArtifact.initiative_id, ApplicationArtifact.role], + set_={ + "storage_uri": stmt.excluded.storage_uri, + "storage_kind": stmt.excluded.storage_kind, + "original_name": stmt.excluded.original_name, + "byte_size": stmt.excluded.byte_size, + "sha256": stmt.excluded.sha256, + "uploaded_by": stmt.excluded.uploaded_by, + "uploaded_at": stmt.excluded.uploaded_at, + "mime_type": stmt.excluded.mime_type, + }, + ) + await session.execute(stmt) + + +async def upsert_evidence_artifact( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + role: str, + storage_uri: str, + original_name: Optional[str], + byte_size: Optional[int], + sha256_hex: Optional[str], + uploaded_by: Optional[uuid.UUID], + mime_type: str = "application/pdf", +) -> None: + now = datetime.now(timezone.utc) + stmt = pg_insert(ApplicationArtifact).values( + initiative_id=initiative_id, + role=role, + storage_uri=storage_uri, + storage_kind=STORAGE_MINIO_ATTACHMENTS, + original_name=original_name, + mime_type=mime_type, + byte_size=byte_size, + sha256=sha256_hex, + uploaded_by=uploaded_by, + uploaded_at=now, + ) + stmt = stmt.on_conflict_do_update( + index_elements=[ApplicationArtifact.initiative_id, ApplicationArtifact.role], + set_={ + "storage_uri": stmt.excluded.storage_uri, + "storage_kind": stmt.excluded.storage_kind, + "original_name": stmt.excluded.original_name, + "byte_size": stmt.excluded.byte_size, + "sha256": stmt.excluded.sha256, + "uploaded_by": stmt.excluded.uploaded_by, + "uploaded_at": stmt.excluded.uploaded_at, + "mime_type": stmt.excluded.mime_type, + # Re-upload clears staff decision + "review_status": None, + "reviewed_by": None, + "reviewed_at": None, + }, + ) + await session.execute(stmt) + + +async def get_evidence_artifact_row( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + role: str, +) -> Optional[ApplicationArtifact]: + stmt = select(ApplicationArtifact).where( + ApplicationArtifact.initiative_id == initiative_id, + ApplicationArtifact.role == role, + ) + return (await session.execute(stmt)).scalar_one_or_none() + + +async def set_evidence_artifact_review( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + role: str, + review_status: str, + reviewer_id: uuid.UUID, +) -> Optional[ApplicationArtifact]: + row = await get_evidence_artifact_row(session, initiative_id=initiative_id, role=role) + if row is None: + return None + now = datetime.now(timezone.utc) + row.review_status = review_status + row.reviewed_by = reviewer_id + row.reviewed_at = now + await session.flush() + return row + + +async def delete_evidence_artifact_row( + session: AsyncSession, + *, + initiative_id: uuid.UUID, + role: str, +) -> Optional[ApplicationArtifact]: + row = await get_evidence_artifact_row(session, initiative_id=initiative_id, role=role) + if row is None: + return None + await session.delete(row) + return row + + +async def list_tab_snapshots_for_case( + session: AsyncSession, + *, + case_code: str, + tab: Optional[str], + limit: int, +) -> List[Dict[str, Any]]: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one_or_none() + if ini is None: + return [] + stmt = select(DraftTabSnapshot).where(DraftTabSnapshot.initiative_id == ini.id) + if tab: + stmt = stmt.where(DraftTabSnapshot.tab == tab) + stmt = stmt.order_by(desc(DraftTabSnapshot.captured_at)).limit(max(1, min(limit, 200))) + rows = (await session.execute(stmt)).scalars().all() + out: List[Dict[str, Any]] = [] + for r in rows: + out.append( + { + "id": str(r.id), + "initiativeId": str(r.initiative_id), + "draftId": str(r.draft_id) if r.draft_id else None, + "tab": r.tab, + "tabVersion": r.tab_version, + "payload": r.payload, + "source": r.source, + "capturedAt": r.captured_at.isoformat() if r.captured_at else None, + } + ) + return out + + +async def list_submit_snapshots_for_case( + session: AsyncSession, + *, + case_code: str, + limit: int, +) -> List[Dict[str, Any]]: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one_or_none() + if ini is None: + return [] + stmt = ( + select(ApplicationSubmitSnapshot) + .where(ApplicationSubmitSnapshot.initiative_id == ini.id) + .order_by(desc(ApplicationSubmitSnapshot.captured_at)) + .limit(max(1, min(limit, 50))) + ) + rows = (await session.execute(stmt)).scalars().all() + out: List[Dict[str, Any]] = [] + for r in rows: + out.append( + { + "id": str(r.id), + "initiativeId": str(r.initiative_id), + "submissionRecordId": r.submission_record_id, + "mergedTabs": r.merged_tabs, + "submitMetadata": r.submit_metadata, + "fullPdfUri": r.full_pdf_uri, + "capturedAt": r.captured_at.isoformat() if r.captured_at else None, + } + ) + return out diff --git a/be0/src/initiative_db/backup_naming.py b/be0/src/initiative_db/backup_naming.py new file mode 100644 index 0000000..cdebcb6 --- /dev/null +++ b/be0/src/initiative_db/backup_naming.py @@ -0,0 +1,80 @@ +"""Naming helpers for admin application backup downloads (no heavy deps).""" + +from __future__ import annotations + +import re +import unicodedata +from typing import Any, Dict, Optional + +_TRANG_BIA_TEN_SK = "Tên sáng kiến (Tiếng Việt)" +_TRANG_BIA_TAC_GIA = "Tác giả/nhóm tác giả sáng kiến" +_TRANG_BIA_LIEN_HE = "Thông tin liên hệ (Điện thoại, Email)" +_EMAIL_RE = re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}") + + +def _sanitize_filename(name: str) -> str: + """Match ``src.minio.storage._sanitize_filename`` without importing boto/S3 stack.""" + name = name.replace("/", "_").replace("\\", "_") + name = re.sub(r"\s+", "_", name.strip()) + name = "".join(ch for ch in name if unicodedata.category(ch)[0] != "C") + return name[:200] or "file" + + +def _extract_contact_email(text: str) -> str: + """Prefer first e-mail in « Thông tin liên hệ »; else whole string trimmed.""" + raw = (text or "").strip() + if not raw: + return "" + m = _EMAIL_RE.search(raw) + return m.group(0) if m else raw + + +def official_form_pdf_backup_zip_path(official_bieu_mau: Optional[Dict[str, Any]]) -> Optional[str]: + """ + Relative path inside admin backup ZIP for the official merged PDF. + + Format: ``submitted/{Tên_sáng_kiến}_{tên_tác_giả}_{email_liên_hệ}.pdf`` (segments sanitized). + Returns ``None`` when ``TRANG BÌA`` is missing so callers keep ``submitted/official-form.pdf``. + """ + if not isinstance(official_bieu_mau, dict): + return None + bia = official_bieu_mau.get("TRANG BÌA") + if not isinstance(bia, dict): + return None + ten = str(bia.get(_TRANG_BIA_TEN_SK) or "").strip() + tac = str(bia.get(_TRANG_BIA_TAC_GIA) or "").strip() + lien = str(bia.get(_TRANG_BIA_LIEN_HE) or "").strip() + email = _extract_contact_email(lien) + if not ten and not tac and not email: + return None + part_ten = _sanitize_filename(ten) if ten else "Ten_sang_kien" + part_tac = _sanitize_filename(tac) if tac else "tac_gia" + part_mail = _sanitize_filename(email) if email else "email_lien_he" + stem = f"{part_ten}_{part_tac}_{part_mail}" + if len(stem) > 180: + stem = stem[:180] + return f"submitted/{stem}.pdf" + + +def backup_zip_attachment_filename( + *, + owner_email: Optional[str], + owner_full_name: Optional[str], + public_application_id: str, +) -> str: + """ + Safe Content-Disposition name: {username}_{applicationId}.zip + Username prefers email local-part, else full name; falls back to ``applicant``. + """ + user_seg = "applicant" + email = (owner_email or "").strip() + if email and "@" in email: + local = email.split("@", 1)[0].strip() + user_seg = _sanitize_filename(local) or "applicant" + else: + name = (owner_full_name or "").strip() + if name: + user_seg = _sanitize_filename(name) or "applicant" + pid = (public_application_id or "").strip() + base = f"{user_seg}_{pid}.zip" + return _sanitize_filename(base) or base diff --git a/be0/src/initiative_db/drafts.py b/be0/src/initiative_db/drafts.py new file mode 100644 index 0000000..d724560 --- /dev/null +++ b/be0/src/initiative_db/drafts.py @@ -0,0 +1,252 @@ +"""Persist tab-based application drafts in `drafts.payload` (JSONB).""" + +from __future__ import annotations + +import logging +import uuid +from datetime import datetime, timezone +from typing import Any, Dict, Optional + +from sqlalchemy import select +from sqlalchemy.exc import ProgrammingError +from sqlalchemy.ext.asyncio import AsyncSession + +from src.audit import AuditAction, record_audit, resolve_actor_fields +from src.initiative_db.application_storage import record_tab_snapshot +from src.initiative_db.models import AuditLog, Draft, Initiative + +# Seed user from migrations/001_initiative_schema.sql +SYSTEM_DRAFT_OWNER_ID = uuid.UUID("00000000-0000-4000-8000-000000000001") +logger = logging.getLogger(__name__) +OFFICIAL_FORM_LAYOUT_PAYLOAD_KEY = "officialFormPdfLayout" + + +async def save_application_draft_tab( + session: AsyncSession, + case_id: str, + tab: str, + data: Dict[str, Any], + owner_id: Optional[uuid.UUID] = None, +) -> Dict[str, Any]: + stmt = select(Initiative).where(Initiative.case_code == case_id) + initiative = (await session.execute(stmt)).scalar_one_or_none() + + effective_owner = owner_id or SYSTEM_DRAFT_OWNER_ID + + if initiative is None: + initiative = Initiative(case_code=case_id, owner_id=effective_owner) + session.add(initiative) + await session.flush() + draft = Draft( + draft_code=f"DRAFT-{case_id}", + initiative_id=initiative.id, + payload=_empty_payload(case_id), + ) + session.add(draft) + else: + stmt_d = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(stmt_d)).scalar_one_or_none() + if draft is None: + draft = Draft( + draft_code=f"DRAFT-{case_id}", + initiative_id=initiative.id, + payload=_empty_payload(case_id), + ) + session.add(draft) + + if owner_id and initiative.owner_id == SYSTEM_DRAFT_OWNER_ID: + initiative.owner_id = owner_id + + current = dict(draft.payload) if isinstance(draft.payload, dict) else {} + tabs = dict(current.get("tabs") or {}) + tabs[tab] = data + now_iso = datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z") + current["caseId"] = case_id + current["updatedAt"] = now_iso + current["tabs"] = tabs + + draft.payload = current + draft.version = (draft.version or 0) + 1 + + await session.flush() + + if owner_id is not None: + ae, ar = await resolve_actor_fields(session, owner_id) + await record_audit( + session, + actor_user_id=owner_id, + actor_email=ae, + actor_role=ar, + action=AuditAction.update, + entity_type="application_draft", + entity_id=case_id, + after={"tab": tab, "version": draft.version}, + metadata={"source": "autosave_tab"}, + ) + + # Snapshot history is optional telemetry. + # Run it in a savepoint so missing table errors don't poison the main transaction. + try: + async with session.begin_nested(): + await record_tab_snapshot( + session, + initiative_id=initiative.id, + draft_id=draft.id, + tab=tab, + payload=data, + source="autosave", + ) + except ProgrammingError as exc: + if "draft_tab_snapshots" in str(exc).lower(): + logger.warning( + "draft_tab_snapshots table missing; skipping tab snapshot for case %s (tab=%s)", + case_id, + tab, + ) + else: + raise + + session.add( + AuditLog( + actor_id=SYSTEM_DRAFT_OWNER_ID, + action="update", + entity="draft", + entity_id=draft.id, + diff={"tab": tab, "version": draft.version}, + ) + ) + + return { + "caseId": case_id, + "updatedAt": now_iso, + "tabs": tabs, + "version": draft.version, + "publicUrl": f"/application-drafts/{case_id}.yml", + } + + +async def insert_initiative_draft_if_missing( + session: AsyncSession, case_id: str, payload: Dict[str, Any] +) -> None: + """ + Copy a YAML/file-shaped bundle into initiatives + drafts when Postgres was empty. + Idempotent: no-op if initiative already exists. + """ + stmt = select(Initiative).where(Initiative.case_code == case_id) + if (await session.execute(stmt)).scalar_one_or_none() is not None: + return + + merged: Dict[str, Any] = dict(payload) if isinstance(payload, dict) else {} + tabs = merged.get("tabs") + merged["tabs"] = dict(tabs) if isinstance(tabs, dict) else {} + merged["caseId"] = case_id + + initiative = Initiative(case_code=case_id, owner_id=SYSTEM_DRAFT_OWNER_ID) + session.add(initiative) + await session.flush() + draft = Draft( + draft_code=f"DRAFT-{case_id}", + initiative_id=initiative.id, + payload=merged, + ) + session.add(draft) + await session.flush() + + +async def get_application_draft_document(session: AsyncSession, case_id: str) -> Dict[str, Any]: + stmt = select(Initiative).where(Initiative.case_code == case_id) + initiative = (await session.execute(stmt)).scalar_one_or_none() + if initiative is None: + raise KeyError(case_id) + + stmt_d = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(stmt_d)).scalar_one_or_none() + if draft is None or not isinstance(draft.payload, dict): + raise KeyError(case_id) + return draft.payload + + +def _empty_payload(case_id: str) -> Dict[str, Any]: + return { + "caseId": case_id, + "updatedAt": datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z"), + "tabs": {}, + } + + +async def get_latest_draft_for_case(session: AsyncSession, case_id: str) -> Optional[Draft]: + stmt = select(Initiative).where(Initiative.case_code == case_id) + initiative = (await session.execute(stmt)).scalar_one_or_none() + if initiative is None: + return None + stmt_d = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + return (await session.execute(stmt_d)).scalar_one_or_none() + + +async def get_official_form_layout_payload(session: AsyncSession, case_id: str) -> Optional[Dict[str, Any]]: + draft = await get_latest_draft_for_case(session, case_id) + if draft is None or not isinstance(draft.payload, dict): + return None + slot = draft.payload.get(OFFICIAL_FORM_LAYOUT_PAYLOAD_KEY) + if not isinstance(slot, dict): + return None + return dict(slot) + + +async def save_official_form_layout_payload( + session: AsyncSession, + *, + case_id: str, + payload: Dict[str, Any], + owner_id: Optional[uuid.UUID] = None, +) -> Dict[str, Any]: + stmt = select(Initiative).where(Initiative.case_code == case_id) + initiative = (await session.execute(stmt)).scalar_one_or_none() + if initiative is None: + initiative = Initiative(case_code=case_id, owner_id=owner_id or SYSTEM_DRAFT_OWNER_ID) + session.add(initiative) + await session.flush() + + stmt_d = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(stmt_d)).scalar_one_or_none() + if draft is None: + draft = Draft( + draft_code=f"DRAFT-{case_id}", + initiative_id=initiative.id, + payload=_empty_payload(case_id), + ) + session.add(draft) + await session.flush() + + if owner_id and initiative.owner_id == SYSTEM_DRAFT_OWNER_ID: + initiative.owner_id = owner_id + + now_iso = datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z") + current = dict(draft.payload) if isinstance(draft.payload, dict) else _empty_payload(case_id) + current["caseId"] = case_id + current["updatedAt"] = now_iso + current[OFFICIAL_FORM_LAYOUT_PAYLOAD_KEY] = dict(payload) + draft.payload = current + draft.version = (draft.version or 0) + 1 + await session.flush() + return {"caseId": case_id, "updatedAt": now_iso, "version": draft.version} diff --git a/be0/src/initiative_db/engine.py b/be0/src/initiative_db/engine.py new file mode 100644 index 0000000..931fa3c --- /dev/null +++ b/be0/src/initiative_db/engine.py @@ -0,0 +1,115 @@ +"""Async SQLAlchemy engine for initiative PostgreSQL persistence.""" + +from __future__ import annotations + +import logging +import os +from contextlib import asynccontextmanager +from typing import AsyncGenerator, Optional + +from sqlalchemy import text +from sqlalchemy.ext.asyncio import AsyncConnection, AsyncEngine, AsyncSession, async_sessionmaker, create_async_engine + +logger = logging.getLogger(__name__) + +_engine: Optional[AsyncEngine] = None +_session_factory: Optional[async_sessionmaker[AsyncSession]] = None + +# Inline from migrations/014_registration_otp.sql — applied on first engine init if missing +# (avoids relying on subprocess migrate script; idempotent). +_DDL_REGISTRATION_OTP_TABLE = """ +CREATE TABLE IF NOT EXISTS registration_otp_codes ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + otp_hash TEXT NOT NULL, + expires_at TIMESTAMPTZ NOT NULL, + failed_attempts INT NOT NULL DEFAULT 0, + used_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +) +""" +_DDL_REGISTRATION_OTP_INDEX = """ +CREATE INDEX IF NOT EXISTS idx_registration_otp_codes_user_pending + ON registration_otp_codes (user_id) + WHERE used_at IS NULL +""" + + +async def _ensure_registration_otp_table(conn: AsyncConnection) -> None: + row = ( + await conn.execute( + text( + """ + SELECT 1 + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'registration_otp_codes' + LIMIT 1 + """ + ) + ) + ).first() + if row is not None: + return + logger.info("initiative_db: creating registration_otp_codes (migration 014 equivalent)") + await conn.execute(text(_DDL_REGISTRATION_OTP_TABLE)) + await conn.execute(text(_DDL_REGISTRATION_OTP_INDEX)) + await conn.execute( + text( + "COMMENT ON TABLE registration_otp_codes IS " + "'Hashed 6-digit OTP for register verification; pending rows deleted when superseded by resend.'" + ) + ) + + +def get_database_url() -> Optional[str]: + return os.getenv("INITIATIVE_DATABASE_URL") or os.getenv("DATABASE_URL") + + +def is_postgres_enabled() -> bool: + url = get_database_url() + return bool(url and url.startswith("postgresql")) + + +async def init_engine() -> None: + """Create async engine and session factory. No-op if URL missing.""" + global _engine, _session_factory + if not is_postgres_enabled(): + return + if _engine is not None: + return + url = get_database_url() + assert url is not None + _engine = create_async_engine(url, echo=os.getenv("SQL_ECHO", "").lower() in ("1", "true", "yes")) + _session_factory = async_sessionmaker(_engine, expire_on_commit=False) + async with _engine.begin() as conn: + await conn.execute(text("SELECT 1")) + await _ensure_registration_otp_table(conn) + + +async def dispose_engine() -> None: + global _engine, _session_factory + if _engine is not None: + await _engine.dispose() + _engine = None + _session_factory = None + + +def get_session_factory() -> async_sessionmaker[AsyncSession]: + if _session_factory is None: + raise RuntimeError("Initiative DB not initialized; call init_engine() on startup.") + return _session_factory + + +@asynccontextmanager +async def get_session() -> AsyncGenerator[AsyncSession, None]: + if is_postgres_enabled() and _session_factory is None: + await init_engine() + factory = get_session_factory() + async with factory() as session: + try: + yield session + await session.commit() + except Exception: + await session.rollback() + raise diff --git a/be0/src/initiative_db/models.py b/be0/src/initiative_db/models.py new file mode 100644 index 0000000..984fd3b --- /dev/null +++ b/be0/src/initiative_db/models.py @@ -0,0 +1,907 @@ +"""SQLAlchemy models for initiative persistence (subset used by API; full schema in migrations).""" + +from __future__ import annotations + +import uuid +from datetime import date, datetime +from typing import Any, Optional + +from sqlalchemy import BigInteger, Boolean, Date, DateTime, Float, ForeignKey, Integer, Numeric, Text, UniqueConstraint, text +from sqlalchemy.dialects.postgresql import ENUM, JSONB, UUID +from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship + + +class Base(DeclarativeBase): + pass + + +submission_status_enum = ENUM( + "draft", + "submitted", + "under_review", + "approved", + "rejected", + name="submission_status", + create_type=False, +) + +recognition_tier_enum = ENUM("excellent", "good", name="recognition_tier", create_type=False) + +user_role_enum = ENUM( + "applicant", + "council_member", + "editor", + "admin", + "viewer", + name="user_role", + create_type=False, +) + +profile_verification_status_enum = ENUM( + "draft", + "pending", + "verified", + "rejected", + name="profile_verification_status", + create_type=False, +) + +audit_action_enum = ENUM( + "create", + "read", + "update", + "delete", + "login", + "logout", + "login_failed", + name="audit_action", + create_type=False, +) + + +class Unit(Base): + """Faculty / department catalog (see migrations/001_initiative_schema.sql).""" + + __tablename__ = "units" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + name: Mapped[str] = mapped_column(Text, nullable=False) + parent_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("units.id", ondelete="SET NULL"), nullable=True + ) + address: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + + +class User(Base): + __tablename__ = "users" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + email: Mapped[str] = mapped_column(Text, unique=True, nullable=False) + password_hash: Mapped[str] = mapped_column(Text, nullable=False) + full_name: Mapped[str] = mapped_column(Text, nullable=False) + phone: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + unit_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("units.id", ondelete="SET NULL"), nullable=True + ) + is_active: Mapped[bool] = mapped_column(Boolean, default=True) + email_verified: Mapped[bool] = mapped_column( + Boolean, nullable=False, server_default=text("false"), default=False + ) + credential_version: Mapped[int] = mapped_column( + Integer, nullable=False, server_default=text("0"), default=0 + ) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + staff_profile: Mapped[Optional["UserStaffProfile"]] = relationship( + "UserStaffProfile", + back_populates="user", + uselist=False, + foreign_keys="UserStaffProfile.user_id", + ) + + +class UserRoleRow(Base): + __tablename__ = "user_roles" + + user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), primary_key=True + ) + role: Mapped[str] = mapped_column(user_role_enum, primary_key=True) + admin_from_email_policy: Mapped[bool] = mapped_column( + Boolean, nullable=False, server_default=text("false") + ) + + +class AcademicTitle(Base): + """Lookup for academic ranks / degrees (avoid Postgres ENUM drift for title list).""" + + __tablename__ = "academic_titles" + + code: Mapped[str] = mapped_column(Text, primary_key=True) + label_vi: Mapped[str] = mapped_column(Text, nullable=False) + label_en: Mapped[str] = mapped_column(Text, nullable=False) + sort_order: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0") + active: Mapped[bool] = mapped_column(Boolean, nullable=False, server_default=text("true")) + + +class UserStaffProfile(Base): + """1:1 institutional profile + verification state (HR scalars; not MinIO).""" + + __tablename__ = "user_staff_profiles" + + user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), primary_key=True + ) + employee_id: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + academic_title_code: Mapped[Optional[str]] = mapped_column( + Text, ForeignKey("academic_titles.code"), nullable=True + ) + academic_title_other: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + unit_name_freetext: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + job_title: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + + profile_verification_status: Mapped[str] = mapped_column( + profile_verification_status_enum, nullable=False, server_default=text("'draft'") + ) + verification_submitted_at: Mapped[Optional[datetime]] = mapped_column( + DateTime(timezone=True), nullable=True + ) + verified_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + verified_by_user_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id"), nullable=True + ) + rejection_reason: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + + version: Mapped[int] = mapped_column(Integer, nullable=False, server_default="1") + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + user: Mapped["User"] = relationship( + "User", + back_populates="staff_profile", + foreign_keys=[user_id], + ) + + +class EmailVerificationToken(Base): + """One-time email verification link secrets (store hash only).""" + + __tablename__ = "email_verification_tokens" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + token_hash: Mapped[str] = mapped_column(Text, unique=True, nullable=False) + expires_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False) + used_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + created_at: Mapped[datetime] = mapped_column( + DateTime(timezone=True), server_default=text("now()"), nullable=False + ) + + +class RegistrationOtpCode(Base): + """Hashed 6-digit OTP for registration verification (magic-link flow superseded on register).""" + + __tablename__ = "registration_otp_codes" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + otp_hash: Mapped[str] = mapped_column(Text, nullable=False) + expires_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False) + failed_attempts: Mapped[int] = mapped_column(Integer, nullable=False, server_default=text("0")) + used_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + created_at: Mapped[datetime] = mapped_column( + DateTime(timezone=True), server_default=text("now()"), nullable=False + ) + + +class PasswordResetToken(Base): + """One-time password reset secrets (store hash only).""" + + __tablename__ = "password_reset_tokens" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + token_hash: Mapped[str] = mapped_column(Text, unique=True, nullable=False) + expires_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False) + used_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + created_at: Mapped[datetime] = mapped_column( + DateTime(timezone=True), server_default=text("now()"), nullable=False + ) + + +class Initiative(Base): + __tablename__ = "initiatives" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + case_code: Mapped[str] = mapped_column(Text, unique=True, nullable=False) + owner_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False) + status: Mapped[str] = mapped_column(submission_status_enum, nullable=False, server_default="draft") + recognition_tier: Mapped[Optional[str]] = mapped_column(recognition_tier_enum, nullable=True) + submitted_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + drafts: Mapped[list["Draft"]] = relationship( + "Draft", + back_populates="initiative", + cascade="all, delete-orphan", + ) + + +class Draft(Base): + __tablename__ = "drafts" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + draft_code: Mapped[str] = mapped_column(Text, unique=True, nullable=False) + initiative_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="CASCADE"), nullable=False + ) + payload: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False) + version: Mapped[int] = mapped_column(Integer, nullable=False, server_default="1") + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + initiative: Mapped["Initiative"] = relationship("Initiative", back_populates="drafts") + + +class AuditLog(Base): + __tablename__ = "audit_log" + + id: Mapped[int] = mapped_column(BigInteger, primary_key=True, autoincrement=True) + actor_id: Mapped[Optional[uuid.UUID]] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=True) + action: Mapped[str] = mapped_column(Text, nullable=False) + entity: Mapped[str] = mapped_column(Text, nullable=False) + entity_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), nullable=False) + diff: Mapped[Optional[dict[str, Any]]] = mapped_column(JSONB, nullable=True) + occurred_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class AuditEvent(Base): + """Append-only CRUD / auth audit trail (see migrations/008_audit_events.sql).""" + + __tablename__ = "audit_events" + + id: Mapped[int] = mapped_column(BigInteger, primary_key=True, autoincrement=True) + occurred_at: Mapped[datetime] = mapped_column( + DateTime(timezone=True), server_default=text("now()"), nullable=False + ) + actor_user_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + actor_email: Mapped[str] = mapped_column(Text, nullable=False) + actor_role: Mapped[str] = mapped_column(Text, nullable=False) + action: Mapped[str] = mapped_column(audit_action_enum, nullable=False) + entity_type: Mapped[str] = mapped_column(Text, nullable=False) + entity_id: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + before: Mapped[Optional[dict[str, Any]]] = mapped_column(JSONB, nullable=True) + after: Mapped[Optional[dict[str, Any]]] = mapped_column(JSONB, nullable=True) + metadata_: Mapped[dict[str, Any]] = mapped_column( + "metadata", JSONB, nullable=False, server_default=text("'{}'::jsonb") + ) + request_id: Mapped[Optional[uuid.UUID]] = mapped_column(UUID(as_uuid=True), nullable=True) + + +class DraftTabSnapshot(Base): + """Per-tab JSONB version history for an initiative draft (autosave / explicit save).""" + + __tablename__ = "draft_tab_snapshots" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + initiative_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="CASCADE"), nullable=False + ) + draft_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("drafts.id", ondelete="SET NULL"), nullable=True + ) + tab: Mapped[str] = mapped_column(Text, nullable=False) + tab_version: Mapped[int] = mapped_column(Integer, nullable=False, server_default="1") + payload: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False) + source: Mapped[str] = mapped_column(Text, nullable=False, server_default="autosave") + captured_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ApplicationSubmitSnapshot(Base): + """Immutable snapshot of merged tabs + metadata at submit time.""" + + __tablename__ = "application_submit_snapshots" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + initiative_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="CASCADE"), nullable=False + ) + submission_record_id: Mapped[str] = mapped_column(Text, nullable=False) + merged_tabs: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False) + submit_metadata: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False) + full_pdf_uri: Mapped[str] = mapped_column(Text, nullable=False) + captured_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ApplicationWorkflow(Base): + """Council-facing workflow projection (review status, deadlines, assignments).""" + + __tablename__ = "application_workflow" + + initiative_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="CASCADE"), primary_key=True + ) + review_status: Mapped[str] = mapped_column(Text, nullable=False, server_default="not_reviewed") + review_deadline: Mapped[Optional[date]] = mapped_column(Date, nullable=True) + reviewer: Mapped[Optional[dict[str, Any]]] = mapped_column(JSONB, nullable=True) + supervisor: Mapped[Optional[dict[str, Any]]] = mapped_column(JSONB, nullable=True) + conference: Mapped[Optional[dict[str, Any]]] = mapped_column(JSONB, nullable=True) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ApplicationTaxonomy(Base): + """Subject / group / topic type for list views and analytics.""" + + __tablename__ = "application_taxonomy" + + initiative_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="CASCADE"), primary_key=True + ) + subject_id: Mapped[str] = mapped_column(Text, nullable=False, server_default="") + group_id: Mapped[str] = mapped_column(Text, nullable=False, server_default="") + topic_type: Mapped[str] = mapped_column(Text, nullable=False, server_default="") + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ApplicationArtifact(Base): + """Registered deliverables (full PDF, evidence kinds, future abstract/poster).""" + + __tablename__ = "application_artifacts" + __table_args__ = (UniqueConstraint("initiative_id", "role", name="uq_application_artifacts_init_role"),) + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + initiative_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="CASCADE"), nullable=False + ) + role: Mapped[str] = mapped_column(Text, nullable=False) + storage_kind: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + storage_uri: Mapped[str] = mapped_column(Text, nullable=False) + original_name: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + mime_type: Mapped[str] = mapped_column(Text, nullable=False, server_default="application/pdf") + byte_size: Mapped[Optional[int]] = mapped_column(BigInteger, nullable=True) + sha256: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + uploaded_by: Mapped[Optional[uuid.UUID]] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=True) + uploaded_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + # Staff (admin / hội đồng) — minh chứng upload; applicants do not set these + review_status: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + reviewed_by: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + reviewed_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + + +class ApplicationAdminResult(Base): + """Quản trị ghi nhận kết quả Duyệt/Từ chối theo initiative (một bản ghi / hồ sơ).""" + + __tablename__ = "application_admin_results" + __table_args__ = (UniqueConstraint("initiative_id", name="uq_application_admin_results_initiative"),) + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + initiative_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="CASCADE"), nullable=False + ) + decision: Mapped[str] = mapped_column(Text, nullable=False) + feedback: Mapped[str] = mapped_column(Text, nullable=False, server_default="") + rationale: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + created_by: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + updated_by: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + + +class UserNotification(Base): + """Applicant inbox row; created when admin saves adjudication (best-effort, post-commit).""" + + __tablename__ = "user_notifications" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + recipient_user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + type: Mapped[str] = mapped_column(Text, nullable=False) + title: Mapped[str] = mapped_column(Text, nullable=False) + body: Mapped[str] = mapped_column(Text, nullable=False) + application_id: Mapped[str] = mapped_column(Text, nullable=False) + related_initiative_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="SET NULL"), nullable=True + ) + source_admin_result_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("application_admin_results.id", ondelete="SET NULL"), nullable=True + ) + decision: Mapped[str] = mapped_column(Text, nullable=False) + merit_category_label: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + feedback_text: Mapped[str] = mapped_column(Text, nullable=False, server_default="") + rationale_text: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + read_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ApplicationReviewDocument(Base): + """Versioned ReviewPanel JSON bundles for DOCX/backoffice pipelines.""" + + __tablename__ = "application_review_documents" + __table_args__ = (UniqueConstraint("initiative_id", "document_version", name="uq_review_docs_init_ver"),) + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + initiative_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("initiatives.id", ondelete="CASCADE"), nullable=False + ) + case_id: Mapped[str] = mapped_column(Text, nullable=False) + document_version: Mapped[int] = mapped_column(Integer, nullable=False, server_default="1") + official_bieu_mau: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False, server_default=text("'{}'::jsonb")) + template_data: Mapped[Optional[dict[str, Any]]] = mapped_column(JSONB, nullable=True) + full_bundle: Mapped[Optional[dict[str, Any]]] = mapped_column(JSONB, nullable=True) + created_by: Mapped[Optional[uuid.UUID]] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class DocumentTemplate(Base): + """Admin-managed DOCX template (file in MinIO) + extracted Jinja placeholder fields.""" + + __tablename__ = "document_templates" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + name: Mapped[str] = mapped_column(Text, nullable=False) + description: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + storage_key: Mapped[str] = mapped_column(Text, nullable=False) + original_filename: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + content_sha256: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + fields: Mapped[list[dict[str, Any]]] = mapped_column( + JSONB, nullable=False, server_default=text("'[]'::jsonb") + ) + is_active: Mapped[bool] = mapped_column(Boolean, nullable=False, server_default=text("true")) + created_by: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +# --------------------------------------------------------------------------- # +# Research projects (Thuyết minh đề tài) + PI cockpit entities — migration 016. +# The proposal row IS the project across its lifecycle (draft→submitted→approved|rejected); +# child research_project_* tables hold the cockpit entities. Owner+admin authz (v1). +# --------------------------------------------------------------------------- # +class ResearchProject(Base): + """A research-project proposal that becomes a managed project on approval.""" + + __tablename__ = "research_projects" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + owner_user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + status: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'draft'")) + code: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + title: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + level: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + pi_name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + period_months: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) + budget_total: Mapped[Optional[float]] = mapped_column(Numeric(14, 2), nullable=True) + content: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False, server_default=text("'{}'::jsonb")) + submitted_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + reviewed_by: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + reviewed_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True) + review_note: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ResearchProjectMember(Base): + __tablename__ = "research_project_members" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + project_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("research_projects.id", ondelete="CASCADE"), nullable=False + ) + sort_order: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0") + name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + role: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + access: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + org: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + email: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + months: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) + tasks: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + status: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + user_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ResearchProjectDataset(Base): + __tablename__ = "research_project_datasets" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + project_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("research_projects.id", ondelete="CASCADE"), nullable=False + ) + sort_order: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0") + name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + type: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + records: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) + source: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + sensitivity: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + ethics: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + owner: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + status: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ResearchProjectModel(Base): + __tablename__ = "research_project_models" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + project_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("research_projects.id", ondelete="CASCADE"), nullable=False + ) + sort_order: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0") + name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + task: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + framework: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + version: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + dataset: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + auc: Mapped[Optional[float]] = mapped_column(Numeric(6, 4), nullable=True) + sensitivity: Mapped[Optional[float]] = mapped_column(Numeric(6, 4), nullable=True) + specificity: Mapped[Optional[float]] = mapped_column(Numeric(6, 4), nullable=True) + accuracy: Mapped[Optional[float]] = mapped_column(Numeric(6, 4), nullable=True) + owner: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + notes: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + status: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ResearchProjectAsset(Base): + __tablename__ = "research_project_assets" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + project_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("research_projects.id", ondelete="CASCADE"), nullable=False + ) + sort_order: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0") + name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + category: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + acquisition: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + value: Mapped[Optional[float]] = mapped_column(Numeric(14, 2), nullable=True) + owner: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + notes: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + status: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ResearchProjectMilestone(Base): + __tablename__ = "research_project_milestones" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + project_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("research_projects.id", ondelete="CASCADE"), nullable=False + ) + sort_order: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0") + title: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + deliverable: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + start_period: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + end_period: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + owner: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + budget: Mapped[Optional[float]] = mapped_column(Numeric(14, 2), nullable=True) + progress: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0") + status: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ResearchProjectAudit(Base): + """Append-only audit trail for a research project (lifecycle + entity CRUD).""" + + __tablename__ = "research_project_audit" + + id: Mapped[int] = mapped_column(BigInteger, primary_key=True, autoincrement=True) + project_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("research_projects.id", ondelete="CASCADE"), nullable=False + ) + occurred_at: Mapped[datetime] = mapped_column( + DateTime(timezone=True), server_default=text("now()"), nullable=False + ) + actor_user_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + actor_name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + role_label: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + action: Mapped[str] = mapped_column(Text, nullable=False) + subject: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + detail: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + + +# --------------------------------------------------------------------------- # +# ImageHub: content-addressed imaging dataset versioning — migration 017. +# A dataset is owned by a user; files dedupe into imagehub_blobs by sha256; an +# imagehub_version freezes a manifest snapshot of the working files. Owner+admin authz (v1). +# --------------------------------------------------------------------------- # +class ImagehubDataset(Base): + """An ImageHub dataset (a versioned collection of imaging files).""" + + __tablename__ = "imagehub_datasets" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + owner_user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + slug: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + description: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + visibility: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'private'")) + modality_tags: Mapped[Any] = mapped_column(JSONB, nullable=False, server_default=text("'[]'::jsonb")) + # Per-dataset value->name label map for multi-label masks (migration 027), e.g. {"1":"kidney"}. + label_map: Mapped[Any] = mapped_column(JSONB, nullable=False, server_default=text("'{}'::jsonb")) + default_branch: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'main'")) + # The research project (the "workspace") this dataset belongs to, if any (migration 024). + research_project_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("research_projects.id", ondelete="SET NULL"), nullable=True + ) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ImagehubBlob(Base): + """Globally content-addressed blob registry (dedup by sha256).""" + + __tablename__ = "imagehub_blobs" + + sha256: Mapped[str] = mapped_column(Text, primary_key=True) + size_bytes: Mapped[int] = mapped_column(BigInteger, nullable=False, server_default="0") + media_type: Mapped[str] = mapped_column( + Text, nullable=False, server_default=text("'application/octet-stream'") + ) + storage_bucket: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + storage_key: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ImagehubStorageMethod(Base): + """A verified external storage method (S3/GCS/Azure) for Cloud Import. Credentials live + encrypted in config_encrypted and are NEVER returned to the client (privacy rule SM7).""" + + __tablename__ = "imagehub_storage_methods" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + owner_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + provider: Mapped[str] = mapped_column(Text, nullable=False) + access_mode: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'read'")) + bucket: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + region: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + config_encrypted: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + verification_status: Mapped[str] = mapped_column( + Text, nullable=False, server_default=text("'pending'") + ) + verification_reason: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + verification_checked_at: Mapped[Optional[datetime]] = mapped_column( + DateTime(timezone=True), nullable=True + ) + created_by: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ImagehubDatasetFile(Base): + """Current working file set on a dataset default branch (one row per folder_path + logical_path).""" + + __tablename__ = "imagehub_dataset_files" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + dataset_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_datasets.id", ondelete="CASCADE"), nullable=False + ) + logical_path: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + # Folder path (migration 026): the relative directory of the file inside the dataset so an + # uploaded tree (e.g. nnU-Net imagesTr/labelsTr) is preserved. '' = dataset root. + folder_path: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + blob_sha256: Mapped[Optional[str]] = mapped_column( + Text, ForeignKey("imagehub_blobs.sha256", ondelete="RESTRICT"), nullable=True + ) + size_bytes: Mapped[int] = mapped_column(BigInteger, nullable=False, server_default="0") + media_type: Mapped[str] = mapped_column( + Text, nullable=False, server_default=text("'application/octet-stream'") + ) + imaging_meta: Mapped[Any] = mapped_column(JSONB, nullable=False, server_default=text("'{}'::jsonb")) + # Segmentation linking (migration 018): a mask row (file_kind='segmentation') points at the + # image it segments via parent_file_id; organ_label names the organ. Regular files are 'image'. + file_kind: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'image'")) + parent_file_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), + ForeignKey("imagehub_dataset_files.id", ondelete="CASCADE"), + nullable=True, + ) + organ_label: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + # Cloud Import (migration 019): a file is EITHER a local content-addressed blob (blob_sha256) + # OR an external reference (storage_method_id + external_path) that streams from a verified + # storage method and is never copied to our servers (privacy rule C4). + storage_method_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), + ForeignKey("imagehub_storage_methods.id", ondelete="RESTRICT"), + nullable=True, + ) + external_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + uploaded_by: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ImagehubVersion(Base): + """A frozen manifest snapshot of a dataset working files (DAG-ready via parent_version_id).""" + + __tablename__ = "imagehub_versions" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + dataset_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_datasets.id", ondelete="CASCADE"), nullable=False + ) + seq: Mapped[int] = mapped_column(Integer, nullable=False, server_default="1") + message: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + manifest: Mapped[Any] = mapped_column(JSONB, nullable=False, server_default=text("'[]'::jsonb")) + parent_version_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_versions.id", ondelete="SET NULL"), nullable=True + ) + author_user_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ImagehubDatasetStage(Base): + """A labeling-pipeline stage on a dataset (Label -> Review_1 -> Review_2 ...). `auto_assign` + is the 'Automatic Task Assignment' toggle; `review_percent` applies to review stages.""" + + __tablename__ = "imagehub_dataset_stages" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + dataset_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_datasets.id", ondelete="CASCADE"), nullable=False + ) + name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + kind: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'label'")) + seq: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0") + review_percent: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) + auto_assign: Mapped[bool] = mapped_column(Boolean, nullable=False, server_default=text("true")) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ImagehubDatasetAudit(Base): + """Append-only audit trail for an ImageHub dataset.""" + + __tablename__ = "imagehub_dataset_audit" + + id: Mapped[int] = mapped_column(BigInteger, primary_key=True, autoincrement=True) + dataset_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_datasets.id", ondelete="CASCADE"), nullable=False + ) + occurred_at: Mapped[datetime] = mapped_column( + DateTime(timezone=True), server_default=text("now()"), nullable=False + ) + actor_user_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + actor_name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + role_label: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + action: Mapped[str] = mapped_column(Text, nullable=False) + subject: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + detail: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + + +class ImagehubTask(Base): + """A unit of labeling work: one dataset file moving through the dataset's pipeline stages. + + A NEW join row (not the file mutated in place): it carries the file's pipeline position + (``current_stage_id`` + ``pipeline_state``), the per-user queue ``queue_status``, the + ``assignee``, a ``priority`` float (0..1), and the Ground-Truth ``is_reference_standard`` flag. + Single-user MVP: task access reuses the dataset owner-or-admin gate; multi-labeler membership is + a later phase. ``UNIQUE(dataset_file_id)`` enforces one task per file (droppable later). + """ + + __tablename__ = "imagehub_tasks" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + dataset_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_datasets.id", ondelete="CASCADE"), nullable=False + ) + dataset_file_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_dataset_files.id", ondelete="CASCADE"), nullable=False + ) + name: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + current_stage_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_dataset_stages.id", ondelete="SET NULL"), nullable=True + ) + # §3 pipeline state: inLabel -> inReview -> groundTruth (terminal); issue diverts (later phase). + pipeline_state: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'inLabel'")) + # §4 per-user queue status within a stage. + queue_status: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'assigned'")) + assignee_user_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + # A2 manual assignment is sticky (never auto-reassigned); 'auto' is first-come-first-serve. + assignment_mode: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'auto'")) + priority: Mapped[float] = mapped_column(Float, nullable=False, server_default=text("0")) + is_reference_standard: Mapped[bool] = mapped_column( + Boolean, nullable=False, server_default=text("false") + ) + # The labeler's vector annotations (bbox/points/pen/brush/polygon), normalized [0..1] geometry + # per slice — persisted as JSON so the AnnotationTool can load + save a task's work (migr 022). + annotations: Mapped[Any] = mapped_column(JSONB, nullable=False, server_default=text("'[]'::jsonb")) + # Q2: a skipped task goes to the end of the queue, ordered by this monotonic seq. + skipped_seq: Mapped[Optional[int]] = mapped_column(BigInteger, nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ImagehubDatasetMember(Base): + """A user (other than the owner) granted access to a dataset's labeling work (multi-labeler). + + MVP: all members are labelers — they view the dataset and work tasks assigned to them; dataset / + stage / settings management stays with the owner + platform admins. ``role`` is reserved for a + future project-admin tier. UNIQUE(dataset_id, user_id): one membership per user per dataset. + """ + + __tablename__ = "imagehub_dataset_members" + + id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + dataset_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_datasets.id", ondelete="CASCADE"), nullable=False + ) + user_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + role: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("'member'")) + added_by: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) + + +class ImagehubTaskReviewEvent(Base): + """An append-only record of a task review decision (accept / acceptWithCorrections / reject), + capturing the reviewer, the stage reviewed, and an optional note (e.g. a reject reason). + Powers review history + per-reviewer accept/reject counters (migration 025).""" + + __tablename__ = "imagehub_task_review_events" + + id: Mapped[int] = mapped_column(BigInteger, primary_key=True, autoincrement=True) + dataset_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_datasets.id", ondelete="CASCADE"), nullable=False + ) + task_id: Mapped[uuid.UUID] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_tasks.id", ondelete="CASCADE"), nullable=False + ) + stage_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("imagehub_dataset_stages.id", ondelete="SET NULL"), nullable=True + ) + reviewer_user_id: Mapped[Optional[uuid.UUID]] = mapped_column( + UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + decision: Mapped[str] = mapped_column(Text, nullable=False) + note: Mapped[str] = mapped_column(Text, nullable=False, server_default=text("''")) + created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=text("now()")) diff --git a/be0/src/initiative_db/repair_split_submission.py b/be0/src/initiative_db/repair_split_submission.py new file mode 100644 index 0000000..815ba93 --- /dev/null +++ b/be0/src/initiative_db/repair_split_submission.py @@ -0,0 +1,315 @@ +""" +One-off repair: merge a mis-linked submission (saved under DR…/wrong initiative) onto the autosave CASE-* initiative. + +Safe by default (`dry_run=True`). Does not alter application HTTP handlers — callers are scripts/operators only. + +See `scripts/repair_split_submission.py`. +""" + +from __future__ import annotations + +import uuid +from dataclasses import dataclass, field +from datetime import datetime, timezone +from typing import Any, Dict, Optional + +from sqlalchemy import delete, select, text, update +from sqlalchemy.ext.asyncio import AsyncSession + +from src.initiative_db.application_storage import upsert_artifact_full_pdf +from src.initiative_db.models import ApplicationArtifact +from src.initiative_db.models import ApplicationReviewDocument +from src.initiative_db.models import ApplicationSubmitSnapshot +from src.initiative_db.models import ApplicationTaxonomy +from src.initiative_db.models import ApplicationWorkflow +from src.initiative_db.models import Draft, Initiative + + +def tabs_effectively_empty(tabs: Any) -> bool: + if not isinstance(tabs, dict) or len(tabs) == 0: + return True + for key in ("report", "application", "contribution"): + val = tabs.get(key) + if isinstance(val, dict) and len(val) > 0: + return False + if val: + return False + return True + + +def merge_payload_for_case_repair( + *, + target_case_code: str, + good_payload: Dict[str, Any], + bad_payload: Dict[str, Any], +) -> Dict[str, Any]: + """ + Preserve tab JSON from whichever side still has autosave (`good` wins if non-empty), + attach submission envelope from `bad` (submissionRecord, submissionFile). + """ + out = dict(good_payload) if isinstance(good_payload, dict) else {} + gp = dict(good_payload.get("tabs") or {}) if isinstance(good_payload, dict) else {} + bp = dict(bad_payload.get("tabs") or {}) if isinstance(bad_payload, dict) else {} + + if not tabs_effectively_empty(gp): + merged_tabs = {**gp} + elif not tabs_effectively_empty(bp): + merged_tabs = {**bp} + else: + merged_tabs = {**gp, **bp} + + out["tabs"] = merged_tabs + out["caseId"] = target_case_code.strip() + if isinstance(bad_payload, dict): + if isinstance(bad_payload.get("submissionRecord"), dict): + out["submissionRecord"] = dict(bad_payload["submissionRecord"]) # type: ignore[arg-type] + if isinstance(bad_payload.get("submissionFile"), dict): + out["submissionFile"] = dict(bad_payload["submissionFile"]) # type: ignore[arg-type] + + ts = datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z") + out["updatedAt"] = ts + return out + + +@dataclass +class RepairReport: + """Human-readable audit trail for CLI / logs.""" + + dry_run: bool + submission_record_id: str + owner_id: str + good_case_code: str + bad_case_code: str + actions: list[str] = field(default_factory=list) + skipped: Optional[str] = None + + +async def _latest_draft(session: AsyncSession, initiative_id: uuid.UUID) -> Optional[Draft]: + stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative_id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + return (await session.execute(stmt)).scalar_one_or_none() + + +async def find_initiative_by_submission_payload_id(session: AsyncSession, submission_record_id: str) -> tuple[Optional[Initiative], Optional[Draft]]: + """Locate initiative+draft whose `payload.submissionRecord.id` matches.""" + sid = (submission_record_id or "").strip() + if not sid or not sid.startswith("sub-"): + return None, None + + # PostgreSQL JSONB path (portable across SQLAlchemy JSON variants) + stmt = ( + select(Initiative, Draft) + .join(Draft, Draft.initiative_id == Initiative.id) + .where(text("(drafts.payload->'submissionRecord'->>'id') = :sid")) + .params(sid=sid) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + row = (await session.execute(stmt)).first() + if row is None: + return None, None + return row[0], row[1] + + +async def resolve_good_initiative( + session: AsyncSession, + *, + owner_id: uuid.UUID, + exclude_initiative_ids: tuple[uuid.UUID, ...], + explicit_good_case_code: Optional[str], +) -> Optional[Initiative]: + """Prefer explicit CASE-* code; otherwise latest CASE-* initiative for this owner (with non-empty drafts preferred).""" + from sqlalchemy import desc + + if explicit_good_case_code: + stmt = ( + select(Initiative) + .where( + Initiative.owner_id == owner_id, + Initiative.case_code == explicit_good_case_code.strip(), + ) + .limit(1) + ) + return (await session.execute(stmt)).scalar_one_or_none() + + stmt_all = ( + select(Initiative) + .where( + Initiative.owner_id == owner_id, + Initiative.id.notin(exclude_initiative_ids), + Initiative.case_code.ilike(r"CASE-%"), + ) + .order_by(desc(Initiative.updated_at)) + ) + initiatives = (await session.execute(stmt_all)).scalars().all() + nonempty_first: Optional[Initiative] = None + for ini in initiatives: + d = await _latest_draft(session, ini.id) + if d is None: + continue + pl = dict(d.payload) if isinstance(d.payload, dict) else {} + if not tabs_effectively_empty(pl.get("tabs")): + nonempty_first = ini + break + if nonempty_first is not None: + return nonempty_first + return initiatives[0] if initiatives else None + + +async def repair_submission_cross_initiative_merge( + session: AsyncSession, + *, + submission_record_id: str, + good_case_code_explicit: Optional[str] = None, + dry_run: bool = True, +) -> RepairReport: + """ + Attach submission data sitting on « bad » initiative to « good » CASE-* initiative, then delete the bad initiative. + + Preconditions (enforced): same owner_id; submission id found on bad initiative only. + + Caller must ``commit()`` the session unless dry_run. + """ + sub_id = submission_record_id.strip() + report = RepairReport( + dry_run=dry_run, + submission_record_id=sub_id, + owner_id="", + good_case_code="", + bad_case_code="", + actions=["resolve bad initiative row by submissionRecord.id"], + ) + + bad_ini, bad_draft = await find_initiative_by_submission_payload_id(session, sub_id) + if bad_ini is None or bad_draft is None: + report.skipped = f"No initiative found with drafts.payload submissionRecord.id = {sub_id!r}" + return report + + report.bad_case_code = bad_ini.case_code or "" + report.owner_id = str(bad_ini.owner_id) + + good_ini = await resolve_good_initiative( + session, + owner_id=bad_ini.owner_id, + exclude_initiative_ids=(bad_ini.id,), + explicit_good_case_code=good_case_code_explicit, + ) + if good_ini is None: + report.skipped = "Could not resolve a target CASE-* initiative for the same owner (pass --good-case explicitly)." + return report + + if good_ini.id == bad_ini.id: + report.skipped = "Bad and good initiative are identical — nothing to repair." + return report + + report.good_case_code = good_ini.case_code or "" + + good_draft = await _latest_draft(session, good_ini.id) + if good_draft is None: + report.skipped = "Target CASE initiative has no draft row — fix data manually." + return report + + good_pl = dict(good_draft.payload) if isinstance(good_draft.payload, dict) else {} + bad_pl = dict(bad_draft.payload) if isinstance(bad_draft.payload, dict) else {} + merged = merge_payload_for_case_repair( + target_case_code=good_ini.case_code or "", + good_payload=good_pl, + bad_payload=bad_pl, + ) + + if dry_run: + report.actions.extend( + [ + f"Would update good draft id={good_draft.id} with merged tabs + submission payload", + f"Would set good initiative status=submitted submitted_at=max(bad, good)", + f"Would repoint application_submit_snapshots from bad initiative {bad_ini.id} -> good {good_ini.id}", + f"Would upsert full_pdf artifact from bad -> good", + "Would copy application_workflow / application_taxonomy rows from bad only if missing on good", + "Would delete application_review_documents on bad (good keeps its own review JSON)", + f"Would DELETE initiative id={bad_ini.id} ({report.bad_case_code}); CASCADE removes orphan drafts and rows tied to bad only", + "WARN: if evidence uploads were stored only on bad initiative, MinIO metadata rows would cascade-delete — verify in DB first", + ] + ) + return report + + # --- apply --- + good_draft.payload = merged + good_draft.version = (good_draft.version or 0) + 1 + good_ini.status = "submitted" + if bad_ini.submitted_at and (good_ini.submitted_at is None or bad_ini.submitted_at > good_ini.submitted_at): + good_ini.submitted_at = bad_ini.submitted_at + elif good_ini.submitted_at is None and bad_ini.submitted_at: + good_ini.submitted_at = bad_ini.submitted_at + elif good_ini.submitted_at is None: + good_ini.submitted_at = datetime.now(timezone.utc) + + await session.flush() + + await session.execute( + update(ApplicationSubmitSnapshot) + .where(ApplicationSubmitSnapshot.initiative_id == bad_ini.id) + .values(initiative_id=good_ini.id) + ) + + bad_pdf = ( + await session.execute( + select(ApplicationArtifact).where( + ApplicationArtifact.initiative_id == bad_ini.id, + ApplicationArtifact.role == "full_pdf", + ) + ) + ).scalar_one_or_none() + if bad_pdf is not None: + await upsert_artifact_full_pdf( + session, + initiative_id=good_ini.id, + storage_uri=bad_pdf.storage_uri, + original_name=bad_pdf.original_name, + byte_size=bad_pdf.byte_size, + sha256_hex=bad_pdf.sha256, + uploaded_by=bad_pdf.uploaded_by, + storage_kind=getattr(bad_pdf, "storage_kind", None), + ) + + bad_wf = await session.get(ApplicationWorkflow, bad_ini.id) + good_wf = await session.get(ApplicationWorkflow, good_ini.id) + if bad_wf is not None and good_wf is None: + session.add( + ApplicationWorkflow( + initiative_id=good_ini.id, + review_status=bad_wf.review_status, + review_deadline=bad_wf.review_deadline, + reviewer=bad_wf.reviewer, + supervisor=bad_wf.supervisor, + conference=bad_wf.conference, + ) + ) + bad_tx = await session.get(ApplicationTaxonomy, bad_ini.id) + good_tx = await session.get(ApplicationTaxonomy, good_ini.id) + if bad_tx is not None and good_tx is None: + session.add( + ApplicationTaxonomy( + initiative_id=good_ini.id, + subject_id=bad_tx.subject_id, + group_id=bad_tx.group_id, + topic_type=bad_tx.topic_type, + ) + ) + + await session.execute(delete(ApplicationReviewDocument).where(ApplicationReviewDocument.initiative_id == bad_ini.id)) + + await session.delete(bad_ini) + await session.flush() + + report.actions.extend( + [ + f"Updated good draft {good_draft.id}, initiative {good_ini.case_code}", + f"Repointed submit snapshots; copied full_pdf artifact; cleared review docs on bad", + f"Deleted bad initiative {report.bad_case_code}", + ] + ) + return report diff --git a/be0/src/initiative_db/submission_readiness.py b/be0/src/initiative_db/submission_readiness.py new file mode 100644 index 0000000..7f5fafe --- /dev/null +++ b/be0/src/initiative_db/submission_readiness.py @@ -0,0 +1,398 @@ +"""Server-side readiness checks before final PDF submit (aligned with fe0 applicantHonestyPrerequisites).""" + +from __future__ import annotations + +import re +import uuid +from datetime import date +from typing import Any, Dict, List, Mapping, Optional + +from sqlalchemy import select +from sqlalchemy.ext.asyncio import AsyncSession + +from src.initiative_db.application_storage import ( + EVIDENCE_ROLE_RESEARCH, + EVIDENCE_ROLE_TEXTBOOK, + EVIDENCE_ROLE_TECHNICAL, +) +from src.initiative_db.models import ApplicationArtifact + +FIRST_APPLY_MIN = date(2025, 4, 15) +FIRST_APPLY_MAX = date(2026, 4, 15) + + +class ApplicationSubmissionNotReadyError(Exception): + """Tabs JSON or MinIO evidence registry does not satisfy submit rules.""" + + def __init__(self, missing: List[str]) -> None: + self.missing = list(missing) + msg = "; ".join(self.missing[:10]) + if len(self.missing) > 10: + msg += " …" + super().__init__(msg) + + +def _txt(value: Any) -> str: + return str(value or "").strip() + + +def _truthy(value: Any) -> bool: + if value is True: + return True + if isinstance(value, str) and value.strip().lower() in ("true", "1", "yes"): + return True + return False + + +def _parse_dd_mm_yyyy(value: str) -> Optional[date]: + m = re.fullmatch(r"(\d{2})/(\d{2})/(\d{4})", value.strip()) + if not m: + return None + d, mo, y = int(m.group(1)), int(m.group(2)), int(m.group(3)) + try: + return date(y, mo, d) + except ValueError: + return None + + +def _eff(report: Mapping[str, Any], key: str) -> str: + eff = report.get("effectiveness") + if not isinstance(eff, dict): + return "" + return _txt(eff.get(key)) + + +def _json_evidence_pdf_present(slot: Any) -> bool: + if slot is None: + return False + if isinstance(slot, dict): + if _txt(slot.get("serverStorageKey")): + return True + try: + if int(slot.get("size") or 0) > 0: + return True + except (TypeError, ValueError): + pass + return False + + +def _digits_cccd(s: str) -> str: + return "".join(c for c in s if c.isdigit()) + + +def _ban_cam_ket_complete(b: Any) -> bool: + if not isinstance(b, dict): + return False + nk = b.get("ngay_ky") if isinstance(b.get("ngay_ky"), dict) else {} + if not _txt(b.get("tac_gia_dang_ky")) or not _txt(b.get("don_vi")): + return False + if not _txt(b.get("ten_bai_bao")) or not _txt(b.get("nguoi_cam_ket")): + return False + cc = _digits_cccd(_txt(b.get("cccd"))) + if len(cc) < 8 or len(cc) > 12: + return False + nx = _txt(b.get("nam_xet")) + if not re.fullmatch(r"\d{4}", nx): + return False + vt = b.get("vai_tro") if isinstance(b.get("vai_tro"), dict) else {} + tc = _truthy(vt.get("tac_gia_chinh")) + dt = _truthy(vt.get("dong_tac_gia")) + if int(tc) + int(dt) != 1: + return False + ck = b.get("cam_ket") if isinstance(b.get("cam_ket"), dict) else {} + if not _truthy(ck.get("quyen_so_huu_1")) and not _truthy(ck.get("quyen_so_huu_2")): + return False + if not ( + _truthy(ck.get("dong_thuan")) + and _truthy(ck.get("bai_bao_uy_tin")) + and _truthy(ck.get("tuan_thu_phap_luat")) + ): + return False + ng = _txt(nk.get("ngay")) + th = _txt(nk.get("thang")) + nam = _txt(nk.get("nam")) + if not ng or not th or not re.fullmatch(r"\d{4}", nam): + return False + return True + + +def _reference_material_honesty_complete(h: Any) -> bool: + if not isinstance(h, dict): + return False + nk = h.get("ngay_ky") if isinstance(h.get("ngay_ky"), dict) else {} + if ( + not _txt(h.get("tac_gia_dang_ky")) + or not _txt(h.get("don_vi")) + or not _txt(h.get("ten_tai_lieu")) + or not _txt(h.get("nguoi_cam_ket")) + ): + return False + cc = _digits_cccd(_txt(h.get("cccd"))) + if len(cc) < 8 or len(cc) > 12: + return False + nx = _txt(h.get("nam_xet")) + if not re.fullmatch(r"\d{4}", nx): + return False + ck = h.get("cam_ket") if isinstance(h.get("cam_ket"), dict) else {} + if not ( + _truthy(ck.get("thong_tin_trung_thuc")) + and _truthy(ck.get("trach_nhiem_phap_luat")) + and _truthy(ck.get("bo_sung_khi_yeu_cau")) + ): + return False + ng = _txt(nk.get("ngay")) + th = _txt(nk.get("thang")) + nam = _txt(nk.get("nam")) + if not ng or not th or not re.fullmatch(r"\d{4}", nam): + return False + return True + + +def _research_domestic_honesty_complete(h: Any) -> bool: + if not isinstance(h, dict): + return False + nk = h.get("ngay_ky") if isinstance(h.get("ngay_ky"), dict) else {} + if ( + not _txt(h.get("tac_gia_dang_ky")) + or not _txt(h.get("don_vi")) + or not _txt(h.get("ten_bai_bao")) + or not _txt(h.get("nguoi_cam_ket")) + ): + return False + cc = _digits_cccd(_txt(h.get("cccd"))) + if len(cc) < 8 or len(cc) > 12: + return False + nx = _txt(h.get("nam_xet")) + if not re.fullmatch(r"\d{4}", nx): + return False + ck = h.get("cam_ket") if isinstance(h.get("cam_ket"), dict) else {} + if not ( + _truthy(ck.get("thong_tin_trung_thuc")) + and _truthy(ck.get("trach_nhiem_phap_luat")) + and _truthy(ck.get("bo_sung_khi_yeu_cau")) + ): + return False + ng = _txt(nk.get("ngay")) + th = _txt(nk.get("thang")) + nam = _txt(nk.get("nam")) + if not ng or not th or not re.fullmatch(r"\d{4}", nam): + return False + return True + + +def _push(ok: bool, gaps: List[str], msg: str) -> None: + if not ok: + gaps.append(msg) + + +def collect_report_tab_gaps(r: Mapping[str, Any]) -> List[str]: + gaps: List[str] = [] + _push(bool(_txt(r.get("introduction"))), gaps, "Báo cáo: Mở đầu (§1)") + _push(bool(_txt(r.get("initiativeName"))), gaps, "Báo cáo: Tên sáng kiến") + _push(bool(_txt(r.get("representativeAuthor"))), gaps, "Báo cáo: Tác giả đại diện") + _push(bool(_txt(r.get("representativePhone"))), gaps, "Báo cáo: Điện thoại") + _push(bool(_txt(r.get("representativeEmail"))), gaps, "Báo cáo: Email") + _push(bool(_txt(r.get("applicationField"))), gaps, "Báo cáo: Lĩnh vực áp dụng") + _push(bool(_txt(r.get("currentStatus"))), gaps, "Báo cáo: Hiện trạng / giải pháp đã biết (§4.1)") + _push(bool(_txt(r.get("purpose"))), gaps, "Báo cáo: Mục đích (§4.2)") + _push(bool(_txt(r.get("implementationSteps"))), gaps, "Báo cáo: Các bước thực hiện") + _push(bool(_txt(r.get("firstAppliedUnit"))), gaps, "Báo cáo: Đơn vị áp dụng lần đầu") + _push(bool(_txt(r.get("achievedResult"))), gaps, "Báo cáo: Kết quả thu được") + _push(bool(_txt(r.get("conditions"))), gaps, "Báo cáo: Điều kiện áp dụng") + _push(bool(_txt(r.get("novelty"))), gaps, "Báo cáo: Tính mới của sáng kiến") + _push(bool(_eff(r, "economic")), gaps, "Báo cáo: Hiệu quả kinh tế") + _push(bool(_eff(r, "teaching")), gaps, "Báo cáo: Hiệu quả công việc / giảng dạy") + _push(bool(_eff(r, "safety")), gaps, "Báo cáo: Môi trường & an toàn") + _push(bool(_eff(r, "social")), gaps, "Báo cáo: Nhận thức & xã hội") + return gaps + + +def _collect_author_gaps(authors: Any) -> List[str]: + gaps: List[str] = [] + if not isinstance(authors, list) or len(authors) == 0: + gaps.append("Đơn: thêm ít nhất một tác giả.") + return gaps + for i, a in enumerate(authors): + if not isinstance(a, dict): + gaps.append(f"Đơn — tác giả {i + 1}: dữ liệu không hợp lệ") + continue + p = f"Đơn — tác giả {i + 1}" + _push(bool(_txt(a.get("name"))), gaps, f"{p}: họ tên") + _push(bool(_txt(a.get("dob"))), gaps, f"{p}: ngày sinh") + _push(bool(_txt(a.get("workplace"))), gaps, f"{p}: nơi công tác") + _push(bool(_txt(a.get("title"))), gaps, f"{p}: chức danh") + _push(bool(_txt(a.get("qualification"))), gaps, f"{p}: trình độ") + total = 0 + for a in authors: + if isinstance(a, dict): + try: + total += int(float(a.get("contributionPercent") or 0)) + except (TypeError, ValueError): + pass + _push(total == 100, gaps, "Đơn: tổng % đóng góp của các tác giả phải bằng 100%.") + return gaps + + +def _classification_pdf_ok( + classification: str, + application: Mapping[str, Any], + evidence_flags: Mapping[str, bool], +) -> tuple[bool, List[str]]: + """Returns (all_ok, gap_messages).""" + gaps: List[str] = [] + + def research_ok() -> bool: + return bool(evidence_flags.get("research")) or _json_evidence_pdf_present(application.get("researchEvidenceFile")) + + def textbook_ok() -> bool: + return bool(evidence_flags.get("textbook")) or _json_evidence_pdf_present(application.get("textbookEvidenceFile")) + + def technical_ok() -> bool: + return bool(evidence_flags.get("technical")) or _json_evidence_pdf_present(application.get("technicalEvidenceFile")) + + if classification == "technical": + _push(technical_ok(), gaps, "Đơn (Nhóm 1): cần tệp PDF minh chứng kỹ thuật / văn bản đơn vị (đã tải lên máy chủ).") + return (len(gaps) == 0, gaps) + + if classification == "textbook": + kind = _txt(application.get("textbookEvidenceKind")) + _push(bool(kind), gaps, "Đơn (Nhóm 2.2): chọn loại minh chứng (xuất sắc / tài liệu tham khảo).") + _push(textbook_ok(), gaps, "Đơn (Nhóm 2.2): cần tệp PDF minh chứng (Quyết định xuất bản…).") + if kind == "book": + _push(_ban_cam_ket_complete(application.get("banCamKet")), gaps, "Đơn (Nhóm 2.2 — sách/giáo trình): hoàn thành bản cam kết tác giả.") + if kind == "reference": + _push( + _reference_material_honesty_complete(application.get("referenceMaterialHonesty")), + gaps, + "Đơn (Nhóm 2.2 — tài liệu tham khảo): hoàn thành biểu xác minh trung thực.", + ) + return (len(gaps) == 0, gaps) + + if classification == "research": + rek = _txt(application.get("researchEvidenceKind")) + _push(bool(rek), gaps, "Đơn (Nhóm 2.1): chọn loại minh chứng (tạp chí / poster…).") + _push(research_ok(), gaps, "Đơn (Nhóm 2.1): cần tệp PDF minh chứng bài báo / poster.") + if rek == "international": + ban_ok = _ban_cam_ket_complete(application.get("banCamKet")) + legacy = bool(_txt(application.get("internationalJournalDeclaration"))) + _push(ban_ok or legacy, gaps, "Đơn (Nhóm 2.1 — quốc tế): hoàn thành bản cam kết tác giả hoặc tuyên bố tạp chí.") + if rek == "domestic": + ban_ok_dom = _ban_cam_ket_complete(application.get("banCamKet")) + _push( + ban_ok_dom + or _research_domestic_honesty_complete(application.get("researchDomesticHonesty")), + gaps, + "Đơn (Nhóm 2.1 — trong nước): hoàn thành biểu xác nhận bài báo trong nước.", + ) + return (len(gaps) == 0, gaps) + + return (True, []) + + +def collect_application_tab_gaps( + application: Mapping[str, Any], + evidence_flags: Mapping[str, bool], +) -> List[str]: + gaps: List[str] = [] + unit_ok = bool(_txt(application.get("unitName"))) + if not unit_ok: + authors0 = application.get("authors") + if isinstance(authors0, list) and len(authors0) > 0 and isinstance(authors0[0], dict): + unit_ok = bool(_txt(authors0[0].get("workplace"))) + _push(unit_ok, gaps, "Đơn: Tên đơn vị") + gaps.extend(_collect_author_gaps(application.get("authors"))) + _push(bool(_txt(application.get("initiativeName"))), gaps, "Đơn: Tên sáng kiến") + _push(bool(_txt(application.get("investorName"))), gaps, "Đơn: Chủ đầu tư") + _push(bool(_txt(application.get("applicationField"))), gaps, "Đơn: Lĩnh vực áp dụng") + fad_raw = _txt(application.get("firstApplyDate")) + fad = _parse_dd_mm_yyyy(fad_raw) if fad_raw else None + _push(fad is not None, gaps, "Đơn: Ngày áp dụng lần đầu (định dạng dd/mm/yyyy)") + if fad is not None and (fad < FIRST_APPLY_MIN or fad > FIRST_APPLY_MAX): + gaps.append("Đơn: Ngày áp dụng lần đầu phải từ 15/04/2025 đến 15/04/2026.") + + ic = application.get("initiativeClassification") + ic_str = _txt(ic) if ic is not None else "" + _push(ic_str != "", gaps, "Đơn: Phân loại sáng kiến (Nhóm 1 / 2.1 / 2.2)") + if ic_str: + _, cgaps = _classification_pdf_ok(ic_str, application, evidence_flags) + gaps.extend(cgaps) + + _push(bool(_txt(application.get("contentSummary"))), gaps, "Đơn: Nội dung của sáng kiến (§4)") + _push(bool(_txt(application.get("conditions"))), gaps, "Đơn: Điều kiện áp dụng") + _push(bool(_txt(application.get("authorEvaluation"))), gaps, "Đơn: Đánh giá lợi ích (tác giả)") + _push(bool(_txt(application.get("trialEvaluation"))), gaps, "Đơn: Đánh giá (đơn vị áp dụng thử)") + + ss = application.get("supportStaff") + if isinstance(ss, list) and len(ss) > 0: + for i, row in enumerate(ss): + if not isinstance(row, dict): + gaps.append(f"Đơn — người hỗ trợ {i + 1}: dữ liệu không hợp lệ") + continue + p = f"Đơn — người hỗ trợ {i + 1}" + _push(bool(_txt(row.get("name"))), gaps, f"{p}: họ tên") + _push(bool(_txt(row.get("dob"))), gaps, f"{p}: ngày sinh") + _push(bool(_txt(row.get("workplace"))), gaps, f"{p}: nơi công tác") + _push(bool(_txt(row.get("title"))), gaps, f"{p}: chức danh") + _push(bool(_txt(row.get("qualification"))), gaps, f"{p}: trình độ") + _push(bool(_txt(row.get("supportContent"))), gaps, f"{p}: nội dung hỗ trợ") + + sd = application.get("submissionDay") + sm = application.get("submissionMonth") + _push(sd is not None and _txt(sd) != "", gaps, "Đơn: Ngày ký (ngày)") + _push(sm is not None and _txt(sm) != "", gaps, "Đơn: Ngày ký (tháng)") + _push(bool(_txt(application.get("submissionYear"))), gaps, "Đơn: Ngày ký (năm)") + return gaps + + +def collect_submission_readiness_gaps( + tabs: Mapping[str, Any], + evidence_flags: Mapping[str, bool], +) -> List[str]: + """Validate merged draft tabs + Postgres/MinIO evidence registry.""" + gaps: List[str] = [] + raw_r = tabs.get("report") + raw_a = tabs.get("application") + raw_c = tabs.get("contribution") + if not isinstance(raw_r, dict) or not isinstance(raw_a, dict) or not isinstance(raw_c, dict): + gaps.append("Bản nháp thiếu dữ liệu một hoặc nhiều tab (báo cáo / đơn / xác nhận đóng góp).") + + report = raw_r if isinstance(raw_r, dict) else {} + application = raw_a if isinstance(raw_a, dict) else {} + contribution = raw_c if isinstance(raw_c, dict) else {} + + gaps.extend(collect_report_tab_gaps(report)) + gaps.extend(collect_application_tab_gaps(application, evidence_flags)) + + _push(_truthy(report.get("honestyConfirmed")), gaps, "Báo cáo: cần tick ô cam kết trung thực ở cuối tab Báo cáo.") + _push(_truthy(application.get("honestyConfirmed")), gaps, "Đơn: cần tick ô cam kết trung thực ở cuối tab Đơn.") + _push( + _truthy(contribution.get("digitalSignatureConfirmed")), + gaps, + "Xác nhận đóng góp: cần tick ô cam kết trung thực ở tab Xác nhận tỷ lệ đóng góp trước khi gửi.", + ) + return gaps + + +async def fetch_evidence_presence_flags(session: AsyncSession, initiative_id: uuid.UUID) -> Dict[str, bool]: + stmt = select(ApplicationArtifact.role, ApplicationArtifact.storage_uri).where( + ApplicationArtifact.initiative_id == initiative_id, + ApplicationArtifact.role.in_( + ( + EVIDENCE_ROLE_RESEARCH, + EVIDENCE_ROLE_TEXTBOOK, + EVIDENCE_ROLE_TECHNICAL, + ) + ), + ) + rows = (await session.execute(stmt)).all() + out = {"research": False, "textbook": False, "technical": False} + for role, uri in rows: + if not _txt(uri): + continue + if role == EVIDENCE_ROLE_RESEARCH: + out["research"] = True + elif role == EVIDENCE_ROLE_TEXTBOOK: + out["textbook"] = True + elif role == EVIDENCE_ROLE_TECHNICAL: + out["technical"] = True + return out diff --git a/be0/src/initiative_db/submissions.py b/be0/src/initiative_db/submissions.py new file mode 100644 index 0000000..78daa99 --- /dev/null +++ b/be0/src/initiative_db/submissions.py @@ -0,0 +1,1359 @@ +"""Persist and query submitted applications in PostgreSQL-backed initiative tables.""" + +from __future__ import annotations + +import asyncio +import logging +import uuid +from datetime import datetime, timezone +from io import BytesIO +from typing import Any, Dict, List, Optional, Tuple + +from sqlalchemy import desc, select +from sqlalchemy.ext.asyncio import AsyncSession + +from src.initiative_db.application_storage import ( + ROLE_OFFICIAL_FORM_DOCX, + ROLE_OFFICIAL_FORM_PDF, + STORAGE_FILESYSTEM, + STORAGE_MINIO_EXPORTS, + record_submit_snapshot, + upsert_application_taxonomy, + upsert_application_workflow, + upsert_artifact_full_pdf, + upsert_artifact_official_form, +) +from src.initiative_db.drafts import SYSTEM_DRAFT_OWNER_ID +from src.initiative_db.models import ApplicationAdminResult +from src.initiative_db.models import ApplicationReviewDocument +from src.initiative_db.models import ApplicationSubmitSnapshot +from src.initiative_db.models import Draft, Initiative, User +from src.initiative_db.submission_readiness import ( + ApplicationSubmissionNotReadyError, + collect_submission_readiness_gaps, + fetch_evidence_presence_flags, +) + +logger = logging.getLogger(__name__) + + +class ApplicationSubmitPersistError(Exception): + """Canonical storage (MinIO / printable forms) failed during submit — do not silently fall back.""" + + +def _now_utc() -> datetime: + return datetime.now(timezone.utc).replace(microsecond=0) + + +def _iso_utc(dt: Optional[datetime]) -> str: + if dt is None: + return "" + value = dt.astimezone(timezone.utc).replace(microsecond=0).isoformat() + return value.replace("+00:00", "Z") + + +def _normalize_case_id(case_id: Optional[str], fallback_prefix: str = "CASE") -> str: + raw = case_id or f"{fallback_prefix}-{int(datetime.now().timestamp() * 1000)}" + safe = "".join(ch for ch in raw if ch.isalnum() or ch in ("-", "_")) + if not safe: + raise ValueError("Invalid case id") + return safe + + +async def _get_or_create_latest_draft(session: AsyncSession, initiative: Initiative, case_id: str) -> Draft: + draft_stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(draft_stmt)).scalar_one_or_none() + if draft is not None: + return draft + + draft = Draft( + draft_code=f"DRAFT-{case_id}", + initiative_id=initiative.id, + payload={"caseId": case_id, "updatedAt": _iso_utc(_now_utc()), "tabs": {}}, + ) + session.add(draft) + await session.flush() + return draft + + +def _map_status_for_client(raw: str) -> str: + """Align DB enum values with frontend `ApplicationStatus` (mock list uses `pending` for new).""" + if raw == "submitted": + return "pending" + return raw + + +def _calendar_year_from_submitted(row: Dict[str, Any]) -> Optional[int]: + sd = str(row.get("submittedDate") or "") + if len(sd) >= 4 and sd[:4].isdigit(): + return int(sd[:4]) + return None + + +def _as_review_document_row(doc: ApplicationReviewDocument) -> Dict[str, Any]: + return { + "id": str(doc.id), + "initiativeId": str(doc.initiative_id), + "caseId": doc.case_id, + "documentVersion": int(doc.document_version or 1), + "officialBieuMau": dict(doc.official_bieu_mau or {}), + "templateData": dict(doc.template_data or {}) if isinstance(doc.template_data, dict) else None, + "fullBundle": dict(doc.full_bundle or {}) if isinstance(doc.full_bundle, dict) else None, + "createdBy": str(doc.created_by) if doc.created_by else None, + "createdAt": _iso_utc(doc.created_at), + } + + +def _submission_display_id(initiative: Initiative, stored: Dict[str, Any]) -> str: + """Public row id for list + GET /api/applications/{id} (must stay in sync everywhere we resolve by id).""" + sid = stored.get("id") + if sid is not None and str(sid).strip(): + return str(sid) + # Match `sub-{uuid4().hex[:16]}` (no hyphens); avoid `str(uuid)[:16]` which truncates dashed UUID awkwardly. + raw_id = getattr(initiative, "id", None) + if raw_id is not None: + hex32 = getattr(raw_id, "hex", None) + if isinstance(hex32, str) and len(hex32) >= 16: + return f"sub-{hex32[:16]}" + compact = str(raw_id).replace("-", "") + if len(compact) >= 16: + return f"sub-{compact[:16]}" + case = str(getattr(initiative, "case_code", "") or "") + tail = "".join(c for c in case if c.isalnum())[:16] or "noid" + return f"sub-{tail}" + + +def _sanitize_case_key_fragment(case_key: str) -> str: + """Same character allow-list as ``main._normalize_case_id`` without inventing a fallback id.""" + raw = (case_key or "").strip() + return "".join(ch for ch in raw if ch.isalnum() or ch in ("-", "_")) + + +async def resolve_initiative_for_draft_case_key( + session: AsyncSession, + case_key: str, +) -> Optional[Initiative]: + """ + Resolve ``Initiative`` for draft/evidence URLs keyed by ``Initiative.case_code`` *or* the public + submission row id (``sub-…`` / ``SUB-…``, case-insensitive — must match ``_submission_display_id``). + """ + safe = _sanitize_case_key_fragment(case_key) + if not safe: + return None + + ini = (await session.execute(select(Initiative).where(Initiative.case_code == safe))).scalar_one_or_none() + if ini is not None: + return ini + + key_lower = safe.lower() + stmt = ( + select(Initiative) + .where(Initiative.status != "draft") + .order_by(desc(Initiative.submitted_at), desc(Initiative.updated_at)) + ) + initiatives = (await session.execute(stmt)).scalars().all() + for initiative in initiatives: + draft_stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(draft_stmt)).scalar_one_or_none() + if draft is None: + continue + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + stored = payload.get("submissionRecord") if isinstance(payload.get("submissionRecord"), dict) else {} + disp = _submission_display_id(initiative, stored) + if disp.lower() == key_lower: + return initiative + return None + + +def _as_submission_item(initiative: Initiative, payload: Dict[str, Any]) -> Dict[str, Any]: + stored = payload.get("submissionRecord") if isinstance(payload.get("submissionRecord"), dict) else {} + item_id = _submission_display_id(initiative, stored) + submitted_at = initiative.submitted_at + submitted_date = _iso_utc(submitted_at) or str(stored.get("submittedDate") or "") + submission_file = payload.get("submissionFile") if isinstance(payload.get("submissionFile"), dict) else {} + author = stored.get("author") if isinstance(stored.get("author"), dict) else {} + + row: Dict[str, Any] = { + "id": item_id, + "submittedDate": submitted_date, + "name": str(stored.get("name") or "Hồ sơ sáng kiến"), + "author": { + "id": str(author.get("id") or initiative.case_code), + "name": str(author.get("name") or "—"), + "email": author.get("email"), + "phone": author.get("phone"), + }, + "subjectId": str(stored.get("subjectId") or ""), + "groupId": str(stored.get("groupId") or ""), + "status": _map_status_for_client(str(initiative.status or stored.get("status") or "submitted")), + "reviewStatus": str(stored.get("reviewStatus") or "not_reviewed"), + "supervisor": stored.get("supervisor"), + "reviewer": stored.get("reviewer"), + "reviewDeadline": stored.get("reviewDeadline"), + "conference": stored.get("conference"), + "topicType": str(stored.get("topicType") or "Hồ sơ PDF (đơn + báo cáo)"), + "files": { + "fullText": submission_file if submission_file.get("url") else None, + "abstract": None, + "poster": None, + }, + } + cy = _calendar_year_from_submitted(row) + if cy is not None: + row["calendarYear"] = cy + # Initiative.case_code is the key for application-drafts; submission "id" is often sub-… and must not be used to load tabs. + row["draft_case_id"] = initiative.case_code + tabs = payload.get("tabs") if isinstance(payload.get("tabs"), dict) else {} + app_tab = tabs.get("application") if isinstance(tabs.get("application"), dict) else {} + ic = app_tab.get("initiativeClassification") + if ic is not None and ic != "": + row["initiativeClassification"] = ic + rek = app_tab.get("researchEvidenceKind") + if rek is not None and rek != "": + row["researchEvidenceKind"] = rek + tek = app_tab.get("textbookEvidenceKind") + if tek is not None and tek != "": + row["textbookEvidenceKind"] = tek + row["textbook_evidence_kind"] = tek + return row + + +async def _admin_feedback_by_initiative_ids( + session: AsyncSession, + initiative_ids: List[uuid.UUID], +) -> Dict[uuid.UUID, str]: + """Initiative id → admin result ``feedback`` (trimmed) for API ``nhan_xet`` / «Nhận xét» column.""" + if not initiative_ids: + return {} + stmt = select(ApplicationAdminResult).where(ApplicationAdminResult.initiative_id.in_(initiative_ids)) + rows = (await session.execute(stmt)).scalars().all() + out: Dict[uuid.UUID, str] = {} + for rec in rows: + text = (rec.feedback or "").strip() + if text: + out[rec.initiative_id] = text + return out + + +async def _admin_decision_reviewers_by_initiative_ids( + session: AsyncSession, + initiative_ids: List[uuid.UUID], +) -> Dict[uuid.UUID, Dict[str, Any]]: + """ + Initiative id → ``reviewer`` person for rows with ``application_admin_results``. + + Uses ``updated_by`` (last admin who saved approve/reject) and ``users.full_name`` so + GET /api/applications exposes «Người đánh giá» consistently with adjudication history. + """ + if not initiative_ids: + return {} + stmt = select(ApplicationAdminResult).where(ApplicationAdminResult.initiative_id.in_(initiative_ids)) + recs = (await session.execute(stmt)).scalars().all() + if not recs: + return {} + + uid_list = [r.updated_by for r in recs if r.updated_by is not None] + name_by_uid: Dict[uuid.UUID, str] = {} + if uid_list: + urows = (await session.execute(select(User).where(User.id.in_(uid_list)))).scalars().all() + for u in urows: + name_by_uid[u.id] = (u.full_name or "").strip() or "—" + + out: Dict[uuid.UUID, Dict[str, Any]] = {} + for rec in recs: + uid = rec.updated_by + if uid is None: + out[rec.initiative_id] = {"id": "", "name": "—"} + else: + out[rec.initiative_id] = { + "id": str(uid), + "name": name_by_uid.get(uid, "—"), + } + return out + + +META_CASE_KEYS = ( + "initiativeCaseId", + "applicationCaseId", + "draftCaseId", + "caseCode", +) + + +def _looks_like_client_draft_session_id(value: str) -> bool: + s = value.strip() + return s.upper().startswith("DRAFT-") + + +async def _find_initiative_for_submit_metadata( + session: AsyncSession, + metadata: Dict[str, Any], + owner_user_id: Optional[uuid.UUID], +) -> Optional[Initiative]: + """Match an existing applicant draft (CASE-…) when the client sends Postgres case keys; ignore DRAFT-* session ids.""" + candidates: List[str] = [] + for key in META_CASE_KEYS: + raw = metadata.get(key) + if isinstance(raw, str) and raw.strip(): + candidates.append(raw.strip()) + + raw_fallback = metadata.get("caseId") + if isinstance(raw_fallback, str) and raw_fallback.strip(): + s = raw_fallback.strip() + if not _looks_like_client_draft_session_id(s): + candidates.append(s) + + seen: set[str] = set() + for c in candidates: + if c in seen: + continue + seen.add(c) + try: + nid = _normalize_case_id(c, fallback_prefix="CASE") + except ValueError: + continue + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == nid)) + ).scalar_one_or_none() + if ini is not None: + logger.info( + "Submit linked to existing initiative case_code=%s id=%s", + ini.case_code, + ini.id, + ) + return ini + + if owner_user_id is None: + return None + + stmt = ( + select(Initiative) + .where( + Initiative.owner_id == owner_user_id, + Initiative.case_code.ilike(r"CASE-%"), + ) + .order_by(desc(Initiative.updated_at)) + .limit(1) + ) + fb = (await session.execute(stmt)).scalar_one_or_none() + if fb is not None: + logger.warning( + "Submit matched initiative via owner CASE-* fallback case_code=%s owner=%s " + "(prefer sending initiativeCaseId from the applicant dashboard)", + fb.case_code, + owner_user_id, + ) + return fb + + +def _new_case_code_for_fresh_submit(metadata: Dict[str, Any]) -> str: + """When no existing initiative matches, allocate a stable case_code from metadata or SUB-….""" + for key in META_CASE_KEYS: + raw = metadata.get(key) + if isinstance(raw, str) and raw.strip(): + cand = raw.strip() + if not _looks_like_client_draft_session_id(cand): + return _normalize_case_id(cand, fallback_prefix="CASE") + raw2 = metadata.get("caseId") + if isinstance(raw2, str) and raw2.strip(): + cand2 = raw2.strip() + if not _looks_like_client_draft_session_id(cand2): + return _normalize_case_id(cand2, fallback_prefix="CASE") + return _normalize_case_id(None, fallback_prefix="SUB") + + +def _tabs_effectively_empty(tabs: Any) -> bool: + if not isinstance(tabs, dict) or len(tabs) == 0: + return True + for key in ("report", "application", "contribution"): + val = tabs.get(key) + if isinstance(val, dict) and len(val) > 0: + return False + if val: + return False + return True + + +async def _upload_submitted_pdf_to_exports_minio_required( + initiative: Initiative, + pdf_body: bytes, + original_name: Optional[str], +) -> str: + """Upload full submitted PDF to exports bucket; required for HTTP submit path.""" + from src.minio.storage import S3Storage, StorageError, settings as s3s + + s3 = S3Storage() + bucket = s3s.s3_bucket_exports + key = s3.build_key_for_initiative(initiative.id, original_name or "ho-so.pdf") + try: + await s3.upload( + bucket, + key, + BytesIO(pdf_body), + "application/pdf", + metadata={ + "case_code": str(initiative.case_code or ""), + "role": "full_pdf_submission", + }, + ) + except StorageError as exc: + raise ApplicationSubmitPersistError(f"Lưu PDF hồ sơ lên MinIO thất bại: {exc}") from exc + except Exception as exc: + raise ApplicationSubmitPersistError(f"Lưu PDF hồ sơ lên MinIO thất bại: {exc}") from exc + logger.info("Submitted PDF copied to MinIO exports bucket key=%s", key[:48]) + return key + + +async def _persist_official_application_forms( + session: AsyncSession, + initiative: Initiative, + owner_user_id: Optional[uuid.UUID], +) -> None: + """ + If a review-document row has non-empty officialBieuMau, render DOCX+PDF and register MinIO artifacts. + Skips when no bundle exists (legacy submits). + """ + from src.be01.docx_to_pdf import convert_docx_bytes_to_pdf + from src.be01.fill_application_form import fill_application_form_docx + from src.be01.official_to_data_blank import official_to_data_blank + from src.minio.storage import S3Storage, StorageError, settings as s3s + + stmt = ( + select(ApplicationReviewDocument) + .where(ApplicationReviewDocument.initiative_id == initiative.id) + .order_by(desc(ApplicationReviewDocument.document_version)) + .limit(1) + ) + doc = (await session.execute(stmt)).scalar_one_or_none() + if doc is None: + return + obm = doc.official_bieu_mau + if not isinstance(obm, dict) or len(obm) == 0: + return + + try: + ctx = official_to_data_blank(dict(obm)) + except Exception as exc: + raise ApplicationSubmitPersistError(f"Dữ liệu biểu mẫu không hợp lệ: {exc}") from exc + + try: + docx_bytes = await asyncio.to_thread(fill_application_form_docx, ctx) + pdf_bytes = await asyncio.to_thread( + convert_docx_bytes_to_pdf, + docx_bytes, + relax_justified_softbreaks=True, + strip_table_row_heights=False, + ) + except FileNotFoundError as exc: + raise ApplicationSubmitPersistError( + "Không thể tạo PDF mẫu: thiếu LibreOffice hoặc file mẫu Word. " + str(exc) + ) from exc + except Exception as exc: + raise ApplicationSubmitPersistError(f"Không thể tạo biểu mẫu DOCX/PDF: {exc}") from exc + + s3 = S3Storage() + bucket = s3s.s3_bucket_exports + safe_case = str(initiative.case_code or "case").replace("/", "_")[:80] + base_name = f"official-form-{safe_case}" + docx_key = s3.build_key_for_initiative(initiative.id, f"{base_name}.docx") + pdf_key = s3.build_key_for_initiative(initiative.id, f"{base_name}.pdf") + + try: + docx_res = await s3.upload( + bucket, + docx_key, + BytesIO(docx_bytes), + "application/vnd.openxmlformats-officedocument.wordprocessingml.document", + metadata={"case_code": str(initiative.case_code or ""), "role": ROLE_OFFICIAL_FORM_DOCX}, + ) + pdf_res = await s3.upload( + bucket, + pdf_key, + BytesIO(pdf_bytes), + "application/pdf", + metadata={"case_code": str(initiative.case_code or ""), "role": ROLE_OFFICIAL_FORM_PDF}, + ) + except StorageError as exc: + raise ApplicationSubmitPersistError(f"Tải biểu mẫu lên MinIO thất bại: {exc}") from exc + + await upsert_artifact_official_form( + session, + initiative_id=initiative.id, + role=ROLE_OFFICIAL_FORM_DOCX, + storage_uri=docx_key, + original_name=f"{base_name}.docx", + byte_size=docx_res["size"], + sha256_hex=docx_res["sha256"], + uploaded_by=owner_user_id, + mime_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document", + storage_kind=STORAGE_MINIO_EXPORTS, + ) + await upsert_artifact_official_form( + session, + initiative_id=initiative.id, + role=ROLE_OFFICIAL_FORM_PDF, + storage_uri=pdf_key, + original_name=f"{base_name}.pdf", + byte_size=pdf_res["size"], + sha256_hex=pdf_res["sha256"], + uploaded_by=owner_user_id, + mime_type="application/pdf", + storage_kind=STORAGE_MINIO_EXPORTS, + ) + + +async def merge_applicant_draft_bundle_tabs_from_snapshot_if_empty( + session: AsyncSession, + *, + initiative: Initiative, + bundle: Dict[str, Any], +) -> Dict[str, Any]: + """ + Recover tab JSON for admins when drafts.payload.tabs stayed empty but a submit snapshot exists + (e.g. older bug linked submit to wrong initiative row). + """ + tabs = bundle.get("tabs") + if not _tabs_effectively_empty(tabs): + return bundle + stmt = ( + select(ApplicationSubmitSnapshot) + .where(ApplicationSubmitSnapshot.initiative_id == initiative.id) + .order_by(desc(ApplicationSubmitSnapshot.captured_at)) + .limit(1) + ) + snap = (await session.execute(stmt)).scalar_one_or_none() + if snap is None or not isinstance(snap.merged_tabs, dict): + return bundle + snap_tabs = snap.merged_tabs + if _tabs_effectively_empty(snap_tabs): + return bundle + merged = dict(bundle) + merged["tabs"] = {**snap_tabs, **(merged.get("tabs") or {})} + merged.setdefault("caseId", initiative.case_code) + return merged + + +async def merge_application_draft_document_with_snapshot_if_needed( + session: AsyncSession, + case_code: str, + doc: Dict[str, Any], +) -> Dict[str, Any]: + """GET /api/v1/application-drafts/:caseId — hydrate empty tabs from immutable submit snapshot when present.""" + raw = (case_code or "").strip() + if not raw: + return doc + try: + norm = _normalize_case_id(raw, fallback_prefix="CASE") + except ValueError: + norm = raw + ini = (await session.execute(select(Initiative).where(Initiative.case_code == norm))).scalar_one_or_none() + if ini is None: + return doc + return await merge_applicant_draft_bundle_tabs_from_snapshot_if_empty(session, initiative=ini, bundle=doc) + + +async def save_submitted_application( + session: AsyncSession, + metadata: Dict[str, Any], + file_url: str, + submission_id: Optional[str] = None, + owner_user_id: Optional[uuid.UUID] = None, + *, + pdf_byte_size: Optional[int] = None, + pdf_sha256: Optional[str] = None, + pdf_original_name: Optional[str] = None, + pdf_body: Optional[bytes] = None, +) -> Dict[str, Any]: + initiative_name = str(metadata.get("initiativeName") or metadata.get("name") or "").strip() or "Hồ sơ sáng kiến" + author_name = str(metadata.get("authorName") or "").strip() or "—" + author_email = str(metadata.get("authorEmail") or "").strip() or None + author_phone = str(metadata.get("authorPhone") or "").strip() or None + now = _now_utc() + resolved_submission_id = submission_id or f"sub-{uuid.uuid4().hex[:16]}" + effective_owner = owner_user_id or SYSTEM_DRAFT_OWNER_ID + + matched = await _find_initiative_for_submit_metadata(session, metadata, owner_user_id) + + if matched is not None: + initiative = matched + case_id = initiative.case_code + else: + case_id = _new_case_code_for_fresh_submit(metadata) + initiative = (await session.execute(select(Initiative).where(Initiative.case_code == case_id))).scalar_one_or_none() + if initiative is None: + initiative = Initiative(case_code=case_id, owner_id=effective_owner) + session.add(initiative) + await session.flush() + elif owner_user_id and initiative.owner_id == SYSTEM_DRAFT_OWNER_ID: + initiative.owner_id = owner_user_id + + assert initiative is not None + + if owner_user_id and initiative.owner_id == SYSTEM_DRAFT_OWNER_ID: + initiative.owner_id = owner_user_id + + draft = await _get_or_create_latest_draft(session, initiative, case_id) + payload_pre: Dict[str, Any] = dict(draft.payload) if isinstance(draft.payload, dict) else {} + tabs_pre = dict(payload_pre.get("tabs") or {}) + bundle_v = await merge_applicant_draft_bundle_tabs_from_snapshot_if_empty( + session, + initiative=initiative, + bundle={"tabs": tabs_pre, "caseId": case_id}, + ) + merged_tabs_validate = dict(bundle_v.get("tabs") or {}) + evidence_flags = await fetch_evidence_presence_flags(session, initiative.id) + missing_ready = collect_submission_readiness_gaps(merged_tabs_validate, evidence_flags) + if missing_ready: + raise ApplicationSubmissionNotReadyError(missing_ready) + + initiative.status = "submitted" + initiative.submitted_at = now + + current_payload: Dict[str, Any] = payload_pre + current_payload["caseId"] = case_id + current_payload["updatedAt"] = _iso_utc(now) + current_payload["tabs"] = dict(current_payload.get("tabs") or {}) + current_payload["submissionFile"] = {"url": file_url, "type": "pdf"} + current_payload["submissionRecord"] = { + "id": resolved_submission_id, + "submittedDate": _iso_utc(now), + "name": initiative_name, + "author": { + "id": case_id, + "name": author_name, + "email": author_email, + "phone": author_phone, + }, + "subjectId": str(metadata.get("subjectId") or ""), + "groupId": str(metadata.get("groupId") or ""), + "status": "submitted", + "reviewStatus": "not_reviewed", + "supervisor": None, + "reviewer": None, + "reviewDeadline": None, + "conference": None, + "topicType": str(metadata.get("topicType") or "Hồ sơ PDF (đơn + báo cáo)"), + } + + draft.payload = current_payload + draft.version = (draft.version or 0) + 1 + await session.flush() + + sr = current_payload.get("submissionRecord") + sr_dict = dict(sr) if isinstance(sr, dict) else {} + merged_tabs = dict(current_payload.get("tabs") or {}) + submit_meta: Dict[str, Any] = { + "caseId": case_id, + "draftVersion": draft.version, + "submissionFile": current_payload.get("submissionFile"), + } + if isinstance(metadata, dict): + for k in ( + "initiativeName", + "name", + "authorName", + "authorEmail", + "topicType", + "subjectId", + "groupId", + "initiativeCaseId", + "applicationCaseId", + "draftCaseId", + ): + if k in metadata: + submit_meta[k] = metadata[k] + + artifact_storage_uri = file_url + artifact_storage_kind: Optional[str] = STORAGE_FILESYSTEM if file_url else None + if pdf_body and len(pdf_body) >= 50: + minio_key = await _upload_submitted_pdf_to_exports_minio_required( + initiative, pdf_body, pdf_original_name or f"{resolved_submission_id}.pdf" + ) + artifact_storage_uri = minio_key + artifact_storage_kind = STORAGE_MINIO_EXPORTS + + await record_submit_snapshot( + session, + initiative_id=initiative.id, + submission_record_id=resolved_submission_id, + merged_tabs=merged_tabs, + submit_metadata=submit_meta, + full_pdf_uri=file_url, + ) + await upsert_application_taxonomy( + session, + initiative_id=initiative.id, + subject_id=str(sr_dict.get("subjectId") or metadata.get("subjectId") or ""), + group_id=str(sr_dict.get("groupId") or metadata.get("groupId") or ""), + topic_type=str(sr_dict.get("topicType") or metadata.get("topicType") or ""), + ) + await upsert_application_workflow( + session, + initiative_id=initiative.id, + submission_record=sr_dict, + ) + await upsert_artifact_full_pdf( + session, + initiative_id=initiative.id, + storage_uri=artifact_storage_uri, + original_name=pdf_original_name, + byte_size=pdf_byte_size, + sha256_hex=pdf_sha256, + uploaded_by=owner_user_id, + storage_kind=artifact_storage_kind, + ) + await _persist_official_application_forms(session, initiative, owner_user_id) + + return { + "id": resolved_submission_id, + "submittedDate": _iso_utc(now), + "publicUrl": file_url, + "name": initiative_name, + } + + +async def create_submitted_application_shell( + session: AsyncSession, + *, + owner_user_id: uuid.UUID, + name: Optional[str] = None, + author_name: Optional[str] = None, + author_email: Optional[str] = None, + author_phone: Optional[str] = None, +) -> Dict[str, Any]: + """ + Create a new submitted-application shell record and return list-item shape. + This allocates an `applicationId` before the applicant uploads the final PDF. + """ + now = _now_utc() + case_id = _normalize_case_id(f"SUB-{uuid.uuid4().hex[:12]}", fallback_prefix="SUB") + submission_id = f"sub-{uuid.uuid4().hex[:16]}" + + initiative = Initiative( + case_code=case_id, + owner_id=owner_user_id, + status="submitted", + submitted_at=now, + ) + session.add(initiative) + await session.flush() + + submission_record = { + "id": submission_id, + "submittedDate": _iso_utc(now), + "name": (name or "").strip() or "Hồ sơ mới", + "author": { + "id": case_id, + "name": (author_name or "").strip() or "—", + "email": (author_email or "").strip() or None, + "phone": (author_phone or "").strip() or None, + }, + "subjectId": "", + "groupId": "", + "status": "submitted", + "reviewStatus": "not_reviewed", + "supervisor": None, + "reviewer": None, + "reviewDeadline": None, + "conference": None, + "topicType": "Hồ sơ khởi tạo", + } + payload: Dict[str, Any] = { + "caseId": case_id, + "updatedAt": _iso_utc(now), + "tabs": {}, + "submissionRecord": submission_record, + } + draft = Draft( + draft_code=f"DRAFT-{case_id}", + initiative_id=initiative.id, + payload=payload, + ) + session.add(draft) + await session.flush() + return _as_submission_item(initiative, payload) + + +async def save_review_document_bundle( + session: AsyncSession, + *, + case_id: str, + official_bieu_mau: Dict[str, Any], + template_data: Optional[Dict[str, Any]], + full_bundle: Optional[Dict[str, Any]], + owner_user_id: Optional[uuid.UUID], +) -> Dict[str, Any]: + normalized_case = _normalize_case_id(case_id, fallback_prefix="CASE") + stmt = select(Initiative).where(Initiative.case_code == normalized_case) + initiative = (await session.execute(stmt)).scalar_one_or_none() + effective_owner = owner_user_id or SYSTEM_DRAFT_OWNER_ID + if initiative is None: + initiative = Initiative(case_code=normalized_case, owner_id=effective_owner) + session.add(initiative) + await session.flush() + elif owner_user_id and initiative.owner_id == SYSTEM_DRAFT_OWNER_ID: + initiative.owner_id = owner_user_id + + max_stmt = ( + select(ApplicationReviewDocument.document_version) + .where(ApplicationReviewDocument.initiative_id == initiative.id) + .order_by(desc(ApplicationReviewDocument.document_version)) + .limit(1) + ) + latest_ver = (await session.execute(max_stmt)).scalar_one_or_none() + next_ver = int(latest_ver or 0) + 1 + row = ApplicationReviewDocument( + initiative_id=initiative.id, + case_id=normalized_case, + document_version=next_ver, + official_bieu_mau=dict(official_bieu_mau or {}), + template_data=dict(template_data or {}) if isinstance(template_data, dict) else None, + full_bundle=dict(full_bundle or {}) if isinstance(full_bundle, dict) else None, + created_by=owner_user_id, + ) + session.add(row) + await session.flush() + return _as_review_document_row(row) + + +async def get_latest_review_document_bundle( + session: AsyncSession, *, case_id: str +) -> Optional[Dict[str, Any]]: + normalized_case = _normalize_case_id(case_id, fallback_prefix="CASE") + stmt = select(Initiative).where(Initiative.case_code == normalized_case) + initiative = (await session.execute(stmt)).scalar_one_or_none() + if initiative is None: + return None + doc_stmt = ( + select(ApplicationReviewDocument) + .where(ApplicationReviewDocument.initiative_id == initiative.id) + .order_by(desc(ApplicationReviewDocument.document_version), desc(ApplicationReviewDocument.created_at)) + .limit(1) + ) + doc = (await session.execute(doc_stmt)).scalar_one_or_none() + if doc is None: + return None + return _as_review_document_row(doc) + + +async def list_review_document_bundles( + session: AsyncSession, *, case_id: str, limit: int = 20 +) -> List[Dict[str, Any]]: + normalized_case = _normalize_case_id(case_id, fallback_prefix="CASE") + stmt = select(Initiative).where(Initiative.case_code == normalized_case) + initiative = (await session.execute(stmt)).scalar_one_or_none() + if initiative is None: + return [] + doc_stmt = ( + select(ApplicationReviewDocument) + .where(ApplicationReviewDocument.initiative_id == initiative.id) + .order_by(desc(ApplicationReviewDocument.document_version), desc(ApplicationReviewDocument.created_at)) + .limit(max(1, min(limit, 200))) + ) + rows = (await session.execute(doc_stmt)).scalars().all() + return [_as_review_document_row(x) for x in rows] + + +async def get_review_document_bundle_by_id( + session: AsyncSession, *, review_document_id: str +) -> Optional[Dict[str, Any]]: + try: + rid = uuid.UUID(str(review_document_id)) + except ValueError: + return None + row = await session.get(ApplicationReviewDocument, rid) + if row is None: + return None + return _as_review_document_row(row) + + +async def update_review_document_bundle( + session: AsyncSession, + *, + review_document_id: str, + official_bieu_mau: Dict[str, Any], + template_data: Optional[Dict[str, Any]], + full_bundle: Optional[Dict[str, Any]], +) -> Optional[Dict[str, Any]]: + try: + rid = uuid.UUID(str(review_document_id)) + except ValueError: + return None + row = await session.get(ApplicationReviewDocument, rid) + if row is None: + return None + row.official_bieu_mau = dict(official_bieu_mau or {}) + row.template_data = dict(template_data or {}) if isinstance(template_data, dict) else None + row.full_bundle = dict(full_bundle or {}) if isinstance(full_bundle, dict) else None + await session.flush() + return _as_review_document_row(row) + + +async def delete_review_document_bundle( + session: AsyncSession, *, review_document_id: str +) -> bool: + try: + rid = uuid.UUID(str(review_document_id)) + except ValueError: + return False + row = await session.get(ApplicationReviewDocument, rid) + if row is None: + return False + await session.delete(row) + await session.flush() + return True + + +async def resolve_submitted_initiative_for_backup( + session: AsyncSession, + application_id: str, +) -> Optional[Tuple[Initiative, str]]: + """ + Resolve initiative + public submission id (``sub-…``) the same way as ``get_application_by_id``. + Used by the admin backup ZIP builder. + """ + aid = (application_id or "").strip() + if not aid: + return None + + stmt = ( + select(Initiative) + .where(Initiative.status != "draft") + .order_by(desc(Initiative.submitted_at), desc(Initiative.updated_at)) + ) + initiatives = (await session.execute(stmt)).scalars().all() + for initiative in initiatives: + draft_stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(draft_stmt)).scalar_one_or_none() + payload = dict(draft.payload) if draft is not None and isinstance(draft.payload, dict) else {} + stored = payload.get("submissionRecord") if isinstance(payload.get("submissionRecord"), dict) else {} + if _submission_display_id(initiative, stored) == aid or initiative.case_code == aid: + return initiative, _submission_display_id(initiative, stored) + return None + + +async def get_application_by_id(session: AsyncSession, application_id: str) -> Optional[Dict[str, Any]]: + """Return one application row (same shape as list items) or None.""" + aid = application_id.strip() + if not aid: + return None + + stmt = ( + select(Initiative) + .where(Initiative.status != "draft") + .order_by(desc(Initiative.submitted_at), desc(Initiative.updated_at)) + ) + initiatives = (await session.execute(stmt)).scalars().all() + _iids = [i.id for i in initiatives] + feedback_map = await _admin_feedback_by_initiative_ids(session, _iids) + reviewer_map = await _admin_decision_reviewers_by_initiative_ids(session, _iids) + + for initiative in initiatives: + draft_stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(draft_stmt)).scalar_one_or_none() + payload = dict(draft.payload) if draft is not None and isinstance(draft.payload, dict) else {} + stored = payload.get("submissionRecord") if isinstance(payload.get("submissionRecord"), dict) else {} + if _submission_display_id(initiative, stored) == aid or initiative.case_code == aid: + row = _as_submission_item(initiative, payload) + fb = feedback_map.get(initiative.id) + if fb: + row["nhan_xet"] = fb + rev = reviewer_map.get(initiative.id) + if rev is not None: + row["reviewer"] = rev + return row + + return None + + +def _matches_filters( + row: Dict[str, Any], + *, + name: str, + author_name: str, + reviewer_name: str, + status: str, + review_status: str, + date_from: str, + date_to: str, + lifecycle: str = "", + skip_status_filter: bool = False, +) -> bool: + row_status = str(row.get("status") or "") + lc = (lifecycle or "").strip().lower() + if lc == "inbox": + if row_status in ("approved", "rejected"): + return False + elif lc == "decided": + if row_status not in ("approved", "rejected"): + return False + n = name.strip().lower() + if n and n not in str(row.get("name") or "").lower(): + return False + an = author_name.strip().lower() + author = row.get("author") or {} + if an and an not in str(author.get("name") or "").lower(): + return False + rn = reviewer_name.strip().lower() + if rn: + reviewer = row.get("reviewer") or {} + if rn not in str(reviewer.get("name") or "").lower(): + return False + if not skip_status_filter and status and row_status != status: + return False + if review_status and str(row.get("reviewStatus") or "") != review_status: + return False + sd = str(row.get("submittedDate") or "") + sd_day = sd[:10] if len(sd) >= 10 else "" + if date_from and sd_day and sd_day < date_from: + return False + if date_to and sd_day and sd_day > date_to: + return False + return True + + +def _sort_submission_pairs_in_place( + pairs: List[Tuple[Dict[str, Any], Dict[str, Any]]], + *, + sort_by: str, + sort_order: str, +) -> None: + """Same ordering as GET /api/applications (after optional ``status`` query narrowing).""" + reverse = sort_order != "asc" + if sort_by == "name": + pairs.sort(key=lambda x: str(x[0].get("name") or ""), reverse=reverse) + elif sort_by == "author": + pairs.sort(key=lambda x: str((x[0].get("author") or {}).get("name") or ""), reverse=reverse) + else: + pairs.sort(key=lambda x: str(x[0].get("submittedDate") or ""), reverse=reverse) + + +async def collect_submission_row_payload_pairs( + session: AsyncSession, + *, + name: str, + author_name: str, + reviewer_name: str, + review_status: str, + date_from: str, + date_to: str, + lifecycle: str = "", +) -> List[Tuple[Dict[str, Any], Dict[str, Any]]]: + """ + Submissions matching list filters (lifecycle, text, dates), before ``status`` query param + and before sort. Each entry is ``(list_row, draft_payload_dict)``. + """ + stmt = ( + select(Initiative) + .where(Initiative.status != "draft") + .order_by(desc(Initiative.submitted_at), desc(Initiative.updated_at)) + ) + initiatives = (await session.execute(stmt)).scalars().all() + _iids = [i.id for i in initiatives] + feedback_map = await _admin_feedback_by_initiative_ids(session, _iids) + reviewer_map = await _admin_decision_reviewers_by_initiative_ids(session, _iids) + + pairs: List[Tuple[Dict[str, Any], Dict[str, Any]]] = [] + for initiative in initiatives: + draft_stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(draft_stmt)).scalar_one_or_none() + payload = dict(draft.payload) if draft is not None and isinstance(draft.payload, dict) else {} + row = _as_submission_item(initiative, payload) + fb = feedback_map.get(initiative.id) + if fb: + row["nhan_xet"] = fb + rev = reviewer_map.get(initiative.id) + if rev is not None: + row["reviewer"] = rev + if not _matches_filters( + row, + name=name, + author_name=author_name, + reviewer_name=reviewer_name, + status="", + review_status=review_status, + date_from=date_from, + date_to=date_to, + lifecycle=lifecycle, + skip_status_filter=True, + ): + continue + pairs.append((row, payload)) + + return pairs + + +async def list_submitted_applications( + session: AsyncSession, + *, + page: int, + page_size: int, + name: str, + author_name: str, + reviewer_name: str, + status: str, + review_status: str, + date_from: str, + date_to: str, + sort_by: str, + sort_order: str, + lifecycle: str = "", +) -> Dict[str, Any]: + pairs = await collect_submission_row_payload_pairs( + session, + name=name, + author_name=author_name, + reviewer_name=reviewer_name, + review_status=review_status, + date_from=date_from, + date_to=date_to, + lifecycle=lifecycle, + ) + items = [r for r, _ in pairs] + + status_counts = { + "approved": sum(1 for r in items if str(r.get("status") or "") == "approved"), + "rejected": sum(1 for r in items if str(r.get("status") or "") == "rejected"), + } + if status: + pairs = [(r, p) for r, p in pairs if str(r.get("status") or "") == status] + _sort_submission_pairs_in_place(pairs, sort_by=sort_by, sort_order=sort_order) + items = [r for r, _ in pairs] + + total = len(items) + start = (page - 1) * page_size + page_data = items[start : start + page_size] + total_pages = max(1, (total + page_size - 1) // page_size) if total else 1 + + return { + "data": page_data, + "pagination": { + "page": page, + "pageSize": page_size, + "totalItems": total, + "totalPages": total_pages, + }, + "statusCounts": status_counts, + } + + +async def submitted_applications_pairs_for_export( + session: AsyncSession, + *, + name: str, + author_name: str, + reviewer_name: str, + status: str, + review_status: str, + date_from: str, + date_to: str, + sort_by: str, + sort_order: str, + lifecycle: str = "", +) -> List[Tuple[Dict[str, Any], Dict[str, Any]]]: + """(row, draft_payload) tuples after the same filters + sort as GET /api/applications (all pages).""" + pairs = await collect_submission_row_payload_pairs( + session, + name=name, + author_name=author_name, + reviewer_name=reviewer_name, + review_status=review_status, + date_from=date_from, + date_to=date_to, + lifecycle=lifecycle, + ) + if status: + pairs = [(r, p) for r, p in pairs if str(r.get("status") or "") == status] + _sort_submission_pairs_in_place(pairs, sort_by=sort_by, sort_order=sort_order) + return pairs + + +async def list_my_submitted_applications( + session: AsyncSession, + user_id: uuid.UUID, + user_email: str, +) -> List[Dict[str, Any]]: + """ + Submissions for the logged-in applicant: owned initiatives, or legacy rows matched by author email. + """ + stmt = ( + select(Initiative) + .where(Initiative.status != "draft") + .order_by(desc(Initiative.submitted_at), desc(Initiative.updated_at)) + ) + initiatives = (await session.execute(stmt)).scalars().all() + _iids = [i.id for i in initiatives] + feedback_map = await _admin_feedback_by_initiative_ids(session, _iids) + reviewer_map = await _admin_decision_reviewers_by_initiative_ids(session, _iids) + + email_norm = user_email.strip().lower() + seen: set[str] = set() + out: List[Dict[str, Any]] = [] + + for initiative in initiatives: + draft_stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(draft_stmt)).scalar_one_or_none() + payload = dict(draft.payload) if draft is not None and isinstance(draft.payload, dict) else {} + row = _as_submission_item(initiative, payload) + fb = feedback_map.get(initiative.id) + if fb: + row["nhan_xet"] = fb + rev = reviewer_map.get(initiative.id) + if rev is not None: + row["reviewer"] = rev + rid = str(row.get("id") or "") + if not rid or rid in seen: + continue + + owns = initiative.owner_id == user_id + auth_email = str((row.get("author") or {}).get("email") or "").strip().lower() + legacy_email_match = bool(email_norm and auth_email and auth_email == email_norm) + + if owns or legacy_email_match: + seen.add(rid) + out.append(row) + + out.sort(key=lambda x: str(x.get("submittedDate") or ""), reverse=True) + return out + + +def _applicant_may_mutate_row( + initiative: Initiative, + row: Dict[str, Any], + user_id: uuid.UUID, + user_email: str, +) -> bool: + """Same visibility rule as list_my_submitted_applications: owner or legacy author email match.""" + if initiative.owner_id == user_id: + return True + email_norm = user_email.strip().lower() + auth_email = str((row.get("author") or {}).get("email") or "").strip().lower() + return bool(email_norm and auth_email and auth_email == email_norm) + + +async def _resolve_initiative_and_latest_draft_for_application_id( + session: AsyncSession, + application_id: str, +) -> tuple[Initiative, Draft]: + aid = application_id.strip() + if not aid: + raise LookupError("missing_id") + + stmt = ( + select(Initiative) + .where(Initiative.status != "draft") + .order_by(desc(Initiative.submitted_at), desc(Initiative.updated_at)) + ) + initiatives = (await session.execute(stmt)).scalars().all() + + for initiative in initiatives: + draft_stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative.id) + .order_by(Draft.updated_at.desc()) + .limit(1) + ) + draft = (await session.execute(draft_stmt)).scalar_one_or_none() + if draft is None: + continue + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + stored = payload.get("submissionRecord") if isinstance(payload.get("submissionRecord"), dict) else {} + disp = _submission_display_id(initiative, stored) + if disp.lower() == aid.lower() or initiative.case_code == aid: + return initiative, draft + + raise LookupError("not_found") + + +def _parse_submitted_date_input(submitted_date: str) -> datetime: + """Accept `YYYY-MM-DD` (from HTML date input) or ISO strings.""" + s = submitted_date.strip() + if len(s) >= 10 and s[4] == "-" and s[7] == "-": + y, mo, d = int(s[0:4]), int(s[5:7]), int(s[8:10]) + return datetime(y, mo, d, 12, 0, 0, tzinfo=timezone.utc) + raw = s.replace("Z", "+00:00") + dt = datetime.fromisoformat(raw) + if dt.tzinfo is None: + dt = dt.replace(tzinfo=timezone.utc) + return dt.astimezone(timezone.utc) + + +async def update_my_submitted_application( + session: AsyncSession, + user_id: uuid.UUID, + user_email: str, + application_id: str, + name: str, + submitted_date: str, +) -> Dict[str, Any]: + """ + Update display fields on a submitted initiative (submissionRecord + initiative.submitted_at). + Raises LookupError if not found, PermissionError if caller cannot mutate this row. + """ + initiative, draft = await _resolve_initiative_and_latest_draft_for_application_id(session, application_id) + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + row = _as_submission_item(initiative, payload) + if not _applicant_may_mutate_row(initiative, row, user_id, user_email): + raise PermissionError("forbidden") + + new_name = name.strip() or str(row.get("name") or "Hồ sơ sáng kiến") + submitted_at = _parse_submitted_date_input(submitted_date) + initiative.submitted_at = submitted_at + + sr = dict(payload.get("submissionRecord") or {}) + sr["name"] = new_name + sr["submittedDate"] = _iso_utc(submitted_at) + payload["submissionRecord"] = sr + payload["updatedAt"] = _iso_utc(_now_utc()) + draft.payload = payload + draft.version = (draft.version or 0) + 1 + await session.flush() + + return _as_submission_item(initiative, payload) + + +async def delete_my_submitted_application( + session: AsyncSession, + user_id: uuid.UUID, + user_email: str, + application_id: str, +) -> None: + """Delete initiative (cascades drafts). Raises LookupError / PermissionError.""" + initiative, draft = await _resolve_initiative_and_latest_draft_for_application_id(session, application_id) + payload = dict(draft.payload) if isinstance(draft.payload, dict) else {} + row = _as_submission_item(initiative, payload) + if not _applicant_may_mutate_row(initiative, row, user_id, user_email): + raise PermissionError("forbidden") + + await session.delete(initiative) + await session.flush() diff --git a/be0/src/initiative_db/user_notifications.py b/be0/src/initiative_db/user_notifications.py new file mode 100644 index 0000000..e7c899e --- /dev/null +++ b/be0/src/initiative_db/user_notifications.py @@ -0,0 +1,207 @@ +"""Applicant in-app notifications (admin adjudication).""" + +from __future__ import annotations + +import logging +import math +import uuid +from datetime import datetime, timezone +from typing import Any, Dict, List, Optional + +from sqlalchemy import desc, func, select, update +from sqlalchemy.ext.asyncio import AsyncSession + +from src.initiative_db.models import Draft, Initiative, UserNotification + +logger = logging.getLogger(__name__) + +__all__ = [ + "merit_category_label_from_draft_payload", + "best_effort_notify_applicant_after_admin_decision", + "list_notifications_for_user", + "count_unread_notifications", + "mark_notification_read", +] + + +def merit_category_label_from_draft_payload(payload: Dict[str, Any]) -> Optional[str]: + """ + Align with fe0 `getApplicationMeritCategoryHint`: 2.1.1 / 2.1.2 và sách giáo trình (textbook + book) → Xuất sắc; + 2.1.4 (`poster-without-review`) → Trung bình; else Khá for known groups. + Returns None if classification is missing (caller may still show «Khá» for approved). + """ + tabs = payload.get("tabs") if isinstance(payload.get("tabs"), dict) else {} + app = tabs.get("application") if isinstance(tabs.get("application"), dict) else {} + c = app.get("initiativeClassification") + k = str(app.get("researchEvidenceKind") or "").strip() + if c == "research" and k in ("international", "domestic"): + return "Xuất sắc" + if c == "research" and k == "poster-without-review": + return "Trung bình" + tb = str(app.get("textbookEvidenceKind") or "").strip() + if c == "textbook" and tb == "book": + return "Xuất sắc" + if c in ("technical", "research", "textbook"): + return "Khá" + return None + + +def _iso(dt: Optional[datetime]) -> Optional[str]: + if dt is None: + return None + v = dt.astimezone(timezone.utc).replace(microsecond=0).isoformat() + return v.replace("+00:00", "Z") + + +def _notification_to_api(row: UserNotification) -> Dict[str, Any]: + return { + "id": str(row.id), + "type": row.type, + "title": row.title, + "body": row.body, + "applicationId": row.application_id, + "relatedInitiativeId": str(row.related_initiative_id) if row.related_initiative_id else None, + "sourceAdminResultId": str(row.source_admin_result_id) if row.source_admin_result_id else None, + "decision": row.decision, + "meritCategoryLabel": row.merit_category_label, + "feedback": row.feedback_text or "", + "rationale": row.rationale_text, + "readAt": _iso(row.read_at), + "createdAt": _iso(row.created_at), + } + + +async def best_effort_notify_applicant_after_admin_decision(result: Dict[str, Any]) -> None: + """ + Second transaction after admin-result commit: insert inbox row for initiative owner. + Swallows all errors (logged); never raises. + """ + try: + from src.initiative_db.engine import get_session, is_postgres_enabled + + if not is_postgres_enabled(): + return + + initiative_id = uuid.UUID(str(result["initiativeId"])) + application_id = str(result["applicationId"]).strip() + decision = str(result.get("decision") or "").strip().lower() + if decision not in ("approved", "rejected"): + return + + feedback = str(result.get("feedback") or "") + rationale = result.get("rationale") + rationale_s = (str(rationale).strip() if rationale is not None else None) or None + admin_result_id = uuid.UUID(str(result["id"])) + + async with get_session() as session: + ini = await session.get(Initiative, initiative_id) + if ini is None: + return + recipient_id = ini.owner_id + + draft_stmt = ( + select(Draft) + .where(Draft.initiative_id == initiative_id) + .order_by(desc(Draft.updated_at)) + .limit(1) + ) + draft = (await session.execute(draft_stmt)).scalar_one_or_none() + payload: Dict[str, Any] = dict(draft.payload) if draft and isinstance(draft.payload, dict) else {} + + merit: Optional[str] = None + if decision == "approved": + merit = merit_category_label_from_draft_payload(payload) + if merit is None: + merit = "Khá" + + if decision == "approved": + title = "Hồ sơ được duyệt" + else: + title = "Hồ sơ không được duyệt" + + body_parts: List[str] = [] + if decision == "approved" and merit: + body_parts.append(f"Phân hạng đề xuất: {merit}.") + if feedback.strip(): + fb = feedback.strip() + body_parts.append(fb[:280] + ("…" if len(fb) > 280 else "")) + elif decision == "rejected": + body_parts.append("Xem phản hồi và lý do chi tiết bên dưới.") + body = " ".join(body_parts) if body_parts else title + + row = UserNotification( + recipient_user_id=recipient_id, + type="admin_application_decision", + title=title, + body=body, + application_id=application_id, + related_initiative_id=initiative_id, + source_admin_result_id=admin_result_id, + decision=decision, + merit_category_label=merit if decision == "approved" else None, + feedback_text=feedback, + rationale_text=rationale_s, + ) + session.add(row) + await session.commit() + except Exception: + logger.exception("best_effort_notify_applicant_after_admin_decision failed") + + +async def list_notifications_for_user( + session: AsyncSession, + user_id: uuid.UUID, + *, + page: int, + page_size: int, +) -> Dict[str, Any]: + page = max(1, page) + page_size = max(1, min(100, page_size)) + + count_stmt = select(func.count()).select_from(UserNotification).where(UserNotification.recipient_user_id == user_id) + total = int((await session.execute(count_stmt)).scalar_one()) + total_pages = max(1, math.ceil(total / page_size)) if total else 1 + start = (page - 1) * page_size + + stmt = ( + select(UserNotification) + .where(UserNotification.recipient_user_id == user_id) + .order_by(desc(UserNotification.created_at)) + .offset(start) + .limit(page_size) + ) + rows = (await session.execute(stmt)).scalars().all() + + return { + "data": [_notification_to_api(r) for r in rows], + "pagination": { + "page": page, + "pageSize": page_size, + "totalItems": total, + "totalPages": total_pages, + }, + } + + +async def count_unread_notifications(session: AsyncSession, user_id: uuid.UUID) -> int: + stmt = select(func.count()).select_from(UserNotification).where( + UserNotification.recipient_user_id == user_id, + UserNotification.read_at.is_(None), + ) + return int((await session.execute(stmt)).scalar_one()) + + +async def mark_notification_read(session: AsyncSession, user_id: uuid.UUID, notification_id: uuid.UUID) -> bool: + """Return True if a row was updated.""" + now = datetime.now(timezone.utc) + stmt = ( + update(UserNotification) + .where( + UserNotification.id == notification_id, + UserNotification.recipient_user_id == user_id, + ) + .values(read_at=now) + ) + res = await session.execute(stmt) + await session.flush() + return (res.rowcount or 0) > 0 diff --git a/be0/src/internal_control/__init__.py b/be0/src/internal_control/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/__pycache__/__init__.cpython-311.pyc b/be0/src/internal_control/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000..b13673b Binary files /dev/null and b/be0/src/internal_control/__pycache__/__init__.cpython-311.pyc differ diff --git a/be0/src/internal_control/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..6ec52c6 Binary files /dev/null and b/be0/src/internal_control/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/access_control/__init__.py b/be0/src/internal_control/access_control/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/access_control/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/access_control/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..2f8cb78 Binary files /dev/null and b/be0/src/internal_control/access_control/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/cloud_infrastructure_controls/__init__.py b/be0/src/internal_control/cloud_infrastructure_controls/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/cloud_infrastructure_controls/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/cloud_infrastructure_controls/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..a54e24b Binary files /dev/null and b/be0/src/internal_control/cloud_infrastructure_controls/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/data_integrity_security/__init__.py b/be0/src/internal_control/data_integrity_security/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/data_integrity_security/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/data_integrity_security/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..93d9097 Binary files /dev/null and b/be0/src/internal_control/data_integrity_security/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/it_asset_management/__init__.py b/be0/src/internal_control/it_asset_management/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/it_asset_management/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/it_asset_management/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..b967662 Binary files /dev/null and b/be0/src/internal_control/it_asset_management/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/it_change_management/__init__.py b/be0/src/internal_control/it_change_management/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/it_change_management/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/it_change_management/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..514d81c Binary files /dev/null and b/be0/src/internal_control/it_change_management/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/it_governance/__init__.py b/be0/src/internal_control/it_governance/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/it_governance/__pycache__/__init__.cpython-311.pyc b/be0/src/internal_control/it_governance/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000..32a2368 Binary files /dev/null and b/be0/src/internal_control/it_governance/__pycache__/__init__.cpython-311.pyc differ diff --git a/be0/src/internal_control/it_governance/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/it_governance/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..219552e Binary files /dev/null and b/be0/src/internal_control/it_governance/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/it_governance/__pycache__/document_io.cpython-311.pyc b/be0/src/internal_control/it_governance/__pycache__/document_io.cpython-311.pyc new file mode 100644 index 0000000..062fc68 Binary files /dev/null and b/be0/src/internal_control/it_governance/__pycache__/document_io.cpython-311.pyc differ diff --git a/be0/src/internal_control/it_governance/__pycache__/document_io.cpython-313.pyc b/be0/src/internal_control/it_governance/__pycache__/document_io.cpython-313.pyc new file mode 100644 index 0000000..a78d7be Binary files /dev/null and b/be0/src/internal_control/it_governance/__pycache__/document_io.cpython-313.pyc differ diff --git a/be0/src/internal_control/it_governance/__pycache__/memory_manager.cpython-313.pyc b/be0/src/internal_control/it_governance/__pycache__/memory_manager.cpython-313.pyc new file mode 100644 index 0000000..dc0bc3c Binary files /dev/null and b/be0/src/internal_control/it_governance/__pycache__/memory_manager.cpython-313.pyc differ diff --git a/be0/src/internal_control/it_governance/__pycache__/response_manager.cpython-313.pyc b/be0/src/internal_control/it_governance/__pycache__/response_manager.cpython-313.pyc new file mode 100644 index 0000000..8d592d0 Binary files /dev/null and b/be0/src/internal_control/it_governance/__pycache__/response_manager.cpython-313.pyc differ diff --git a/be0/src/internal_control/it_governance/document_io.py b/be0/src/internal_control/it_governance/document_io.py new file mode 100644 index 0000000..0e1b35a --- /dev/null +++ b/be0/src/internal_control/it_governance/document_io.py @@ -0,0 +1,151 @@ +import os, glob +import fitz +# import easyocr +import pymupdf +import json + +from typing import Dict, List, Optional, Any +from src.utils import initialize_a_logger + + +class DocumentIO: + def __init__(self): + self.logger = initialize_a_logger("./logs/DocumentIO.log") + self.input_filename = "" + self.output_filename = "" + self.cur_page_number = None + self.cur_page_content = None + self.content = {} + + + async def upload_document(self, input_filename:str, ext="json"): + self.input_filename = './' + input_filename + self.logger.info(f"input filename {self.input_filename}") + + self.output_filename = self.input_filename.replace("pdf",ext) + self.logger.info(f"output filename { self.output_filename}") + result = await self.load_json_file(self.output_filename) + return result + + def create_document_with_easyocr(self, file_path): + doc = fitz.open(file_path) + reader = easyocr.Reader(['en']) # Initialize once for efficiency + results = {} + + for page_num in range(doc.page_count): + page = doc[page_num] + + # Convert page to image + pix = page.get_pixmap(dpi=500) # Higher DPI for better OCR + img_data = pix.tobytes("png") + + ocr_results = reader.readtext(img_data) + + # Extract text and confidence scores + page_text = "" + word_details = [] + + for (_ , text, confidence) in ocr_results: + if confidence > 0.2: # Filter low-confidence results + page_text += text + " " + word_details.append({ + 'text': text + }) + + results[f'page_{page_num + 1}'] = { + 'text': page_text.strip(), + } + + doc.close() + return results + + + async def load_json_file(self, output_filename: str) -> Dict[str, Any]: + if not os.path.exists(output_filename): + results: Dict[str, Any] = self.create_document_with_easyocr(self.input_filename) + self.content = results + + with open(output_filename, "w", encoding="utf-8") as f: + json.dump(results, f, indent=4, ensure_ascii=False) + else: + with open(output_filename, "r", encoding="utf-8") as f: + try: + results = json.load(f) + self.content = results + except json.JSONDecodeError: + if self.logger: + self.logger.debug("Error: load_json_file failed!") + self.content = {} + return self.content + + def load_page(self, page_id: int) -> Dict[str, Any]: + page_key = f"page_{page_id}" + if not self.content: + if self.logger: + self.logger.debug("Content not loaded yet. Run load_json_file first.") + return {} + + try: + if len(self.content) == 0: + self.content = self.load_json_file(self.output_filename) + self.logger.info("Load json file in load page") + + page_data = self.content.get(page_key, {}) + except Exception as e: + if self.logger: + self.logger.debug(f"Error accessing page {page_id}: {e}") + return {} + + return page_data + + def extract_header(self, pdf_path:str, page_num:int=2, header_height_ratio=0.1): + """ + Extracts text from the top portion of a page (header area). + """ + doc = fitz.open(pdf_path) + page = doc[page_num] + blocks = page.get_text("blocks") # list of (x0, y0, x1, y1, text, block_no, block_type) + + page_height = page.rect.height + header_cutoff = page_height * header_height_ratio + + header_text = [] + for b in blocks: + x0, y0, x1, y1, text, *_ = b + if y1 <= header_cutoff: # only take text in the top area + header_text.append(text.strip()) + + return "\n".join(header_text) + + + def extract_footer(self, pdf_path, page_num=2, footer_height_ratio=0.1): + """ + Extracts text from the bottom portion of a page (footer area). + """ + doc = fitz.open(pdf_path) + page = doc[page_num] + blocks = page.get_text("blocks") # (x0, y0, x1, y1, text, block_no, block_type) + + page_height = page.rect.height + footer_cutoff = page_height * (1 - footer_height_ratio) + + footer_text = [] + for b in blocks: + x0, y0, x1, y1, text, *_ = b + if y0 >= footer_cutoff: # only take text in the bottom area + footer_text.append(text.strip()) + + return "\n".join(footer_text) + + def create_document_with_pymupdf(self, filepath): + doc_map = {} + doc = pymupdf.open(filepath) + + for i,page in enumerate(doc): # iterate the document pages + header = self.extract_header(filepath, page_num=i) + footer = self.extract_footer(filepath, page_num=i) + text = page.get_text() + text = text.replace(header, "") + text = text.replace(footer, "") + doc_map[f"page {i}"]= text + return doc_map \ No newline at end of file diff --git a/be0/src/internal_control/it_governance/memory_manager.py b/be0/src/internal_control/it_governance/memory_manager.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/it_governance/response_manager.py b/be0/src/internal_control/it_governance/response_manager.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/it_operations/__init__.py b/be0/src/internal_control/it_operations/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/it_operations/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/it_operations/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..d1d6a83 Binary files /dev/null and b/be0/src/internal_control/it_operations/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/logical_physical_security/__init__.py b/be0/src/internal_control/logical_physical_security/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/logical_physical_security/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/logical_physical_security/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..0c08ce4 Binary files /dev/null and b/be0/src/internal_control/logical_physical_security/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/monitoring_logging/__init__.py b/be0/src/internal_control/monitoring_logging/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/monitoring_logging/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/monitoring_logging/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..bda21a7 Binary files /dev/null and b/be0/src/internal_control/monitoring_logging/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/network_security/__init__.py b/be0/src/internal_control/network_security/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/network_security/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/network_security/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..cbbbc78 Binary files /dev/null and b/be0/src/internal_control/network_security/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/patch_vuln_management/__init__.py b/be0/src/internal_control/patch_vuln_management/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/patch_vuln_management/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/patch_vuln_management/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..8a3afaf Binary files /dev/null and b/be0/src/internal_control/patch_vuln_management/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/security_awareness_training/__init__.py b/be0/src/internal_control/security_awareness_training/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/security_awareness_training/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/security_awareness_training/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..4932409 Binary files /dev/null and b/be0/src/internal_control/security_awareness_training/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/system_dev_lifecycle/__init__.py b/be0/src/internal_control/system_dev_lifecycle/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/system_dev_lifecycle/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/system_dev_lifecycle/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..04c2455 Binary files /dev/null and b/be0/src/internal_control/system_dev_lifecycle/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/system_dev_lifecycle/__pycache__/sdlc_planning.cpython-313.pyc b/be0/src/internal_control/system_dev_lifecycle/__pycache__/sdlc_planning.cpython-313.pyc new file mode 100644 index 0000000..a2b3f65 Binary files /dev/null and b/be0/src/internal_control/system_dev_lifecycle/__pycache__/sdlc_planning.cpython-313.pyc differ diff --git a/be0/src/internal_control/system_dev_lifecycle/sdlc_planning.py b/be0/src/internal_control/system_dev_lifecycle/sdlc_planning.py new file mode 100644 index 0000000..ca0981c --- /dev/null +++ b/be0/src/internal_control/system_dev_lifecycle/sdlc_planning.py @@ -0,0 +1,561 @@ + + + + +from langgraph.graph import StateGraph, END +from typing import TypedDict, List, Optional, Literal +from datetime import datetime +import json + +# Define the state structure for the workflow +class RMIntegrationState(TypedDict): + current_phase: str + phase_number: int + checklist_items: List[dict] + completed_items: List[int] + pending_approvals: List[str] + records_officer_involved: bool + project_status: str + comments: dict + validation_results: dict + next_phase_ready: bool + +# Phase 1: Concept Development +def phase1_concept_development(state: RMIntegrationState) -> RMIntegrationState: + """Phase 1: Concept Development - Initial records planning""" + + phase1_checklist = [ + { + "id": 1, + "task": "Include Records Officer in system design process", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + }, + { + "id": 2, + "task": "Identify records that support the business process", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 3, + "task": "Evaluate current record schedules applicability", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 4, + "task": "Determine if new record schedule is required", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 5, + "task": "Obtain Records Officer signature on Investment Summary Proposal", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + } + ] + + state["current_phase"] = "Concept Development" + state["phase_number"] = 1 + state["checklist_items"] = phase1_checklist + state["pending_approvals"] = ["Records Officer - System Design", "Records Officer - Investment Summary"] + + return state + +# Phase 2: Requirements Document +def phase2_requirements_document(state: RMIntegrationState) -> RMIntegrationState: + """Phase 2: Requirements Document - Document all records requirements""" + + phase2_checklist = [ + { + "id": 6, + "task": "Identify and incorporate all records-related requirements into CONOPS Report", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 7, + "task": "Draft new records schedules if needed", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 8, + "task": "Obtain Records Officer signature on requirements document", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + } + ] + + state["current_phase"] = "Requirements Document" + state["phase_number"] = 2 + state["checklist_items"] = phase2_checklist + state["pending_approvals"] = ["Records Officer - Requirements Document"] + + return state + +# Phase 3: Design +def phase3_design(state: RMIntegrationState) -> RMIntegrationState: + """Phase 3: Design - Incorporate records management into system design""" + + phase3_checklist = [ + { + "id": 9, + "task": "Incorporate records management requirements into system design", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 10, + "task": "Obtain Records Officer signature on system design document", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + } + ] + + state["current_phase"] = "Design" + state["phase_number"] = 3 + state["checklist_items"] = phase3_checklist + state["pending_approvals"] = ["Records Officer - System Design"] + + return state + +# Phase 4: Detailed Design +def phase4_detailed_design(state: RMIntegrationState) -> RMIntegrationState: + """Phase 4: Detailed Design - Include records staff in project meetings""" + + phase4_checklist = [ + { + "id": 11, + "task": "Include agency records management staff in project status meetings", + "status": "pending", + "requires_approval": False, + "approver": None + } + ] + + state["current_phase"] = "Detailed Design" + state["phase_number"] = 4 + state["checklist_items"] = phase4_checklist + state["pending_approvals"] = [] + + return state + +# Phase 5: Development +def phase5_development(state: RMIntegrationState) -> RMIntegrationState: + """Phase 5: Development - Continue records staff involvement and submit schedules""" + + phase5_checklist = [ + { + "id": 12, + "task": "Continue including records management staff in project meetings", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 13, + "task": "Submit proposed records schedules to NARA", + "status": "pending", + "requires_approval": False, + "approver": None + } + ] + + state["current_phase"] = "Development" + state["phase_number"] = 5 + state["checklist_items"] = phase5_checklist + state["pending_approvals"] = [] + + return state + +# Phase 6: Integration & System Testing +def phase6_integration_testing(state: RMIntegrationState) -> RMIntegrationState: + """Phase 6: Integration & System Testing - Test records management integration""" + + phase6_checklist = [ + { + "id": 14, + "task": "Incorporate records management requirements into system testing", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 15, + "task": "Obtain Records Officer signature on Systems Test Report", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + } + ] + + state["current_phase"] = "Integration & System Testing" + state["phase_number"] = 6 + state["checklist_items"] = phase6_checklist + state["pending_approvals"] = ["Records Officer - Systems Test Report"] + + return state + +# Phase 7: Deployment & Acceptance +def phase7_deployment_acceptance(state: RMIntegrationState) -> RMIntegrationState: + """Phase 7: Deployment & Acceptance - Final approvals and deployment""" + + phase7_checklist = [ + { + "id": 16, + "task": "Continue including records management staff in project meetings", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 17, + "task": "Obtain Records Officer signature on deployment approval document", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + } + ] + + state["current_phase"] = "Deployment & Acceptance" + state["phase_number"] = 7 + state["checklist_items"] = phase7_checklist + state["pending_approvals"] = ["Records Officer - Deployment Approval"] + + return state + +# Phase 8: Production +def phase8_production(state: RMIntegrationState) -> RMIntegrationState: + """Phase 8: Production - Post-deployment monitoring and compliance""" + + phase8_checklist = [ + { + "id": 18, + "task": "Complete Mid-Cycle Review (3 years after production)", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 19, + "task": "Implement disposition authorities per approved dispositions", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 20, + "task": "Send Mid-Cycle Review report to records management staff", + "status": "pending", + "requires_approval": False, + "approver": None + }, + { + "id": 21, + "task": "Obtain Records Officer signature on Mid-Cycle Review certification", + "status": "pending", + "requires_approval": True, + "approver": "Records Officer" + } + ] + + state["current_phase"] = "Production" + state["phase_number"] = 8 + state["checklist_items"] = phase8_checklist + state["pending_approvals"] = ["Records Officer - Mid-Cycle Review Certification"] + + return state + +# Validation and routing functions +def validate_phase_completion(state: RMIntegrationState) -> RMIntegrationState: + """Validate that all required items in current phase are completed""" + + validation_results = {} + all_completed = True + + for item in state["checklist_items"]: + if item["status"] != "completed": + all_completed = False + validation_results[item["id"]] = f"Item {item['id']} not completed: {item['task']}" + + if state["pending_approvals"]: + all_completed = False + validation_results["approvals"] = f"Pending approvals: {', '.join(state['pending_approvals'])}" + + state["validation_results"] = validation_results + state["next_phase_ready"] = all_completed + + return state + +def route_next_phase(state: RMIntegrationState) -> Literal["next_phase", "current_phase", "end"]: + """Route to next phase or stay in current phase based on completion status""" + + if not state["next_phase_ready"]: + return "current_phase" + elif state["phase_number"] >= 8: + return "end" + else: + return "next_phase" + +def progress_to_next_phase(state: RMIntegrationState) -> RMIntegrationState: + """Progress to the next phase in the workflow""" + + next_phase = state["phase_number"] + 1 + + # Map phase numbers to phase functions + phase_functions = { + 1: phase1_concept_development, + 2: phase2_requirements_document, + 3: phase3_design, + 4: phase4_detailed_design, + 5: phase5_development, + 6: phase6_integration_testing, + 7: phase7_deployment_acceptance, + 8: phase8_production + } + + if next_phase in phase_functions: + return phase_functions[next_phase](state) + else: + state["current_phase"] = "Completed" + state["project_status"] = "All phases completed successfully" + return state + +def stay_current_phase(state: RMIntegrationState) -> RMIntegrationState: + """Stay in current phase until requirements are met""" + + state["project_status"] = f"Phase {state['phase_number']} ({state['current_phase']}) - Requirements not met" + return state + +# Create the workflow graph +def create_rm_integration_workflow(): + """Create the LangGraph workflow for RM Integration""" + + workflow = StateGraph(RMIntegrationState) + + # Add nodes + workflow.add_node("phase1", phase1_concept_development) + workflow.add_node("phase2", phase2_requirements_document) + workflow.add_node("phase3", phase3_design) + workflow.add_node("phase4", phase4_detailed_design) + workflow.add_node("phase5", phase5_development) + workflow.add_node("phase6", phase6_integration_testing) + workflow.add_node("phase7", phase7_deployment_acceptance) + workflow.add_node("phase8", phase8_production) + workflow.add_node("validate", validate_phase_completion) + workflow.add_node("next_phase", progress_to_next_phase) + workflow.add_node("current_phase", stay_current_phase) + + # Set entry point + workflow.set_entry_point("phase1") + + # Add edges + workflow.add_edge("phase1", "validate") + workflow.add_edge("phase2", "validate") + workflow.add_edge("phase3", "validate") + workflow.add_edge("phase4", "validate") + workflow.add_edge("phase5", "validate") + workflow.add_edge("phase6", "validate") + workflow.add_edge("phase7", "validate") + workflow.add_edge("phase8", "validate") + + # Add conditional edges from validate + workflow.add_conditional_edges( + "validate", + route_next_phase, + { + "next_phase": "next_phase", + "current_phase": "current_phase", + "end": END + } + ) + + # Add edge from next_phase back to appropriate phase + workflow.add_edge("next_phase", "validate") + workflow.add_edge("current_phase", END) + + return workflow.compile() + +# Example usage and testing +def run_rm_integration_example(): + """Example of how to run the RM integration workflow""" + + # Initialize the workflow + app = create_rm_integration_workflow() + + # Initial state + initial_state = { + "current_phase": "", + "phase_number": 0, + "checklist_items": [], + "completed_items": [], + "pending_approvals": [], + "records_officer_involved": False, + "project_status": "Starting RM Integration Process", + "comments": {}, + "validation_results": {}, + "next_phase_ready": False + } + + # Run the workflow + result = app.invoke(initial_state) + + return result + +# Utility functions for workflow management +def update_item_status(state: RMIntegrationState, item_id: int, status: str, comment: str = "") -> RMIntegrationState: + """Update the status of a specific checklist item""" + + for item in state["checklist_items"]: + if item["id"] == item_id: + item["status"] = status + if status == "completed" and item_id not in state["completed_items"]: + state["completed_items"].append(item_id) + break + + if comment: + state["comments"][item_id] = comment + + return state + +def generate_status_report(state: RMIntegrationState) -> dict: + """Generate a comprehensive status report""" + + completed_count = len([item for item in state["checklist_items"] if item["status"] == "completed"]) + total_count = len(state["checklist_items"]) + + report = { + "current_phase": state["current_phase"], + "phase_number": state["phase_number"], + "completion_percentage": (completed_count / total_count * 100) if total_count > 0 else 0, + "completed_items": completed_count, + "total_items": total_count, + "pending_approvals": state["pending_approvals"], + "validation_results": state["validation_results"], + "project_status": state["project_status"], + "timestamp": datetime.now().isoformat() + } + + return report + +def test_ollama_similarity(requirement: str) -> Dict[str, Any]: + """ + Call the /test_ollama_similarity endpoint with a requirement text + + Args: + requirement: The requirement text to generate embeddings for + + Returns: + Dictionary containing embedding preview, dimensions, and model info + """ + try: + # Prepare the request payload + payload = { + "prompt": requirement + } + + # Make POST request to the API + response = requests.post( + f"{API_BASE_URL}/test_ollama_similarity", + json=payload, + headers={"Content-Type": "application/json"} + ) + + # Check if request was successful + response.raise_for_status() + + # Parse and return the JSON response + result = response.json() + + return result + + except requests.exceptions.RequestException as e: + return {"error": f"Request failed: {str(e)}"} + except json.JSONDecodeError as e: + return {"error": f"Failed to parse response: {str(e)}"} + +def test_multiple_requirements(requirements: list) -> list: + """ + Process multiple requirements and return their embeddings + + Args: + requirements: List of requirement texts + + Returns: + List of results for each requirement + """ + results = [] + + for i, req in enumerate(requirements): + result = test_ollama_similarity(req) + results.append({ + "requirement": req, + "result": result + }) + + return results + +def compare_requirements_similarity(req1: str, req2: str) -> Dict[str, Any]: + """ + Compare similarity between two requirements using cosine similarity + + Args: + req1: First requirement text + req2: Second requirement text + + Returns: + Dictionary with embeddings and similarity score + """ + import numpy as np + + # Get embeddings for both requirements + result1 = test_ollama_similarity(req1) + result2 = test_ollama_similarity(req2) + + if "error" in result1 or "error" in result2: + return { + "error": "Failed to generate embeddings", + "result1": result1, + "result2": result2 + } + + # For full comparison, we need the complete embeddings + # This is just a demo with preview data + emb1 = np.array(result1.get('embedding_preview', [])) + emb2 = np.array(result2.get('embedding_preview', [])) + + # Calculate cosine similarity + if len(emb1) > 0 and len(emb2) > 0: + dot_product = np.dot(emb1, emb2) + norm1 = np.linalg.norm(emb1) + norm2 = np.linalg.norm(emb2) + similarity = dot_product / (norm1 * norm2) + else: + similarity = None + + return { + "requirement1": req1, + "requirement2": req2, + "similarity_score": float(similarity) if similarity is not None else None + } + + + +if __name__ == "__main__": + # Run example + result = run_rm_integration_example() + print("RM Integration Workflow Result:") + print(json.dumps(generate_status_report(result), indent=2)) \ No newline at end of file diff --git a/be0/src/internal_control/system_interface_data_transfer/__init__.py b/be0/src/internal_control/system_interface_data_transfer/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/system_interface_data_transfer/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/system_interface_data_transfer/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..b3db5db Binary files /dev/null and b/be0/src/internal_control/system_interface_data_transfer/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/internal_control/vendor_management/__init__.py b/be0/src/internal_control/vendor_management/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/internal_control/vendor_management/__pycache__/__init__.cpython-313.pyc b/be0/src/internal_control/vendor_management/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..5356883 Binary files /dev/null and b/be0/src/internal_control/vendor_management/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/memory_manager.py b/be0/src/memory_manager.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/minio/002_add_upload_status.py b/be0/src/minio/002_add_upload_status.py new file mode 100644 index 0000000..6d44ea1 --- /dev/null +++ b/be0/src/minio/002_add_upload_status.py @@ -0,0 +1,51 @@ +"""add upload_status to attachments + +Tracks the two-phase upload lifecycle: + PENDING - row created, awaiting client's direct upload to MinIO + CONFIRMED - client finished upload, /confirm endpoint verified head_object + FAILED - presign issued but client never completed; swept by cron + +Revision ID: 002_add_upload_status +Revises: 001_initial_schema +""" +from alembic import op +import sqlalchemy as sa + + +revision = "002_add_upload_status" +down_revision = "001_initial_schema" + + +def upgrade(): + op.add_column( + "attachments", + sa.Column( + "upload_status", + sa.String(16), + nullable=False, + server_default="CONFIRMED", # existing rows are already confirmed + ), + ) + op.create_check_constraint( + "chk_attachment_upload_status", + "attachments", + "upload_status IN ('PENDING','CONFIRMED','FAILED')", + ) + op.add_column( + "attachments", + sa.Column("sha256", sa.String(64), nullable=True), + ) + # Partial index so the cleanup cron can find stale PENDINGs fast + op.create_index( + "idx_attach_pending", + "attachments", + ["uploaded_at"], + postgresql_where=sa.text("upload_status = 'PENDING'"), + ) + + +def downgrade(): + op.drop_index("idx_attach_pending", table_name="attachments") + op.drop_column("attachments", "sha256") + op.drop_constraint("chk_attachment_upload_status", "attachments") + op.drop_column("attachments", "upload_status") diff --git a/be0/src/minio/__init__.py b/be0/src/minio/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/be0/src/minio/__pycache__/002_add_upload_status.cpython-313.pyc b/be0/src/minio/__pycache__/002_add_upload_status.cpython-313.pyc new file mode 100644 index 0000000..130fe84 Binary files /dev/null and b/be0/src/minio/__pycache__/002_add_upload_status.cpython-313.pyc differ diff --git a/be0/src/minio/__pycache__/__init__.cpython-311.pyc b/be0/src/minio/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000..6bd455c Binary files /dev/null and b/be0/src/minio/__pycache__/__init__.cpython-311.pyc differ diff --git a/be0/src/minio/__pycache__/__init__.cpython-313.pyc b/be0/src/minio/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..194664a Binary files /dev/null and b/be0/src/minio/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/minio/__pycache__/attachments.cpython-313.pyc b/be0/src/minio/__pycache__/attachments.cpython-313.pyc new file mode 100644 index 0000000..5d47bcd Binary files /dev/null and b/be0/src/minio/__pycache__/attachments.cpython-313.pyc differ diff --git a/be0/src/minio/__pycache__/cleanup.cpython-313.pyc b/be0/src/minio/__pycache__/cleanup.cpython-313.pyc new file mode 100644 index 0000000..0611569 Binary files /dev/null and b/be0/src/minio/__pycache__/cleanup.cpython-313.pyc differ diff --git a/be0/src/minio/__pycache__/storage.cpython-311.pyc b/be0/src/minio/__pycache__/storage.cpython-311.pyc new file mode 100644 index 0000000..e8ff5ca Binary files /dev/null and b/be0/src/minio/__pycache__/storage.cpython-311.pyc differ diff --git a/be0/src/minio/__pycache__/storage.cpython-313.pyc b/be0/src/minio/__pycache__/storage.cpython-313.pyc new file mode 100644 index 0000000..75279f1 Binary files /dev/null and b/be0/src/minio/__pycache__/storage.cpython-313.pyc differ diff --git a/be0/src/minio/__pycache__/test_storage.cpython-313-pytest-8.3.4.pyc b/be0/src/minio/__pycache__/test_storage.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..6ab6747 Binary files /dev/null and b/be0/src/minio/__pycache__/test_storage.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/src/minio/__pycache__/test_storage.cpython-313.pyc b/be0/src/minio/__pycache__/test_storage.cpython-313.pyc new file mode 100644 index 0000000..4384814 Binary files /dev/null and b/be0/src/minio/__pycache__/test_storage.cpython-313.pyc differ diff --git a/be0/src/minio/attachments.py b/be0/src/minio/attachments.py new file mode 100644 index 0000000..948688a --- /dev/null +++ b/be0/src/minio/attachments.py @@ -0,0 +1,322 @@ +""" +Attachment API endpoints. + +Two upload flows: + (A) Small files (< 10 MB): multipart/form-data → FastAPI → MinIO + (B) Large files: client requests signed URL → uploads direct to MinIO +""" +from __future__ import annotations + +import io +from typing import Annotated + +from fastapi import APIRouter, Depends, File, HTTPException, UploadFile, status +from fastapi.responses import StreamingResponse +from pydantic import BaseModel, Field + +from app.auth.dependencies import get_current_user, User +from app.db import get_db +from app.storage import storage, settings, ALLOWED_MIME_TYPES, StorageError +from app.repositories import attachment_repo, application_repo + +router = APIRouter(prefix="/applications/{app_id}/attachments", tags=["attachments"]) + +SMALL_FILE_LIMIT_BYTES = 10 * 1024 * 1024 # 10 MB + + +# ---------------------------------------------------------------------- # +# Request / response models +# ---------------------------------------------------------------------- # +class AttachmentOut(BaseModel): + attachment_id: int + file_name: str + file_size: int + mime_type: str + kind: str | None + uploaded_at: str + download_url: str | None = None + + +class PresignedUploadRequest(BaseModel): + file_name: str = Field(min_length=1, max_length=255) + mime_type: str + file_size: int = Field(gt=0) + kind: str | None = None + + +class PresignedUploadResponse(BaseModel): + upload_url: str + headers: dict[str, str] + object_key: str + # Client calls POST /attachments/confirm with this after upload completes + pending_attachment_id: int + + +# ---------------------------------------------------------------------- # +# (A) Direct upload: multipart → FastAPI → MinIO +# ---------------------------------------------------------------------- # +@router.post("", response_model=AttachmentOut, status_code=status.HTTP_201_CREATED) +async def upload_attachment( + app_id: int, + file: Annotated[UploadFile, File(description="Attachment file")], + kind: str | None = None, + user: User = Depends(get_current_user), + db=Depends(get_db), +): + """ + Upload a small file (< 10 MB) through the API. + + For files larger than this, call POST /attachments/presign instead — + proxying large uploads through FastAPI wastes memory and CPU. + """ + # 1. Authorization: user must be author of the app, and app must be DRAFT + app = await application_repo.get(db, app_id) + if app is None: + raise HTTPException(404, "Application not found") + if app.status != "DRAFT": + raise HTTPException(422, "Attachments can only be added to DRAFT applications") + if not await application_repo.is_author(db, app_id, user.id): + raise HTTPException(403, "Only authors may upload attachments") + + # 2. Validate MIME type — never trust the client alone + if file.content_type not in ALLOWED_MIME_TYPES: + raise HTTPException(422, f"MIME type {file.content_type} not allowed") + + # 3. Check declared size + if file.size and file.size > SMALL_FILE_LIMIT_BYTES: + raise HTTPException( + 413, + "File too large for direct upload. Use POST /presign for large files.", + ) + + # 4. Stream to MinIO + object_key = storage.build_key(app_id, file.filename or "file") + try: + result = await storage.upload( + bucket=settings.s3_bucket_attachments, + key=object_key, + fileobj=file.file, + mime_type=file.content_type, + metadata={"uploaded_by": str(user.id), "app_id": str(app_id)}, + ) + except ValueError as exc: + raise HTTPException(422, str(exc)) from exc + except StorageError as exc: + raise HTTPException(502, f"Storage unavailable: {exc}") from exc + + # 5. Write metadata to Postgres (storage is the source of truth for bytes, + # DB is the source of truth for metadata — two-phase write) + attachment = await attachment_repo.insert( + db, + application_id=app_id, + file_name=file.filename, + file_path=object_key, + file_size=result["size"], + mime_type=file.content_type, + kind=kind, + uploaded_by=user.id, + ) + await db.commit() + + return AttachmentOut( + attachment_id=attachment.attachment_id, + file_name=attachment.file_name, + file_size=attachment.file_size, + mime_type=attachment.mime_type, + kind=attachment.kind, + uploaded_at=attachment.uploaded_at.isoformat(), + ) + + +# ---------------------------------------------------------------------- # +# (B) Large-file upload: presigned URL flow +# ---------------------------------------------------------------------- # +@router.post("/presign", response_model=PresignedUploadResponse) +async def presign_upload( + app_id: int, + req: PresignedUploadRequest, + user: User = Depends(get_current_user), + db=Depends(get_db), +): + """ + Step 1 of large-file upload: client requests a signed URL. + + Flow: + 1. Client POSTs metadata → receives {upload_url, pending_attachment_id} + 2. Client PUTs the file bytes to upload_url (direct to MinIO) + 3. Client POSTs /attachments/confirm with pending_attachment_id + """ + app = await application_repo.get(db, app_id) + if app is None or app.status != "DRAFT": + raise HTTPException(422, "Invalid application state") + if not await application_repo.is_author(db, app_id, user.id): + raise HTTPException(403, "Only authors may upload attachments") + if req.mime_type not in ALLOWED_MIME_TYPES: + raise HTTPException(422, "MIME type not allowed") + if req.file_size > settings.max_upload_size_mb * 1024 * 1024: + raise HTTPException(413, "File exceeds maximum size") + + object_key = storage.build_key(app_id, req.file_name) + + # Write a PENDING attachment row — gets finalized on /confirm + pending = await attachment_repo.insert_pending( + db, + application_id=app_id, + file_name=req.file_name, + file_path=object_key, + file_size=req.file_size, + mime_type=req.mime_type, + kind=req.kind, + uploaded_by=user.id, + ) + await db.commit() + + presigned = await storage.get_upload_url( + bucket=settings.s3_bucket_attachments, + key=object_key, + mime_type=req.mime_type, + ) + + return PresignedUploadResponse( + upload_url=presigned["url"], + headers=presigned["headers"], + object_key=object_key, + pending_attachment_id=pending.attachment_id, + ) + + +@router.post("/{pending_id}/confirm", response_model=AttachmentOut) +async def confirm_upload( + app_id: int, + pending_id: int, + user: User = Depends(get_current_user), + db=Depends(get_db), +): + """Step 3: confirm the direct upload succeeded; verify the object exists.""" + pending = await attachment_repo.get(db, pending_id) + if pending is None or pending.application_id != app_id: + raise HTTPException(404, "Pending attachment not found") + if pending.uploaded_by != user.id: + raise HTTPException(403, "Not your upload") + + # Verify MinIO actually received the file + try: + head = await storage.head( + bucket=settings.s3_bucket_attachments, key=pending.file_path + ) + except FileNotFoundError: + raise HTTPException(422, "Upload never completed") + + # Optionally verify size matches declaration + actual_size = head["ContentLength"] + if actual_size != pending.file_size: + # Client lied about size — quarantine and reject + await storage.move( + settings.s3_bucket_attachments, + pending.file_path, + settings.s3_bucket_quarantine, + pending.file_path, + ) + await attachment_repo.delete(db, pending_id) + await db.commit() + raise HTTPException(422, "File size mismatch") + + await attachment_repo.mark_confirmed(db, pending_id) + await db.commit() + + return AttachmentOut( + attachment_id=pending.attachment_id, + file_name=pending.file_name, + file_size=actual_size, + mime_type=pending.mime_type, + kind=pending.kind, + uploaded_at=pending.uploaded_at.isoformat(), + ) + + +# ---------------------------------------------------------------------- # +# Download +# ---------------------------------------------------------------------- # +@router.get("/{attachment_id}") +async def download_attachment( + app_id: int, + attachment_id: int, + user: User = Depends(get_current_user), + db=Depends(get_db), +): + """ + Returns a signed URL. The browser follows it to download directly from + MinIO — our API doesn't proxy the bytes, which saves significant bandwidth + and CPU. + """ + attachment = await attachment_repo.get(db, attachment_id) + if attachment is None or attachment.application_id != app_id: + raise HTTPException(404) + + # Authorization + if not await application_repo.user_can_read(db, app_id, user.id): + raise HTTPException(403) + + url = await storage.get_download_url( + bucket=settings.s3_bucket_attachments, + key=attachment.file_path, + filename=attachment.file_name, + ) + return {"download_url": url, "expires_in": settings.s3_signed_url_ttl} + + +@router.get("/{attachment_id}/stream") +async def stream_attachment( + app_id: int, + attachment_id: int, + user: User = Depends(get_current_user), + db=Depends(get_db), +): + """ + Alternative: proxy the stream through FastAPI. Use this when you need + to apply additional authorization/watermarking server-side, at the cost + of extra bandwidth on your API servers. + """ + attachment = await attachment_repo.get(db, attachment_id) + if attachment is None or attachment.application_id != app_id: + raise HTTPException(404) + if not await application_repo.user_can_read(db, app_id, user.id): + raise HTTPException(403) + + return StreamingResponse( + storage.download_stream( + bucket=settings.s3_bucket_attachments, key=attachment.file_path + ), + media_type=attachment.mime_type, + headers={ + "Content-Disposition": f'attachment; filename="{attachment.file_name}"' + }, + ) + + +# ---------------------------------------------------------------------- # +# Delete +# ---------------------------------------------------------------------- # +@router.delete("/{attachment_id}", status_code=status.HTTP_204_NO_CONTENT) +async def delete_attachment( + app_id: int, + attachment_id: int, + user: User = Depends(get_current_user), + db=Depends(get_db), +): + app = await application_repo.get(db, app_id) + if app.status != "DRAFT": + raise HTTPException(422, "Can only delete attachments on DRAFT applications") + if not await application_repo.is_author(db, app_id, user.id): + raise HTTPException(403) + + attachment = await attachment_repo.get(db, attachment_id) + if attachment is None or attachment.application_id != app_id: + raise HTTPException(404) + + # Delete from MinIO first. If the DB delete fails, we have an orphan + # in storage (recoverable — versioning), which is better than an + # orphan in the DB that points to nothing. + await storage.delete(settings.s3_bucket_attachments, attachment.file_path) + await attachment_repo.delete(db, attachment_id) + await db.commit() diff --git a/be0/src/minio/cleanup.py b/be0/src/minio/cleanup.py new file mode 100644 index 0000000..727085f --- /dev/null +++ b/be0/src/minio/cleanup.py @@ -0,0 +1,121 @@ +""" +Orphan cleanup — runs daily via Celery Beat. + +Sweeps two kinds of inconsistency between PostgreSQL and MinIO: + 1. PENDING attachments older than 1 hour → delete the row (and object if any) + 2. Objects in MinIO with no DB row → move to quarantine for 7 days then delete + +Run schedule: every day at 03:00 UTC (after nightly backup, before business hours) +""" +from __future__ import annotations + +import logging +from datetime import datetime, timezone, timedelta + +from celery import shared_task +from sqlalchemy import text + +from app.db import sync_engine +from app.storage import storage, settings + +logger = logging.getLogger(__name__) + +STALE_PENDING_AGE = timedelta(hours=1) + + +@shared_task(bind=True, max_retries=3) +def cleanup_storage_orphans(self): + """Reconcile PostgreSQL attachments table with MinIO bucket contents.""" + stats = {"pending_deleted": 0, "orphan_objects_quarantined": 0} + + # ---- Pass 1: expire stale PENDINGs ---- + cutoff = datetime.now(tz=timezone.utc) - STALE_PENDING_AGE + with sync_engine.begin() as conn: + rows = conn.execute( + text( + """ + SELECT attachment_id, file_path + FROM attachments + WHERE upload_status = 'PENDING' + AND uploaded_at < :cutoff + """ + ), + {"cutoff": cutoff}, + ).fetchall() + + for row in rows: + # Try to delete from MinIO (may not exist) + try: + _sync_delete(settings.s3_bucket_attachments, row.file_path) + except Exception as exc: + logger.warning("MinIO delete failed for %s: %s", row.file_path, exc) + + conn.execute( + text("DELETE FROM attachments WHERE attachment_id = :id"), + {"id": row.attachment_id}, + ) + stats["pending_deleted"] += 1 + + # ---- Pass 2: find objects in MinIO with no matching DB row ---- + with sync_engine.begin() as conn: + db_keys = { + r.file_path + for r in conn.execute(text("SELECT file_path FROM attachments")) + } + + orphan_keys = [] + for key in _sync_list_objects(settings.s3_bucket_attachments): + if key not in db_keys: + orphan_keys.append(key) + + # Quarantine orphans — don't delete outright in case it's a race condition + # with an in-flight upload. The quarantine bucket auto-expires in 7 days. + for key in orphan_keys: + try: + _sync_move( + settings.s3_bucket_attachments, + key, + settings.s3_bucket_quarantine, + key, + ) + stats["orphan_objects_quarantined"] += 1 + except Exception as exc: + logger.exception("Failed to quarantine %s: %s", key, exc) + + logger.info("Orphan cleanup done: %s", stats) + return stats + + +# Sync helpers for boto3 inside Celery (aioboto3 is for async FastAPI routes) +def _sync_client(): + import boto3 + + return boto3.client( + "s3", + endpoint_url=settings.s3_endpoint_url, + aws_access_key_id=settings.s3_access_key, + aws_secret_access_key=settings.s3_secret_key, + region_name=settings.s3_region, + ) + + +def _sync_delete(bucket, key): + _sync_client().delete_object(Bucket=bucket, Key=key) + + +def _sync_list_objects(bucket): + s3 = _sync_client() + paginator = s3.get_paginator("list_objects_v2") + for page in paginator.paginate(Bucket=bucket): + for obj in page.get("Contents", []): + yield obj["Key"] + + +def _sync_move(src_bucket, src_key, dst_bucket, dst_key): + s3 = _sync_client() + s3.copy_object( + Bucket=dst_bucket, + Key=dst_key, + CopySource={"Bucket": src_bucket, "Key": src_key}, + ) + s3.delete_object(Bucket=src_bucket, Key=src_key) diff --git a/be0/src/minio/storage.py b/be0/src/minio/storage.py new file mode 100644 index 0000000..47c8385 --- /dev/null +++ b/be0/src/minio/storage.py @@ -0,0 +1,407 @@ +""" +Async S3/MinIO client. Abstracts the details so application code never +touches boto3 directly. +""" +from __future__ import annotations + +import hashlib +import io +import logging +import uuid +from datetime import datetime, timezone +from typing import AsyncIterator, BinaryIO + +import aioboto3 +from botocore.config import Config as BotoConfig +from botocore.exceptions import ClientError +from pydantic_settings import BaseSettings + +logger = logging.getLogger(__name__) + + +class S3Settings(BaseSettings): + s3_endpoint_url: str + """Host the API uses to reach S3 (e.g. http://minio:9000 inside Docker).""" + s3_public_endpoint_url: str | None = None + """If set, presigned GET/PUT URLs use this host so browsers can open them (e.g. http://localhost:19000).""" + s3_region: str = "us-east-1" + s3_access_key: str + s3_secret_key: str + s3_bucket_attachments: str + s3_bucket_exports: str + s3_bucket_quarantine: str + s3_bucket_templates: str = "initiative-templates" + """Admin-managed .docx templates (server-side only; no browser CORS needed).""" + s3_bucket_imagehub_blobs: str = "imagehub-blobs" + """ImageHub content-addressed blob store (imaging dataset files; deduped by sha256).""" + s3_signed_url_ttl: int = 900 # 15 minutes + max_upload_size_mb: int = 50 + max_blob_size_mb: int = 2048 # ImageHub imaging blobs are large (DICOM series, NIfTI volumes) + + class Config: + env_file = ".env" + + +settings = S3Settings() + + +# Allowed MIME types — validated server-side, never trust the client +ALLOWED_MIME_TYPES = { + "application/pdf", + "image/png", + "image/jpeg", + "image/webp", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document", # docx + "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", # xlsx + "application/vnd.openxmlformats-officedocument.presentationml.presentation", # pptx + "application/msword", + "application/vnd.ms-excel", + "text/plain", +} + + +class S3Storage: + """ + Async context-manager friendly wrapper around MinIO/S3. + + Design choices: + - Keys are content-addressed: {app_id}/{yyyy}/{mm}/{uuid}-{safe_filename} + This avoids collisions, makes listing by app cheap, and sharding by + month keeps directory-style prefixes from getting too deep. + - Never streams a whole file into memory — chunked upload/download + for large attachments. + - All errors are caught and re-raised as domain exceptions. + """ + + def __init__(self, settings: S3Settings = settings): + self._settings = settings + self._session = aioboto3.Session( + aws_access_key_id=settings.s3_access_key, + aws_secret_access_key=settings.s3_secret_key, + region_name=settings.s3_region, + ) + + def _client(self): + """Return a client as an async context manager.""" + return self._session.client( + "s3", + endpoint_url=self._settings.s3_endpoint_url, + # Path-style addressing works with both MinIO and S3 + config=BotoConfig(signature_version="s3v4"), + ) + + def _client_presign(self): + """Presigned URLs must use a host reachable from the user's browser (often not minio:9000).""" + ep = self._settings.s3_public_endpoint_url or self._settings.s3_endpoint_url + return self._session.client( + "s3", + endpoint_url=ep, + config=BotoConfig(signature_version="s3v4"), + ) + + async def ensure_buckets_exist(self) -> None: + """Create configured buckets if they do not exist (local MinIO / first boot).""" + names = ( + self._settings.s3_bucket_attachments, + self._settings.s3_bucket_exports, + self._settings.s3_bucket_quarantine, + self._settings.s3_bucket_templates, + self._settings.s3_bucket_imagehub_blobs, + ) + for name in names: + async with self._client() as s3: + try: + await s3.create_bucket(Bucket=name) + except ClientError as exc: + code = (exc.response or {}).get("Error", {}).get("Code", "") + if code in ( + "BucketAlreadyOwnedByYou", + "BucketAlreadyExists", + ): + continue + raise + logger.info("S3 bucket ready: %s", name) + + # ------------------------------------------------------------------ # + # Key construction + # ------------------------------------------------------------------ # + @staticmethod + def build_key(application_id: int, filename: str) -> str: + """ + Build a unique, safe object key for an attachment. + + Example: 42/2025/10/7c9e8a1b-4d5f-flowchart.pdf + """ + now = datetime.now(tz=timezone.utc) + safe = _sanitize_filename(filename) + unique = uuid.uuid4().hex[:16] + return f"{application_id}/{now:%Y}/{now:%m}/{unique}-{safe}" + + @staticmethod + def build_key_for_initiative(initiative_id: uuid.UUID, filename: str) -> str: + """ + Same as build_key but namespaces by initiative UUID (initiative application drafts). + """ + now = datetime.now(tz=timezone.utc) + safe = _sanitize_filename(filename) + unique = uuid.uuid4().hex[:16] + iid = str(initiative_id).replace("-", "") + return f"initiatives/{iid}/{now:%Y}/{now:%m}/{unique}-{safe}" + + # ------------------------------------------------------------------ # + # Upload + # ------------------------------------------------------------------ # + async def upload( + self, + bucket: str, + key: str, + fileobj: BinaryIO, + mime_type: str, + metadata: dict[str, str] | None = None, + ) -> dict: + """ + Upload a file-like object to MinIO. Validates MIME type and size, + computes SHA-256 for integrity, returns result metadata. + """ + if mime_type not in ALLOWED_MIME_TYPES: + raise ValueError(f"MIME type not allowed: {mime_type}") + + # Compute size + hash while buffering + data = fileobj.read() + size = len(data) + max_bytes = self._settings.max_upload_size_mb * 1024 * 1024 + if size > max_bytes: + raise ValueError( + f"File too large: {size} bytes > {max_bytes} bytes limit" + ) + sha256 = hashlib.sha256(data).hexdigest() + + meta = {"sha256": sha256, **(metadata or {})} + + async with self._client() as s3: + try: + await s3.put_object( + Bucket=bucket, + Key=key, + Body=io.BytesIO(data), + ContentType=mime_type, + Metadata=meta, + # Omit ServerSideEncryption: default MinIO returns NotImplemented if KMS/SSE + # is not configured. Use bucket policies or a compatible backend for SSE. + ) + except ClientError as exc: + logger.exception("S3 upload failed: bucket=%s key=%s", bucket, key) + raise StorageError(f"Upload failed: {exc}") from exc + + logger.info( + "Uploaded s3://%s/%s size=%d sha256=%s", bucket, key, size, sha256[:12] + ) + return {"bucket": bucket, "key": key, "size": size, "sha256": sha256} + + # ------------------------------------------------------------------ # + # Download (streamed) + # ------------------------------------------------------------------ # + async def download_stream( + self, bucket: str, key: str, chunk_size: int = 64 * 1024 + ) -> AsyncIterator[bytes]: + """ + Stream the object body in chunks. Use for FastAPI StreamingResponse. + """ + async with self._client() as s3: + try: + obj = await s3.get_object(Bucket=bucket, Key=key) + except ClientError as exc: + if exc.response["Error"]["Code"] == "NoSuchKey": + raise FileNotFoundError(f"Object not found: {key}") from exc + raise StorageError(f"Download failed: {exc}") from exc + + # aiobotocore: `async with body as stream` yields aiohttp ClientResponse; + # `stream.read(n)` fails (read() takes no size). Keep the Body wrapper and + # read from `body` inside `async with body:` so chunk sizes work. See aiobotocore#1156. + body = obj["Body"] + async with body: + while True: + chunk = await body.read(chunk_size) + if not chunk: + break + yield chunk + + # ------------------------------------------------------------------ # + # Signed URLs + # ------------------------------------------------------------------ # + async def get_download_url( + self, + bucket: str, + key: str, + ttl: int | None = None, + filename: str | None = None, + *, + inline: bool = False, + response_content_type: str | None = None, + ) -> str: + """ + Generate a pre-signed URL so the browser can GET directly from MinIO. + + - ``inline=False`` (default): Content-Disposition attachment — save-as / download. + - ``inline=True``: Content-Disposition inline — PDF/image viewers and iframes. + - ``response_content_type``: optional override (e.g. application/pdf) for clients + that rely on the response header when the stored object metadata is wrong. + """ + ttl = ttl or self._settings.s3_signed_url_ttl + params: dict = {"Bucket": bucket, "Key": key} + if filename: + safe = _sanitize_filename(filename) + disp = "inline" if inline else "attachment" + params["ResponseContentDisposition"] = f'{disp}; filename="{safe}"' + elif inline: + params["ResponseContentDisposition"] = "inline" + if response_content_type: + params["ResponseContentType"] = response_content_type + async with self._client_presign() as s3: + return await s3.generate_presigned_url( + "get_object", Params=params, ExpiresIn=ttl + ) + + async def get_upload_url( + self, bucket: str, key: str, mime_type: str, ttl: int | None = None + ) -> dict: + """ + Generate a pre-signed URL for direct browser → MinIO upload. + Returns URL + required headers. Use this for large files to bypass + the FastAPI request-body limit. + """ + ttl = ttl or self._settings.s3_signed_url_ttl + async with self._client_presign() as s3: + url = await s3.generate_presigned_url( + "put_object", + Params={ + "Bucket": bucket, + "Key": key, + "ContentType": mime_type, + }, + ExpiresIn=ttl, + ) + return {"url": url, "headers": {"Content-Type": mime_type}} + + # ------------------------------------------------------------------ # + # Delete / copy + # ------------------------------------------------------------------ # + async def delete(self, bucket: str, key: str) -> None: + """ + Delete an object. With versioning enabled, this creates a delete + marker — the previous version is still recoverable until lifecycle + rules remove it. + """ + async with self._client() as s3: + await s3.delete_object(Bucket=bucket, Key=key) + logger.info("Deleted s3://%s/%s", bucket, key) + + async def move( + self, src_bucket: str, src_key: str, dst_bucket: str, dst_key: str + ) -> None: + """Atomic move: copy then delete. Used when quarantining suspicious files.""" + async with self._client() as s3: + await s3.copy_object( + Bucket=dst_bucket, + Key=dst_key, + CopySource={"Bucket": src_bucket, "Key": src_key}, + ) + await s3.delete_object(Bucket=src_bucket, Key=src_key) + + async def head(self, bucket: str, key: str) -> dict: + """Fetch object metadata without downloading it.""" + async with self._client() as s3: + try: + return await s3.head_object(Bucket=bucket, Key=key) + except ClientError as exc: + if exc.response["Error"]["Code"] == "404": + raise FileNotFoundError(f"Object not found: {key}") from exc + raise + + # ------------------------------------------------------------------ # + # Content-addressed blobs (ImageHub) — dedup by sha256 + # ------------------------------------------------------------------ # + @staticmethod + def build_blob_key(sha256: str) -> str: + """Content-addressed key ``blobs///`` (sharded by hash prefix).""" + h = (sha256 or "").lower() + aa = h[0:2] if len(h) >= 2 else "00" + bb = h[2:4] if len(h) >= 4 else "00" + return f"blobs/{aa}/{bb}/{h}" + + async def blob_exists(self, bucket: str, key: str) -> bool: + """True if an object already exists at key (used for content-addressed dedup).""" + async with self._client() as s3: + try: + await s3.head_object(Bucket=bucket, Key=key) + return True + except ClientError as exc: + code = (exc.response or {}).get("Error", {}).get("Code", "") + if code in ("404", "NoSuchKey", "NotFound"): + return False + raise + + async def put_blob(self, data: bytes, media_type: str | None = None) -> dict: + """ + Store raw bytes as a content-addressed, globally deduped blob. + + Hashes the bytes (sha256) and PUTs to ``blobs///`` in the ImageHub + bucket only if not already present (``deduped`` tells the caller which happened). + Accepts any ``media_type`` — imaging MIME is unreliable (DICOM is often + ``application/octet-stream``) — but enforces ``max_blob_size_mb``. + + NOTE: like ``upload()``, this buffers the whole blob in memory; resumable/ + multipart upload for very large series is a later milestone. + """ + size = len(data) + max_bytes = self._settings.max_blob_size_mb * 1024 * 1024 + if size > max_bytes: + raise ValueError(f"Blob too large: {size} bytes > {max_bytes} bytes limit") + sha256 = hashlib.sha256(data).hexdigest() + bucket = self._settings.s3_bucket_imagehub_blobs + key = self.build_blob_key(sha256) + ctype = media_type or "application/octet-stream" + if await self.blob_exists(bucket, key): + return {"sha256": sha256, "size": size, "bucket": bucket, "key": key, + "media_type": ctype, "deduped": True} + async with self._client() as s3: + try: + await s3.put_object( + Bucket=bucket, Key=key, Body=io.BytesIO(data), ContentType=ctype, + Metadata={"sha256": sha256}, + ) + except ClientError as exc: + logger.exception("S3 put_blob failed: bucket=%s key=%s", bucket, key) + raise StorageError(f"Blob upload failed: {exc}") from exc + logger.info("Stored blob s3://%s/%s size=%d sha256=%s", bucket, key, size, sha256[:12]) + return {"sha256": sha256, "size": size, "bucket": bucket, "key": key, + "media_type": ctype, "deduped": False} + + +# ---------------------------------------------------------------------- # +# Helpers +# ---------------------------------------------------------------------- # +def _sanitize_filename(name: str) -> str: + """ + Strip path components and replace unsafe characters. Preserves + Vietnamese diacritics because MinIO keys are UTF-8. + """ + import re + import unicodedata + + # Strip any path separators + name = name.replace("/", "_").replace("\\", "_") + # Collapse whitespace + name = re.sub(r"\s+", "_", name.strip()) + # Remove control chars + name = "".join(ch for ch in name if unicodedata.category(ch)[0] != "C") + # Max length 200 chars (S3 key limit is 1024, but be conservative) + return name[:200] or "file" + + +class StorageError(Exception): + """Raised for any storage-layer failure.""" + + +# Module-level singleton +storage = S3Storage() diff --git a/be0/src/minio/test_storage.py b/be0/src/minio/test_storage.py new file mode 100644 index 0000000..c3f9aea --- /dev/null +++ b/be0/src/minio/test_storage.py @@ -0,0 +1,131 @@ +""" +Validation test — exercises the S3/MinIO integration logic against a mock S3. +Verifies: upload, download, presigned URLs, metadata, MIME validation. +""" +import io +import boto3 +from moto import mock_aws +import hashlib + +BUCKET = "sangkien-attachments" + +ALLOWED = {"application/pdf", "image/png", "image/jpeg"} + + +def build_key(app_id, filename): + from datetime import datetime, timezone + import uuid + now = datetime.now(tz=timezone.utc) + unique = uuid.uuid4().hex[:16] + safe = filename.replace("/", "_").replace(" ", "_") + return f"{app_id}/{now:%Y}/{now:%m}/{unique}-{safe}" + + +def upload(s3, key, data, mime_type, metadata): + if mime_type not in ALLOWED: + raise ValueError(f"MIME not allowed: {mime_type}") + sha = hashlib.sha256(data).hexdigest() + s3.put_object( + Bucket=BUCKET, Key=key, Body=data, + ContentType=mime_type, Metadata={"sha256": sha, **metadata}, + ServerSideEncryption="AES256", + ) + return {"key": key, "size": len(data), "sha256": sha} + + +@mock_aws +def test_full_flow(): + s3 = boto3.client("s3", region_name="us-east-1") + s3.create_bucket(Bucket=BUCKET) + + # 1. Upload a fake PDF + fake_pdf = b"%PDF-1.4\n%fake content for testing\n" * 100 + key = build_key(app_id=42, filename="flowchart sáng kiến.pdf") + result = upload(s3, key, fake_pdf, "application/pdf", + {"uploaded_by": "7", "app_id": "42"}) + print(f"✓ Uploaded: {result['key']}") + print(f" size={result['size']} sha256={result['sha256'][:16]}...") + + # 2. Head object — verify metadata survived + head = s3.head_object(Bucket=BUCKET, Key=key) + assert head["ContentType"] == "application/pdf" + assert head["Metadata"]["uploaded_by"] == "7" + assert head["Metadata"]["sha256"] == result["sha256"] + print(f"✓ Metadata preserved: uploaded_by={head['Metadata']['uploaded_by']}") + + # 3. Generate pre-signed download URL + download_url = s3.generate_presigned_url( + "get_object", + Params={"Bucket": BUCKET, "Key": key, + "ResponseContentDisposition": 'attachment; filename="flowchart.pdf"'}, + ExpiresIn=900, + ) + assert "Signature=" in download_url and "Expires=" in download_url + print(f"✓ Signed download URL generated (TTL=900s)") + + # 4. Generate pre-signed upload URL (for large-file flow) + upload_url = s3.generate_presigned_url( + "put_object", + Params={"Bucket": BUCKET, "Key": "42/2025/10/new-large-file.pdf", + "ContentType": "application/pdf"}, + ExpiresIn=900, + ) + assert "Signature=" in upload_url + print(f"✓ Signed upload URL generated") + + # 5. Download and verify bytes match + obj = s3.get_object(Bucket=BUCKET, Key=key) + got = obj["Body"].read() + assert got == fake_pdf + assert hashlib.sha256(got).hexdigest() == result["sha256"] + print(f"✓ Download: {len(got)} bytes, hash matches") + + # 6. MIME validation rejects bad types + try: + upload(s3, "bad.exe", b"MZ\x90", "application/x-msdownload", {}) + assert False, "Should have rejected .exe" + except ValueError as e: + print(f"✓ MIME validation blocked: {e}") + + # 7. Delete creates a delete marker (versioning), object recoverable + s3.put_bucket_versioning( + Bucket=BUCKET, + VersioningConfiguration={"Status": "Enabled"}, + ) + # Re-upload with versioning on, then delete + key2 = "42/2025/10/versioned.pdf" + s3.put_object(Bucket=BUCKET, Key=key2, Body=b"v1", ContentType="application/pdf") + s3.delete_object(Bucket=BUCKET, Key=key2) + versions = s3.list_object_versions(Bucket=BUCKET, Prefix=key2) + has_delete_marker = any(dm for dm in versions.get("DeleteMarkers", [])) + has_version = any(v for v in versions.get("Versions", [])) + assert has_delete_marker and has_version + print(f"✓ Versioning: delete marker present, previous version recoverable") + + # 8. List objects under a prefix (application-scoped listing) + resp = s3.list_objects_v2(Bucket=BUCKET, Prefix="42/") + keys = [o["Key"] for o in resp.get("Contents", [])] + print(f"✓ Listed {len(keys)} objects under prefix 42/") + + print("\n✅ All checks passed") + + +@mock_aws +def test_research_evidence_pdf_upload_metadata(): + """Parity check: applicant research PDF uses same bucket + PDF MIME as attachment pipeline.""" + s3 = boto3.client("s3", region_name="us-east-1") + s3.create_bucket(Bucket=BUCKET) + pdf = b"%PDF-1.4 research evidence fixture\n" + key = build_key(app_id=99, filename="minh-chung-nhom-2.1.4.pdf") + meta = {"uploaded_by": "1", "app_id": "99", "case_code": "CASE-MERIT", "role": "research_evidence"} + result = upload(s3, key, pdf, "application/pdf", meta) + head = s3.head_object(Bucket=BUCKET, Key=key) + assert head["Metadata"]["role"] == "research_evidence" + assert head["Metadata"]["case_code"] == "CASE-MERIT" + assert head["ContentType"] == "application/pdf" + assert result["sha256"] == hashlib.sha256(pdf).hexdigest() + + +if __name__ == "__main__": + test_full_flow() + test_research_evidence_pdf_upload_metadata() diff --git a/be0/src/research_routes.py b/be0/src/research_routes.py new file mode 100644 index 0000000..511308b --- /dev/null +++ b/be0/src/research_routes.py @@ -0,0 +1,827 @@ +"""Research-project proposals (Thuyết minh đề tài) + lifecycle. + +A PI fills the proposal form; the row goes draft → submitted; an admin approves it +(→ approved, the project's cockpit unlocks) or rejects it. The proposal row *is* the +project across its lifecycle; the cockpit's child entities (members/datasets/models/ +assets/milestones) hang off it (added in phase 2). + +Authz (v1): a project is readable / mutable by its owner OR a platform admin. +Approve / reject are admin-only. Every lifecycle transition writes an append-only audit row. + +Mounted under ``/api/v1`` in main.py → routes live at ``/api/v1/research/*``. +""" +from __future__ import annotations + +import uuid +from datetime import datetime, timezone +from typing import Any, Optional + +from fastapi import APIRouter, Body, Header, HTTPException +from pydantic import BaseModel, Field +from sqlalchemy import func, select + +from src.auth_jwt import decode_access_token_user_id, decode_bearer_token +from src.initiative_db.engine import get_session, is_postgres_enabled +from src.initiative_db.models import ( + ResearchProject, + ResearchProjectAsset, + ResearchProjectAudit, + ResearchProjectDataset, + ResearchProjectMember, + ResearchProjectMilestone, + ResearchProjectModel, + User, +) + +router = APIRouter(prefix="/research", tags=["research"]) + +_STATUS_DRAFT = "draft" +_STATUS_SUBMITTED = "submitted" +_STATUS_APPROVED = "approved" +_STATUS_REJECTED = "rejected" + +_ROLE_PI = "Chủ nhiệm (PI)" +_ROLE_ADMIN = "Quản trị viên" + + +# --------------------------------------------------------------------------- # +# Auth (mirrors template_routes / the extracted admin routers) +# --------------------------------------------------------------------------- # +def _jwt_roles(authorization: str | None) -> list[str]: + p = decode_bearer_token(authorization) + if not p: + return [] + r = p.get("roles") + return [str(x) for x in r] if isinstance(r, list) else [] + + +def _require_authed_uid(authorization: str | None) -> uuid.UUID: + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để thực hiện thao tác.") + return uid + + +def _is_admin(authorization: str | None) -> bool: + return "admin" in _jwt_roles(authorization) + + +def _require_admin_uid(authorization: str | None) -> uuid.UUID: + uid = _require_authed_uid(authorization) + if not _is_admin(authorization): + raise HTTPException(status_code=403, detail="Chỉ tài khoản quản trị mới thực hiện được.") + return uid + + +def _require_db() -> None: + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa sẵn sàng.") + + +# --------------------------------------------------------------------------- # +# Scalar extraction from the proposal content blob (for listing / filtering / overview) +# --------------------------------------------------------------------------- # +def _coerce_int(v: Any) -> Optional[int]: + if v is None or v == "": + return None + try: + return int(float(str(v).strip())) + except (ValueError, TypeError): + return None + + +def _coerce_float(v: Any) -> Optional[float]: + if v is None or v == "": + return None + try: + return float(str(v).strip()) + except (ValueError, TypeError): + return None + + +def _extract_scalars(content: Any) -> dict[str, Any]: + """Pull the queryable scalars out of the (flat, dotted-key) proposal form values.""" + c = content if isinstance(content, dict) else {} + return { + "title": str(c.get("tenDeTai") or "").strip(), + "level": str(c.get("capDeTai") or "").strip(), + "pi_name": str(c.get("chuNhiem.hoTen") or "").strip(), + "period_months": _coerce_int(c.get("thoiGianThucHienThang")), + "budget_total": _coerce_float(c.get("tongKinhPhi")), + } + + +# --------------------------------------------------------------------------- # +# Schemas +# --------------------------------------------------------------------------- # +class ProjectOut(BaseModel): + id: str + ownerUserId: str + status: str + code: Optional[str] = None + title: str = "" + level: str = "" + piName: str = "" + periodMonths: Optional[int] = None + budgetTotal: Optional[float] = None + content: dict[str, Any] = Field(default_factory=dict) + submittedAt: Optional[datetime] = None + reviewedAt: Optional[datetime] = None + reviewNote: Optional[str] = None + createdAt: Optional[datetime] = None + updatedAt: Optional[datetime] = None + + +class ProjectCreateIn(BaseModel): + content: dict[str, Any] = Field(default_factory=dict) + + +class ProjectUpdateIn(BaseModel): + content: dict[str, Any] = Field(default_factory=dict) + + +class ProjectDetailPatchIn(BaseModel): + """Partial administrative-detail patch merged into an approved project's content.""" + patch: dict[str, Any] = Field(default_factory=dict) + + +class ApproveIn(BaseModel): + code: Optional[str] = Field(default=None, max_length=100) + note: Optional[str] = Field(default=None, max_length=2000) + + +class RejectIn(BaseModel): + note: Optional[str] = Field(default=None, max_length=2000) + + +class AuditOut(BaseModel): + id: int + occurredAt: Optional[datetime] = None + actorName: str = "" + roleLabel: str = "" + action: str + subject: str = "" + detail: str = "" + + +def _to_out(row: ResearchProject) -> ProjectOut: + return ProjectOut( + id=str(row.id), + ownerUserId=str(row.owner_user_id), + status=row.status, + code=row.code, + title=row.title or "", + level=row.level or "", + piName=row.pi_name or "", + periodMonths=row.period_months, + budgetTotal=float(row.budget_total) if row.budget_total is not None else None, + content=row.content if isinstance(row.content, dict) else {}, + submittedAt=row.submitted_at, + reviewedAt=row.reviewed_at, + reviewNote=row.review_note, + createdAt=row.created_at, + updatedAt=row.updated_at, + ) + + +# --------------------------------------------------------------------------- # +# Helpers +# --------------------------------------------------------------------------- # +async def _load_project(session, project_id: str, uid: uuid.UUID, is_admin: bool) -> ResearchProject: + """Fetch a project enforcing owner-or-admin read access (404 hides others' rows).""" + try: + pid = uuid.UUID(project_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy đề tài.") + row = ( + await session.execute(select(ResearchProject).where(ResearchProject.id == pid)) + ).scalar_one_or_none() + if row is None or (not is_admin and row.owner_user_id != uid): + raise HTTPException(status_code=404, detail="Không tìm thấy đề tài.") + return row + + +async def _actor_name(session, uid: uuid.UUID) -> str: + u = await session.get(User, uid) + if u is None: + return "" + return (u.full_name or u.email or "").strip() + + +async def _write_audit( + session, + project_id: uuid.UUID, + actor_uid: Optional[uuid.UUID], + actor_name: str, + role_label: str, + action: str, + subject: str = "", + detail: str = "", +) -> None: + session.add( + ResearchProjectAudit( + project_id=project_id, + actor_user_id=actor_uid, + actor_name=actor_name or "", + role_label=role_label or "", + action=action, + subject=subject or "", + detail=detail or "", + ) + ) + + +# --------------------------------------------------------------------------- # +# Endpoints — proposals lifecycle +# --------------------------------------------------------------------------- # +@router.post("/projects", response_model=ProjectOut) +async def create_project( + payload: Optional[ProjectCreateIn] = Body(None), + authorization: Optional[str] = Header(None), +) -> ProjectOut: + """Authed: create a draft proposal owned by the current user.""" + _require_db() + uid = _require_authed_uid(authorization) + content = (payload.content if payload and isinstance(payload.content, dict) else {}) + scalars = _extract_scalars(content) + async with get_session() as session: + row = ResearchProject(id=uuid.uuid4(), owner_user_id=uid, status=_STATUS_DRAFT, content=content, **scalars) + session.add(row) + await session.flush() + await _write_audit( + session, row.id, uid, await _actor_name(session, uid), _ROLE_PI, + "Tạo bản thảo đề tài", scalars["title"] or "(chưa đặt tên)", + ) + await session.commit() + await session.refresh(row) + return _to_out(row) + + +@router.get("/projects", response_model=list[ProjectOut]) +async def list_projects(mine: bool = True, authorization: Optional[str] = Header(None)) -> list[ProjectOut]: + """List projects. Non-admin always sees only their own; admin can pass ?mine=false to see all.""" + _require_db() + uid = _require_authed_uid(authorization) + is_admin = _is_admin(authorization) + async with get_session() as session: + stmt = select(ResearchProject).order_by(ResearchProject.created_at.desc()) + if mine or not is_admin: + stmt = stmt.where(ResearchProject.owner_user_id == uid) + rows = (await session.execute(stmt)).scalars().all() + return [_to_out(r) for r in rows] + + +@router.get("/projects/{project_id}", response_model=ProjectOut) +async def get_project(project_id: str, authorization: Optional[str] = Header(None)) -> ProjectOut: + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + return _to_out(await _load_project(session, project_id, uid, _is_admin(authorization))) + + +@router.put("/projects/{project_id}", response_model=ProjectOut) +async def update_project( + project_id: str, + payload: ProjectUpdateIn = Body(...), + authorization: Optional[str] = Header(None), +) -> ProjectOut: + """Owner: replace the draft proposal content. Allowed only while status=draft.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + row = await _load_project(session, project_id, uid, _is_admin(authorization)) + if row.owner_user_id != uid: + raise HTTPException(status_code=403, detail="Chỉ chủ nhiệm mới sửa được bản thảo.") + if row.status != _STATUS_DRAFT: + raise HTTPException(status_code=409, detail="Chỉ sửa được khi đề tài ở trạng thái bản thảo.") + content = payload.content if isinstance(payload.content, dict) else {} + row.content = content + for k, v in _extract_scalars(content).items(): + setattr(row, k, v) + row.updated_at = datetime.now(tz=timezone.utc) + await session.commit() + await session.refresh(row) + return _to_out(row) + + +@router.put("/projects/{project_id}/detail", response_model=ProjectOut) +async def update_project_detail( + project_id: str, + payload: ProjectDetailPatchIn = Body(...), + authorization: Optional[str] = Header(None), +) -> ProjectOut: + """Owner-or-admin: shallow-merge administrative-detail fields into an approved + project's content JSONB. Unlike ``update_project`` (draft-only, wholesale-replace), + this serves the cockpit: it is allowed once the project is approved and MERGES the + patch into existing content so the original proposal keys survive. Writes an audit row.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + row = await _load_project(session, project_id, uid, _is_admin(authorization)) + _require_approved(row) + patch = payload.patch if isinstance(payload.patch, dict) else {} + merged = {**(row.content if isinstance(row.content, dict) else {}), **patch} + row.content = merged + for k, v in _extract_scalars(merged).items(): + setattr(row, k, v) + row.updated_at = datetime.now(tz=timezone.utc) + await _write_audit( + session, row.id, uid, await _actor_name(session, uid), + _role_label(authorization), "Cập nhật thông tin đề tài", "", + f"{len(patch)} trường", + ) + await session.commit() + await session.refresh(row) + return _to_out(row) + + +@router.post("/projects/{project_id}/submit", response_model=ProjectOut) +async def submit_project(project_id: str, authorization: Optional[str] = Header(None)) -> ProjectOut: + """Owner: submit a draft for review (draft → submitted).""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + row = await _load_project(session, project_id, uid, _is_admin(authorization)) + if row.owner_user_id != uid: + raise HTTPException(status_code=403, detail="Chỉ chủ nhiệm mới nộp được đề tài.") + if row.status != _STATUS_DRAFT: + raise HTTPException(status_code=409, detail="Đề tài đã được nộp hoặc đã xử lý.") + if not (row.title or "").strip(): + raise HTTPException(status_code=422, detail="Cần nhập tên đề tài trước khi nộp.") + row.status = _STATUS_SUBMITTED + row.submitted_at = datetime.now(tz=timezone.utc) + row.updated_at = row.submitted_at + await _write_audit( + session, row.id, uid, await _actor_name(session, uid), _ROLE_PI, "Nộp đề tài", row.title, + ) + await session.commit() + await session.refresh(row) + return _to_out(row) + + +@router.post("/projects/{project_id}/approve", response_model=ProjectOut) +async def approve_project( + project_id: str, + payload: Optional[ApproveIn] = Body(None), + authorization: Optional[str] = Header(None), +) -> ProjectOut: + """Admin: approve a submitted proposal (submitted → approved); optionally assign its code.""" + _require_db() + admin_uid = _require_admin_uid(authorization) + async with get_session() as session: + row = await _load_project(session, project_id, admin_uid, True) + if row.status != _STATUS_SUBMITTED: + raise HTTPException(status_code=409, detail="Chỉ duyệt được đề tài đang chờ duyệt.") + code = (payload.code if payload else None) or "" + note = (payload.note if payload else None) or "" + if code.strip(): + row.code = code.strip() + row.status = _STATUS_APPROVED + row.reviewed_by = admin_uid + row.reviewed_at = datetime.now(tz=timezone.utc) + row.review_note = note.strip() or None + row.updated_at = row.reviewed_at + await _seed_cockpit_from_proposal(session, row) + await _write_audit( + session, row.id, admin_uid, await _actor_name(session, admin_uid), _ROLE_ADMIN, + "Phê duyệt đề tài", row.code or row.title, row.review_note or "", + ) + await session.commit() + await session.refresh(row) + return _to_out(row) + + +@router.post("/projects/{project_id}/reject", response_model=ProjectOut) +async def reject_project( + project_id: str, + payload: Optional[RejectIn] = Body(None), + authorization: Optional[str] = Header(None), +) -> ProjectOut: + """Admin: reject a submitted proposal (submitted → rejected) with an optional note.""" + _require_db() + admin_uid = _require_admin_uid(authorization) + async with get_session() as session: + row = await _load_project(session, project_id, admin_uid, True) + if row.status != _STATUS_SUBMITTED: + raise HTTPException(status_code=409, detail="Chỉ từ chối được đề tài đang chờ duyệt.") + note = (payload.note if payload else None) or "" + row.status = _STATUS_REJECTED + row.reviewed_by = admin_uid + row.reviewed_at = datetime.now(tz=timezone.utc) + row.review_note = note.strip() or None + row.updated_at = row.reviewed_at + await _write_audit( + session, row.id, admin_uid, await _actor_name(session, admin_uid), _ROLE_ADMIN, + "Từ chối đề tài", row.title, row.review_note or "", + ) + await session.commit() + await session.refresh(row) + return _to_out(row) + + +@router.delete("/projects/{project_id}") +async def delete_project(project_id: str, authorization: Optional[str] = Header(None)) -> dict[str, Any]: + """Owner may delete a draft; admin may delete any. Children + audit cascade in the DB.""" + _require_db() + uid = _require_authed_uid(authorization) + is_admin = _is_admin(authorization) + async with get_session() as session: + row = await _load_project(session, project_id, uid, is_admin) + if not is_admin: + if row.owner_user_id != uid: + raise HTTPException(status_code=404, detail="Không tìm thấy đề tài.") + if row.status != _STATUS_DRAFT: + raise HTTPException(status_code=409, detail="Chỉ xóa được bản thảo; đề tài đã nộp cần liên hệ quản trị.") + await session.delete(row) + await session.commit() + return {"ok": True} + + +@router.get("/projects/{project_id}/audit", response_model=list[AuditOut]) +async def list_audit(project_id: str, authorization: Optional[str] = Header(None)) -> list[AuditOut]: + """Owner or admin: the append-only audit trail for a project, newest first.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + row = await _load_project(session, project_id, uid, _is_admin(authorization)) + rows = ( + await session.execute( + select(ResearchProjectAudit) + .where(ResearchProjectAudit.project_id == row.id) + .order_by(ResearchProjectAudit.occurred_at.desc(), ResearchProjectAudit.id.desc()) + ) + ).scalars().all() + return [ + AuditOut( + id=r.id, + occurredAt=r.occurred_at, + actorName=r.actor_name, + roleLabel=r.role_label, + action=r.action, + subject=r.subject, + detail=r.detail, + ) + for r in rows + ] + + +# --------------------------------------------------------------------------- # +# Cockpit entities — generic, config-driven CRUD (phase 2) +# The 5 child entity types share one CRUD surface keyed by an entity config. +# Mutations require the project to be approved (the cockpit "unlocks" on approval) and +# are allowed for the owner OR an admin; every mutation writes an audit row. +# --------------------------------------------------------------------------- # +_TEXT, _INT, _NUM = "text", "int", "num" + +_ENTITY_CONFIG: dict[str, dict[str, Any]] = { + "members": { + "model": ResearchProjectMember, + "singular": "thành viên", + "primary": "name", + "fields": [ + ("name", "name", _TEXT), + ("role", "role", _TEXT), + ("access", "access", _TEXT), + ("org", "org", _TEXT), + ("email", "email", _TEXT), + ("months", "months", _INT), + ("tasks", "tasks", _TEXT), + ("status", "status", _TEXT), + ], + }, + "datasets": { + "model": ResearchProjectDataset, + "singular": "bộ dữ liệu", + "primary": "name", + "fields": [ + ("name", "name", _TEXT), + ("type", "type", _TEXT), + ("records", "records", _INT), + ("source", "source", _TEXT), + ("sensitivity", "sensitivity", _TEXT), + ("ethics", "ethics", _TEXT), + ("owner", "owner", _TEXT), + ("status", "status", _TEXT), + ], + }, + "models": { + "model": ResearchProjectModel, + "singular": "mô hình", + "primary": "name", + "fields": [ + ("name", "name", _TEXT), + ("task", "task", _TEXT), + ("framework", "framework", _TEXT), + ("version", "version", _TEXT), + ("dataset", "dataset", _TEXT), + ("auc", "auc", _NUM), + ("sensitivity", "sensitivity", _NUM), + ("specificity", "specificity", _NUM), + ("accuracy", "accuracy", _NUM), + ("owner", "owner", _TEXT), + ("notes", "notes", _TEXT), + ("status", "status", _TEXT), + ], + }, + "assets": { + "model": ResearchProjectAsset, + "singular": "tài sản", + "primary": "name", + "fields": [ + ("name", "name", _TEXT), + ("category", "category", _TEXT), + ("acquisition", "acquisition", _TEXT), + ("value", "value", _NUM), + ("owner", "owner", _TEXT), + ("notes", "notes", _TEXT), + ("status", "status", _TEXT), + ], + }, + "milestones": { + "model": ResearchProjectMilestone, + "singular": "mốc tiến độ", + "primary": "title", + "fields": [ + ("title", "title", _TEXT), + ("deliverable", "deliverable", _TEXT), + ("start", "start_period", _TEXT), + ("end", "end_period", _TEXT), + ("owner", "owner", _TEXT), + ("budget", "budget", _NUM), + ("progress", "progress", _INT), + ("status", "status", _TEXT), + ], + }, +} + + +def _coerce_value(kind: str, v: Any) -> Any: + if kind == _INT: + return _coerce_int(v) + if kind == _NUM: + return _coerce_float(v) + return "" if v is None else str(v) + + +def _apply_fields(row: Any, cfg: dict[str, Any], data: dict[str, Any]) -> None: + """Whitelist-copy known fields from a client dict into an ORM row (prevents column injection).""" + for json_key, column, kind in cfg["fields"]: + if json_key in data: + setattr(row, column, _coerce_value(kind, data[json_key])) + + +def _entity_to_out(cfg: dict[str, Any], row: Any) -> dict[str, Any]: + out: dict[str, Any] = {"id": str(row.id), "sortOrder": row.sort_order} + for json_key, column, kind in cfg["fields"]: + val = getattr(row, column) + out[json_key] = float(val) if (kind == _NUM and val is not None) else val + return out + + +def _entity_cfg_or_404(entity: str) -> dict[str, Any]: + cfg = _ENTITY_CONFIG.get(entity) + if cfg is None: + raise HTTPException(status_code=404, detail="Không tìm thấy loại dữ liệu.") + return cfg + + +def _role_label(authorization: str | None) -> str: + return _ROLE_ADMIN if _is_admin(authorization) else _ROLE_PI + + +def _require_approved(project: ResearchProject) -> None: + if project.status != _STATUS_APPROVED: + raise HTTPException(status_code=409, detail="Chỉ quản lý được dữ liệu sau khi đề tài được phê duyệt.") + + +async def _entity_list(session, cfg: dict[str, Any], project_id: uuid.UUID) -> list[dict[str, Any]]: + model = cfg["model"] + rows = ( + await session.execute( + select(model).where(model.project_id == project_id).order_by(model.sort_order, model.created_at) + ) + ).scalars().all() + return [_entity_to_out(cfg, r) for r in rows] + + +async def _load_entity_or_404(session, cfg: dict[str, Any], project_id: uuid.UUID, item_id: str): + model = cfg["model"] + try: + iid = uuid.UUID(item_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy mục.") + row = ( + await session.execute(select(model).where(model.id == iid, model.project_id == project_id)) + ).scalar_one_or_none() + if row is None: + raise HTTPException(status_code=404, detail="Không tìm thấy mục.") + return row + + +# --- seeding the cockpit from the proposal content (best-effort, on approval) --- +def _seed_members_from_content(content: Any) -> list[dict[str, Any]]: + c = content if isinstance(content, dict) else {} + out: list[dict[str, Any]] = [] + pi_name = str(c.get("chuNhiem.hoTen") or "").strip() + if pi_name: + out.append( + { + "name": pi_name, + "role": "Chủ nhiệm đề tài", + "access": _ROLE_PI, + "org": str(c.get("chuNhiem.tenToChuc") or ""), + "email": str(c.get("chuNhiem.email") or ""), + "status": "Đang hoạt động", + } + ) + sec_name = str(c.get("thuKy.hoTen") or "").strip() + if sec_name: + out.append( + { + "name": sec_name, + "role": "Thư ký khoa học", + "access": "Điều phối / Thư ký", + "org": str(c.get("thuKy.tenToChuc") or ""), + "email": str(c.get("thuKy.email") or ""), + "status": "Đang hoạt động", + } + ) + members_raw = c.get("thanhVienThucHien") + for m in members_raw if isinstance(members_raw, list) else []: + if isinstance(m, dict) and str(m.get("hoTenHocVi") or "").strip(): + out.append( + { + "name": str(m.get("hoTenHocVi")), + "role": str(m.get("chucDanh") or "Thành viên chính"), + "org": str(m.get("toChucCongTac") or ""), + "status": "Đang hoạt động", + } + ) + return out + + +def _seed_milestones_from_content(content: Any) -> list[dict[str, Any]]: + c = content if isinstance(content, dict) else {} + out: list[dict[str, Any]] = [] + timeline_raw = c.get("tienDoThucHien") + for t in timeline_raw if isinstance(timeline_raw, list) else []: + if isinstance(t, dict) and str(t.get("noiDungCongViec") or "").strip(): + out.append( + { + "title": str(t.get("noiDungCongViec")), + "deliverable": str(t.get("ketQua") or ""), + "start": str(t.get("thoiGian") or ""), + "owner": str(t.get("caNhanToChuc") or ""), + "budget": t.get("kinhPhi"), + "progress": 0, + "status": "Chưa bắt đầu", + } + ) + return out + + +async def _seed_cockpit_from_proposal(session, project: ResearchProject) -> None: + """On approval, populate members + milestones from the proposal (only when none exist yet).""" + existing = ( + await session.execute( + select(func.count()) + .select_from(ResearchProjectMember) + .where(ResearchProjectMember.project_id == project.id) + ) + ).scalar_one() + if existing: + return + for i, m in enumerate(_seed_members_from_content(project.content)): + row = ResearchProjectMember(id=uuid.uuid4(), project_id=project.id, sort_order=i) + _apply_fields(row, _ENTITY_CONFIG["members"], m) + session.add(row) + for i, t in enumerate(_seed_milestones_from_content(project.content)): + row = ResearchProjectMilestone(id=uuid.uuid4(), project_id=project.id, sort_order=i) + _apply_fields(row, _ENTITY_CONFIG["milestones"], t) + session.add(row) + + +# --- entity endpoints --- +@router.get("/projects/{project_id}/entities/{entity}") +async def list_entity( + project_id: str, entity: str, authorization: Optional[str] = Header(None) +) -> list[dict[str, Any]]: + _require_db() + uid = _require_authed_uid(authorization) + cfg = _entity_cfg_or_404(entity) + async with get_session() as session: + project = await _load_project(session, project_id, uid, _is_admin(authorization)) + return await _entity_list(session, cfg, project.id) + + +@router.post("/projects/{project_id}/entities/{entity}") +async def create_entity( + project_id: str, + entity: str, + payload: Optional[dict[str, Any]] = Body(None), + authorization: Optional[str] = Header(None), +) -> dict[str, Any]: + _require_db() + uid = _require_authed_uid(authorization) + cfg = _entity_cfg_or_404(entity) + data = payload if isinstance(payload, dict) else {} + async with get_session() as session: + project = await _load_project(session, project_id, uid, _is_admin(authorization)) + _require_approved(project) + count = ( + await session.execute( + select(func.count()).select_from(cfg["model"]).where(cfg["model"].project_id == project.id) + ) + ).scalar_one() + row = cfg["model"](id=uuid.uuid4(), project_id=project.id, sort_order=int(count or 0)) + _apply_fields(row, cfg, data) + session.add(row) + await session.flush() + subject = str(getattr(row, cfg["primary"]) or "") + await _write_audit( + session, project.id, uid, await _actor_name(session, uid), + _role_label(authorization), f"Thêm {cfg['singular']}", subject, + ) + await session.commit() + await session.refresh(row) + return _entity_to_out(cfg, row) + + +@router.put("/projects/{project_id}/entities/{entity}/{item_id}") +async def update_entity( + project_id: str, + entity: str, + item_id: str, + payload: dict[str, Any] = Body(...), + authorization: Optional[str] = Header(None), +) -> dict[str, Any]: + _require_db() + uid = _require_authed_uid(authorization) + cfg = _entity_cfg_or_404(entity) + async with get_session() as session: + project = await _load_project(session, project_id, uid, _is_admin(authorization)) + _require_approved(project) + row = await _load_entity_or_404(session, cfg, project.id, item_id) + _apply_fields(row, cfg, payload if isinstance(payload, dict) else {}) + row.updated_at = datetime.now(tz=timezone.utc) + await session.flush() + subject = str(getattr(row, cfg["primary"]) or "") + await _write_audit( + session, project.id, uid, await _actor_name(session, uid), + _role_label(authorization), f"Cập nhật {cfg['singular']}", subject, + ) + await session.commit() + await session.refresh(row) + return _entity_to_out(cfg, row) + + +@router.delete("/projects/{project_id}/entities/{entity}/{item_id}") +async def delete_entity( + project_id: str, + entity: str, + item_id: str, + authorization: Optional[str] = Header(None), +) -> dict[str, Any]: + _require_db() + uid = _require_authed_uid(authorization) + cfg = _entity_cfg_or_404(entity) + async with get_session() as session: + project = await _load_project(session, project_id, uid, _is_admin(authorization)) + _require_approved(project) + row = await _load_entity_or_404(session, cfg, project.id, item_id) + subject = str(getattr(row, cfg["primary"]) or "") + await session.delete(row) + await _write_audit( + session, project.id, uid, await _actor_name(session, uid), + _role_label(authorization), f"Xóa {cfg['singular']}", subject, + ) + await session.commit() + return {"ok": True} + + +@router.get("/projects/{project_id}/cockpit") +async def get_cockpit(project_id: str, authorization: Optional[str] = Header(None)) -> dict[str, Any]: + """Owner or admin: the whole cockpit in one shot — project + 5 entity lists + recent audit.""" + _require_db() + uid = _require_authed_uid(authorization) + async with get_session() as session: + project = await _load_project(session, project_id, uid, _is_admin(authorization)) + bundle: dict[str, Any] = {"project": _to_out(project).model_dump(mode="json")} + for name, cfg in _ENTITY_CONFIG.items(): + bundle[name] = await _entity_list(session, cfg, project.id) + audit_rows = ( + await session.execute( + select(ResearchProjectAudit) + .where(ResearchProjectAudit.project_id == project.id) + .order_by(ResearchProjectAudit.occurred_at.desc(), ResearchProjectAudit.id.desc()) + .limit(200) + ) + ).scalars().all() + bundle["audit"] = [ + AuditOut( + id=r.id, occurredAt=r.occurred_at, actorName=r.actor_name, roleLabel=r.role_label, + action=r.action, subject=r.subject, detail=r.detail, + ).model_dump(mode="json") + for r in audit_rows + ] + return bundle diff --git a/be0/src/shared_kernel/__init__.py b/be0/src/shared_kernel/__init__.py new file mode 100644 index 0000000..717e9ab --- /dev/null +++ b/be0/src/shared_kernel/__init__.py @@ -0,0 +1,5 @@ +"""Shared kernel — framework-free building blocks reused across bounded contexts. + +Nothing here may import FastAPI, SQLAlchemy, or any adapter. Inner layers +(`domain`, `application`) depend on this; it depends on nothing in the project. +""" diff --git a/be0/src/shared_kernel/entity.py b/be0/src/shared_kernel/entity.py new file mode 100644 index 0000000..4caf73b --- /dev/null +++ b/be0/src/shared_kernel/entity.py @@ -0,0 +1,21 @@ +"""Entity / AggregateRoot bases — identity-based equality (not attribute-based).""" + +from __future__ import annotations + +from typing import Any + + +class Entity: + """A domain entity: equality + hash by its ``id``, regardless of other fields.""" + + id: Any + + def __eq__(self, other: object) -> bool: + return isinstance(other, type(self)) and getattr(other, "id", None) == self.id + + def __hash__(self) -> int: + return hash((type(self).__name__, self.id)) + + +class AggregateRoot(Entity): + """The consistency boundary a repository loads and persists as a whole.""" diff --git a/be0/src/shared_kernel/errors.py b/be0/src/shared_kernel/errors.py new file mode 100644 index 0000000..0b3314e --- /dev/null +++ b/be0/src/shared_kernel/errors.py @@ -0,0 +1,40 @@ +"""Domain error hierarchy. The API layer is the ONLY place these map to HTTP status. + +Each carries a user-safe, Vietnamese-ready ``message``. Raise these from the +domain/application layers instead of ``fastapi.HTTPException`` so inner layers stay +framework-free. +""" + +from __future__ import annotations + + +class DomainError(Exception): + """Base for all domain-rule violations.""" + + def __init__(self, message: str) -> None: + super().__init__(message) + self.message = message + + +class ValidationError(DomainError): + """Input violated a domain rule. → HTTP 400.""" + + +class AuthenticationError(DomainError): + """Identity could not be authenticated. → HTTP 401.""" + + +class AuthorizationError(DomainError): + """Authenticated but not permitted / not in the required state. → HTTP 403.""" + + +class RateLimited(DomainError): + """Too many attempts in the rate-limit window. → HTTP 429.""" + + +class NotFoundError(DomainError): + """A required aggregate does not exist. → HTTP 404.""" + + +class ConflictError(DomainError): + """State conflict, e.g. a uniqueness violation. → HTTP 409.""" diff --git a/be0/src/shared_kernel/value_object.py b/be0/src/shared_kernel/value_object.py new file mode 100644 index 0000000..0373c1a --- /dev/null +++ b/be0/src/shared_kernel/value_object.py @@ -0,0 +1,15 @@ +"""Value-object base — immutable, compared by value. + +Subclass with ``@dataclass(frozen=True)`` and add fields; equality/hash come from +the dataclass. Value objects validate their own invariants in ``__post_init__`` or a +``parse`` classmethod and are never mutated after construction. +""" + +from __future__ import annotations + +from dataclasses import dataclass + + +@dataclass(frozen=True) +class ValueObject: + """Marker base for immutable, value-compared objects.""" diff --git a/be0/src/staff_profile_domain.py b/be0/src/staff_profile_domain.py new file mode 100644 index 0000000..5127c10 --- /dev/null +++ b/be0/src/staff_profile_domain.py @@ -0,0 +1,123 @@ +"""Shared validation + DTO helpers for user staff profiles (no FastAPI imports).""" + +from __future__ import annotations + +import re +import uuid +from datetime import datetime, timezone +from typing import Any, Mapping, Optional + +from src.initiative_db.models import User, UserStaffProfile + +EMPLOYEE_ID_PATTERN = re.compile(r"^[A-Z0-9-]{3,32}$") + +STAFF_PROFILE_AUDIT_KEYS: frozenset[str] = frozenset( + { + "employee_id", + "academic_title_code", + "academic_title_other", + "unit_id", + "unit_name_freetext", + "job_title", + "profile_verification_status", + "verification_submitted_at", + "verified_at", + "verified_by_user_id", + "rejection_reason", + "version", + } +) + + +def normalize_employee_id(raw: str | None) -> str | None: + if raw is None: + return None + s = str(raw).strip().upper() + return s or None + + +def assert_employee_id_shape(emp: str | None) -> None: + if emp is None: + return + if not EMPLOYEE_ID_PATTERN.match(emp): + raise ValueError("Mã nhân sự không hợp lệ (3–32 ký tự A-Z, 0-9, gạch ngang).") + + +def assert_unit_exclusive(user: User, sp: UserStaffProfile) -> None: + if user.unit_id is not None and sp.unit_name_freetext: + t = sp.unit_name_freetext.strip() + if t: + raise ValueError("Chọn đơn vị trong danh mục hoặc nhập tên tự do — không dùng cùng lúc.") + + +def staff_row_for_audit(sp: UserStaffProfile, user_unit_id: Optional[uuid.UUID]) -> dict[str, Any]: + """Whitelist snapshot for audit_events.before/after (JSONB).""" + out: dict[str, Any] = { + "employee_id": sp.employee_id, + "academic_title_code": sp.academic_title_code, + "academic_title_other": sp.academic_title_other, + "unit_id": str(user_unit_id) if user_unit_id else None, + "unit_name_freetext": sp.unit_name_freetext, + "job_title": sp.job_title, + "profile_verification_status": sp.profile_verification_status, + "verification_submitted_at": _iso(sp.verification_submitted_at), + "verified_at": _iso(sp.verified_at), + "verified_by_user_id": str(sp.verified_by_user_id) if sp.verified_by_user_id else None, + "rejection_reason": sp.rejection_reason, + "version": sp.version, + } + return {k: v for k, v in out.items() if k in STAFF_PROFILE_AUDIT_KEYS} + + +def _iso(dt: datetime | None) -> str | None: + if dt is None: + return None + if dt.tzinfo is None: + return dt.replace(tzinfo=timezone.utc).isoformat() + return dt.isoformat() + + +def material_staff_fields_changed( + before: Mapping[str, Any], + after: Mapping[str, Any], +) -> bool: + keys = ( + "employee_id", + "academic_title_code", + "academic_title_other", + "unit_id", + "unit_name_freetext", + "job_title", + ) + return any(before.get(k) != after.get(k) for k in keys) + + +def apply_reverify_from_verified(sp: UserStaffProfile, now: datetime) -> None: + """Strict policy: verified profile returns to pending when institutional fields change.""" + sp.profile_verification_status = "pending" + sp.verification_submitted_at = now + sp.verified_at = None + sp.verified_by_user_id = None + sp.rejection_reason = None + + +def assert_complete_for_submission(user: User, sp: UserStaffProfile) -> None: + emp = normalize_employee_id(sp.employee_id) + if not emp: + raise ValueError("Cần mã số nhân sự trước khi gửi xác minh.") + assert_employee_id_shape(emp) + + if not sp.academic_title_code: + raise ValueError("Chọn học hàm / học vị.") + if sp.academic_title_code == "other": + if not sp.academic_title_other or not str(sp.academic_title_other).strip(): + raise ValueError("Nhập nội dung khi chọn «Khác».") + + has_unit = user.unit_id is not None or ( + sp.unit_name_freetext and len(sp.unit_name_freetext.strip()) > 0 + ) + if not has_unit: + raise ValueError("Chọn đơn vị công tác hoặc nhập tên đơn vị.") + + if not sp.job_title or not str(sp.job_title).strip(): + raise ValueError("Nhập chức vụ công tác.") diff --git a/be0/src/structure_analysis.py b/be0/src/structure_analysis.py new file mode 100644 index 0000000..78ed6b2 --- /dev/null +++ b/be0/src/structure_analysis.py @@ -0,0 +1,57 @@ +import os + +import nltk +from rake_nltk import Rake + +class StructureAnalyzer(object): + def __init__(self, config=None): + self.config = config + self.keywords = [] + self.extractor = Rake(min_length=1, max_length=3) + + + def extract_keywords(self, text): + + #change min/max length of the keywords. + self.extractor.extract_keywords_from_text(text) + keywords = self.extractor.get_ranked_phrases() + + # Optional: POS tagging to keep only nouns and verbs + # pos_tags = nltk.pos_tag(keywords) + # keywords = [word for word, pos in pos_tags if pos.startswith('NN') or pos.startswith('VB')] + + return keywords + + def extract_keywords_combined(text, top_n=10): + """ + Combined approach: Use POS tagging for filtering and frequency for ranking + This is often the most effective method + """ + try: + lemmatizer = WordNetLemmatizer() + + # Tokenize + tokens = word_tokenize(text.lower()) + + # POS tagging + pos_tags = pos_tag(tokens) + + # Filter for important POS tags and lemmatize + stop_words = set(stopwords.words('english')) + keywords = [ + lemmatizer.lemmatize(word) for word, pos in pos_tags + if (pos.startswith('NN') or pos.startswith('JJ') or pos.startswith('VB')) + and word.isalnum() + and word not in stop_words + and len(word) > 2 # Filter out very short words + ] + + # Count frequency + keyword_freq = Counter(keywords) + + self.keywords =keyword_freq.most_common(top_n) + + return keyword_freq.most_common(top_n) + except Exception as e: + print(f"Error in combined extraction: {e}") + return [] \ No newline at end of file diff --git a/be0/src/template_routes.py b/be0/src/template_routes.py new file mode 100644 index 0000000..a94ab24 --- /dev/null +++ b/be0/src/template_routes.py @@ -0,0 +1,402 @@ +"""Admin-managed document templates: upload a .docx, extract its {{placeholders}}, and let +applicants render a filled DOCX/PDF by template id. + +- Storage: MinIO bucket ``s3_bucket_templates`` (the .docx file). +- Fields: Jinja placeholder names extracted from the .docx, persisted as JSONB on the row. +- Render: docxtpl fills the template with submitted values; LibreOffice converts to PDF. + +Mounted under ``/api/v1`` in main.py → routes live at ``/api/v1/templates``. +""" +from __future__ import annotations + +import io +import re +import uuid +import zipfile +from datetime import datetime, timezone +from typing import Any, Optional + +from fastapi import APIRouter, Body, File, Form, Header, HTTPException, Response, UploadFile +from pydantic import BaseModel, Field +from sqlalchemy import select + +from src.auth_jwt import decode_access_token_user_id, decode_bearer_token +from src.initiative_db.engine import get_session, is_postgres_enabled +from src.initiative_db.models import DocumentTemplate + +router = APIRouter(prefix="/templates", tags=["templates"]) + +_DOCX_MIME = "application/vnd.openxmlformats-officedocument.wordprocessingml.document" + + +# --------------------------------------------------------------------------- # +# Auth (mirrors the extracted admin routers) +# --------------------------------------------------------------------------- # +def _jwt_roles(authorization: str | None) -> list[str]: + p = decode_bearer_token(authorization) + if not p: + return [] + r = p.get("roles") + return [str(x) for x in r] if isinstance(r, list) else [] + + +def _require_authed_uid(authorization: str | None) -> uuid.UUID: + uid = decode_access_token_user_id(authorization) + if uid is None: + raise HTTPException(status_code=401, detail="Đăng nhập để thực hiện thao tác.") + return uid + + +def _require_admin_uid(authorization: str | None) -> uuid.UUID: + uid = _require_authed_uid(authorization) + if "admin" not in _jwt_roles(authorization): + raise HTTPException(status_code=403, detail="Chỉ tài khoản quản trị mới thực hiện được.") + return uid + + +def _require_db() -> None: + if not is_postgres_enabled(): + raise HTTPException(status_code=503, detail="Cơ sở dữ liệu chưa sẵn sàng.") + + +# --------------------------------------------------------------------------- # +# Storage (lazy — MinIO config may be absent in some environments) +# --------------------------------------------------------------------------- # +def _templates_storage(): + """Return (storage, bucket). Raises 503 if MinIO/S3 is not configured.""" + try: + from src.minio.storage import S3Storage, settings as s3_settings + + bucket = getattr(s3_settings, "s3_bucket_templates", None) or "initiative-templates" + return S3Storage(), bucket + except Exception as exc: # noqa: BLE001 — surface a clean 503 + raise HTTPException(status_code=503, detail=f"Lưu trữ tệp chưa sẵn sàng: {exc}") from exc + + +# --------------------------------------------------------------------------- # +# Placeholder extraction +# --------------------------------------------------------------------------- # +_VAR_RE = re.compile(r"\{\{\s*([a-zA-Z_][\w]*)") + + +def _humanize(key: str) -> str: + words = re.sub(r"[_.]+", " ", key).split() + return " ".join(w[:1].upper() + w[1:] for w in words) if words else key + + +def _regex_extract(docx_bytes: bytes) -> set[str]: + """Fallback: strip XML tags (placeholders may split across runs) then scan for {{ var }}.""" + found: set[str] = set() + try: + with zipfile.ZipFile(io.BytesIO(docx_bytes)) as z: + for name in z.namelist(): + if not name.endswith(".xml"): + continue + try: + flat = re.sub(r"<[^>]+>", "", z.read(name).decode("utf-8", "ignore")) + except Exception: # noqa: BLE001 + continue + for m in _VAR_RE.finditer(flat): + found.add(m.group(1)) + except (zipfile.BadZipFile, OSError): + pass + return found + + +def _extract_fields(docx_bytes: bytes) -> list[dict[str, str]]: + keys: set[str] = set() + try: + from docxtpl import DocxTemplate + + tpl = DocxTemplate(io.BytesIO(docx_bytes)) + keys = {str(v) for v in tpl.get_undeclared_template_variables()} + except Exception: # noqa: BLE001 — jinja control tags can trip the parser; fall back + keys = set() + if not keys: + keys = _regex_extract(docx_bytes) + roots = sorted({k.split(".")[0].split("[")[0].strip() for k in keys if k and k.strip()}) + return [{"key": k, "label": _humanize(k), "type": "text"} for k in roots] + + +def _safe_filename(name: str) -> str: + base = (name or "template.docx").strip().replace("\\", "/").split("/")[-1] + base = re.sub(r"[^A-Za-z0-9._-]+", "_", base) or "template.docx" + return base if base.lower().endswith(".docx") else f"{base}.docx" + + +# --------------------------------------------------------------------------- # +# Schemas +# --------------------------------------------------------------------------- # +class TemplateField(BaseModel): + key: str + label: str + type: str = "text" + + +class TemplateOut(BaseModel): + id: str + name: str + description: Optional[str] = None + fields: list[TemplateField] + originalFilename: Optional[str] = None + isActive: bool + createdAt: Optional[datetime] = None + updatedAt: Optional[datetime] = None + + +class TemplateUpdateIn(BaseModel): + name: Optional[str] = Field(default=None, max_length=300) + description: Optional[str] = None + isActive: Optional[bool] = None + + +def _to_out(row: DocumentTemplate) -> TemplateOut: + raw_fields = row.fields if isinstance(row.fields, list) else [] + return TemplateOut( + id=str(row.id), + name=row.name, + description=row.description, + fields=[TemplateField(**f) for f in raw_fields if isinstance(f, dict) and f.get("key")], + originalFilename=row.original_filename, + isActive=bool(row.is_active), + createdAt=row.created_at, + updatedAt=row.updated_at, + ) + + +# --------------------------------------------------------------------------- # +# Endpoints +# --------------------------------------------------------------------------- # +@router.post("", response_model=TemplateOut) +async def create_template( + name: str = Form(..., max_length=300), + description: str = Form(""), + file: UploadFile = File(...), + authorization: Optional[str] = Header(None), +) -> TemplateOut: + """Admin: upload a .docx template; extract its placeholder fields; store in MinIO.""" + _require_db() + admin_uid = _require_admin_uid(authorization) + if not (name or "").strip(): + raise HTTPException(status_code=422, detail="Tên mẫu không được để trống.") + + data = await file.read() + if len(data) < 100: + raise HTTPException(status_code=422, detail="Tệp .docx rỗng hoặc quá nhỏ.") + ctype = (file.content_type or "").strip() + fname = file.filename or "template.docx" + if ctype != _DOCX_MIME and not fname.lower().endswith(".docx"): + raise HTTPException(status_code=422, detail="Chỉ chấp nhận tệp .docx (Word).") + + fields = _extract_fields(data) + template_id = uuid.uuid4() + safe = _safe_filename(fname) + key = f"{template_id}/{safe}" + + storage, bucket = _templates_storage() + try: + result = await storage.upload(bucket, key, io.BytesIO(data), _DOCX_MIME) + except ValueError as exc: + raise HTTPException(status_code=422, detail=str(exc)) from exc + except Exception as exc: # noqa: BLE001 + raise HTTPException(status_code=502, detail=f"Tải tệp lên thất bại: {exc}") from exc + + async with get_session() as session: + row = DocumentTemplate( + id=template_id, + name=name.strip(), + description=(description or "").strip() or None, + storage_key=key, + original_filename=safe, + content_sha256=result.get("sha256"), + fields=fields, + is_active=True, + created_by=admin_uid, + ) + session.add(row) + await session.commit() + await session.refresh(row) + return _to_out(row) + + +@router.get("", response_model=list[TemplateOut]) +async def list_templates(authorization: Optional[str] = Header(None)) -> list[TemplateOut]: + """Authed: list templates. Non-admin sees active only; admin sees all.""" + _require_db() + _require_authed_uid(authorization) + is_admin = "admin" in _jwt_roles(authorization) + async with get_session() as session: + stmt = select(DocumentTemplate).order_by(DocumentTemplate.created_at.desc()) + if not is_admin: + stmt = stmt.where(DocumentTemplate.is_active.is_(True)) + rows = (await session.execute(stmt)).scalars().all() + return [_to_out(r) for r in rows] + + +async def _get_row_or_404(session, template_id: str, *, is_admin: bool) -> DocumentTemplate: + try: + tid = uuid.UUID(template_id) + except (ValueError, TypeError): + raise HTTPException(status_code=404, detail="Không tìm thấy mẫu.") + row = ( + await session.execute(select(DocumentTemplate).where(DocumentTemplate.id == tid)) + ).scalar_one_or_none() + if row is None or (not is_admin and not row.is_active): + raise HTTPException(status_code=404, detail="Không tìm thấy mẫu.") + return row + + +@router.get("/{template_id}", response_model=TemplateOut) +async def get_template(template_id: str, authorization: Optional[str] = Header(None)) -> TemplateOut: + _require_db() + _require_authed_uid(authorization) + is_admin = "admin" in _jwt_roles(authorization) + async with get_session() as session: + return _to_out(await _get_row_or_404(session, template_id, is_admin=is_admin)) + + +@router.get("/{template_id}/file") +async def download_template_file(template_id: str, authorization: Optional[str] = Header(None)) -> Response: + """Admin: download the raw .docx (for preview/editing in the admin UI).""" + _require_db() + _require_admin_uid(authorization) + async with get_session() as session: + row = await _get_row_or_404(session, template_id, is_admin=True) + key, fname = row.storage_key, row.original_filename or "template.docx" + storage, bucket = _templates_storage() + chunks: list[bytes] = [] + try: + async for chunk in storage.download_stream(bucket, key): + chunks.append(chunk) + except FileNotFoundError: + raise HTTPException(status_code=404, detail="Tệp mẫu không tồn tại trong kho.") + except Exception as exc: # noqa: BLE001 + raise HTTPException(status_code=502, detail=f"Không tải được tệp mẫu: {exc}") from exc + return Response( + content=b"".join(chunks), + media_type=_DOCX_MIME, + headers={"Content-Disposition": f'attachment; filename="{fname}"'}, + ) + + +@router.put("/{template_id}", response_model=TemplateOut) +async def update_template( + template_id: str, + payload: TemplateUpdateIn = Body(...), + authorization: Optional[str] = Header(None), +) -> TemplateOut: + """Admin: update metadata (name / description / active). File replace is a separate step.""" + _require_db() + _require_admin_uid(authorization) + async with get_session() as session: + row = await _get_row_or_404(session, template_id, is_admin=True) + if payload.name is not None: + if not payload.name.strip(): + raise HTTPException(status_code=422, detail="Tên mẫu không được để trống.") + row.name = payload.name.strip() + if payload.description is not None: + row.description = payload.description.strip() or None + if payload.isActive is not None: + row.is_active = payload.isActive + row.updated_at = datetime.now(tz=timezone.utc) + await session.commit() + await session.refresh(row) + return _to_out(row) + + +@router.delete("/{template_id}") +async def delete_template( + template_id: str, + hard: bool = False, + authorization: Optional[str] = Header(None), +) -> dict[str, Any]: + """Admin: soft-delete (is_active=false) by default; ?hard=true removes the row + MinIO object.""" + _require_db() + _require_admin_uid(authorization) + async with get_session() as session: + row = await _get_row_or_404(session, template_id, is_admin=True) + key = row.storage_key + if hard: + await session.delete(row) + else: + row.is_active = False + row.updated_at = datetime.now(tz=timezone.utc) + await session.commit() + if hard: + try: + storage, bucket = _templates_storage() + async with storage._client() as s3: # noqa: SLF001 — single-object delete + await s3.delete_object(Bucket=bucket, Key=key) + except Exception: # noqa: BLE001 — row already gone; object cleanup is best-effort + pass + return {"ok": True, "hardDeleted": hard} + + +class RenderIn(BaseModel): + values: dict[str, Any] = Field(default_factory=dict) + format: str = "pdf" # "pdf" | "docx" + + +@router.post("/{template_id}/render") +async def render_template( + template_id: str, + payload: RenderIn = Body(...), + authorization: Optional[str] = Header(None), +) -> Response: + """Authed: fill the template with `values` → DOCX (docxtpl), optionally PDF (LibreOffice).""" + _require_db() + _require_authed_uid(authorization) + fmt = (payload.format or "pdf").lower() + if fmt not in ("pdf", "docx"): + raise HTTPException(status_code=422, detail="format phải là 'pdf' hoặc 'docx'.") + + async with get_session() as session: + row = await _get_row_or_404(session, template_id, is_admin="admin" in _jwt_roles(authorization)) + key, base = row.storage_key, (row.original_filename or "document.docx").rsplit(".", 1)[0] + + storage, bucket = _templates_storage() + chunks: list[bytes] = [] + try: + async for chunk in storage.download_stream(bucket, key): + chunks.append(chunk) + except FileNotFoundError: + raise HTTPException(status_code=404, detail="Tệp mẫu không tồn tại trong kho.") + except Exception as exc: # noqa: BLE001 + raise HTTPException(status_code=502, detail=f"Không tải được tệp mẫu: {exc}") from exc + template_bytes = b"".join(chunks) + + try: + from docxtpl import DocxTemplate + from jinja2.exceptions import TemplateError + + doc = DocxTemplate(io.BytesIO(template_bytes)) + try: + doc.render(payload.values or {}) + except TemplateError as exc: + raise HTTPException(status_code=400, detail=f"Mẫu có cú pháp không hợp lệ: {exc}") from exc + out = io.BytesIO() + doc.docx.save(out) + docx_bytes = out.getvalue() + except HTTPException: + raise + except Exception as exc: # noqa: BLE001 + raise HTTPException(status_code=500, detail=f"Điền mẫu thất bại: {exc}") from exc + + if fmt == "docx": + return Response( + content=docx_bytes, + media_type=_DOCX_MIME, + headers={"Content-Disposition": f'attachment; filename="{base}.docx"'}, + ) + + try: + from src.be01.docx_to_pdf import convert_docx_bytes_to_pdf + + pdf_bytes = convert_docx_bytes_to_pdf(docx_bytes) + except Exception as exc: # noqa: BLE001 — LibreOffice missing / convert failure + raise HTTPException(status_code=502, detail=f"Chuyển PDF thất bại: {exc}") from exc + return Response( + content=pdf_bytes, + media_type="application/pdf", + headers={"Content-Disposition": f'inline; filename="{base}.pdf"'}, + ) diff --git a/be0/src/test/__init__.py b/be0/src/test/__init__.py new file mode 100644 index 0000000..9645e6d --- /dev/null +++ b/be0/src/test/__init__.py @@ -0,0 +1 @@ +# Test helpers and fixtures for be0 (e.g. DOCX fill smoke tests). diff --git a/be0/src/test/__pycache__/__init__.cpython-313.pyc b/be0/src/test/__pycache__/__init__.cpython-313.pyc new file mode 100644 index 0000000..7c3c94d Binary files /dev/null and b/be0/src/test/__pycache__/__init__.cpython-313.pyc differ diff --git a/be0/src/test/__pycache__/test_docx_pseudo_fill.cpython-313-pytest-8.3.4.pyc b/be0/src/test/__pycache__/test_docx_pseudo_fill.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..6110dae Binary files /dev/null and b/be0/src/test/__pycache__/test_docx_pseudo_fill.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/src/test/__pycache__/test_docx_pseudo_fill.cpython-313.pyc b/be0/src/test/__pycache__/test_docx_pseudo_fill.cpython-313.pyc new file mode 100644 index 0000000..154b695 Binary files /dev/null and b/be0/src/test/__pycache__/test_docx_pseudo_fill.cpython-313.pyc differ diff --git a/be0/src/test/pseudo_data_blank.json b/be0/src/test/pseudo_data_blank.json new file mode 100644 index 0000000..efb4863 --- /dev/null +++ b/be0/src/test/pseudo_data_blank.json @@ -0,0 +1,107 @@ +{ + "trang_bia": { + "ten_sang_kien": "[TEST] Quy trình số hóa hồ sơ sáng kiến — bộ dữ liệu tự động", + "tac_gia": "[TEST] Nguyễn Văn An; [TEST] Trần Thị Bình", + "don_vi": "[TEST] Khoa Điều dưỡng — ĐHYD TP.HCM", + "thong_tin_lien_he": "[TEST] 028-1234-5678 · test.applicant@example.edu.vn", + "nam": "2026" + }, + "mau_01": { + "mo_dau": "[TEST] Đơn vị đang quản lý hồ sơ thủ công, dễ trễ hạn và khó tra cứu.", + "ten_sang_kien": "[TEST] Quy trình số hóa hồ sơ sáng kiến — bộ dữ liệu tự động", + "linh_vuc_ap_dung": "[TEST] Quản trị văn phòng / CNTT phục vụ đào tạo", + "tinh_trang_da_biet": "[TEST] Hiện dùng biểu mẫu Word rời, email và file PDF tách biệt.", + "muc_dich": "[TEST] Chuẩn hóa một luồng nộp — duyệt — lưu trữ.", + "cac_buoc_thuc_hien": "[TEST] (1) Nhập liệu (2) Kiểm tra (3) Ký số (4) Lưu kho.", + "dieu_kien_ap_dung": "[TEST] Máy tính kết nối mạng nội bộ; tài khoản SSO.", + "linh_vuc_ap_dung_2": "[TEST] Áp dụng nội bộ trường và đơn vị trực thuộc.", + "ket_qua_thu_duoc": "[TEST] Bản demo điền form đầy đủ trong < 20 phút.", + "danh_sach_ap_dung": [ + {"tt": "1", "ten_to_chuc": "[TEST] Phòng Đào tạo", "dia_chi": "[TEST] Cơ sở 1", "linh_vuc": "[TEST] Hành chính"}, + {"tt": "2", "ten_to_chuc": "[TEST] Khoa X", "dia_chi": "[TEST] Cơ sở 2", "linh_vuc": "[TEST] Lâm sàng"} + ], + "tinh_moi": "[TEST] Lần đầu gộp ba biểu mẫu vào một luồng nháp tự lưu.", + "tinh_hieu_qua": { + "loi_ich_kinh_te": "[TEST] Tiết kiệm ~15% thời gian xử lý hồ sơ.", + "hieu_qua_giang_day": "[TEST] Minh họa trực quan hơn cho sinh viên.", + "tang_nang_suat": "[TEST] Giảm thao tác lặp khi nhập liệu.", + "nang_cao_hieu_qua": "[TEST] Thống nhất quy trình giữa các đơn vị.", + "nang_cao_chat_luong": "[TEST] Hồ sơ đầy đủ, ít sai sót hơn.", + "giam_chi_phi": "[TEST] Giảm in ấn và lưu trữ giấy.", + "cai_thien_moi_truong": "[TEST] Ưu tiên hồ sơ điện tử.", + "bao_ve_suc_khoe": "[TEST] Hạn chế tiếp xúc hồ sơ giấy tại quầy.", + "an_toan_lao_dong": "[TEST] Quy trình rõ ràng, giảm nhầm lẫn.", + "nang_cao_nhan_thuc": "[TEST] Tăng hiểu biết về quy định nộp sáng kiến." + }, + "thong_tin_bao_mat": "[TEST] Không có thông tin mật trong bản thử nghiệm.", + "ngay_ky": {"ngay": "18", "thang": "6", "nam": "2026"}, + "lanh_dao_don_vi": "", + "tac_gia_sang_kien": "[TEST] Nguyễn Văn An" + }, + "mau_02": { + "don_vi": "[TEST] Khoa Điều dưỡng — ĐHYD TP.HCM", + "danh_sach_tac_gia": [ + {"stt": "1", "ho_ten": "[TEST] Nguyễn Văn An", "ngay_sinh": "15/05/1988", "noi_cong_tac": "[TEST] Khoa Điều dưỡng", "chuc_danh": "[TEST] Giảng viên", "trinh_do": "[TEST] Thạc sĩ", "ty_le": "60"}, + {"stt": "2", "ho_ten": "[TEST] Trần Thị Bình", "ngay_sinh": "22/11/1990", "noi_cong_tac": "[TEST] Phòng NCKH", "chuc_danh": "[TEST] Cán bộ", "trinh_do": "[TEST] Cử nhân", "ty_le": "40"} + ], + "ten_sang_kien": "[TEST] Quy trình số hóa hồ sơ sáng kiến — bộ dữ liệu tự động", + "chu_dau_tu": "[TEST] Đại học Y Dược TP.HCM", + "linh_vuc_ap_dung": "[TEST] Quản trị văn phòng / CNTT phục vụ đào tạo", + "ngay_ap_dung": "10/09/2025", + "noi_dung": "[TEST] Mô tả ngắn: mẫu điền tự động kiểm tra textarea Đơn §4.", + "phan_loai": { + "giai_phap_ky_thuat": true, + "sang_kien_tu_nckh": false, + "sang_kien_tu_sach": false + }, + "thong_tin_bao_mat": "[TEST] Không có thông tin mật trong bản thử nghiệm.", + "dieu_kien_ap_dung": "[TEST] Thiết bị đọc PDF; trình duyệt hiện đại.", + "danh_gia_tac_gia": "[TEST] Sáng kiến giúp giảm công sức nhập liệu.", + "danh_gia_to_chuc": "[TEST] Đơn vị thử nghiệm phản hồi tích cực.", + "danh_sach_tham_gia": [ + {"stt": "1", "ho_ten": "[TEST] Lê Văn Cường", "ngay_sinh": "01/01/1992", "noi_cong_tac": "[TEST] IT", "chuc_danh": "[TEST] Kỹ thuật", "trinh_do": "[TEST] Kỹ sư", "noi_dung_ho_tro": "[TEST] Hỗ trợ triển khai pilot."} + ], + "ngay_ky": {"ngay": "18", "thang": "6", "nam": "2026"}, + "lanh_dao_don_vi": "", + "tac_gia_sang_kien": "[TEST] Nguyễn Văn An" + }, + "mau_03": { + "ngay_ky": {"ngay": "20", "thang": "6", "nam": "2026"}, + "ten_sang_kien": "[TEST] Quy trình số hóa hồ sơ sáng kiến — bộ dữ liệu tự động", + "tac_gia_chinh": "[TEST] Nguyễn Văn An", + "chuc_vu_don_vi": "[TEST] Giảng viên — Khoa Điều dưỡng", + "ty_le_dong_gop": [ + {"stt": "1", "ho_ten": "[TEST] Nguyễn Văn An", "don_vi": "[TEST] Khoa Điều dưỡng", "phan_tram": "60", "chu_ky": ""}, + {"stt": "2", "ho_ten": "[TEST] Trần Thị Bình", "don_vi": "[TEST] Phòng NCKH", "phan_tram": "40", "chu_ky": ""} + ], + "tac_gia_chinh_ky": "[TEST] Nguyễn Văn An" + }, + "mau_04": { + "ten_sang_kien": "[TEST] Quy trình số hóa hồ sơ sáng kiến — bộ dữ liệu tự động", + "tac_gia": "[TEST] Nguyễn Văn An", + "chuc_vu_don_vi": "[TEST] Giảng viên — Khoa Điều dưỡng", + "tinh_moi": {"nhan_xet": "[TEST] Có tính mới ở mức khá trong đơn vị.", "diem": "28"}, + "tinh_hieu_qua": {"nhan_xet": "[TEST] Hiệu quả rõ trên phạm vi khoa.", "diem": "42"}, + "tong_cong": "70", + "ket_luan": "[TEST] Đạt mức khá — đủ điều kiện xét tiếp (dữ liệu thử).", + "ngay_ky": {"ngay": "25", "thang": "6", "nam": "2026"}, + "thanh_vien_hoi_dong": "[TEST] TS. A, PGS.TS. B, TS. C (thử nghiệm form)." + }, + "ban_cam_ket": { + "ngay_ky": {"ngay": "25", "thang": "6", "nam": "2026"}, + "tac_gia_dang_ky": "[TEST] Nguyễn Văn An", + "cccd": "[TEST] 079088012345", + "don_vi": "[TEST] Khoa Điều dưỡng", + "ten_bai_bao": "[TEST] Bài báo minh họa (test)", + "nam_xet": "2026", + "vai_tro": {"tac_gia_chinh": true, "dong_tac_gia": false}, + "cam_ket": { + "quyen_so_huu_1": true, + "quyen_so_huu_2": true, + "dong_thuan": true, + "bai_bao_uy_tin": true, + "tuan_thu_phap_luat": true + }, + "nguoi_cam_ket": "[TEST] Nguyễn Văn An" + } +} diff --git a/be0/src/test/test_docx_pseudo_fill.py b/be0/src/test/test_docx_pseudo_fill.py new file mode 100644 index 0000000..f930e11 --- /dev/null +++ b/be0/src/test/test_docx_pseudo_fill.py @@ -0,0 +1,51 @@ +""" +Smoke test: render `template_application_form.docx` with pseudo `data_blank.json`-shaped context. + +Run from repo root or be0: + cd be0 && python -m unittest src.test.test_docx_pseudo_fill -v + +Skips when docxtpl is missing or the Word template path is not on disk (e.g. CI without fe0 mount). +""" + +from __future__ import annotations + +import json +import unittest +from pathlib import Path + +_JSON = Path(__file__).resolve().parent / "pseudo_data_blank.json" + + +class DocxPseudoFillTests(unittest.TestCase): + def test_pseudo_json_loads_and_matches_blank_shape(self) -> None: + ctx = json.loads(_JSON.read_text(encoding="utf-8")) + self.assertIn("trang_bia", ctx) + self.assertIn("mau_01", ctx) + self.assertIn("mau_02", ctx) + self.assertIn("mau_03", ctx) + self.assertIn("mau_04", ctx) + self.assertIn("ban_cam_ket", ctx) + self.assertTrue(str(ctx["trang_bia"]["ten_sang_kien"]).startswith("[TEST]")) + + def test_docx_render_returns_bytes(self) -> None: + try: + from src.be01.fill_application_form import fill_application_form_docx + except ImportError as e: + self.skipTest(f"be01 import failed: {e}") + + ctx = json.loads(_JSON.read_text(encoding="utf-8")) + try: + out = fill_application_form_docx(ctx) + except FileNotFoundError as e: + self.skipTest(str(e)) + except ModuleNotFoundError as e: + if "docxtpl" in str(e).lower(): + self.skipTest(str(e)) + raise + + self.assertIsInstance(out, (bytes, bytearray)) + self.assertGreater(len(out), 4000) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/src/utils.py b/be0/src/utils.py new file mode 100644 index 0000000..343d686 --- /dev/null +++ b/be0/src/utils.py @@ -0,0 +1,407 @@ +import fitz # PyMuPDF +import base64 +import os +from typing import Dict, List, Tuple, Optional, Union +import ollama # Import the Ollama library +import tempfile +import logging + +from pathlib import Path + +class Section: + """Represents a document section with type, content, and bounding box.""" + + def __init__(self, section_type: str, content: str, bbox: Tuple[float, float, float, float]): + self.type = section_type + self.content = content + self.bbox = bbox + + def to_dict(self) -> Dict: + return { + 'type': self.type, + 'content': self.content, + 'bbox': self.bbox + } + +def get_available_models() -> List[str]: + """ + Get a list of available models from Ollama. + + Returns: + List[str]: A list of available model names. + """ + try: + models_response = client.list() + logger.info(f"Ollama models response: {models_response}") + + if "models" in models_response and isinstance(models_response["models"], list): + model_names = [model["model"] for model in models_response["models"]] + return model_names + else: + logger.warning("No models found in Ollama response") + return [] + except Exception as e: + logger.error(f"Error fetching Ollama models: {e}") + return [] + +def _extract_text_blocks(page: fitz.Page, show_text: bool) -> List[Section]: + """Extract text blocks from a PDF page.""" + sections = [] + blocks = page.get_text("blocks") + + for block in blocks: + if block[6] == 0: # Text block + rect = fitz.Rect(block[:4]) + content = block[4] if block[4] is not None else '' + + if content.strip(): # Only add non-empty text + section = Section('text', content, (rect.x0, rect.y0, rect.x1, rect.y1)) + sections.append(section) + + if show_text: + page.draw_rect(rect, color=(1, 0, 0), width=1.5) + + return sections + +def _extract_image_data(page: fitz.Page, img_index: int) -> Optional[bytes]: + """Extract image data as bytes from PDF page.""" + try: + img_list = page.get_images(full=True) + if img_index < len(img_list): + img_info = img_list[img_index] + xref = img_info[0] + pix = fitz.Pixmap(page.parent, xref) + + # Convert CMYK to RGB if needed + if pix.n - pix.alpha < 4: + img_data = pix.tobytes("png") + else: + pix1 = fitz.Pixmap(fitz.csRGB, pix) + img_data = pix1.tobytes("png") + pix1 = None + + pix = None + return img_data + except Exception as e: + logger.warning(f"Failed to extract image data: {e}") + return None + +def _process_image_with_ai(img_data: bytes, model: str = "qwen2.5vl:7b") -> str: + """Process image with AI to extract text or analyze diagrams.""" + try: + import base64 + + # Encode image to base64 + img_base64 = base64.b64encode(img_data).decode() + + # Create prompt for image analysis + prompt = """Extract all text from this image exactly as it appears. Do not analyze, describe, or interpret the image. Only output the readable text you see, preserving the original formatting and layout as much as possible. If there is no text, respond with "[No text found]".""" + + # Send to Ollama with image + response = client.chat( + model=model, + messages=[ + { + 'role': 'user', + 'content': prompt, + 'images': [img_base64] + } + ] + ) + + if response and 'message' in response and 'content' in response['message']: + return response['message']['content'] + else: + logger.error("Invalid response format from Ollama vision model") + return "[Image: Could not analyze]" + + except Exception as e: + logger.error(f"Error processing image with AI: {e}") + return "[Image: Analysis failed]" + +def _extract_images(page: fitz.Page, show_images: bool, process_with_ai: bool = False) -> List[Section]: + """Extract images from a PDF page and optionally process with AI.""" + sections = [] + + img_list = page.get_images(full=True) + for img_index, img in enumerate(img_list): + try: + bbox = page.get_image_bbox(img) + if bbox: + content = "[Image]" + + # Process image with AI if enabled + if process_with_ai: + logger.info(f"Processing image {img_index + 1} with AI...") + img_data = _extract_image_data(page, img_index) + if img_data: + ai_analysis = _process_image_with_ai(img_data) + content = f"**[Image Analysis]**\n\n{ai_analysis}" + + section = Section('image', content, (bbox.x0, bbox.y0, bbox.x1, bbox.y1)) + sections.append(section) + + if show_images: + page.draw_rect(bbox, color=(0, 0, 1), width=1.5) + except Exception as e: + logger.warning(f"Failed to extract image: {e}") + continue + + return sections + +# Removed table extraction functions - focusing on text and images only + +def process_pdf(pdf_path: str, show_text: bool, show_images: bool) -> Tuple[fitz.Document, Dict[int, List[Dict]]]: + """ + Process the PDF to extract sections (text, images, tables) in reading order. + + Args: + pdf_path (str): Path to the PDF file. + show_text (bool): Whether to draw borders around text blocks. + show_images (bool): Whether to draw borders around images. + + Returns: + Tuple[fitz.Document, Dict[int, List[Dict]]]: The processed PDF document and a dictionary of page sections. + """ + try: + doc = fitz.open(pdf_path) + page_sections: Dict[int, List[Dict]] = {} + + for page_num in range(len(doc)): + page = doc.load_page(page_num) + sections = [] + + # Extract different types of content + text_sections = _extract_text_blocks(page, show_text) + image_sections = _extract_images(page, show_images, process_with_ai=False) + + # Combine all sections + all_sections = text_sections + image_sections + + # Sort sections by reading order (top to bottom, left to right) + all_sections.sort(key=lambda s: (s.bbox[1], s.bbox[0])) + + # Convert to dictionary format + page_sections[page_num] = [section.to_dict() for section in all_sections] + + logger.info(f"Page {page_num + 1}: Found {len(text_sections)} text, " + f"{len(image_sections)} image sections") + + return doc, page_sections + + except Exception as e: + logger.error(f"Error processing PDF: {e}") + raise + +def get_page_images(doc: fitz.Document, zoom_level: float = 1.5) -> Dict[int, str]: + """ + Render each page of the PDF as a base64-encoded PNG image. + + Args: + doc (fitz.Document): The PDF document. + zoom_level (float): The zoom level for rendering (affects image size). + + Returns: + Dict[int, str]: A dictionary of page images as base64 strings. + """ + page_images = {} + + try: + for page_num in range(len(doc)): + page = doc.load_page(page_num) + + # Create transformation matrix for zoom + matrix = fitz.Matrix(zoom_level, zoom_level) + pix = page.get_pixmap(matrix=matrix) + + # Convert to PNG bytes and encode to base64 + img_bytes = pix.tobytes("png") + img_base64 = base64.b64encode(img_bytes).decode() + page_images[page_num] = img_base64 + + logger.debug(f"Generated preview image for page {page_num + 1}") + + except Exception as e: + logger.error(f"Error generating page images: {e}") + + return page_images + +def save_and_encode_pdf(doc: fitz.Document) -> Tuple[bytes, str]: + """ + Save the modified PDF to a temporary file and encode it to base64. + + Args: + doc (fitz.Document): The PDF document to save and encode. + + Returns: + Tuple[bytes, str]: The raw PDF bytes and the base64-encoded string. + """ + try: + # Create temporary file + with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as temp_file: + modified_pdf_path = temp_file.name + + # Save document + doc.save(modified_pdf_path) + doc.close() + + # Read and encode + with open(modified_pdf_path, "rb") as f: + modified_pdf_bytes = f.read() + + # Clean up temporary file + try: + os.unlink(modified_pdf_path) + except OSError: + logger.warning(f"Could not delete temporary file: {modified_pdf_path}") + + base64_encoded = base64.b64encode(modified_pdf_bytes).decode("utf-8") + logger.info("PDF successfully saved and encoded") + + return modified_pdf_bytes, base64_encoded + + except Exception as e: + logger.error(f"Error saving and encoding PDF: {e}") + raise + +def format_text_with_llama(text: str, model: str) -> str: + """ + Use the selected Ollama model to format the text into Markdown. + + Args: + text (str): The text to format. + model (str): The model to use for formatting. + + Returns: + str: The formatted Markdown text. + """ + if not text.strip(): + logger.warning("Empty text provided for formatting") + return text + + try: + logger.info(f"Formatting text with model: {model}") + response = client.generate( + model=model, + prompt=text + ) + + if "response" in response: + return response["response"] + else: + logger.error("Invalid response format from Ollama") + return text + + except Exception as e: + logger.error(f"Error formatting text with Ollama: {e}") + return text # Return the original text if formatting fails + +def initialize_a_logger(logger_name: str = "./logs/ChatBot.log"): + logger = logging.getLogger('example_logger') + + # Set the logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) + logger.setLevel(logging.DEBUG) + + log_path = Path(logger_name).expanduser().resolve() + log_path.parent.mkdir(parents=True, exist_ok=True) + # Use the same resolved path as mkdir — relative paths + cwd quirks broke FileHandler in Docker. + fh = logging.FileHandler(str(log_path), mode="w") + logger.addHandler(fh) + return logger + +def build_prompt(text: str) -> str: + + + return f""" + As an expert auditor, read the text line by line, analyze semantic chunks, + and infer the compliance rules in the text thoroughly. + Return only valid JSON. If information is missing, leave fields empty. + + Input Text: + {text} + + LLM Tasks: + - Parse text as semantic chunks, analyze input and the Type descriptions and concisely extract compliance rules based on the format. + 1. Ubiquitous Requirements. Description: No pre-condition, Always active + Format: + Example: "The Bank SHALL maintain audit trails for all change requests." + + 2. Event-driven Requirements. Description: Begin with WHEN, Triggered by specific events + Format: WHEN + Example: "WHEN acquiring software packages, the Bank SHALL perform a detailed evaluation to ensure user and business requirements are met." + + 3. Unwanted Behavior Requirements. Description: Begin with IF...THEN, Express undesirable situations to be handled + Format: IF THEN + Example: "IF a supplier's financial stability is in doubt, THEN the Bank SHALL have alternatives to mitigate potential loss of service." + + 4. State-driven Requirements. Description: Begin with WHILE, Active during specific system states + Format: WHILE + Example: "WHILE systems are in production, the Bank SHALL maintain hardware and software requirements at recovery locations." + + 5. Optional Feature Requirements. Description: Begin with WHERE, Apply when certain features are enabled/present + Format: WHERE + Example: "WHERE third-party service providers are involved, the Bank SHALL implement secure remote access mechanisms." + + + 6. Complex Requirements. Description: Combine multiple condition types, Handle more sophisticated scenarios + Example: "WHEN changes are implemented to information systems, IF the change is an emergency change, THEN the Bank SHALL follow formal emergency change management procedures, INCLUDING regularization of implemented emergency changes." + + + - Normalize language → turn “should/must/required/have to” into clear compliance rules. + + + Output JSON: + {{ + "requirements": ["output 1", "output 2", ...] + }} + """ + +def extract_pdf_text(pdf_path: str): + """ + Extracts text page-by-page from a PDF, sends each to the Ollama endpoint, + and returns structured compliance rules in JSON. + """ + + def parse_pdf_to_json(text: str): + prompt = build_prompt(text) + url = "http://localhost:4402/test_ollama_1" + + try: + # Send prompt to your FastAPI Ollama endpoint + response = requests.post(url, json={"prompt": prompt}) + response.raise_for_status() + raw = response.json() + + # Try to clean and parse LLM JSON output + content = raw.get("oss_json", "") + cleaned = content.replace("```json", "").replace("```", "").strip() + + try: + parsed = json.loads(cleaned) + except json.JSONDecodeError: + parsed = {"requirements": [cleaned]} + + return parsed + + except Exception as e: + print(f"Error processing page: {e}") + return {"error": str(e)} + + reader = PdfReader(pdf_path) + all_requirements = [] + + for i, page in enumerate(reader.pages): + text = page.extract_text() + if not text: + continue + + print(f"Processing page {i + 1}...") + result = parse_pdf_to_json(text) + all_requirements.append({"page": i + 1, "result": result}) + + return all_requirements + + + + + diff --git a/be0/template_application_form.docx b/be0/template_application_form.docx new file mode 100644 index 0000000..e69de29 diff --git a/be0/tests/__pycache__/auth_register_staff_fixture.cpython-313.pyc b/be0/tests/__pycache__/auth_register_staff_fixture.cpython-313.pyc new file mode 100644 index 0000000..5b032b8 Binary files /dev/null and b/be0/tests/__pycache__/auth_register_staff_fixture.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/security_token_fixture.cpython-313.pyc b/be0/tests/__pycache__/security_token_fixture.cpython-313.pyc new file mode 100644 index 0000000..2f7b253 Binary files /dev/null and b/be0/tests/__pycache__/security_token_fixture.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_admin_audit_routes.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_admin_audit_routes.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..ec661f1 Binary files /dev/null and b/be0/tests/__pycache__/test_admin_audit_routes.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_admin_audit_routes.cpython-313.pyc b/be0/tests/__pycache__/test_admin_audit_routes.cpython-313.pyc new file mode 100644 index 0000000..b540915 Binary files /dev/null and b/be0/tests/__pycache__/test_admin_audit_routes.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_application_backup.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_application_backup.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..6d3ba2b Binary files /dev/null and b/be0/tests/__pycache__/test_application_backup.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_application_backup.cpython-313.pyc b/be0/tests/__pycache__/test_application_backup.cpython-313.pyc new file mode 100644 index 0000000..85bc0a7 Binary files /dev/null and b/be0/tests/__pycache__/test_application_backup.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_application_drafts_get.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_application_drafts_get.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..41231d7 Binary files /dev/null and b/be0/tests/__pycache__/test_application_drafts_get.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_application_drafts_get.cpython-313.pyc b/be0/tests/__pycache__/test_application_drafts_get.cpython-313.pyc new file mode 100644 index 0000000..3448f65 Binary files /dev/null and b/be0/tests/__pycache__/test_application_drafts_get.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_applications_db_integration.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_applications_db_integration.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..29ff54f Binary files /dev/null and b/be0/tests/__pycache__/test_applications_db_integration.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_applications_db_integration.cpython-313.pyc b/be0/tests/__pycache__/test_applications_db_integration.cpython-313.pyc new file mode 100644 index 0000000..6574726 Binary files /dev/null and b/be0/tests/__pycache__/test_applications_db_integration.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_auth_password_reset_integration.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_auth_password_reset_integration.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..03c64ec Binary files /dev/null and b/be0/tests/__pycache__/test_auth_password_reset_integration.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_auth_password_reset_integration.cpython-313.pyc b/be0/tests/__pycache__/test_auth_password_reset_integration.cpython-313.pyc new file mode 100644 index 0000000..696534f Binary files /dev/null and b/be0/tests/__pycache__/test_auth_password_reset_integration.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_auth_policy_integration.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_auth_policy_integration.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..067e0e9 Binary files /dev/null and b/be0/tests/__pycache__/test_auth_policy_integration.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_auth_policy_integration.cpython-313.pyc b/be0/tests/__pycache__/test_auth_policy_integration.cpython-313.pyc new file mode 100644 index 0000000..208396f Binary files /dev/null and b/be0/tests/__pycache__/test_auth_policy_integration.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_backup_e2e.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_backup_e2e.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..1e2b7b0 Binary files /dev/null and b/be0/tests/__pycache__/test_backup_e2e.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_backup_e2e.cpython-313.pyc b/be0/tests/__pycache__/test_backup_e2e.cpython-313.pyc new file mode 100644 index 0000000..bb3d6ab Binary files /dev/null and b/be0/tests/__pycache__/test_backup_e2e.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_dashboard_lookup_routes.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_dashboard_lookup_routes.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..d99c563 Binary files /dev/null and b/be0/tests/__pycache__/test_dashboard_lookup_routes.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_dashboard_lookup_routes.cpython-313.pyc b/be0/tests/__pycache__/test_dashboard_lookup_routes.cpython-313.pyc new file mode 100644 index 0000000..3873cd9 Binary files /dev/null and b/be0/tests/__pycache__/test_dashboard_lookup_routes.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_docx_normalize.cpython-311.pyc b/be0/tests/__pycache__/test_docx_normalize.cpython-311.pyc new file mode 100644 index 0000000..fa90816 Binary files /dev/null and b/be0/tests/__pycache__/test_docx_normalize.cpython-311.pyc differ diff --git a/be0/tests/__pycache__/test_docx_normalize.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_docx_normalize.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..5ef949c Binary files /dev/null and b/be0/tests/__pycache__/test_docx_normalize.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_docx_normalize.cpython-313.pyc b/be0/tests/__pycache__/test_docx_normalize.cpython-313.pyc new file mode 100644 index 0000000..c20c527 Binary files /dev/null and b/be0/tests/__pycache__/test_docx_normalize.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_evidence_initiative_resolution.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_evidence_initiative_resolution.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..e60b173 Binary files /dev/null and b/be0/tests/__pycache__/test_evidence_initiative_resolution.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_evidence_initiative_resolution.cpython-313.pyc b/be0/tests/__pycache__/test_evidence_initiative_resolution.cpython-313.pyc new file mode 100644 index 0000000..198e210 Binary files /dev/null and b/be0/tests/__pycache__/test_evidence_initiative_resolution.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_evidence_kind_parsing.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_evidence_kind_parsing.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..ff87990 Binary files /dev/null and b/be0/tests/__pycache__/test_evidence_kind_parsing.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_evidence_kind_parsing.cpython-313.pyc b/be0/tests/__pycache__/test_evidence_kind_parsing.cpython-313.pyc new file mode 100644 index 0000000..5915e43 Binary files /dev/null and b/be0/tests/__pycache__/test_evidence_kind_parsing.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_official_to_data_blank_ban_cam_ket.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_official_to_data_blank_ban_cam_ket.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..c3a6679 Binary files /dev/null and b/be0/tests/__pycache__/test_official_to_data_blank_ban_cam_ket.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_official_to_data_blank_ban_cam_ket.cpython-313.pyc b/be0/tests/__pycache__/test_official_to_data_blank_ban_cam_ket.cpython-313.pyc new file mode 100644 index 0000000..c3b39b2 Binary files /dev/null and b/be0/tests/__pycache__/test_official_to_data_blank_ban_cam_ket.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_official_to_data_blank_don_vi.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_official_to_data_blank_don_vi.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..beb30a5 Binary files /dev/null and b/be0/tests/__pycache__/test_official_to_data_blank_don_vi.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_official_to_data_blank_don_vi.cpython-313.pyc b/be0/tests/__pycache__/test_official_to_data_blank_don_vi.cpython-313.pyc new file mode 100644 index 0000000..42a372d Binary files /dev/null and b/be0/tests/__pycache__/test_official_to_data_blank_don_vi.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_registration_otp.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_registration_otp.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..95f76d6 Binary files /dev/null and b/be0/tests/__pycache__/test_registration_otp.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_registration_otp.cpython-313.pyc b/be0/tests/__pycache__/test_registration_otp.cpython-313.pyc new file mode 100644 index 0000000..560b42d Binary files /dev/null and b/be0/tests/__pycache__/test_registration_otp.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_registration_stack_alignment.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_registration_stack_alignment.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..2639dd2 Binary files /dev/null and b/be0/tests/__pycache__/test_registration_stack_alignment.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_registration_stack_alignment.cpython-313.pyc b/be0/tests/__pycache__/test_registration_stack_alignment.cpython-313.pyc new file mode 100644 index 0000000..763bafc Binary files /dev/null and b/be0/tests/__pycache__/test_registration_stack_alignment.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_repair_split_submission.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_repair_split_submission.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..e2cd18e Binary files /dev/null and b/be0/tests/__pycache__/test_repair_split_submission.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_repair_split_submission.cpython-313.pyc b/be0/tests/__pycache__/test_repair_split_submission.cpython-313.pyc new file mode 100644 index 0000000..4f8f0ee Binary files /dev/null and b/be0/tests/__pycache__/test_repair_split_submission.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_security_routes.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_security_routes.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..ef87e53 Binary files /dev/null and b/be0/tests/__pycache__/test_security_routes.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_security_routes.cpython-313.pyc b/be0/tests/__pycache__/test_security_routes.cpython-313.pyc new file mode 100644 index 0000000..0de625f Binary files /dev/null and b/be0/tests/__pycache__/test_security_routes.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_staff_profile_domain.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_staff_profile_domain.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..2164654 Binary files /dev/null and b/be0/tests/__pycache__/test_staff_profile_domain.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_staff_profile_domain.cpython-313.pyc b/be0/tests/__pycache__/test_staff_profile_domain.cpython-313.pyc new file mode 100644 index 0000000..bbe5f7f Binary files /dev/null and b/be0/tests/__pycache__/test_staff_profile_domain.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_submission_readiness.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_submission_readiness.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..8856fdc Binary files /dev/null and b/be0/tests/__pycache__/test_submission_readiness.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_submission_readiness.cpython-313.pyc b/be0/tests/__pycache__/test_submission_readiness.cpython-313.pyc new file mode 100644 index 0000000..9b4ba39 Binary files /dev/null and b/be0/tests/__pycache__/test_submission_readiness.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_submissions_projection_research_kind.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_submissions_projection_research_kind.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..d2ecea0 Binary files /dev/null and b/be0/tests/__pycache__/test_submissions_projection_research_kind.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_submissions_projection_research_kind.cpython-313.pyc b/be0/tests/__pycache__/test_submissions_projection_research_kind.cpython-313.pyc new file mode 100644 index 0000000..1060fea Binary files /dev/null and b/be0/tests/__pycache__/test_submissions_projection_research_kind.cpython-313.pyc differ diff --git a/be0/tests/__pycache__/test_user_notifications_merit.cpython-313-pytest-8.3.4.pyc b/be0/tests/__pycache__/test_user_notifications_merit.cpython-313-pytest-8.3.4.pyc new file mode 100644 index 0000000..4161ff6 Binary files /dev/null and b/be0/tests/__pycache__/test_user_notifications_merit.cpython-313-pytest-8.3.4.pyc differ diff --git a/be0/tests/__pycache__/test_user_notifications_merit.cpython-313.pyc b/be0/tests/__pycache__/test_user_notifications_merit.cpython-313.pyc new file mode 100644 index 0000000..f4f3959 Binary files /dev/null and b/be0/tests/__pycache__/test_user_notifications_merit.cpython-313.pyc differ diff --git a/be0/tests/auth_register_staff_fixture.py b/be0/tests/auth_register_staff_fixture.py new file mode 100644 index 0000000..406b022 --- /dev/null +++ b/be0/tests/auth_register_staff_fixture.py @@ -0,0 +1,16 @@ +"""Minimal valid staff fields for POST /api/v1/auth/register in integration tests.""" + +from __future__ import annotations + +import uuid + + +def register_staff_fields() -> dict[str, str]: + """Unique employee_id per call (DB partial unique index on user_staff_profiles.employee_id).""" + suffix = uuid.uuid4().hex[:8].upper() + return { + "employeeId": f"CB-{suffix}", + "academicTitleCode": "master", + "unitNameFreetext": "Khoa kiểm thử", + "jobTitle": "Cán bộ", + } diff --git a/be0/tests/fixtures/__pycache__/minimal_submit_bundle.cpython-313.pyc b/be0/tests/fixtures/__pycache__/minimal_submit_bundle.cpython-313.pyc new file mode 100644 index 0000000..3d12d6d Binary files /dev/null and b/be0/tests/fixtures/__pycache__/minimal_submit_bundle.cpython-313.pyc differ diff --git a/be0/tests/fixtures/minimal_submit_bundle.py b/be0/tests/fixtures/minimal_submit_bundle.py new file mode 100644 index 0000000..a6aaa25 --- /dev/null +++ b/be0/tests/fixtures/minimal_submit_bundle.py @@ -0,0 +1,99 @@ +"""Minimal valid tab JSON for submit readiness checks (technical classification).""" + +from __future__ import annotations + +from typing import Any, Dict + + +def minimal_report_tab(*, initiative_name: str = "Test initiative") -> Dict[str, Any]: + return { + "introduction": "Mở đầu đủ.", + "initiativeName": initiative_name, + "representativeAuthor": "Nguyễn Văn A", + "representativePhone": "0900000000", + "representativeEmail": "a@ump.edu.vn", + "applicationField": "Y tế", + "currentStatus": "Hiện trạng.", + "purpose": "Mục đích.", + "solutionContent": "Nội dung giải pháp.", + "implementationSteps": "Các bước.", + "firstAppliedUnit": "Đơn vị.", + "achievedResult": "Kết quả.", + "conditions": "Điều kiện.", + "trialUnits": [], + "novelty": "Tính mới.", + "effectiveness": { + "economic": "Kinh tế.", + "social": "Xã hội.", + "teaching": "Giảng dạy.", + "productivity": "", + "quality": "", + "environment": "", + "safety": "An toàn.", + }, + "confidentialInfo": "", + "submissionDate": "05/05/2026", + "authorName": "Nguyễn Văn A", + "honestyConfirmed": True, + } + + +def minimal_application_tab_technical(*, initiative_name: str = "Test initiative") -> Dict[str, Any]: + return { + "unitName": "Đơn vị A", + "authors": [ + { + "id": 1, + "name": "Nguyễn Văn A", + "dob": "01/01/1980", + "workplace": "UMP", + "title": "GV", + "qualification": "TS", + "contributionPercent": 100, + } + ], + "initiativeName": initiative_name, + "investorName": "Chủ đầu tư", + "applicationField": "Y tế", + "firstApplyDate": "15/04/2025", + "initiativeClassification": "technical", + "textbookEvidenceKind": "", + "researchEvidenceKind": "", + "researchEvidenceFile": None, + "textbookEvidenceFile": None, + "technicalEvidenceFile": None, + "internationalJournalDeclaration": "", + "banCamKet": {}, + "referenceMaterialHonesty": {}, + "researchDomesticHonesty": {}, + "contentSummary": "Tóm tắt nội dung.", + "confidentialInfo": "", + "conditions": "Điều kiện đơn.", + "authorEvaluation": "Đánh giá tác giả.", + "trialEvaluation": "Đánh giá thử.", + "supportStaff": [], + "honestyConfirmed": True, + "submissionDay": 5, + "submissionMonth": 5, + "submissionYear": "2026", + } + + +def minimal_contribution_tab(*, initiative_name: str = "Test initiative") -> Dict[str, Any]: + return { + "initiativeName": initiative_name, + "mainAuthor": "Nguyễn Văn A", + "position": "UMP", + "representativePercent": 100, + "submissionDate": "2026-05-05T00:00:00.000Z", + "participants": [], + "digitalSignatureConfirmed": True, + } + + +def minimal_tabs_bundle(*, initiative_name: str = "Test initiative") -> Dict[str, Dict[str, Any]]: + return { + "report": minimal_report_tab(initiative_name=initiative_name), + "application": minimal_application_tab_technical(initiative_name=initiative_name), + "contribution": minimal_contribution_tab(initiative_name=initiative_name), + } diff --git a/be0/tests/security_token_fixture.py b/be0/tests/security_token_fixture.py new file mode 100644 index 0000000..2490a87 --- /dev/null +++ b/be0/tests/security_token_fixture.py @@ -0,0 +1,32 @@ +"""Shared JWT bearer header for route security tests (uses auth_jwt.jwt_secret()).""" + +from __future__ import annotations + +import uuid +from datetime import datetime, timedelta, timezone +from typing import Sequence + +import jwt + +from src.auth_jwt import jwt_secret + + +def mint_bearer_token( + *, + roles: Sequence[str] = ("viewer",), + sub: uuid.UUID | None = None, + email: str = "security-test@ump.edu.vn", + credential_version: int = 0, +) -> str: + user_id = sub or uuid.uuid4() + now = datetime.now(timezone.utc) + payload = { + "sub": str(user_id), + "email": email, + "roles": list(roles), + "cv": credential_version, + "iat": int(now.timestamp()), + "exp": int((now + timedelta(hours=1)).timestamp()), + } + token = jwt.encode(payload, jwt_secret(), algorithm="HS256") + return f"Bearer {token}" diff --git a/be0/tests/test_admin_audit_routes.py b/be0/tests/test_admin_audit_routes.py new file mode 100644 index 0000000..f282a55 --- /dev/null +++ b/be0/tests/test_admin_audit_routes.py @@ -0,0 +1,27 @@ +"""Sanity checks for admin audit router registration (no DB required).""" + +from __future__ import annotations + +import unittest + + +class AdminAuditRouterSmokeTests(unittest.TestCase): + def test_audit_router_registers_list_and_detail(self) -> None: + from src.admin_audit_routes import router + + paths = [getattr(r, "path", "") for r in router.routes] + self.assertIn("/admin/audit", paths) + self.assertTrue( + any(isinstance(p, str) and p.startswith("/admin/audit/") for p in paths), + msg=f"detail route missing under router, paths={paths}", + ) + + def test_parse_sort_behavior(self) -> None: + from src.admin_audit_routes import _parse_sort + + self.assertFalse(_parse_sort("occurred_at:desc")) + self.assertTrue(_parse_sort("occurred_at:asc")) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_application_backup.py b/be0/tests/test_application_backup.py new file mode 100644 index 0000000..27c0aa1 --- /dev/null +++ b/be0/tests/test_application_backup.py @@ -0,0 +1,99 @@ +"""Unit tests for backup ZIP helpers (no database).""" + +import unittest + +from src.initiative_db.backup_naming import backup_zip_attachment_filename, official_form_pdf_backup_zip_path +from src.initiative_db.application_storage import ( + EVIDENCE_ROLE_RESEARCH, + STORAGE_FILESYSTEM, + STORAGE_MINIO_ATTACHMENTS, + STORAGE_MINIO_EXPORTS, + effective_storage_kind, +) + + +class OfficialFormPdfZipPathTests(unittest.TestCase): + def test_trang_bia_fields(self) -> None: + obm = { + "TRANG BÌA": { + "Tên sáng kiến (Tiếng Việt)": " Khảo sát thảo dược ", + "Tác giả/nhóm tác giả sáng kiến": "Lê Thị A", + "Thông tin liên hệ (Điện thoại, Email)": "0909, a.b@ump.edu.vn", + } + } + p = official_form_pdf_backup_zip_path(obm) + self.assertEqual(p, "submitted/Khảo_sát_thảo_dược_Lê_Thị_A_a.b@ump.edu.vn.pdf") + + def test_no_trang_bia_returns_none(self) -> None: + self.assertIsNone(official_form_pdf_backup_zip_path({})) + self.assertIsNone(official_form_pdf_backup_zip_path({"OTHER": {}})) + + def test_empty_cover_returns_none(self) -> None: + self.assertIsNone( + official_form_pdf_backup_zip_path({"TRANG BÌA": {"Tên sáng kiến (Tiếng Việt)": " "}}) + ) + + +class BackupZipFilenameTests(unittest.TestCase): + def test_email_local_part_and_sub_id(self) -> None: + self.assertEqual( + backup_zip_attachment_filename( + owner_email=" nguyen.van.a@ump.edu.vn ", + owner_full_name="Nguyễn Văn A", + public_application_id="sub-deadbeef", + ), + "nguyen.van.a_sub-deadbeef.zip", + ) + + def test_fallback_name_when_no_email(self) -> None: + fn = backup_zip_attachment_filename( + owner_email=None, + owner_full_name=" Lê Thị B ", + public_application_id="sub-001", + ) + self.assertTrue(fn.endswith("_sub-001.zip")) + self.assertIn("Lê", fn) + + def test_applicant_fallback(self) -> None: + self.assertEqual( + backup_zip_attachment_filename( + owner_email="", + owner_full_name="", + public_application_id="sub-x", + ), + "applicant_sub-x.zip", + ) + + +class EffectiveStorageKindTests(unittest.TestCase): + def test_full_pdf_minio_key(self) -> None: + self.assertEqual( + effective_storage_kind("full_pdf", "initiatives/abcd/2025/01/x-file.pdf", None), + STORAGE_MINIO_EXPORTS, + ) + + def test_full_pdf_filesystem(self) -> None: + self.assertEqual( + effective_storage_kind("full_pdf", "/submitted-initiatives/sub-abc.pdf", None), + STORAGE_FILESYSTEM, + ) + + def test_evidence_attachments(self) -> None: + self.assertEqual( + effective_storage_kind( + EVIDENCE_ROLE_RESEARCH, + "initiatives/abcd/2025/01/x-file.pdf", + None, + ), + STORAGE_MINIO_ATTACHMENTS, + ) + + def test_respects_declared(self) -> None: + self.assertEqual( + effective_storage_kind("full_pdf", "/any", STORAGE_MINIO_EXPORTS), + STORAGE_MINIO_EXPORTS, + ) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_application_drafts_get.py b/be0/tests/test_application_drafts_get.py new file mode 100644 index 0000000..4c5475c --- /dev/null +++ b/be0/tests/test_application_drafts_get.py @@ -0,0 +1,33 @@ +""" +QA: GET draft bundle returns 200 with empty tabs when nothing is stored yet. + +Run: cd be0 && python -m unittest tests.test_application_drafts_get -v +""" + +from __future__ import annotations + +import unittest +from unittest.mock import patch + +_CASE = "CASE-1776577845956" + + +class ApplicationDraftsGetTests(unittest.TestCase): + @patch("src.initiative_db.engine.is_postgres_enabled", return_value=False) + @patch("main._load_application_draft_yaml", return_value=None) + def test_unknown_case_returns_200_empty_shape(self, _mock_yaml, _mock_pg) -> None: + from fastapi.testclient import TestClient + + from main import app + + client = TestClient(app) + r = client.get(f"/api/v1/application-drafts/{_CASE}") + self.assertEqual(r.status_code, 200, r.text) + body = r.json() + self.assertEqual(body.get("caseId"), _CASE) + self.assertEqual(body.get("tabs"), {}) + self.assertIn("updatedAt", body) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_applications_db_integration.py b/be0/tests/test_applications_db_integration.py new file mode 100644 index 0000000..c1655b8 --- /dev/null +++ b/be0/tests/test_applications_db_integration.py @@ -0,0 +1,797 @@ +""" +PostgreSQL integration tests for submitted applications (update / delete). + +These tests exercise `src.initiative_db.submissions` against a real database. +They are skipped unless INITIATIVE_DATABASE_URL points at PostgreSQL (asyncpg). + +Example (host port from repo docker-compose.yml): + + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://initiative:initiative_secret@127.0.0.1:15432/initiatives" + cd be0 && python -m unittest tests.test_applications_db_integration -v + +Prerequisites: + - Schema applied: `001_initiative_schema.sql` and `002_application_storage_extensions.sql` (see docker-compose postgres init mounts), or equivalent. + - Network reachable from the machine running tests. +""" + +from __future__ import annotations + +import os +import unittest +import uuid +from datetime import datetime, timezone + +from sqlalchemy import select + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") + +_HAS_MINIO = all( + os.getenv(k, "").strip() + for k in ( + "S3_ENDPOINT_URL", + "S3_ACCESS_KEY", + "S3_SECRET_KEY", + "S3_BUCKET_ATTACHMENTS", + "S3_BUCKET_EXPORTS", + "S3_BUCKET_QUARANTINE", + ) +) + + +@unittest.skipUnless( + _RUN_DB, + "Set INITIATIVE_DATABASE_URL=postgresql+asyncpg://.../initiatives to run DB integration tests", +) +class ApplicationsDbIntegrationTests(unittest.IsolatedAsyncioTestCase): + """End-to-end persistence for applicant submission update + delete.""" + + async def asyncSetUp(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + await eng.init_engine() + + async def asyncTearDown(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + + async def test_update_then_delete_submission_round_trip(self) -> None: + from src.initiative_db.engine import get_session + from src.initiative_db.models import Draft, Initiative, User + from src.initiative_db.submissions import ( + delete_my_submitted_application, + get_application_by_id, + update_my_submitted_application, + ) + + owner_id = uuid.uuid4() + owner_email = f"dbtest-{owner_id.hex[:8]}@ump.edu.vn" + case_code = f"TESTCASE-{uuid.uuid4().hex[:10]}" + submission_id = f"sub-{uuid.uuid4().hex[:16]}" + + # --- seed owner + submitted initiative + draft payload --- + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="DB Test Applicant", + ) + ) + await session.flush() + ini = Initiative( + case_code=case_code, + owner_id=owner_id, + status="submitted", + submitted_at=datetime(2026, 1, 1, 12, 0, 0, tzinfo=timezone.utc), + ) + session.add(ini) + await session.flush() + payload = { + "caseId": case_code, + "updatedAt": "2026-01-01T12:00:00Z", + "tabs": {}, + "submissionRecord": { + "id": submission_id, + "submittedDate": "2026-01-01T12:00:00.000Z", + "name": "Original title", + "author": { + "id": case_code, + "name": "DB Test Applicant", + "email": owner_email, + }, + "status": "submitted", + "reviewStatus": "not_reviewed", + }, + } + session.add( + Draft( + draft_code=f"DRAFT-{case_code}", + initiative_id=ini.id, + payload=payload, + version=1, + ) + ) + + # --- update (same semantics as PUT /api/applications/{id}) --- + async with get_session() as session: + row = await update_my_submitted_application( + session, + owner_id, + owner_email, + submission_id, + "Renamed via DB test", + "2026-06-15", + ) + + self.assertEqual(row.get("name"), "Renamed via DB test") + self.assertIn("2026-06-15", str(row.get("submittedDate") or "")) + + async with get_session() as session: + loaded = await get_application_by_id(session, submission_id) + self.assertIsNotNone(loaded) + assert loaded is not None + self.assertEqual(loaded.get("name"), "Renamed via DB test") + + # --- delete (same semantics as DELETE /api/applications/{id}) --- + async with get_session() as session: + await delete_my_submitted_application(session, owner_id, owner_email, submission_id) + + async with get_session() as session: + gone = await get_application_by_id(session, submission_id) + ini_row = (await session.execute(select(Initiative).where(Initiative.case_code == case_code))).scalar_one_or_none() + + self.assertIsNone(gone) + self.assertIsNone(ini_row) + + # --- cleanup user (initiative already cascade-deleted) --- + async with get_session() as session: + u = await session.get(User, owner_id) + if u is not None: + await session.delete(u) + + async def test_get_application_by_id_matches_fallback_sub_id_when_record_has_no_id(self) -> None: + """List rows use sub-{initiative.id[:16]} when submissionRecord.id is absent; GET must accept the same id.""" + from src.initiative_db.engine import get_session + from src.initiative_db.models import Draft, Initiative, User + from src.initiative_db.submissions import get_application_by_id + + owner_id = uuid.uuid4() + owner_email = f"dbtest-fallback-{owner_id.hex[:8]}@ump.edu.vn" + case_code = f"TESTCASE-{uuid.uuid4().hex[:10]}" + + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="Fallback Id Test", + ) + ) + await session.flush() + ini = Initiative( + case_code=case_code, + owner_id=owner_id, + status="submitted", + submitted_at=datetime(2026, 3, 1, 12, 0, 0, tzinfo=timezone.utc), + ) + session.add(ini) + await session.flush() + # Must match `_submission_display_id`: first 16 hex digits of UUID, not `str(uuid)[:16]`. + expected_list_id = f"sub-{ini.id.hex[:16]}" + payload = { + "caseId": case_code, + "updatedAt": "2026-03-01T12:00:00Z", + "tabs": {}, + "submissionRecord": { + "submittedDate": "2026-03-01T12:00:00.000Z", + "name": "No explicit submission id", + "author": { + "id": case_code, + "name": "Fallback Id Test", + "email": owner_email, + }, + "status": "submitted", + "reviewStatus": "not_reviewed", + }, + } + session.add( + Draft( + draft_code=f"DRAFT-{case_code}", + initiative_id=ini.id, + payload=payload, + version=1, + ) + ) + + async with get_session() as session: + loaded = await get_application_by_id(session, expected_list_id) + self.assertIsNotNone(loaded) + assert loaded is not None + self.assertEqual(loaded.get("id"), expected_list_id) + self.assertEqual(loaded.get("name"), "No explicit submission id") + + async with get_session() as session: + ini_row = (await session.execute(select(Initiative).where(Initiative.case_code == case_code))).scalar_one() + await session.delete(ini_row) + u = await session.get(User, owner_id) + if u is not None: + await session.delete(u) + + async def test_update_forbidden_for_non_owner_mismatched_email(self) -> None: + from src.initiative_db.engine import get_session + from src.initiative_db.models import Draft, Initiative, User + from src.initiative_db.submissions import update_my_submitted_application + + owner_id = uuid.uuid4() + owner_email = f"owner-{owner_id.hex[:8]}@ump.edu.vn" + intruder_id = uuid.uuid4() + intruder_email = f"other-{intruder_id.hex[:8]}@ump.edu.vn" + case_code = f"TESTCASE-{uuid.uuid4().hex[:10]}" + submission_id = f"sub-{uuid.uuid4().hex[:16]}" + + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="Owner", + ) + ) + session.add( + User( + id=intruder_id, + email=intruder_email, + password_hash="-", + full_name="Intruder", + ) + ) + await session.flush() + ini = Initiative( + case_code=case_code, + owner_id=owner_id, + status="submitted", + submitted_at=datetime(2026, 2, 1, 12, 0, 0, tzinfo=timezone.utc), + ) + session.add(ini) + await session.flush() + session.add( + Draft( + draft_code=f"DRAFT-{case_code}", + initiative_id=ini.id, + payload={ + "caseId": case_code, + "submissionRecord": { + "id": submission_id, + "submittedDate": "2026-02-01T12:00:00.000Z", + "name": "Sealed", + "author": {"id": case_code, "name": "Owner", "email": owner_email}, + }, + }, + version=1, + ) + ) + + async with get_session() as session: + with self.assertRaises(PermissionError): + await update_my_submitted_application( + session, + intruder_id, + intruder_email, + submission_id, + "Should not apply", + "2026-02-02", + ) + + async with get_session() as session: + ini = (await session.execute(select(Initiative).where(Initiative.case_code == case_code))).scalar_one() + d = ( + await session.execute(select(Draft).where(Draft.initiative_id == ini.id).limit(1)) + ).scalar_one() + name = (d.payload or {}).get("submissionRecord", {}).get("name") + + self.assertEqual(name, "Sealed") + + async with get_session() as session: + ini = (await session.execute(select(Initiative).where(Initiative.case_code == case_code))).scalar_one_or_none() + if ini is not None: + await session.delete(ini) + u1 = await session.get(User, owner_id) + u2 = await session.get(User, intruder_id) + if u1 is not None: + await session.delete(u1) + if u2 is not None: + await session.delete(u2) + + async def test_save_submitted_application_rejects_when_readiness_not_met(self) -> None: + """Server readiness runs before official-form MinIO work (no S3_* required).""" + from src.initiative_db.engine import get_session + from src.initiative_db.models import Draft, Initiative, User + from src.initiative_db.submission_readiness import ApplicationSubmissionNotReadyError + from src.initiative_db.submissions import save_submitted_application + + owner_id = uuid.uuid4() + owner_email = f"ready-{owner_id.hex[:8]}@ump.edu.vn" + case_code = f"READYCASE-{uuid.uuid4().hex[:10]}" + + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="Readiness tester", + ) + ) + await session.flush() + ini = Initiative(case_code=case_code, owner_id=owner_id, status="draft") + session.add(ini) + await session.flush() + session.add( + Draft( + draft_code=f"DRAFT-{case_code}", + initiative_id=ini.id, + payload={ + "caseId": case_code, + "tabs": {}, + "updatedAt": datetime.now(timezone.utc) + .replace(microsecond=0) + .isoformat() + .replace("+00:00", "Z"), + }, + ) + ) + await session.flush() + with self.assertRaises(ApplicationSubmissionNotReadyError) as ctx: + await save_submitted_application( + session, + metadata={ + "caseId": case_code, + "initiativeName": "Incomplete", + "authorName": "X", + "authorEmail": owner_email, + }, + file_url="/submitted-initiatives/x.pdf", + owner_user_id=owner_id, + pdf_byte_size=50, + pdf_sha256="ab" * 32, + pdf_original_name="x.pdf", + ) + self.assertTrue(len(ctx.exception.missing) > 0) + + async with get_session() as session: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one_or_none() + if ini is not None: + await session.delete(ini) + u = await session.get(User, owner_id) + if u is not None: + await session.delete(u) + + @unittest.skipUnless( + _HAS_MINIO, + "Needs S3_* env — submit persists official forms via MinIO after snapshots.", + ) + async def test_save_submitted_application_writes_storage_tables(self) -> None: + """Requires migration 002 (submit snapshots, taxonomy, workflow, artifacts).""" + from src.initiative_db.engine import get_session + from src.initiative_db.models import ( + ApplicationArtifact, + ApplicationSubmitSnapshot, + ApplicationTaxonomy, + ApplicationWorkflow, + Draft, + Initiative, + User, + ) + from src.initiative_db.submissions import save_submitted_application + + from tests.fixtures.minimal_submit_bundle import minimal_tabs_bundle + + owner_id = uuid.uuid4() + owner_email = f"snaps-{owner_id.hex[:8]}@ump.edu.vn" + case_code = f"SNAPCASE-{uuid.uuid4().hex[:10]}" + sha = "ab" * 32 + + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="Snap tester", + ) + ) + await session.flush() + ini = Initiative(case_code=case_code, owner_id=owner_id, status="draft") + session.add(ini) + await session.flush() + tabs_payload = minimal_tabs_bundle(initiative_name="Storage ext test") + session.add( + Draft( + draft_code=f"DRAFT-{case_code}", + initiative_id=ini.id, + payload={ + "caseId": case_code, + "tabs": tabs_payload, + "updatedAt": datetime.now(timezone.utc) + .replace(microsecond=0) + .isoformat() + .replace("+00:00", "Z"), + }, + ) + ) + session.add( + ApplicationArtifact( + initiative_id=ini.id, + role="technical_evidence", + storage_uri="initiatives/test/evidence-key.pdf", + mime_type="application/pdf", + byte_size=900, + ) + ) + await session.flush() + await save_submitted_application( + session, + metadata={ + "caseId": case_code, + "initiativeName": "Storage ext test", + "authorName": "Snap", + "authorEmail": owner_email, + "subjectId": "math", + "groupId": "g1", + "topicType": "Hồ sơ PDF", + }, + file_url="/submitted-initiatives/t.pdf", + owner_user_id=owner_id, + pdf_byte_size=123, + pdf_sha256=sha, + pdf_original_name="t.pdf", + ) + + async with get_session() as session: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one() + snaps = ( + await session.execute( + select(ApplicationSubmitSnapshot).where( + ApplicationSubmitSnapshot.initiative_id == ini.id + ) + ) + ).scalars().all() + arts = ( + await session.execute( + select(ApplicationArtifact).where(ApplicationArtifact.initiative_id == ini.id) + ) + ).scalars().all() + tax = ( + await session.execute( + select(ApplicationTaxonomy).where(ApplicationTaxonomy.initiative_id == ini.id) + ) + ).scalar_one() + wf = ( + await session.execute( + select(ApplicationWorkflow).where(ApplicationWorkflow.initiative_id == ini.id) + ) + ).scalar_one() + + self.assertEqual(len(snaps), 1) + self.assertEqual(snaps[0].submission_record_id[:4], "sub-") + full_pdf_rows = [a for a in arts if a.role == "full_pdf"] + self.assertEqual(len(full_pdf_rows), 1) + self.assertEqual(full_pdf_rows[0].byte_size, 123) + self.assertEqual(tax.subject_id, "math") + self.assertEqual(wf.review_status, "not_reviewed") + + async with get_session() as session: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one_or_none() + if ini is not None: + await session.delete(ini) + u = await session.get(User, owner_id) + if u is not None: + await session.delete(u) + + async def test_draft_tab_save_records_tab_snapshot(self) -> None: + """Requires migration 002 (`draft_tab_snapshots`).""" + from src.initiative_db.drafts import save_application_draft_tab + from src.initiative_db.engine import get_session + from src.initiative_db.models import DraftTabSnapshot, Initiative, User + + owner_id = uuid.uuid4() + owner_email = f"tabsnap-{owner_id.hex[:8]}@ump.edu.vn" + case_code = f"TABCASE-{uuid.uuid4().hex[:10]}" + + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="Tab snap", + ) + ) + await session.flush() + await save_application_draft_tab( + session, case_code, "report", {"title": "Chapter 1"}, owner_id=owner_id + ) + await save_application_draft_tab( + session, case_code, "report", {"title": "Chapter 2"}, owner_id=owner_id + ) + + async with get_session() as session: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one() + rows = ( + await session.execute( + select(DraftTabSnapshot) + .where(DraftTabSnapshot.initiative_id == ini.id) + .where(DraftTabSnapshot.tab == "report") + .order_by(DraftTabSnapshot.tab_version) + ) + ).scalars().all() + + self.assertEqual(len(rows), 2) + self.assertEqual(rows[0].tab_version, 1) + self.assertEqual(rows[0].payload.get("title"), "Chapter 1") + self.assertEqual(rows[1].tab_version, 2) + self.assertEqual(rows[1].payload.get("title"), "Chapter 2") + + async with get_session() as session: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one_or_none() + if ini is not None: + await session.delete(ini) + u = await session.get(User, owner_id) + if u is not None: + await session.delete(u) + + async def test_admin_result_upsert_appears_in_decided_list(self) -> None: + """PUT-style upsert updates initiative status; decided lifecycle list includes the row.""" + from src.initiative_db.application_admin_results import upsert_admin_result + from src.initiative_db.engine import get_session + from src.initiative_db.models import Draft, Initiative, User + from src.initiative_db.submissions import list_submitted_applications + + admin_id = uuid.uuid4() + owner_id = uuid.uuid4() + owner_email = f"admin-upsert-{owner_id.hex[:8]}@ump.edu.vn" + admin_email = f"admin-{admin_id.hex[:8]}@ump.edu.vn" + case_code = f"ADM-{uuid.uuid4().hex[:10]}" + submission_id = f"sub-{uuid.uuid4().hex[:16]}" + + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="Owner upsert", + ) + ) + session.add( + User( + id=admin_id, + email=admin_email, + password_hash="-", + full_name="Admin upsert", + ) + ) + await session.flush() + ini = Initiative( + case_code=case_code, + owner_id=owner_id, + status="submitted", + submitted_at=datetime(2026, 4, 1, 12, 0, 0, tzinfo=timezone.utc), + ) + session.add(ini) + await session.flush() + session.add( + Draft( + draft_code=f"DRAFT-{case_code}", + initiative_id=ini.id, + payload={ + "caseId": case_code, + "updatedAt": "2026-04-01T12:00:00Z", + "tabs": {}, + "submissionRecord": { + "id": submission_id, + "submittedDate": "2026-04-01T12:00:00.000Z", + "name": "Upsert list test", + "author": { + "id": case_code, + "name": "Owner upsert", + "email": owner_email, + }, + "status": "submitted", + "reviewStatus": "not_reviewed", + }, + }, + version=1, + ) + ) + + async with get_session() as session: + await upsert_admin_result( + session, + submission_id, + admin_id, + decision="approved", + feedback="ok", + rationale=None, + ) + + async with get_session() as session: + out = await list_submitted_applications( + session=session, + page=1, + page_size=50, + name="", + author_name="", + reviewer_name="", + status="", + review_status="", + date_from="", + date_to="", + sort_by="submittedDate", + sort_order="desc", + lifecycle="decided", + ) + ids = {str(r.get("id")) for r in out.get("data") or []} + self.assertIn(submission_id, ids) + decided_row = next( + (r for r in (out.get("data") or []) if str(r.get("id")) == submission_id), + None, + ) + self.assertIsNotNone(decided_row) + assert decided_row is not None + self.assertEqual(decided_row.get("nhan_xet"), "ok", "Admin feedback must surface as nhan_xet for «Nhận xét» in admin list") + reviewer = decided_row.get("reviewer") or {} + self.assertEqual( + reviewer.get("name"), + "Admin upsert", + "Người đánh giá should show adjudicating admin users.full_name (updated_by)", + ) + self.assertEqual(str(reviewer.get("id")), str(admin_id)) + + async with get_session() as session: + ini_row = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one_or_none() + if ini_row is not None: + await session.delete(ini_row) + for uid in (owner_id, admin_id): + u = await session.get(User, uid) + if u is not None: + await session.delete(u) + + async def test_notification_inbox_after_admin_upsert(self) -> None: + """Requires migration 006 (`user_notifications`). Applicant receives inbox row after admin upsert + best_effort.""" + from src.initiative_db.application_admin_results import upsert_admin_result + from src.initiative_db.engine import get_session + from src.initiative_db.models import Draft, Initiative, User + from src.initiative_db.user_notifications import ( + best_effort_notify_applicant_after_admin_decision, + count_unread_notifications, + list_notifications_for_user, + mark_notification_read, + ) + + admin_id = uuid.uuid4() + owner_id = uuid.uuid4() + owner_email = f"notif-{owner_id.hex[:8]}@ump.edu.vn" + admin_email = f"notif-adm-{admin_id.hex[:8]}@ump.edu.vn" + case_code = f"NTF-{uuid.uuid4().hex[:10]}" + submission_id = f"sub-{uuid.uuid4().hex[:16]}" + + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="Notif owner", + ) + ) + session.add( + User( + id=admin_id, + email=admin_email, + password_hash="-", + full_name="Notif admin", + ) + ) + await session.flush() + ini = Initiative( + case_code=case_code, + owner_id=owner_id, + status="submitted", + submitted_at=datetime(2026, 5, 1, 12, 0, 0, tzinfo=timezone.utc), + ) + session.add(ini) + await session.flush() + session.add( + Draft( + draft_code=f"DRAFT-{case_code}", + initiative_id=ini.id, + payload={ + "caseId": case_code, + "tabs": { + "application": { + "initiativeClassification": "research", + "researchEvidenceKind": "international", + } + }, + "submissionRecord": { + "id": submission_id, + "submittedDate": "2026-05-01T12:00:00.000Z", + "name": "Notif seed", + "author": { + "id": case_code, + "name": "Notif owner", + "email": owner_email, + }, + "status": "submitted", + "reviewStatus": "not_reviewed", + }, + }, + version=1, + ) + ) + + result: dict + async with get_session() as session: + result = await upsert_admin_result( + session, + submission_id, + admin_id, + decision="approved", + feedback="Kết quả thử nghiệm thông báo.", + rationale=None, + ) + + await best_effort_notify_applicant_after_admin_decision(result) + + async with get_session() as session: + unread_before = await count_unread_notifications(session, owner_id) + inbox = await list_notifications_for_user(session, owner_id, page=1, page_size=10) + + self.assertEqual(unread_before, 1) + self.assertEqual(inbox["pagination"]["totalItems"], 1) + row = inbox["data"][0] + self.assertEqual(row["applicationId"], submission_id) + self.assertEqual(row["decision"], "approved") + self.assertEqual(row["meritCategoryLabel"], "Xuất sắc") + self.assertIn("Kết quả thử nghiệm", row["feedback"]) + self.assertIsNone(row["readAt"]) + + nid = uuid.UUID(row["id"]) + async with get_session() as session: + ok = await mark_notification_read(session, owner_id, nid) + self.assertTrue(ok) + unread_after = await count_unread_notifications(session, owner_id) + self.assertEqual(unread_after, 0) + + async with get_session() as session: + ini_row = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_code)) + ).scalar_one_or_none() + if ini_row is not None: + await session.delete(ini_row) + for uid in (owner_id, admin_id): + u = await session.get(User, uid) + if u is not None: + await session.delete(u) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_auth_password_reset_integration.py b/be0/tests/test_auth_password_reset_integration.py new file mode 100644 index 0000000..1895656 --- /dev/null +++ b/be0/tests/test_auth_password_reset_integration.py @@ -0,0 +1,217 @@ +""" +Password reset + credential_version JWT integration. + +Requires Postgres, migrations through 013 (email_verified) and **014** (registration_otp_codes). + + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://initiative:initiative_secret@127.0.0.1:15432/initiatives" + docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/012_password_reset.sql + docker exec -i initiative-postgres psql -U initiative -d initiatives < be0/migrations/013_email_verification.sql + cd be0 && python -m unittest tests.test_auth_password_reset_integration -v +""" + +from __future__ import annotations + +import asyncio +import os +import unittest +import uuid +from unittest.mock import patch + +from sqlalchemy import delete + +from tests.auth_register_staff_fixture import register_staff_fields + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") + +_TEST_PASSWORD = "Testpass1!" +_NEW_PASSWORD = "Newpass1!" + + +@unittest.skipUnless( + _RUN_DB, + "Set INITIATIVE_DATABASE_URL=postgresql+asyncpg://.../initiatives to run DB integration tests", +) +class AuthPasswordResetIntegrationTests(unittest.TestCase): + def _delete_user_email(self, email: str) -> None: + """Cleanup after TestClient — use a fresh engine (TestClient tears down the app engine).""" + + async def go() -> None: + from src.initiative_db.engine import dispose_engine, get_session, init_engine, is_postgres_enabled + + if not is_postgres_enabled(): + return + await init_engine() + try: + from sqlalchemy import delete + + from src.initiative_db.models import User + + async with get_session() as session: + await session.execute(delete(User).where(User.email == email)) + finally: + await dispose_engine() + + asyncio.run(go()) + + def _register(self, email: str) -> object: + from fastapi.testclient import TestClient + + from main import app + + captured: list[str] = [] + + async def grab(_to: str, raw: str) -> None: + captured.append(raw) + + with patch.dict(os.environ, {"AUTH_MAIL_LOG_ONLY": "1"}): + with patch("src.auth_api.deliver_registration_otp_email", side_effect=grab): + with TestClient(app) as client: + r = client.post( + "/api/v1/auth/register", + json={ + "fullName": "Reset Test", + "email": email, + "password": _TEST_PASSWORD, + "passwordConfirm": _TEST_PASSWORD, + **register_staff_fields(), + }, + ) + if r.status_code == 200 and captured: + client.post( + "/api/v1/auth/verify-otp", + json={"email": email, "otp": captured[0]}, + ) + return r + + def test_forgot_password_unknown_email_same_message(self) -> None: + from fastapi.testclient import TestClient + + from main import app + + with patch.dict(os.environ, {"AUTH_MAIL_LOG_ONLY": "1"}): + with TestClient(app) as client: + r = client.post( + "/api/v1/auth/forgot-password", + json={"email": f"nope-{uuid.uuid4().hex[:8]}@ump.edu.vn"}, + ) + self.assertEqual(r.status_code, 200, r.text) + msg = r.json().get("message", "") + self.assertIn("Nếu email", msg) + + def test_forgot_password_invalid_domain_400(self) -> None: + from fastapi.testclient import TestClient + + from main import app + + with TestClient(app) as client: + r = client.post( + "/api/v1/auth/forgot-password", + json={"email": "x@gmail.com"}, + ) + self.assertEqual(r.status_code, 400, r.text) + + def test_reset_flow_login_and_stale_jwt(self) -> None: + email = f"reset-{uuid.uuid4().hex[:12]}@ump.edu.vn" + try: + reg = self._register(email) + self.assertEqual(reg.status_code, 200, reg.text) + from fastapi.testclient import TestClient + + from main import app + + with TestClient(app) as client_pre: + lg0 = client_pre.post( + "/api/v1/auth/login", + json={"email": email, "password": _TEST_PASSWORD}, + ) + self.assertEqual(lg0.status_code, 200, lg0.text) + access_before = lg0.json()["accessToken"] + + captured: list[str] = [] + + async def grab(_to: str, raw: str) -> None: + captured.append(raw) + + with patch.dict(os.environ, {"AUTH_MAIL_LOG_ONLY": "1"}): + with patch("src.auth_api.deliver_password_reset_email", side_effect=grab): + with TestClient(app) as client: + fr = client.post( + "/api/v1/auth/forgot-password", + json={"email": email}, + ) + self.assertEqual(fr.status_code, 200, fr.text) + + rr = client.post( + "/api/v1/auth/reset-password", + json={ + "token": captured[0], + "newPassword": _NEW_PASSWORD, + "newPasswordConfirm": _NEW_PASSWORD, + }, + ) + self.assertEqual(rr.status_code, 200, rr.text) + + bad = client.post( + "/api/v1/auth/login", + json={"email": email, "password": _TEST_PASSWORD}, + ) + self.assertEqual(bad.status_code, 401, bad.text) + + ok = client.post( + "/api/v1/auth/login", + json={"email": email, "password": _NEW_PASSWORD}, + ) + self.assertEqual(ok.status_code, 200, ok.text) + + me_old = client.get( + "/api/v1/auth/me", + headers={"Authorization": f"Bearer {access_before}"}, + ) + self.assertEqual(me_old.status_code, 401, me_old.text) + finally: + self._delete_user_email(email) + + def test_reset_token_single_use(self) -> None: + email = f"reset2-{uuid.uuid4().hex[:12]}@ump.edu.vn" + try: + reg = self._register(email) + self.assertEqual(reg.status_code, 200, reg.text) + + captured: list[str] = [] + + async def grab(_to: str, raw: str) -> None: + captured.append(raw) + + with patch.dict(os.environ, {"AUTH_MAIL_LOG_ONLY": "1"}): + with patch("src.auth_api.deliver_password_reset_email", side_effect=grab): + from fastapi.testclient import TestClient + + from main import app + + with TestClient(app) as client: + client.post("/api/v1/auth/forgot-password", json={"email": email}) + tok = captured[0] + r1 = client.post( + "/api/v1/auth/reset-password", + json={ + "token": tok, + "newPassword": _NEW_PASSWORD, + "newPasswordConfirm": _NEW_PASSWORD, + }, + ) + self.assertEqual(r1.status_code, 200, r1.text) + r2 = client.post( + "/api/v1/auth/reset-password", + json={ + "token": tok, + "newPassword": _NEW_PASSWORD, + "newPasswordConfirm": _NEW_PASSWORD, + }, + ) + self.assertEqual(r2.status_code, 400, r2.text) + finally: + self._delete_user_email(email) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_auth_policy_integration.py b/be0/tests/test_auth_policy_integration.py new file mode 100644 index 0000000..90b37b0 --- /dev/null +++ b/be0/tests/test_auth_policy_integration.py @@ -0,0 +1,126 @@ +""" +Auth policy integration: server-derived admin vs viewer, ignored client role, UMC domain. + +Requires Postgres + migration 007 (admin_from_email_policy column) and 013 (email_verified). + + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://initiative:initiative_secret@127.0.0.1:15432/initiatives" + cd be0 && python -m unittest tests.test_auth_policy_integration -v +""" + +from __future__ import annotations + +import os +import unittest +import uuid +from unittest.mock import patch + +from sqlalchemy import select + +from tests.auth_register_staff_fixture import register_staff_fields + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") + +_TEST_PASSWORD = "Testpass1!" + + +@unittest.skipUnless( + _RUN_DB, + "Set INITIATIVE_DATABASE_URL=postgresql+asyncpg://.../initiatives to run DB integration tests", +) +class AuthPolicyIntegrationTests(unittest.IsolatedAsyncioTestCase): + async def asyncSetUp(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + await eng.init_engine() + + async def asyncTearDown(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + + async def _delete_user_email(self, email: str) -> None: + from src.initiative_db.engine import get_session + from src.initiative_db.models import User + + async with get_session() as session: + user = ( + await session.execute(select(User).where(User.email == email)) + ).scalar_one_or_none() + if user is not None: + await session.delete(user) + + def _register(self, email: str, full_name: str = "Policy Test", extra: dict | None = None): + from fastapi.testclient import TestClient + + from main import app + + body = { + "fullName": full_name, + "email": email, + "password": _TEST_PASSWORD, + "passwordConfirm": _TEST_PASSWORD, + **register_staff_fields(), + } + if extra: + body.update(extra) + with TestClient(app) as client: + return client.post("/api/v1/auth/register", json=body) + + async def test_register_default_viewer_ump(self) -> None: + email = f"applicant-{uuid.uuid4().hex[:12]}@ump.edu.vn" + try: + r = self._register(email) + self.assertEqual(r.status_code, 200, r.text) + data = r.json() + self.assertTrue(data.get("emailVerificationRequired")) + self.assertNotIn("accessToken", data) + roles = data["user"]["roles"] + self.assertIn("viewer", roles) + self.assertNotIn("admin", roles) + finally: + await self._delete_user_email(email) + + async def test_register_default_viewer_umc(self) -> None: + email = f"applicant-{uuid.uuid4().hex[:12]}@umc.edu.vn" + try: + r = self._register(email) + self.assertEqual(r.status_code, 200, r.text) + self.assertTrue(r.json().get("emailVerificationRequired")) + roles = r.json()["user"]["roles"] + self.assertIn("viewer", roles) + self.assertNotIn("admin", roles) + finally: + await self._delete_user_email(email) + + async def test_register_ignores_client_admin_role(self) -> None: + email = f"privesc-{uuid.uuid4().hex[:12]}@ump.edu.vn" + try: + r = self._register( + email, + extra={"role": "admin"}, + ) + self.assertEqual(r.status_code, 200, r.text) + self.assertTrue(r.json().get("emailVerificationRequired")) + roles = r.json()["user"]["roles"] + self.assertIn("viewer", roles) + self.assertNotIn("admin", roles) + finally: + await self._delete_user_email(email) + + async def test_policy_env_makes_admin(self) -> None: + email = f"stub-admin-{uuid.uuid4().hex[:12]}@ump.edu.vn" + try: + with patch.dict(os.environ, {"AUTH_ADMIN_EMAILS": email}): + r = self._register(email) + self.assertEqual(r.status_code, 200, r.text) + self.assertTrue(r.json().get("emailVerificationRequired")) + roles = r.json()["user"]["roles"] + self.assertIn("admin", roles) + self.assertNotIn("viewer", roles) + finally: + await self._delete_user_email(email) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_authenticate_user.py b/be0/tests/test_authenticate_user.py new file mode 100644 index 0000000..4845813 --- /dev/null +++ b/be0/tests/test_authenticate_user.py @@ -0,0 +1,146 @@ +"""Unit tests for the AuthenticateUser use case using fakes (no DB, no FastAPI). + +Async use case is driven via ``asyncio.run`` so no pytest-asyncio plugin is needed. +""" + +from __future__ import annotations + +import asyncio +import uuid + +import pytest + +from src.application.identity.dto import LoginCommand +from src.application.identity.use_cases.authenticate_user import AuthenticateUser +from src.domain.identity.entities import User +from src.domain.identity.errors import ( + EmailNotVerified, + InvalidCredentials, + InvalidInstitutionalEmail, +) +from src.shared_kernel.errors import RateLimited + + +class FakeUsers: + def __init__(self, user: User | None = None, roles: list[str] | None = None) -> None: + self._user = user + self._roles = roles or [] + self.reconciled = False + + async def get_by_email(self, email: str) -> User | None: + return self._user if (self._user and self._user.email == email) else None + + async def get_by_id(self, user_id): # pragma: no cover - unused here + return self._user + + async def roles_after_reconcile(self, user: User) -> list[str]: + self.reconciled = True + return self._roles + + +class FakeHasher: + def __init__(self, ok: bool = True) -> None: + self.ok = ok + + def hash(self, plain: str) -> str: + return "h:" + plain + + def verify(self, plain: str, hashed: str) -> bool: + return self.ok + + +class FakeTokens: + def issue(self, user_id, email, roles, credential_version) -> str: + return f"tok:{user_id}:{credential_version}:{','.join(roles)}" + + +class FakeRateLimiter: + def __init__(self, allow: bool = True) -> None: + self._allow = allow + + def allow(self, email: str, client_ip: str) -> bool: + return self._allow + + +class FakeAudit: + def __init__(self) -> None: + self.events: list[tuple] = [] + + async def login_succeeded(self, *, user_id, email, roles) -> None: + self.events.append(("ok", email, tuple(roles))) + + async def login_failed(self, *, email, user_id, reason) -> None: + self.events.append(("fail", email, reason)) + + +def _user(**kw) -> User: + base = dict( + id=uuid.uuid4(), + email="a@ump.edu.vn", + full_name="Test", + password_hash="x", + email_verified=True, + is_active=True, + credential_version=2, + ) + base.update(kw) + return User(**base) + + +def _build(users: FakeUsers, hasher=None, rate_limiter=None, audit=None) -> tuple: + audit = audit or FakeAudit() + uc = AuthenticateUser( + users=users, + hasher=hasher or FakeHasher(ok=True), + tokens=FakeTokens(), + rate_limiter=rate_limiter or FakeRateLimiter(allow=True), + audit=audit, + ) + return uc, audit + + +def test_login_success_returns_token_and_reconciles_roles() -> None: + users = FakeUsers(user=_user(), roles=["admin", "viewer"]) + uc, audit = _build(users) + result = asyncio.run(uc.execute(LoginCommand("A@ump.edu.vn", "pw", "1.2.3.4"))) + assert result.access_token.endswith(":2:admin,viewer") + assert result.roles == ["admin", "viewer"] + assert users.reconciled is True + assert audit.events == [("ok", "a@ump.edu.vn", ("admin", "viewer"))] + + +def test_wrong_password_raises_401_and_audits_failure() -> None: + users = FakeUsers(user=_user()) + uc, audit = _build(users, hasher=FakeHasher(ok=False)) + with pytest.raises(InvalidCredentials) as exc: + asyncio.run(uc.execute(LoginCommand("a@ump.edu.vn", "bad", "ip"))) + assert exc.value.message == "Email hoặc mật khẩu không đúng." + assert audit.events == [("fail", "a@ump.edu.vn", None)] + + +def test_unknown_email_raises_401() -> None: + uc, _ = _build(FakeUsers(user=None)) + with pytest.raises(InvalidCredentials): + asyncio.run(uc.execute(LoginCommand("nobody@ump.edu.vn", "pw", "ip"))) + + +def test_unverified_email_raises_403_with_reason() -> None: + users = FakeUsers(user=_user(email_verified=False)) + uc, audit = _build(users) + with pytest.raises(EmailNotVerified): + asyncio.run(uc.execute(LoginCommand("a@ump.edu.vn", "pw", "ip"))) + assert audit.events == [("fail", "a@ump.edu.vn", "email_unverified")] + + +def test_rate_limited_raises_429_before_db() -> None: + users = FakeUsers(user=_user()) + uc, _ = _build(users, rate_limiter=FakeRateLimiter(allow=False)) + with pytest.raises(RateLimited): + asyncio.run(uc.execute(LoginCommand("a@ump.edu.vn", "pw", "ip"))) + + +def test_non_institutional_email_rejected_before_lookup() -> None: + users = FakeUsers(user=_user()) + uc, _ = _build(users) + with pytest.raises(InvalidInstitutionalEmail): + asyncio.run(uc.execute(LoginCommand("a@gmail.com", "pw", "ip"))) diff --git a/be0/tests/test_backup_e2e.py b/be0/tests/test_backup_e2e.py new file mode 100644 index 0000000..0e65d8c --- /dev/null +++ b/be0/tests/test_backup_e2e.py @@ -0,0 +1,257 @@ +""" +Full-stack backup E2E: HTTP submit (PDF → Postgres + MinIO) then admin ZIP download. + +Requires PostgreSQL with migrations through **009** and reachable **MinIO (S3 API)**. + +Run (host → docker-compose ports): + + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://initiative:initiative_secret@127.0.0.1:15432/initiatives" + export S3_ENDPOINT_URL="http://127.0.0.1:19000" + export S3_PUBLIC_ENDPOINT_URL="http://127.0.0.1:19000" + export S3_ACCESS_KEY="minio_user" + export S3_SECRET_KEY="minio_password" + export S3_BUCKET_ATTACHMENTS="initiative-attachments" + export S3_BUCKET_EXPORTS="initiative-exports" + export S3_BUCKET_QUARANTINE="initiative-quarantine" + export E2E_BACKUP=1 + cd be0 && python -m unittest tests.test_backup_e2e -v + +Browser E2E: see ``fe0/e2e/backup-admin-download.spec.ts`` (requires the same stack plus ``fe0`` + ``E2E_ADMIN_EMAIL`` on ``AUTH_ADMIN_EMAILS``). +""" + +from __future__ import annotations + +import io +import json +import os +import unittest +import uuid +import zipfile +from unittest.mock import patch + +from sqlalchemy import select + +from tests.auth_register_staff_fixture import register_staff_fields + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") +_S3_KEYS = ( + "S3_ENDPOINT_URL", + "S3_ACCESS_KEY", + "S3_SECRET_KEY", + "S3_BUCKET_ATTACHMENTS", + "S3_BUCKET_EXPORTS", + "S3_BUCKET_QUARANTINE", +) +_HAS_S3 = all(os.getenv(k, "").strip() for k in _S3_KEYS) +_RUN_BACKUP = os.getenv("E2E_BACKUP", "").strip().lower() in ("1", "true", "yes") + +_TEST_PASSWORD = "Testpass1!" + +_MIN_PDF = b"%PDF-1.4\n%\xe2\xe3\xcf\xd3\n1 0 obj<<>>endobj\ntrailer<<>>\n%%EOF\n" + b"0" * 120 + + +@unittest.skipUnless( + _RUN_DB and _HAS_S3 and _RUN_BACKUP, + "Need INITIATIVE_DATABASE_URL, full S3_* env, and E2E_BACKUP=1 (see module docstring).", +) +class BackupFullStackApiE2ETests(unittest.IsolatedAsyncioTestCase): + async def asyncSetUp(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + await eng.init_engine() + + async def asyncTearDown(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + + async def _delete_users_by_email(self, emails: list[str]) -> None: + from src.initiative_db.engine import get_session + from src.initiative_db.models import Initiative, User + + async with get_session() as session: + for email in emails: + user = ( + await session.execute(select(User).where(User.email == email)) + ).scalar_one_or_none() + if user is None: + continue + inis = ( + await session.execute(select(Initiative).where(Initiative.owner_id == user.id)) + ).scalars().all() + for ini in inis: + await session.delete(ini) + await session.flush() + await session.delete(user) + await session.commit() + + async def test_submit_creates_minio_full_pdf_then_backup_zip(self) -> None: + from fastapi.testclient import TestClient + + from main import app + from src.initiative_db.engine import get_session + from src.initiative_db.models import ApplicationArtifact, Initiative + + applicant_email = f"e2e-backup-app-{uuid.uuid4().hex[:10]}@ump.edu.vn" + admin_email = f"e2e-backup-adm-{uuid.uuid4().hex[:10]}@ump.edu.vn" + + try: + with patch.dict(os.environ, {"AUTH_MAIL_LOG_ONLY": "1"}): + with TestClient(app) as client: + cap_app: list[str] = [] + + async def grab_app(_t: str, raw: str) -> None: + cap_app.append(raw) + + with patch("src.auth_api.deliver_registration_otp_email", side_effect=grab_app): + r = client.post( + "/api/v1/auth/register", + json={ + "fullName": "E2E Applicant", + "email": applicant_email, + "password": _TEST_PASSWORD, + "passwordConfirm": _TEST_PASSWORD, + **register_staff_fields(), + }, + ) + self.assertEqual(r.status_code, 200, r.text) + self.assertTrue(cap_app) + r = client.post( + "/api/v1/auth/verify-otp", + json={"email": applicant_email, "otp": cap_app[0]}, + ) + self.assertEqual(r.status_code, 200, r.text) + r = client.post( + "/api/v1/auth/login", + json={"email": applicant_email, "password": _TEST_PASSWORD}, + ) + self.assertEqual(r.status_code, 200, r.text) + applicant_token = r.json()["accessToken"] + + r = client.post( + "/api/applications/new", + headers={"Authorization": f"Bearer {applicant_token}"}, + json={"name": "E2E backup row"}, + ) + self.assertEqual(r.status_code, 200, r.text) + shell = r.json().get("application") or {} + case_id = str(shell.get("draft_case_id") or "").strip() + application_id = str(r.json().get("id") or shell.get("id") or "").strip() + self.assertTrue(case_id, shell) + self.assertTrue(application_id, shell) + + from tests.fixtures.minimal_submit_bundle import minimal_tabs_bundle + + bundle = minimal_tabs_bundle(initiative_name="E2E Backup Initiative") + for tab_name in ("report", "application", "contribution"): + r = client.post( + "/api/v1/application-drafts", + headers={"Authorization": f"Bearer {applicant_token}"}, + json={"caseId": case_id, "tab": tab_name, "data": bundle[tab_name]}, + ) + self.assertEqual(r.status_code, 200, r.text) + + r = client.post( + f"/api/v1/application-drafts/{case_id}/evidence", + headers={"Authorization": f"Bearer {applicant_token}"}, + data={"kind": "technical"}, + files={"file": ("mc.pdf", io.BytesIO(_MIN_PDF), "application/pdf")}, + ) + self.assertEqual(r.status_code, 200, r.text) + + meta = { + "initiativeCaseId": case_id, + "initiativeName": "E2E Backup Initiative", + "authorName": "Applicant", + "authorEmail": applicant_email, + "subjectId": "s1", + "groupId": "g1", + "topicType": "Hồ sơ PDF", + } + r = client.post( + "/api/applications/submit", + headers={"Authorization": f"Bearer {applicant_token}"}, + files={"file": ("e2e.pdf", io.BytesIO(_MIN_PDF), "application/pdf")}, + data={"metadata": json.dumps(meta)}, + ) + self.assertEqual(r.status_code, 200, r.text, r.text) + application_id = str((r.json() or {}).get("id") or application_id) + + with patch.dict(os.environ, {"AUTH_ADMIN_EMAILS": admin_email}): + cap_adm: list[str] = [] + + async def grab_adm(_t: str, raw: str) -> None: + cap_adm.append(raw) + + with patch( + "src.auth_api.deliver_registration_otp_email", + side_effect=grab_adm, + ): + r = client.post( + "/api/v1/auth/register", + json={ + "fullName": "E2E Admin", + "email": admin_email, + "password": _TEST_PASSWORD, + "passwordConfirm": _TEST_PASSWORD, + **register_staff_fields(), + }, + ) + self.assertEqual(r.status_code, 200, r.text) + self.assertTrue(cap_adm) + self.assertIn("admin", r.json()["user"]["roles"]) + r = client.post( + "/api/v1/auth/verify-otp", + json={"email": admin_email, "otp": cap_adm[0]}, + ) + self.assertEqual(r.status_code, 200, r.text) + r = client.post( + "/api/v1/auth/login", + json={"email": admin_email, "password": _TEST_PASSWORD}, + ) + self.assertEqual(r.status_code, 200, r.text) + admin_token = r.json()["accessToken"] + + r = client.get( + f"/api/applications/{application_id}/backup", + headers={"Authorization": f"Bearer {admin_token}"}, + ) + self.assertEqual(r.status_code, 200, r.text) + self.assertEqual(r.headers.get("content-type", "").split(";")[0], "application/zip") + + buf = io.BytesIO(r.content) + with zipfile.ZipFile(buf, "r") as zf: + names = zf.namelist() + self.assertIn("manifest.json", names) + self.assertIn("submitted/full-package.pdf", names) + manifest = json.loads(zf.read("manifest.json").decode("utf-8")) + self.assertEqual(str(manifest.get("applicationId")), application_id) + self.assertIn("initiative_id", manifest) + packed = {str(x.get("zip_path")): x for x in manifest.get("files") or []} + self.assertIn("submitted/full-package.pdf", packed) + self.assertFalse(packed["submitted/full-package.pdf"].get("skipped")) + + async with get_session() as session: + ini = ( + await session.execute(select(Initiative).where(Initiative.case_code == case_id)) + ).scalar_one() + row = ( + await session.execute( + select(ApplicationArtifact).where( + ApplicationArtifact.initiative_id == ini.id, + ApplicationArtifact.role == "full_pdf", + ) + ) + ).scalar_one_or_none() + self.assertIsNotNone(row) + assert row is not None + uri = (row.storage_uri or "").strip() + self.assertTrue(uri.startswith("initiatives/"), uri) + self.assertEqual(row.storage_kind, "minio_exports") + finally: + await self._delete_users_by_email([applicant_email, admin_email]) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_dashboard_lookup_routes.py b/be0/tests/test_dashboard_lookup_routes.py new file mode 100644 index 0000000..5cc6819 --- /dev/null +++ b/be0/tests/test_dashboard_lookup_routes.py @@ -0,0 +1,31 @@ +""" +GET /api/conferences and /api/supervisors — dashboard filter lookups. + +Run: cd be0 && python -m unittest tests.test_dashboard_lookup_routes -v +""" + +from __future__ import annotations + +import unittest +from unittest.mock import patch + + +class DashboardLookupRoutesTests(unittest.TestCase): + def test_no_db_returns_empty_lists(self) -> None: + from fastapi.testclient import TestClient + + from main import app + from src.initiative_db import engine as eng + + with patch.object(eng, "is_postgres_enabled", return_value=False): + client = TestClient(app) + r1 = client.get("/api/conferences") + r2 = client.get("/api/supervisors") + self.assertEqual(r1.status_code, 200, r1.text) + self.assertEqual(r2.status_code, 200, r2.text) + self.assertEqual(r1.json(), []) + self.assertEqual(r2.json(), []) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_docx_normalize.py b/be0/tests/test_docx_normalize.py new file mode 100644 index 0000000..dc7791c --- /dev/null +++ b/be0/tests/test_docx_normalize.py @@ -0,0 +1,678 @@ +"""Tests for OOXML normalization used after docxtpl render.""" + +from __future__ import annotations + +import io +import re +import unittest +import zipfile + +from src.be01.docx_normalize import ( + collapse_empty_page_break_paragraphs_in_docx, + force_times_new_roman_in_styles_docx, + move_signature_date_to_top_row, + normalize_bo_y_te_header_lines, + relax_justified_softbreak_paragraphs_in_docx, + shift_selected_header_lines_left, + strip_mau_04_evaluation_section_in_docx, + strip_table_row_height_rules_from_docx, +) + + +def _wrap_doc_in_zip(doc_xml: bytes) -> bytes: + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + return buf.getvalue() + + +def _read_document_xml(docx_bytes: bytes) -> str: + with zipfile.ZipFile(io.BytesIO(docx_bytes)) as z: + return z.read("word/document.xml").decode("utf-8") + + +class DocxNormalizeTests(unittest.TestCase): + def test_strip_tr_height_removes_self_closing(self) -> None: + xml = ( + b'' + b"" + b'' + b"a" + b"" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = strip_table_row_height_rules_from_docx(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + self.assertNotIn("trHeight", doc) + self.assertNotIn("720", doc) + + def test_normalize_bo_y_te_strips_ministry_bold_centers(self) -> None: + doc_xml = """ + + + +BỘ Y TẾ + + +ĐẠI HỘC Y DƯỢCTHÀNH PHỐ HỒ CHÍ MINH + +""".encode( + "utf-8" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + phase1 = shift_selected_header_lines_left(buf.getvalue()) + out = normalize_bo_y_te_header_lines(phase1) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + ministry = re.search(r"<[^>]*:p\b[^>]*>.*?BỘ Y TẾ.*?]*:p>", doc, re.DOTALL | re.IGNORECASE) + self.assertIsNotNone(ministry) + assert ministry is not None + block = ministry.group(0) + self.assertNotIn("ns0:b", block.split("BỘ Y TẾ")[0]) + self.assertIn('val="center"', block) + uni = re.search(r"<[^>]*:p\b[^>]*>.*?ĐẠI HỘC Y DƯỢC.*?]*:p>", doc, re.DOTALL | re.IGNORECASE) + self.assertIsNotNone(uni) + assert uni is not None + self.assertIn("ns0:b", uni.group(0)) + self.assertIn("Times New Roman", uni.group(0)) + + def test_university_letterhead_two_paragraphs_bold_centered(self) -> None: + """Cover may use two paragraphs instead of one line break.""" + doc_xml = """ + + + +ĐẠI HỘC Y DƯỢC + + +THÀNH PHỐ HỒ CHÍ MINH + +""".encode( + "utf-8" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = normalize_bo_y_te_header_lines(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + for label, needle in ( + ("dhyd", "ĐẠI HỘC Y DƯỢC"), + ("tphcm", "THÀNH PHỐ HỒ CHÍ MINH"), + ): + blk = re.search( + rf"<[^>]*:p\b[^>]*>.*?{re.escape(needle)}.*?]*:p>", + doc, + re.DOTALL | re.IGNORECASE, + ) + self.assertIsNotNone(blk, msg=label) + assert blk is not None + b = blk.group(0) + self.assertIn("ns0:b", b, msg=label) + self.assertIn('val="center"', b, msg=label) + + def test_university_letterhead_one_paragraph_gets_soft_break_inserted(self) -> None: + """When both letterhead phrases share one paragraph on a single visual line, a + soft is inserted before the city line so the cover renders on two lines. + + Also asserts the runs end up bold + upright (no italic) + Times New Roman.""" + doc_xml = """ + + + +ĐẠI HỘC Y DƯỢC THÀNH PHỐ HỒ CHÍ MINH + +""".encode( + "utf-8" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = normalize_bo_y_te_header_lines(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + # A soft break should now sit between the two phrases. + self.assertRegex( + doc, + r"ĐẠI HỘC Y DƯỢC.*?<[^>]*:br[^>]*/?>.*?THÀNH PHỐ HỒ CHÍ MINH", + ) + # Paragraph is centered, runs are bold + not italic + Times New Roman. + self.assertIn('val="center"', doc) + self.assertIn("ns0:b", doc) + self.assertIn('ns0:i ns0:val="0"', doc) + self.assertIn("Times New Roman", doc) + + def test_university_letterhead_soft_break_idempotent(self) -> None: + """Running normalize twice should not stack additional elements between + the letterhead phrases.""" + doc_xml = """ + + + +ĐẠI HỘC Y DƯỢC THÀNH PHỐ HỒ CHÍ MINH + +""".encode( + "utf-8" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + once = normalize_bo_y_te_header_lines(buf.getvalue()) + twice = normalize_bo_y_te_header_lines(once) + with zipfile.ZipFile(io.BytesIO(twice)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + br_count = len(re.findall(r"<[^>]*:br\b[^>]*/?>", doc)) + self.assertEqual(br_count, 1, msg=f"expected exactly one , got {br_count}: {doc!r}") + + def test_first_page_scope_second_bo_te_unchanged(self) -> None: + """Only the cover « BỘ Y TẾ » is stripped of bold; a later duplicate keeps bold.""" + doc_xml = """ + + +BỘ Y TẾ +ĐẠI HỘC Y DƯỢCTHÀNH PHỐ HỒ CHÍ MINH + +BỘ Y TẾ +""".encode( + "utf-8" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = normalize_bo_y_te_header_lines(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + paras = re.findall(r"<[^>]*:p\b[^>]*>.*?]*:p>", doc, re.DOTALL | re.IGNORECASE) + self.assertEqual(len(paras), 4, msg="expected 4 paragraphs") + first_bo = paras[0] + late_bo = paras[3] + self.assertNotIn("ns0:b", first_bo.split("BỘ Y TẾ")[0]) + self.assertIn("ns0:b", late_bo) + + def test_move_signature_date_creates_full_width_top_row(self) -> None: + """The date paragraph is lifted into a single-cell top row spanning every column.""" + doc_xml = """ + + + + + +LÃNH ĐẠO ĐƠN VỊ(Ký, ghi rõ họ tên) +Tp. Hồ Chí Minh, ngày 11 tháng 5 năm 2026Tác giả sáng kiến + + +""".encode( + "utf-8" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = move_signature_date_to_top_row(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + + rows = re.findall(r"<[^>]*:tr\b[^>]*>.*?]*:tr>", doc, re.DOTALL) + self.assertEqual(len(rows), 2, msg=f"expected 2 rows after lift, got: {doc!r}") + + first_row, second_row = rows + # Top row: single cell, gridSpan=2, contains the date, right-aligned. + self.assertEqual(first_row.count("") + first_row.count("]*:gridSpan\s+[^>]*:val="2"') + self.assertIn("Tp. Hồ Chí Minh, ngày 11 tháng 5 năm 2026", first_row) + self.assertRegex(first_row, r'<[^>]*:jc\s+[^>]*:val="right"') + + # Second row: original 2 cells. Right cell starts with "Tác giả sáng kiến" + # (no date paragraph anymore), so it aligns with "LÃNH ĐẠO ĐƠN VỊ". + self.assertNotIn("Tp. Hồ Chí Minh, ngày", second_row) + self.assertIn("LÃNH ĐẠO ĐƠN VỊ", second_row) + self.assertIn("Tác giả sáng kiến", second_row) + + def test_move_signature_date_is_idempotent(self) -> None: + """A second pass over an already-lifted table is a no-op (still exactly 2 rows).""" + doc_xml = """ + + + + + +LÃNH ĐẠO ĐƠN VỊ +Tp. Hồ Chí Minh, ngày 11 tháng 5 năm 2026Tác giả sáng kiến + + +""".encode( + "utf-8" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + once = move_signature_date_to_top_row(buf.getvalue()) + twice = move_signature_date_to_top_row(once) + with zipfile.ZipFile(io.BytesIO(twice)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + rows = re.findall(r"<[^>]*:tr\b[^>]*>.*?]*:tr>", doc, re.DOTALL) + self.assertEqual(len(rows), 2, msg=f"expected 2 rows after second pass, got: {doc!r}") + date_hits = doc.count("Tp. Hồ Chí Minh, ngày") + self.assertEqual(date_hits, 1, msg=f"date should appear exactly once, got {date_hits}") + + def test_move_signature_date_skips_table_without_date(self) -> None: + """Tables that do not contain the date prefix are left untouched.""" + doc_xml = """ + + + + +cell Acell B + +""".encode( + "utf-8" + ) + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = move_signature_date_to_top_row(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + doc = z2.read("word/document.xml").decode("utf-8") + rows = re.findall(r"<[^>]*:tr\b[^>]*>.*?]*:tr>", doc, re.DOTALL) + self.assertEqual(len(rows), 1, msg="non-signature tables must be left untouched") + + def test_relax_justified_splits_paragraph_at_soft_break_in_run(self) -> None: + """Justified paragraph with a soft mid-run is split into two paragraphs. + Both fragments keep so the layout stays justified, and the + line that used to be followed by the soft break (« first chunk ») becomes the + last line of its own paragraph -> stops being stretched.""" + doc_xml = """ + + + +first chunksecond chunk + +""".encode( + "utf-8" + ) + out = relax_justified_softbreak_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + self.assertNotRegex( + doc, r"<[^>]*:br\b(?![^>]*:type=\"page\")[^>]*/?>", + msg="soft should be consumed by the split", + ) + paras = re.findall(r"<[^>]*:p\b[^>]*>.*?]*:p>", doc, re.DOTALL) + self.assertEqual(len(paras), 2, msg=f"expected 2 paragraphs after split: {doc!r}") + for p in paras: + self.assertRegex(p, r'<[^>]*:jc\s+[^>]*:val="both"') + self.assertIn("Arial", p) # run properties preserved on both fragments + self.assertIn("first chunk", paras[0]) + self.assertIn("second chunk", paras[1]) + self.assertNotIn("second chunk", paras[0]) + + def test_relax_justified_distribute_becomes_both(self) -> None: + """`distribute` stretches every line including the last; rewrite it to `both`.""" + doc_xml = """ + + +solo line +""".encode( + "utf-8" + ) + out = relax_justified_softbreak_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + self.assertNotIn('val="distribute"', doc) + self.assertRegex(doc, r'<[^>]*:jc\s+[^>]*:val="both"') + + def test_relax_justified_rewrites_distribute_in_styles_xml(self) -> None: + """Paragraph styles may use ``distribute``; rewrite so body text is justified like Word ``both``.""" + doc_xml = """ + +x""".encode("utf-8") + styles_xml = """ + + + + +""".encode("utf-8") + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr("word/styles.xml", styles_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = relax_justified_softbreak_paragraphs_in_docx(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + styles = z2.read("word/styles.xml").decode("utf-8") + self.assertNotIn('val="distribute"', styles) + self.assertIn('val="both"', styles) + + def test_relax_justified_merges_do_not_expand_shift_return_in_settings(self) -> None: + """Compatibility flag so lines ending in soft breaks are not fully stretched when justified.""" + doc_xml = """ + +x""".encode("utf-8") + settings_xml = b""" + + +""" + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/document.xml", doc_xml) + z.writestr("word/settings.xml", settings_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = relax_justified_softbreak_paragraphs_in_docx(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + settings = z2.read("word/settings.xml").decode("utf-8") + self.assertIn("doNotExpandShiftReturn", settings) + self.assertRegex(settings, r'doNotExpandShiftReturn[^>]*val="1"') + + def test_relax_justified_preserves_non_justified_paragraphs(self) -> None: + """Soft breaks in non-justified paragraphs are left alone (no surprise splits).""" + doc_xml = """ + + + +line1line2 + +""".encode( + "utf-8" + ) + out = relax_justified_softbreak_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + paras = re.findall(r"<[^>]*:p\b[^>]*>.*?]*:p>", doc, re.DOTALL) + self.assertEqual(len(paras), 1, msg="left-aligned paragraphs must not be split") + self.assertRegex(doc, r"<[^>]*:br\b[^>]*/?>", msg="soft break should survive") + + def test_relax_justified_preserves_page_break(self) -> None: + """Page breaks (``) are NOT treated as soft breaks.""" + doc_xml = """ + + + +beforeafter + +""".encode( + "utf-8" + ) + out = relax_justified_softbreak_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + paras = re.findall(r"<[^>]*:p\b[^>]*>.*?]*:p>", doc, re.DOTALL) + self.assertEqual(len(paras), 1, msg="page breaks must not trigger paragraph split") + self.assertRegex(doc, r'<[^>]*:br\s+[^>]*:type="page"') + + def test_relax_justified_idempotent(self) -> None: + """Running twice produces the same output as running once.""" + doc_xml = """ + + + +aaabbbccc + +""".encode( + "utf-8" + ) + once = relax_justified_softbreak_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + twice = relax_justified_softbreak_paragraphs_in_docx(once) + self.assertEqual( + _read_document_xml(once), + _read_document_xml(twice), + msg="second pass should be a no-op", + ) + paras = re.findall( + r"<[^>]*:p\b[^>]*>.*?]*:p>", _read_document_xml(once), re.DOTALL + ) + self.assertEqual(len(paras), 3, msg="two soft breaks should yield three paragraphs") + + def test_strip_mau_04_removes_section_between_page_breaks(self) -> None: + """Body order: mau_03 sig, page-break, letterhead table, « Mẫu số 04 », content, + page-break, « Bản cam kết ». Strip should drop everything from the leading + page-break paragraph through the last Mẫu số 04 content paragraph (inclusive), + keeping the trailing page-break paragraph that opens « Bản cam kết ».""" + doc_xml = """ + + +{{ mau_03.tac_gia_chinh_ky }} + +BỘ Y TẾ +Mẫu số 04 +PHIẾU ĐÁNH GIÁ SÁNG KIẾN +1. Tên sáng kiến: {{ mau_04.ten_sang_kien }} +Kết luận: {{ mau_04.ket_luan }} +{{ mau_04.thanh_vien_hoi_dong }} + +CỘNG HOÀ XÃ HỘI CHỦ NGHĨA VIỆT NAM +BẢN CAM KẾT + +""".encode( + "utf-8" + ) + out = strip_mau_04_evaluation_section_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + self.assertNotIn("Mẫu số 04", doc) + self.assertNotIn("PHIẾU ĐÁNH GIÁ", doc) + self.assertNotIn("mau_04", doc) + # The leading page break + letterhead + content are gone, but the trailing + # page-break paragraph (now the only page break) must survive so Bản cam kết + # still starts on its own page. + page_breaks = re.findall(r'<[^>]*:br\s+[^>]*:type="page"', doc) + self.assertEqual(len(page_breaks), 1, msg=f"expected 1 page break, got {len(page_breaks)}: {doc!r}") + self.assertIn("mau_03.tac_gia_chinh_ky", doc) + self.assertIn("BẢN CAM KẾT", doc) + self.assertIn("CỘNG HOÀ XÃ HỘI CHỦ NGHĨA VIỆT NAM", doc) + # sectPr must survive the trim. + self.assertRegex(doc, r"<[^>]*:sectPr") + + def test_strip_mau_04_is_idempotent(self) -> None: + """Second pass over an already-stripped document is a no-op.""" + doc_xml = """ + + +mau_03 end + +Mẫu số 04 +content + +BẢN CAM KẾT +""".encode( + "utf-8" + ) + once = strip_mau_04_evaluation_section_in_docx(_wrap_doc_in_zip(doc_xml)) + twice = strip_mau_04_evaluation_section_in_docx(once) + self.assertEqual(_read_document_xml(once), _read_document_xml(twice)) + self.assertNotIn("Mẫu số 04", _read_document_xml(once)) + + def test_strip_mau_04_noop_when_marker_missing(self) -> None: + """Documents that don't carry the « Mẫu số 04 » header are left untouched.""" + doc_xml = """ + + +only mau_03 + +BẢN CAM KẾT +""".encode( + "utf-8" + ) + out = strip_mau_04_evaluation_section_in_docx(_wrap_doc_in_zip(doc_xml)) + before = _read_document_xml(_wrap_doc_in_zip(doc_xml)) + after = _read_document_xml(out) + # Allow whitespace / declaration differences from ElementTree round-trip; the + # human-readable text content must be unchanged. + for needle in ("only mau_03", "BẢN CAM KẾT"): + self.assertIn(needle, after) + self.assertNotIn("Mẫu số 04", after) + + def test_strip_mau_04_bails_out_without_leading_page_break(self) -> None: + """If there's no page break before the Mẫu số 04 header (malformed template), + leave the document alone instead of removing the previous section by mistake.""" + doc_xml = """ + + +previous section content +Mẫu số 04 +{{ mau_04.ten_sang_kien }} +""".encode( + "utf-8" + ) + out = strip_mau_04_evaluation_section_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + self.assertIn("previous section content", doc) + self.assertIn("Mẫu số 04", doc, msg="strip must not run when leading page break is missing") + + def test_collapse_empty_pagebreak_before_table_uses_pagebreakbefore(self) -> None: + """An empty paragraph that hosts only ```` followed by a + table is removed; the first paragraph in the first cell of the table gets + ```` so the table anchors to a new page without an + intervening empty body paragraph.""" + doc_xml = """ + + +previous content + + + +letterhead cell + + +next section paragraph +""".encode( + "utf-8" + ) + out = collapse_empty_page_break_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + self.assertNotRegex( + doc, r'<[^>]*:br\s+[^>]*:type="page"', + msg="inline should be replaced", + ) + self.assertRegex(doc, r"<[^>]*:pageBreakBefore") + # The empty page-break paragraph is gone but original content survives. + self.assertIn("previous content", doc) + self.assertIn("letterhead cell", doc) + self.assertIn("next section paragraph", doc) + + def test_collapse_empty_pagebreak_before_paragraph(self) -> None: + """Empty page-break paragraph followed by a non-empty paragraph: the empty + paragraph is removed and ```` is added to the next paragraph + so it starts on a new page.""" + doc_xml = """ + + +A + +B +""".encode( + "utf-8" + ) + out = collapse_empty_page_break_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + # Exactly two body paragraphs left (empty break paragraph collapsed). + paras = re.findall(r"<[^>]*:p\b[^>]*>.*?]*:p>", doc, re.DOTALL) + self.assertEqual(len(paras), 2, msg=f"expected 2 paragraphs, got {len(paras)}: {doc!r}") + # The B paragraph keeps its center alignment AND gains pageBreakBefore. + b_para = next(p for p in paras if "B]*:jc\s+[^>]*:val="center"') + + def test_collapse_empty_pagebreak_idempotent(self) -> None: + """Second pass produces the same output as first pass.""" + doc_xml = """ + + +A + +B +""".encode( + "utf-8" + ) + once = collapse_empty_page_break_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + twice = collapse_empty_page_break_paragraphs_in_docx(once) + self.assertEqual(_read_document_xml(once), _read_document_xml(twice)) + # And exactly one pageBreakBefore in the result (not double-registered). + pbb_count = len(re.findall(r"<[^>]*:pageBreakBefore", _read_document_xml(once))) + self.assertEqual(pbb_count, 1) + + def test_collapse_empty_pagebreak_preserves_text_carrying_breaks(self) -> None: + """A paragraph that carries real text *and* an inline page break (rare; usually + Word-edited) must not be collapsed: dropping the text would lose content.""" + doc_xml = """ + + +visible text +after +""".encode( + "utf-8" + ) + out = collapse_empty_page_break_paragraphs_in_docx(_wrap_doc_in_zip(doc_xml)) + doc = _read_document_xml(out) + self.assertRegex(doc, r'<[^>]*:br\s+[^>]*:type="page"', msg="break must survive") + self.assertIn("visible text", doc) + self.assertIn("after", doc) + self.assertNotIn("pageBreakBefore", doc) + + def test_force_times_new_roman_styles(self) -> None: + styles_xml = b""" + + + +""" + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as z: + z.writestr("word/styles.xml", styles_xml) + z.writestr( + "[Content_Types].xml", + b'', + ) + out = force_times_new_roman_in_styles_docx(buf.getvalue()) + with zipfile.ZipFile(io.BytesIO(out)) as z2: + st = z2.read("word/styles.xml").decode("utf-8") + self.assertNotIn("Calibri", st) + self.assertIn("Times New Roman", st) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_evidence_initiative_resolution.py b/be0/tests/test_evidence_initiative_resolution.py new file mode 100644 index 0000000..e0372c6 --- /dev/null +++ b/be0/tests/test_evidence_initiative_resolution.py @@ -0,0 +1,108 @@ +""" +``resolve_initiative_for_draft_case_key`` — evidence URLs may use ``sub-…`` or ``SUB-…`` instead of ``Initiative.case_code``. + +Set INITIATIVE_DATABASE_URL to run (same as tests.test_applications_db_integration). + +Run: + cd be0 && python -m unittest tests.test_evidence_initiative_resolution -v +""" + +from __future__ import annotations + +import os +import unittest +import uuid +from datetime import datetime, timezone + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") + + +@unittest.skipUnless( + _RUN_DB, + "Set INITIATIVE_DATABASE_URL=postgresql+asyncpg://.../initiatives to run DB integration tests", +) +class EvidenceInitiativeResolutionTests(unittest.IsolatedAsyncioTestCase): + async def asyncSetUp(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + await eng.init_engine() + + async def asyncTearDown(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + + async def test_resolves_by_public_submission_id_case_insensitive(self) -> None: + from sqlalchemy import delete + + from src.initiative_db.engine import get_session + from src.initiative_db.models import Draft, Initiative, User + from src.initiative_db.submissions import resolve_initiative_for_draft_case_key + + owner_id = uuid.uuid4() + owner_email = f"evtest-{owner_id.hex[:8]}@ump.edu.vn" + case_code = f"EVCASE-{uuid.uuid4().hex[:10]}" + submission_id = f"sub-{uuid.uuid4().hex[:16]}" + + async with get_session() as session: + session.add( + User( + id=owner_id, + email=owner_email, + password_hash="-", + full_name="Evidence resolver test", + ) + ) + await session.flush() + ini = Initiative( + case_code=case_code, + owner_id=owner_id, + status="submitted", + submitted_at=datetime(2026, 1, 1, 12, 0, 0, tzinfo=timezone.utc), + ) + session.add(ini) + await session.flush() + payload = { + "caseId": case_code, + "updatedAt": "2026-01-01T12:00:00Z", + "tabs": {}, + "submissionRecord": { + "id": submission_id, + "submittedDate": "2026-01-01T12:00:00.000Z", + "name": "Test", + "author": {"id": case_code, "name": "T", "email": owner_email}, + "status": "submitted", + "reviewStatus": "not_reviewed", + }, + } + session.add( + Draft( + draft_code=f"DRAFT-{case_code}", + initiative_id=ini.id, + payload=payload, + version=1, + ) + ) + await session.commit() + ini_id = ini.id + + async with get_session() as session: + upper_alias = "SUB-" + submission_id.split("-", 1)[1] + r1 = await resolve_initiative_for_draft_case_key(session, submission_id) + r2 = await resolve_initiative_for_draft_case_key(session, upper_alias) + self.assertIsNotNone(r1) + self.assertIsNotNone(r2) + assert r1 is not None and r2 is not None + self.assertEqual(r1.case_code, case_code) + self.assertEqual(r2.case_code, case_code) + + async with get_session() as session: + await session.execute(delete(Draft).where(Draft.initiative_id == ini_id)) + await session.execute(delete(Initiative).where(Initiative.id == ini_id)) + await session.execute(delete(User).where(User.id == owner_id)) + await session.commit() + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_evidence_kind_parsing.py b/be0/tests/test_evidence_kind_parsing.py new file mode 100644 index 0000000..3be5b40 --- /dev/null +++ b/be0/tests/test_evidence_kind_parsing.py @@ -0,0 +1,35 @@ +"""Unit tests for ``_evidence_kind_to_role`` (query/form normalization). + +Run: cd be0 && python -m unittest tests.test_evidence_kind_parsing -v +""" + +from __future__ import annotations + +import unittest + + +class EvidenceKindParsingTests(unittest.TestCase): + def test_plain_strings(self) -> None: + from main import _evidence_kind_to_role + + self.assertEqual(_evidence_kind_to_role("research"), "research_evidence") + self.assertEqual(_evidence_kind_to_role("TextBook"), "textbook_evidence") + self.assertEqual(_evidence_kind_to_role("TECHNICAL"), "technical_evidence") + self.assertIsNone(_evidence_kind_to_role("other")) + + def test_prefers_valid_entry_in_list(self) -> None: + """Duplicate or noisy ``kind=`` values: first matching token wins.""" + from main import _evidence_kind_to_role + + self.assertEqual(_evidence_kind_to_role(["", "bad", "research"]), "research_evidence") + self.assertEqual(_evidence_kind_to_role(["research", "textbook"]), "research_evidence") + + def test_strips_zwsp_and_bom(self) -> None: + from main import _evidence_kind_to_role + + self.assertEqual(_evidence_kind_to_role("research\u200b"), "research_evidence") + self.assertEqual(_evidence_kind_to_role("\ufeffresearch"), "research_evidence") + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_filename_normalize.py b/be0/tests/test_filename_normalize.py new file mode 100644 index 0000000..8cf00d7 --- /dev/null +++ b/be0/tests/test_filename_normalize.py @@ -0,0 +1,76 @@ +"""Unit tests for the pure filename-normalization helpers in imagehub_routes. + +These cover the {prefix}_{caseID}_0000.{ext} rename convention (caseID = the file's number, +5-digit zero-padded), the case-number extraction, and the double-extension split — with no +Postgres / MinIO dependency. +""" +from __future__ import annotations + +import unittest + +from src.imagehub_routes import _case_number, _normalized_name, _split_name_ext + + +class TestSplitNameExt(unittest.TestCase): + def test_double_extension_nii_gz(self): + self.assertEqual(_split_name_ext("a.nii.gz"), ("a", ".nii.gz")) + + def test_double_extension_tar_gz(self): + self.assertEqual(_split_name_ext("a.tar.gz"), ("a", ".tar.gz")) + + def test_single_extension(self): + self.assertEqual(_split_name_ext("a.png"), ("a", ".png")) + + def test_no_extension(self): + self.assertEqual(_split_name_ext("a"), ("a", "")) + + +class TestCaseNumber(unittest.TestCase): + def test_plain_number(self): + self.assertEqual(_case_number("1"), 1) + self.assertEqual(_case_number("100"), 100) + + def test_strips_channel_tag(self): + self.assertEqual(_case_number("10_0000"), 10) + + def test_prefixed_stem(self): + # year digits in the prefix must NOT be mistaken for the case number + self.assertEqual(_case_number("POLYP25_00001"), 1) + self.assertEqual(_case_number("POLYP25_00123_0000"), 123) + + def test_no_digits(self): + self.assertIsNone(_case_number("frame")) + + +class TestNormalizedName(unittest.TestCase): + def test_image_gets_prefix_padding_and_channel(self): + self.assertEqual(_normalized_name("1.png", "POLYP25", False), "POLYP25_00001_0000.png") + self.assertEqual(_normalized_name("100.png", "POLYP25", False), "POLYP25_00100_0000.png") + + def test_label_gets_prefix_padding_no_channel(self): + self.assertEqual(_normalized_name("1.png", "POLYP25", True), "POLYP25_00001.png") + + def test_image_and_label_share_case_id(self): + img = _normalized_name("7.png", "POLYP25", False) + lbl = _normalized_name("7.png", "POLYP25", True) + self.assertEqual(img, "POLYP25_00007_0000.png") + self.assertEqual(lbl, "POLYP25_00007.png") + + def test_double_extension(self): + self.assertEqual(_normalized_name("5.nii.gz", "RIB25", False), "RIB25_00005_0000.nii.gz") + + def test_idempotent_already_correct(self): + self.assertIsNone(_normalized_name("POLYP25_00001_0000.png", "POLYP25", False)) + self.assertIsNone(_normalized_name("POLYP25_00001.png", "POLYP25", True)) + + def test_reprefix_changes_prefix_keeps_case(self): + self.assertEqual( + _normalized_name("OLD25_00009_0000.png", "POLYP25", False), "POLYP25_00009_0000.png" + ) + + def test_no_case_number_returns_none(self): + self.assertIsNone(_normalized_name("frame.png", "POLYP25", False)) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_identity_domain.py b/be0/tests/test_identity_domain.py new file mode 100644 index 0000000..c0fbef5 --- /dev/null +++ b/be0/tests/test_identity_domain.py @@ -0,0 +1,133 @@ +"""Unit tests for the pure Identity domain layer. + +No DB, no FastAPI — runs anywhere (``python -m pytest tests/test_identity_domain.py``). +Pins the behavior extracted from auth_api.py so the eventual cut-over can't drift. +""" + +from __future__ import annotations + +import uuid +from datetime import datetime, timezone + +import pytest + +from src.domain.identity.entities import User +from src.domain.identity.errors import InvalidInstitutionalEmail, WeakPassword +from src.domain.identity.services import ( + DEFAULT_POLICY_ADMIN_EMAILS, + AdminReconcileAction, + build_access_token_claims, + policy_admin_emails, + reconcile_admin_action, +) +from src.domain.identity.value_objects import ( + InstitutionalEmail, + Role, + assert_password_policy, +) + + +class TestInstitutionalEmail: + @pytest.mark.parametrize("raw", [" ThaoNTT@UMP.edu.vn ", "x@umc.edu.vn"]) + def test_parse_normalizes_and_accepts(self, raw: str) -> None: + assert InstitutionalEmail.parse(raw).value == raw.strip().lower() + + @pytest.mark.parametrize("raw", ["a@gmail.com", "a@ump.edu.vn.evil.com", "", " "]) + def test_parse_rejects_non_institutional(self, raw: str) -> None: + with pytest.raises(InvalidInstitutionalEmail): + InstitutionalEmail.parse(raw) + + def test_value_object_equality_by_value(self) -> None: + assert InstitutionalEmail.parse("A@ump.edu.vn") == InstitutionalEmail.parse("a@ump.edu.vn") + + +class TestPasswordPolicy: + def test_accepts_strong_password(self) -> None: + assert_password_policy("Abcdef1!") # no raise + + @pytest.mark.parametrize( + "pwd, msg", + [ + ("Ab1!", "Mật khẩu tối thiểu 6 ký tự."), + ("abcdef1!", "Mật khẩu phải có ít nhất một chữ cái hoa."), + ("ABCDEF1!", "Mật khẩu phải có ít nhất một chữ cái thường."), + ("Abcdefg!", "Mật khẩu phải có ít nhất một chữ số."), + ("Abcdef12", "Mật khẩu phải có ít nhất một ký tự đặc biệt (không chỉ chữ và số)."), + ], + ) + def test_rejects_with_exact_message(self, pwd: str, msg: str) -> None: + with pytest.raises(WeakPassword) as exc: + assert_password_policy(pwd) + assert exc.value.message == msg + + def test_rejects_overlong(self) -> None: + with pytest.raises(WeakPassword): + assert_password_policy("Ab1!" + "a" * 600) + + +class TestRolePolicy: + def test_env_overrides_defaults(self) -> None: + assert policy_admin_emails("A@ump.edu.vn, b@umc.edu.vn ") == frozenset( + {"a@ump.edu.vn", "b@umc.edu.vn"} + ) + + def test_unset_uses_builtin_allowlist(self) -> None: + assert policy_admin_emails(None) == DEFAULT_POLICY_ADMIN_EMAILS + assert policy_admin_emails(" ") == DEFAULT_POLICY_ADMIN_EMAILS + + @pytest.mark.parametrize( + "email, has_row, from_policy, expected", + [ + ("a@ump.edu.vn", False, False, AdminReconcileAction.add_admin), + ("a@ump.edu.vn", True, True, AdminReconcileAction.mark_policy), + ("b@ump.edu.vn", True, True, AdminReconcileAction.remove_admin), + ("b@ump.edu.vn", True, False, AdminReconcileAction.none), # manual admin preserved + ("b@ump.edu.vn", False, False, AdminReconcileAction.none), + ], + ) + def test_reconcile_decision(self, email, has_row, from_policy, expected) -> None: + policy = frozenset({"a@ump.edu.vn"}) + assert reconcile_admin_action(email, policy, has_row, from_policy) == expected + + +class TestTokenClaims: + def test_claim_shape(self) -> None: + uid = uuid.uuid4() + now = datetime(2026, 6, 13, 12, 0, tzinfo=timezone.utc) + claims = build_access_token_claims(uid, "a@ump.edu.vn", ["admin", "viewer"], 3, now, 12) + assert claims["sub"] == str(uid) + assert claims["email"] == "a@ump.edu.vn" + assert claims["roles"] == ["admin", "viewer"] + assert claims["cv"] == 3 + assert claims["exp"] - claims["iat"] == 12 * 3600 + + +class TestUserAggregate: + def _user(self, **kw) -> User: + base = dict( + id=uuid.uuid4(), + email="a@ump.edu.vn", + full_name="Test", + password_hash="x", + email_verified=True, + is_active=True, + credential_version=0, + ) + base.update(kw) + return User(**base) + + def test_can_authenticate_requires_active(self) -> None: + assert self._user(is_active=True).can_authenticate() + assert not self._user(is_active=False).can_authenticate() + + def test_bump_credential_version(self) -> None: + u = self._user(credential_version=2) + u.bump_credential_version() + assert u.credential_version == 3 + + def test_identity_equality(self) -> None: + uid = uuid.uuid4() + assert self._user(id=uid, full_name="A") == self._user(id=uid, full_name="B") + + def test_role_enum_values(self) -> None: + assert {r.value for r in Role} == {"admin", "editor", "viewer"} diff --git a/be0/tests/test_imagehub_datasets.py b/be0/tests/test_imagehub_datasets.py new file mode 100644 index 0000000..efa9627 --- /dev/null +++ b/be0/tests/test_imagehub_datasets.py @@ -0,0 +1,407 @@ +"""Tests for the ImageHub dataset routes (milestone 1 walking skeleton). + +Pure-helper unit tests always run. The full integration test (create dataset → upload with +content-addressed dedup → version snapshot → owner/admin authz → audit) runs only when BOTH: + - INITIATIVE_DATABASE_URL points at PostgreSQL (asyncpg), and + - S3_ENDPOINT_URL is set (a reachable MinIO; the dev stack maps it to http://localhost:19000). + + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://initiative:initiative_secret@127.0.0.1:15432/initiatives" + export S3_ENDPOINT_URL="http://localhost:19000" S3_ACCESS_KEY=minio_user S3_SECRET_KEY=minio_password \\ + S3_BUCKET_ATTACHMENTS=initiative-attachments S3_BUCKET_EXPORTS=initiative-exports \\ + S3_BUCKET_QUARANTINE=initiative-quarantine S3_PUBLIC_ENDPOINT_URL=http://localhost:19000 + cd be0 && python -m unittest tests.test_imagehub_datasets -v + +Prereq for the integration test: migration 017_imagehub_datasets.sql applied (compose init mount +or scripts/apply_initiative_migrations.py). +""" +from __future__ import annotations + +import io +import os +import unittest +import uuid + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") +_RUN_S3 = bool(os.getenv("S3_ENDPOINT_URL", "").strip()) + +# Let the module (which imports src.minio.storage → S3Settings()) import even when not running +# against a real MinIO, so the pure-unit tests below can always run. These defaults match the +# dev stack's host-mapped MinIO; the integration test only fires when S3_ENDPOINT_URL was set. +os.environ.setdefault("S3_ENDPOINT_URL", "http://localhost:19000") +os.environ.setdefault("S3_ACCESS_KEY", "minio_user") +os.environ.setdefault("S3_SECRET_KEY", "minio_password") +os.environ.setdefault("S3_BUCKET_ATTACHMENTS", "initiative-attachments") +os.environ.setdefault("S3_BUCKET_EXPORTS", "initiative-exports") +os.environ.setdefault("S3_BUCKET_QUARANTINE", "initiative-quarantine") +os.environ.setdefault("S3_PUBLIC_ENDPOINT_URL", "http://localhost:19000") + + +class PureHelperTests(unittest.TestCase): + """No DB / no network — string + sniff helpers.""" + + def test_build_blob_key_is_content_addressed(self) -> None: + from src.minio.storage import S3Storage + + key = S3Storage.build_blob_key("AbCdEf0123456789") + self.assertEqual(key, "blobs/ab/cd/abcdef0123456789") + + def test_slugify_strips_diacritics_and_punct(self) -> None: + from src.imagehub_routes import _slugify + + self.assertEqual(_slugify("Bộ dữ liệu CT Ngực!! 2026"), "bo-du-lieu-ct-nguc-2026") + self.assertEqual(_slugify(""), "dataset") + + def test_safe_logical_path_basename_only(self) -> None: + from src.imagehub_routes import _safe_logical_path + + self.assertEqual(_safe_logical_path("/evil/../a b.dcm"), "a_b.dcm") + self.assertEqual(_safe_logical_path("C:\\scans\\series1.nii.gz"), "series1.nii.gz") + self.assertEqual(_safe_logical_path(""), "file") + + def test_safe_folder_path_preserves_dirs_rejects_traversal(self) -> None: + from src.imagehub_routes import _safe_folder_path + + # the directory is kept (basename dropped) so an uploaded tree round-trips + self.assertEqual(_safe_folder_path("imagesTr/ct_001.nii.gz"), "imagesTr") + self.assertEqual(_safe_folder_path("a/b/c/scan.nii.gz"), "a/b/c") + # no directory component → dataset root + self.assertEqual(_safe_folder_path("readme.txt"), "") + self.assertEqual(_safe_folder_path(""), "") + # leading slash + ".." traversal segments are stripped + self.assertEqual(_safe_folder_path("/evil/../x/y.dcm"), "evil/x") + # backslashes normalise to forward slashes + self.assertEqual(_safe_folder_path("labelsTr\\sub\\m.nii.gz"), "labelsTr/sub") + + def test_coerce_tags(self) -> None: + from src.imagehub_routes import _coerce_tags + + self.assertEqual(_coerce_tags(["CT", " MRI ", "", 7]), ["CT", "MRI", "7"]) + self.assertEqual(_coerce_tags("nope"), []) + + def test_coerce_label_map(self) -> None: + from src.imagehub_routes import _coerce_label_map + + # valid entries kept + trimmed; non-positive / non-int keys and empty/non-str values dropped + self.assertEqual( + _coerce_label_map( + {"1": " kidney ", "2": "tumor", "0": "bad", "-3": "bad", "x": "bad", "4": "", "+5": "bad", "1_0": "bad"} + ), + {"1": "kidney", "2": "tumor"}, + ) + # integer keys coerce to strings; non-dict input → {} + self.assertEqual(_coerce_label_map({1: "kidney"}), {"1": "kidney"}) + self.assertEqual(_coerce_label_map("nope"), {}) + self.assertEqual(_coerce_label_map(None), {}) + + def test_sniff_never_raises_on_non_imaging(self) -> None: + from src.imagehub_routes import _sniff_imaging_meta + + # plain bytes → {}; a .dcm name with junk must degrade to {} (never raise) + self.assertEqual(_sniff_imaging_meta("notes.txt", b"hello world", "text/plain"), {}) + self.assertIsInstance(_sniff_imaging_meta("x.dcm", b"DICM" + b"\x00" * 200, "application/dicom"), dict) + + +def _bearer(uid: uuid.UUID, roles: list[str]) -> str: + import jwt + + from src.auth_jwt import jwt_secret + + return "Bearer " + jwt.encode({"sub": str(uid), "roles": roles, "cv": 0}, jwt_secret(), algorithm="HS256") + + +def _upload(name: str, data: bytes, ctype: str = "application/octet-stream"): + from starlette.datastructures import Headers, UploadFile + + return UploadFile(io.BytesIO(data), filename=name, headers=Headers({"content-type": ctype})) + + +@unittest.skipUnless( + _RUN_DB and _RUN_S3, + "Set INITIATIVE_DATABASE_URL=postgresql+asyncpg://… and S3_ENDPOINT_URL=… to run the integration test", +) +class ImagehubDatasetDbTests(unittest.IsolatedAsyncioTestCase): + """End-to-end: create → upload (content-addressed dedup) → version → owner/admin authz → audit.""" + + async def asyncSetUp(self) -> None: + from src.initiative_db import engine as eng + from src.minio.storage import storage + + await eng.dispose_engine() + await eng.init_engine() + try: + await storage.ensure_buckets_exist() + except Exception as exc: # MinIO not reachable → skip rather than error + self.skipTest(f"MinIO not reachable: {exc}") + self._user_ids: list[uuid.UUID] = [] + self._dataset_ids: list[uuid.UUID] = [] + + async def asyncTearDown(self) -> None: + from sqlalchemy import delete + + from src.initiative_db import engine as eng + from src.initiative_db.engine import get_session + from src.initiative_db.models import ImagehubDataset, User + + async with get_session() as session: + for did in self._dataset_ids: + await session.execute(delete(ImagehubDataset).where(ImagehubDataset.id == did)) + for uid in self._user_ids: + await session.execute(delete(User).where(User.id == uid)) + await session.commit() + await eng.dispose_engine() + + async def _seed_user(self, *, admin: bool = False) -> uuid.UUID: + from src.initiative_db.engine import get_session + from src.initiative_db.models import User + + uid = uuid.uuid4() + async with get_session() as session: + session.add( + User( + id=uid, + email=f"ih-{uid.hex[:10]}@ump.edu.vn", + password_hash="x", + full_name=("Quản trị" if admin else "Nhà nghiên cứu") + " Test", + ) + ) + await session.commit() + self._user_ids.append(uid) + return uid + + async def test_dataset_research_project_link(self) -> None: + """A dataset can be created linked to a research project ("workspace"); the list can be + filtered to that project; bad/foreign project ids are rejected (migration 024).""" + from fastapi import HTTPException + + from src.imagehub_routes import DatasetCreateIn, create_dataset, list_datasets + from src.initiative_db.engine import get_session + from src.initiative_db.models import ResearchProject + + owner = await self._seed_user() + owner_tok = _bearer(owner, ["viewer"]) + + # seed a research project ("workspace") owned by the user (cascade-cleaned with the user) + proj_id = uuid.uuid4() + async with get_session() as session: + session.add(ResearchProject(id=proj_id, owner_user_id=owner, title="Đề tài thử nghiệm")) + await session.commit() + + # create a dataset linked to the project → the link is persisted + ds = await create_dataset( + DatasetCreateIn(name="Bộ dữ liệu thuộc đề tài", researchProjectId=str(proj_id)), + owner_tok, + ) + self._dataset_ids.append(uuid.UUID(ds.id)) + self.assertEqual(ds.researchProjectId, str(proj_id)) + + # a standalone dataset (no project) is still allowed and stays unlinked + ds2 = await create_dataset(DatasetCreateIn(name="Bộ dữ liệu độc lập"), owner_tok) + self._dataset_ids.append(uuid.UUID(ds2.id)) + self.assertIsNone(ds2.researchProjectId) + + # a non-existent project id is rejected (422) + with self.assertRaises(HTTPException) as ctx: + await create_dataset( + DatasetCreateIn(name="x", researchProjectId=str(uuid.uuid4())), owner_tok + ) + self.assertEqual(ctx.exception.status_code, 422) + + # ?projectId= filters the list to that project only (3rd positional arg = projectId) + in_proj = await list_datasets("mine", owner_tok, str(proj_id)) + ids_in_proj = [d.id for d in in_proj] + self.assertIn(ds.id, ids_in_proj) + self.assertNotIn(ds2.id, ids_in_proj) + self.assertTrue(all(d.researchProjectId == str(proj_id) for d in in_proj)) + + async def test_update_label_map_sanitizes_and_persists(self) -> None: + """update_dataset accepts a per-value label map, sanitizes it, and round-trips it (migration 027).""" + from src.imagehub_routes import ( + DatasetCreateIn, + DatasetUpdateIn, + create_dataset, + get_dataset, + update_dataset, + ) + + owner = await self._seed_user() + owner_tok = _bearer(owner, ["viewer"]) + + ds = await create_dataset(DatasetCreateIn(name="KiTS labels"), owner_tok) + self._dataset_ids.append(uuid.UUID(ds.id)) + self.assertEqual(ds.labelMap, {}) # empty by default + + # garbage keys/values are dropped; valid ones trimmed + kept + updated = await update_dataset( + ds.id, + DatasetUpdateIn(labelMap={"1": "kidney", "2": "tumor", "3": "cyst", "0": "bad", "x": "bad"}), + owner_tok, + ) + self.assertEqual(updated.labelMap, {"1": "kidney", "2": "tumor", "3": "cyst"}) + + # persisted: a fresh read returns the same map + fresh = await get_dataset(ds.id, owner_tok) + self.assertEqual(fresh.labelMap, {"1": "kidney", "2": "tumor", "3": "cyst"}) + + async def test_review_persists_decision_and_stats(self) -> None: + """review_task writes a structured review event; review-stats tallies it per reviewer (025).""" + from sqlalchemy import select + + from src.imagehub_routes import ReviewIn, review_stats, review_task + from src.initiative_db.engine import get_session + from src.initiative_db.models import ( + ImagehubBlob, + ImagehubDataset, + ImagehubDatasetFile, + ImagehubDatasetStage, + ImagehubTask, + ImagehubTaskReviewEvent, + ) + + owner = await self._seed_user() + owner_tok = _bearer(owner, ["viewer"]) + + # build the minimal chain (no upload): dataset + a Review stage + a file + a task already + # advanced to that Review stage, assigned to the owner. + ds_id, stage_id, file_id, task_id = (uuid.uuid4() for _ in range(4)) + sha = uuid.uuid4().hex + async with get_session() as session: + session.add(ImagehubDataset(id=ds_id, owner_user_id=owner, name="Review demo")) + session.add( + ImagehubDatasetStage(id=stage_id, dataset_id=ds_id, name="Rà soát 1", kind="review", seq=1) + ) + session.add(ImagehubBlob(sha256=sha, size_bytes=1)) + session.add( + ImagehubDatasetFile(id=file_id, dataset_id=ds_id, logical_path="ct.nii.gz", blob_sha256=sha) + ) + session.add( + ImagehubTask( + id=task_id, dataset_id=ds_id, dataset_file_id=file_id, name="ct.nii.gz", + current_stage_id=stage_id, pipeline_state="inReview", queue_status="assigned", + assignee_user_id=owner, + ) + ) + await session.commit() + self._dataset_ids.append(ds_id) # cascade-cleans stage/file/task/events in teardown + + # accept the review → a structured event is persisted (decision + reviewer + stage + note) + await review_task(str(ds_id), str(task_id), ReviewIn(decision="accept", note="Đạt"), owner_tok) + async with get_session() as session: + evs = ( + await session.execute( + select(ImagehubTaskReviewEvent).where(ImagehubTaskReviewEvent.task_id == task_id) + ) + ).scalars().all() + self.assertEqual(len(evs), 1) + self.assertEqual(evs[0].decision, "accept") + self.assertEqual(evs[0].reviewer_user_id, owner) + self.assertEqual(evs[0].stage_id, stage_id) + self.assertEqual(evs[0].note, "Đạt") + + # the stats endpoint tallies it for the reviewer (authorization is the LAST positional arg) + stats = await review_stats(str(ds_id), str(owner), 30, owner_tok) + self.assertEqual(stats.accepted, 1) + self.assertEqual(stats.rejected, 0) + # a foreign reviewer has no tally + empty = await review_stats(str(ds_id), str(uuid.uuid4()), 30, owner_tok) + self.assertEqual(empty.accepted, 0) + + async def test_create_upload_dedup_version_authz_audit(self) -> None: + from fastapi import HTTPException + from sqlalchemy import func, select + + from src.imagehub_routes import ( + DatasetCreateIn, + VersionCreateIn, + create_dataset, + create_version, + get_dataset, + list_audit, + list_datasets, + list_files, + list_versions, + upload_files, + ) + from src.initiative_db.engine import get_session + from src.initiative_db.models import ImagehubBlob, ImagehubDatasetFile + + owner = await self._seed_user() + admin = await self._seed_user(admin=True) + other = await self._seed_user() + owner_tok = _bearer(owner, ["viewer"]) + admin_tok = _bearer(admin, ["admin"]) + other_tok = _bearer(other, ["viewer"]) + + # create + ds = await create_dataset( + DatasetCreateIn(name="CT Ngực thử nghiệm", description="demo", modalityTags=["CT"]), + owner_tok, + ) + self._dataset_ids.append(uuid.UUID(ds.id)) + self.assertEqual(ds.name, "CT Ngực thử nghiệm") + self.assertEqual(ds.modalityTags, ["CT"]) + self.assertEqual(ds.fileCount, 0) + + # upload the SAME content under two names → content-addressed dedup + blob_bytes = uuid.uuid4().bytes * 64 # unique per run + res = await upload_files( + ds.id, [_upload("scan_a.bin", blob_bytes), _upload("scan_b.bin", blob_bytes)], owner_tok + ) + self.assertTrue(res["ok"]) + shas = {f["sha256"] for f in res["files"]} + self.assertEqual(len(shas), 1, "same content must hash to one sha256") + deduped_flags = sorted(f["deduped"] for f in res["files"]) + self.assertEqual(deduped_flags, [False, True], "first stores the blob, second dedups") + + # DB: exactly one blob row for that sha256, two file rows for the dataset + sha = next(iter(shas)) + async with get_session() as session: + blob_count = ( + await session.execute( + select(func.count()).select_from(ImagehubBlob).where(ImagehubBlob.sha256 == sha) + ) + ).scalar_one() + file_count = ( + await session.execute( + select(func.count()) + .select_from(ImagehubDatasetFile) + .where(ImagehubDatasetFile.dataset_id == uuid.UUID(ds.id)) + ) + ).scalar_one() + self.assertEqual(blob_count, 1) + self.assertEqual(file_count, 2) + + # browse files (each carries a presigned download URL) + files = await list_files(ds.id, owner_tok) + self.assertEqual(len(files), 2) + self.assertTrue(all(f.downloadUrl for f in files)) + + # authz: a non-admin other user can't see or read it + owner_list = await list_datasets("mine", owner_tok) + self.assertIn(ds.id, [d.id for d in owner_list]) + other_list = await list_datasets("all", other_tok) # non-admin: scope=all ignored + self.assertNotIn(ds.id, [d.id for d in other_list]) + with self.assertRaises(HTTPException) as ctx: + await get_dataset(ds.id, other_tok) + self.assertEqual(ctx.exception.status_code, 404) + + # admin sees every dataset (the clinical data repository) + admin_list = await list_datasets("all", admin_tok) + self.assertIn(ds.id, [d.id for d in admin_list]) + + # version snapshot freezes the 2-file manifest + ver = await create_version(ds.id, VersionCreateIn(message="phiên bản đầu"), owner_tok) + self.assertEqual(ver.seq, 1) + self.assertEqual(ver.fileCount, 2) + versions = await list_versions(ds.id, owner_tok) + self.assertEqual(len(versions), 1) + + # audit trail recorded each mutation + audit = await list_audit(ds.id, owner_tok) + actions = [a.action for a in audit] + self.assertIn("Tạo bộ dữ liệu", actions) + self.assertIn("Tải tệp lên", actions) + self.assertIn("Tạo phiên bản", actions) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_imagehub_segmentation.py b/be0/tests/test_imagehub_segmentation.py new file mode 100644 index 0000000..1134817 --- /dev/null +++ b/be0/tests/test_imagehub_segmentation.py @@ -0,0 +1,157 @@ +"""Unit tests for the ImageHub segmentation-linking domain service. + +The service was built with INJECTED infrastructure (put_blob / sniff_meta / safe_name) +precisely so the domain rules can be exercised with fakes — no Postgres, no MinIO. +Covers: parent validation (bad uuid / not-found / not-an-image), the mask path +namespacing, organ-label fallback, and the empty-payload guard. +""" +from __future__ import annotations + +import unittest +import uuid + +from src.imagehub_segmentation import MaskUpload, SegmentationError, SegmentationService +from src.initiative_db.models import ImagehubDataset, ImagehubDatasetFile + + +class _FakeResult: + def __init__(self, value): + self._value = value + + def scalar_one_or_none(self): + return self._value + + +class _FakeSession: + """Minimal AsyncSession stand-in. The 1st execute() resolves the parent file; + every later execute() resolves the 'existing mask at this path' lookup (None = + create new). Records added rows.""" + + def __init__(self, parent=None, existing=None): + self._parent = parent + self._existing = existing + self.added: list = [] + self.exec_calls = 0 + self.flushes = 0 + + async def execute(self, _stmt): + self.exec_calls += 1 + return _FakeResult(self._parent if self.exec_calls == 1 else self._existing) + + async def get(self, _model, _key): + return None # blob absent → service will add it + + def add(self, obj): + self.added.append(obj) + + async def flush(self): + self.flushes += 1 + + +def _safe_name(name): + base = (name or "").strip().replace("\\", "/").rsplit("/", 1)[-1] + return base or "file" + + +async def _put_blob(data, media_type): + return { + "sha256": "deadbeef" + str(len(data)), + "size": len(data), + "bucket": "imagehub-blobs", + "key": "blobs/de/ad/deadbeef", + "media_type": media_type or "application/octet-stream", + "deduped": False, + } + + +def _sniff(_filename, _data, _media): + return {"format": "nifti", "shape": [4, 4, 4]} + + +def _service(session): + return SegmentationService(session, put_blob=_put_blob, sniff_meta=_sniff, safe_name=_safe_name) + + +def _image_parent(dataset_id, logical_path="ct.nii.gz"): + return ImagehubDatasetFile( + id=uuid.uuid4(), dataset_id=dataset_id, logical_path=logical_path, file_kind="image" + ) + + +def _mask(filename="liver.nii.gz", organ="Gan", data=b"xyz"): + return MaskUpload(filename=filename, data=data, media_type="application/gzip", organ_label=organ) + + +class TestMaskLogicalPath(unittest.TestCase): + def test_namespaces_under_parent_stem(self): + svc = _service(_FakeSession()) + self.assertEqual(svc._mask_logical_path("ct.nii.gz", "liver.nii.gz"), "ct.seg/liver.nii.gz") + self.assertEqual(svc._mask_logical_path("scan.nii", "a.nii.gz"), "scan.seg/a.nii.gz") + self.assertEqual(svc._mask_logical_path("study.dcm", "k.nii.gz"), "study.seg/k.nii.gz") + + def test_mask_path_cannot_collide_with_a_real_image_path(self): + # A real file's logical_path never contains '/', a mask path always does. + svc = _service(_FakeSession()) + self.assertIn("/", svc._mask_logical_path("ct.nii.gz", "ct.nii.gz")) + + +class TestLinkMasks(unittest.IsolatedAsyncioTestCase): + def setUp(self): + self.ds = ImagehubDataset(id=uuid.uuid4(), owner_user_id=uuid.uuid4()) + self.uid = uuid.uuid4() + + async def test_happy_path_links_mask_to_image(self): + parent = _image_parent(self.ds.id) + sess = _FakeSession(parent=parent, existing=None) + rows = await _service(sess).link_masks(self.ds, str(parent.id), [_mask()], self.uid) + self.assertEqual(len(rows), 1) + r = rows[0] + self.assertEqual(r.file_kind, "segmentation") + self.assertEqual(r.parent_file_id, parent.id) + self.assertEqual(r.organ_label, "Gan") + self.assertEqual(r.logical_path, "ct.seg/liver.nii.gz") + self.assertEqual(r.dataset_id, self.ds.id) + + async def test_organ_label_falls_back_to_filename(self): + parent = _image_parent(self.ds.id) + sess = _FakeSession(parent=parent) + rows = await _service(sess).link_masks(self.ds, str(parent.id), [_mask(organ=" ")], self.uid) + self.assertEqual(rows[0].organ_label, "liver.nii.gz") + + async def test_bad_parent_uuid_is_404(self): + with self.assertRaises(SegmentationError) as ctx: + await _service(_FakeSession()).link_masks(self.ds, "not-a-uuid", [_mask()], self.uid) + self.assertEqual(ctx.exception.status, 404) + + async def test_parent_not_found_is_404(self): + sess = _FakeSession(parent=None) + with self.assertRaises(SegmentationError) as ctx: + await _service(sess).link_masks(self.ds, str(uuid.uuid4()), [_mask()], self.uid) + self.assertEqual(ctx.exception.status, 404) + + async def test_attaching_to_a_mask_is_422(self): + mask_parent = ImagehubDatasetFile( + id=uuid.uuid4(), dataset_id=self.ds.id, logical_path="ct.seg/x.nii.gz", file_kind="segmentation" + ) + sess = _FakeSession(parent=mask_parent) + with self.assertRaises(SegmentationError) as ctx: + await _service(sess).link_masks(self.ds, str(mask_parent.id), [_mask()], self.uid) + self.assertEqual(ctx.exception.status, 422) + + async def test_empty_payload_is_422(self): + parent = _image_parent(self.ds.id) + sess = _FakeSession(parent=parent) + with self.assertRaises(SegmentationError) as ctx: + await _service(sess).link_masks(self.ds, str(parent.id), [], self.uid) + self.assertEqual(ctx.exception.status, 422) + + async def test_all_empty_byte_masks_is_422(self): + parent = _image_parent(self.ds.id) + sess = _FakeSession(parent=parent) + with self.assertRaises(SegmentationError) as ctx: + await _service(sess).link_masks(self.ds, str(parent.id), [_mask(data=b"")], self.uid) + self.assertEqual(ctx.exception.status, 422) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_imagehub_tasks.py b/be0/tests/test_imagehub_tasks.py new file mode 100644 index 0000000..8d731ee --- /dev/null +++ b/be0/tests/test_imagehub_tasks.py @@ -0,0 +1,157 @@ +"""Unit tests for the ImageHub task-pipeline domain (project-workflow §3/§4 transitions). + +The pipeline service is pure (plain functions over ``StageInfo`` lists — no Postgres, no FastAPI), +so the whole state machine is exercised here with plain data. The thin HTTP wrappers in +``imagehub_routes`` are verified live; this file owns the transition rules. + +Covers: stage ordering, the new-task start state, TP1 finalize (advance / -> Ground Truth), +TP2 accept + accept-with-corrections, TP3 reject (-> first stage), the guards (wrong-state finalize/ +review, bad decision), and RS1 reference-standard (Ground-Truth-only). +""" +from __future__ import annotations + +import unittest + +from src.imagehub_task_pipeline import ( + StageInfo, + TaskPipelineError, + compute_finalize, + compute_review, + first_stage, + initial_transition, + order_stages, + stage_after, + state_for_stage, + validate_set_reference, +) + +# A canonical pipeline: Label(0) -> Review_1(1) -> Review_2(2). Deliberately unsorted in the list +# so order_stages / first_stage / stage_after are actually exercised, not given pre-sorted input. +LABEL = StageInfo(id="s-label", kind="label", seq=0) +REV1 = StageInfo(id="s-rev1", kind="review", seq=1) +REV2 = StageInfo(id="s-rev2", kind="review", seq=2) +PIPELINE = [REV2, LABEL, REV1] # out of order on purpose + + +class StageOrderingTests(unittest.TestCase): + def test_order_stages_sorts_by_seq(self): + self.assertEqual([s.id for s in order_stages(PIPELINE)], ["s-label", "s-rev1", "s-rev2"]) + + def test_first_stage_is_lowest_seq(self): + self.assertEqual(first_stage(PIPELINE).id, "s-label") + + def test_first_stage_empty_raises_409(self): + with self.assertRaises(TaskPipelineError) as ctx: + first_stage([]) + self.assertEqual(ctx.exception.status, 409) + + def test_stage_after_returns_next(self): + self.assertEqual(stage_after(PIPELINE, "s-label").id, "s-rev1") + self.assertEqual(stage_after(PIPELINE, "s-rev1").id, "s-rev2") + + def test_stage_after_last_is_none(self): + self.assertIsNone(stage_after(PIPELINE, "s-rev2")) + + def test_stage_after_unknown_or_none_is_none(self): + self.assertIsNone(stage_after(PIPELINE, "nope")) + self.assertIsNone(stage_after(PIPELINE, None)) + + def test_state_for_stage(self): + self.assertEqual(state_for_stage(LABEL), "inLabel") + self.assertEqual(state_for_stage(REV1), "inReview") + + +class InitialTransitionTests(unittest.TestCase): + def test_new_task_starts_inlabel_at_first_stage(self): + t = initial_transition(PIPELINE) + self.assertEqual(t.pipeline_state, "inLabel") + self.assertEqual(t.current_stage_id, "s-label") + self.assertEqual(t.queue_status, "assigned") + + def test_new_task_starts_inreview_when_first_stage_is_review(self): + t = initial_transition([REV1]) + self.assertEqual(t.pipeline_state, "inReview") + self.assertEqual(t.current_stage_id, "s-rev1") + + +class FinalizeTests(unittest.TestCase): + def test_finalize_label_advances_to_first_review(self): + t = compute_finalize("inLabel", "s-label", PIPELINE) + self.assertEqual(t.pipeline_state, "inReview") + self.assertEqual(t.current_stage_id, "s-rev1") + self.assertEqual(t.queue_status, "assigned") + + def test_finalize_advances_label_to_next_label(self): + # PreLabel(0,label) -> Label(1,label): finalizing the first stays inLabel at the next. + prelabel = StageInfo(id="s-pre", kind="label", seq=0) + label = StageInfo(id="s-lab", kind="label", seq=1) + t = compute_finalize("inLabel", "s-pre", [prelabel, label]) + self.assertEqual(t.pipeline_state, "inLabel") + self.assertEqual(t.current_stage_id, "s-lab") + + def test_finalize_single_label_goes_to_ground_truth(self): + t = compute_finalize("inLabel", "s-label", [LABEL]) + self.assertEqual(t.pipeline_state, "groundTruth") + self.assertIsNone(t.current_stage_id) + + def test_finalize_when_not_inlabel_raises_409(self): + with self.assertRaises(TaskPipelineError) as ctx: + compute_finalize("inReview", "s-rev1", PIPELINE) + self.assertEqual(ctx.exception.status, 409) + + def test_finalize_when_groundtruth_raises_409(self): + with self.assertRaises(TaskPipelineError): + compute_finalize("groundTruth", None, PIPELINE) + + +class ReviewTests(unittest.TestCase): + def test_accept_advances_to_next_review(self): + t = compute_review("inReview", "s-rev1", PIPELINE, "accept") + self.assertEqual(t.pipeline_state, "inReview") + self.assertEqual(t.current_stage_id, "s-rev2") + + def test_accept_on_last_review_goes_to_ground_truth(self): + t = compute_review("inReview", "s-rev2", PIPELINE, "accept") + self.assertEqual(t.pipeline_state, "groundTruth") + self.assertIsNone(t.current_stage_id) + + def test_accept_with_corrections_advances(self): + t = compute_review("inReview", "s-rev1", PIPELINE, "acceptWithCorrections") + self.assertEqual(t.pipeline_state, "inReview") + self.assertEqual(t.current_stage_id, "s-rev2") + + def test_reject_returns_to_first_stage(self): + t = compute_review("inReview", "s-rev2", PIPELINE, "reject") + self.assertEqual(t.pipeline_state, "inLabel") + self.assertEqual(t.current_stage_id, "s-label") + self.assertEqual(t.queue_status, "assigned") + + def test_review_when_not_inreview_raises_409(self): + with self.assertRaises(TaskPipelineError) as ctx: + compute_review("inLabel", "s-label", PIPELINE, "accept") + self.assertEqual(ctx.exception.status, 409) + + def test_review_invalid_decision_raises_422(self): + with self.assertRaises(TaskPipelineError) as ctx: + compute_review("inReview", "s-rev1", PIPELINE, "maybe") + self.assertEqual(ctx.exception.status, 422) + + +class ReferenceStandardTests(unittest.TestCase): + def test_set_reference_ok_on_ground_truth(self): + # Should not raise. + validate_set_reference("groundTruth", True) + + def test_set_reference_blocked_off_ground_truth(self): + with self.assertRaises(TaskPipelineError) as ctx: + validate_set_reference("inLabel", True) + self.assertEqual(ctx.exception.status, 409) + + def test_unset_reference_allowed_in_any_state(self): + # Removing the flag is always fine, even mid-pipeline. + validate_set_reference("inLabel", False) + validate_set_reference("inReview", False) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_official_to_data_blank_ban_cam_ket.py b/be0/tests/test_official_to_data_blank_ban_cam_ket.py new file mode 100644 index 0000000..07b9879 --- /dev/null +++ b/be0/tests/test_official_to_data_blank_ban_cam_ket.py @@ -0,0 +1,71 @@ +"""`BẢN CAM KẾT` → `ban_cam_ket` matches `bieu_mau_sang_kien_template.json` keys (fe0).""" + +import copy +import json +from pathlib import Path + +import pytest + +from src.be01.official_to_data_blank import official_to_data_blank + +_REPO_ROOT = Path(__file__).resolve().parents[2] +_TEMPLATE_PATH = _REPO_ROOT / "fe0" / "public" / "assets" / "bieu_mau_sang_kien_template.json" + + +def _minimal_official_with_bck() -> dict: + if not _TEMPLATE_PATH.is_file(): + pytest.skip("fe0 template JSON not found at %s" % _TEMPLATE_PATH) + raw = json.loads(_TEMPLATE_PATH.read_text(encoding="utf-8")) + official = {"BẢN CAM KẾT": copy.deepcopy(raw["BẢN CAM KẾT"])} + bck = official["BẢN CAM KẾT"] + bck["Ngày ký"] = {"Ngày": "15", "Tháng": "4", "Năm": "2026"} + i1 = bck["I. THÔNG TIN CHỦ THỂ CAM KẾT"] + i1["Tác giả đăng ký sáng kiến"] = "Nguyễn Văn A" + i1["CCCD/Hộ chiếu số"] = "079012345678" + i1["Đơn vị"] = "Khoa X" + i1["Tên Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH"] = "Bài báo thử nghiệm" + i1["Năm xét công nhận sáng kiến"] = "2026" + vt = i1["Vai trò đối với bài báo (☑ vào ô tương ứng)"] + vt["Tác giả chính Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH"] = True + vt["Đồng tác giả Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH"] = False + ii = bck["II. CAM KẾT NỘI DUNG (☑ vào ô tương ứng)"] + q = ii["1. Quyền sở hữu đối với bài báo trong nước/quốc tế"] + k1 = ( + "Tôi là chủ sở hữu hợp pháp của bài báo hoặc được chủ sở hữu/đồng chủ sở hữu đồng ý cho sử dụng bài báo có tên nêu trên làm sản phẩm đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD" + ) + k2 = ( + "Trường hợp bài báo là sản phẩm của nhiệm vụ NCKH: chủ sở hữu bài báo (cơ quan) đồng ý cho tác giả/nhóm tác giả sử dụng bài báo có tên nêu trên để đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD" + ) + q[k1] = True + q[k2] = False + kd = "Tất cả đồng tác giả đã biết, đồng ý và ký xác nhận cho phép Tác giả đăng ký sáng kiến được sử dụng bài báo có tên nêu trên để đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD" + ii["2. Đồng thuận của đồng tác giả bài báo trong nước/quốc tế"][kd] = True + ku = ( + "Cá nhân đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD đối với bài báo trong nước/quốc tế cam kết bài báo không thuộc 'Tạp chí săn mồi'. Tôi xin chịu trách nhiệm kiểm tra, đối chiếu và cung cấp bằng chứng khi được yêu cầu" + ) + ii["3. Cam kết bài báo trong nước/quốc tế uy tín"][ku] = True + kt = ( + "Tôi cam kết rằng việc sử dụng bài báo đăng ký xét công nhận sáng kiến tại ĐHYD sẽ không gây tranh chấp về: quyền tác giả/quyền liên quan, quyền sở hữu công nghiệp, tiết lộ bí mật kinh doanh, vi phạm bảo mật dữ liệu của bất kỳ bên thứ ba nào. Tôi chịu trách nhiệm trước pháp luật về tính trung thực, hợp pháp của hồ sơ" + ) + ii["4. Tuân thủ pháp luật sở hữu trí tuệ"][kt] = True + bck["Người cam kết (Ký tên, ghi rõ họ tên)"] = "Nguyễn Văn A" + return official + + +def test_ban_cam_ket_from_numbered_template_keys(): + out = official_to_data_blank(_minimal_official_with_bck()) + b = out["ban_cam_ket"] + assert b["tac_gia_dang_ky"] == "Nguyễn Văn A" + assert b["cccd"] == "079012345678" + assert b["don_vi"] == "Khoa X" + assert b["ten_bai_bao"] == "Bài báo thử nghiệm" + assert b["nam_xet"] == "2026" + assert b["ngay_ky"] == {"ngay": "15", "thang": "4", "nam": "2026"} + assert b["vai_tro"]["tac_gia_chinh"] is True + assert b["vai_tro"]["dong_tac_gia"] is False + assert b["cam_ket"]["quyen_so_huu_1"] is True + assert b["cam_ket"]["quyen_so_huu_2"] is False + assert b["cam_ket"]["dong_thuan"] is True + assert b["cam_ket"]["bai_bao_uy_tin"] is True + assert b["cam_ket"]["tuan_thu_phap_luat"] is True + assert b["nguoi_cam_ket"] == "Nguyễn Văn A" diff --git a/be0/tests/test_official_to_data_blank_don_vi.py b/be0/tests/test_official_to_data_blank_don_vi.py new file mode 100644 index 0000000..ee4e313 --- /dev/null +++ b/be0/tests/test_official_to_data_blank_don_vi.py @@ -0,0 +1,72 @@ +"""Cover / Mẫu 02 don_vi resolution for legacy officialBieuMau JSON. + +Run: cd be0 && python -m unittest tests.test_official_to_data_blank_don_vi -v +""" + +from __future__ import annotations + +import unittest + +from src.be01.official_to_data_blank import _resolve_don_vi_cong_tac, official_to_data_blank + + +class OfficialToDataBlankDonViTests(unittest.TestCase): + def test_resolve_prefers_explicit_cover(self) -> None: + official = { + "TRANG BÌA": {"Đơn vị công tác": " Phòng A "}, + "MẪU SỐ 02 - ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN": { + "Đơn vị": "", + "Danh sách tác giả": [ + {"STT": "1", "Họ và tên": "X", "Nơi công tác": "Phòng B"}, + ], + }, + } + self.assertEqual(_resolve_don_vi_cong_tac(official), "Phòng A") + + def test_resolve_falls_back_to_first_author_workplace(self) -> None: + official = { + "TRANG BÌA": {"Đơn vị công tác": ""}, + "MẪU SỐ 02 - ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN": { + "Đơn vị": "", + "Danh sách tác giả": [ + { + "STT": "1", + "Họ và tên": "Nguyễn Văn A", + "Nơi công tác": "Trường Y", + }, + ], + }, + } + self.assertEqual(_resolve_don_vi_cong_tac(official), "Trường Y") + + def test_official_to_data_blank_sets_trang_bia_and_mau02(self) -> None: + official = { + "TRANG BÌA": { + "Tên sáng kiến (Tiếng Việt)": "SK1", + "Tác giả/nhóm tác giả sáng kiến": "A", + "Đơn vị công tác": "", + "Thông tin liên hệ (Điện thoại, Email)": "", + "Năm": "2026", + }, + "MẪU SỐ 02 - ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN": { + "Đơn vị": "", + "Danh sách tác giả": [ + { + "STT": "1", + "Họ và tên": "A", + "Ngày tháng năm sinh": "", + "Nơi công tác": "Khoa X", + "Chức danh": "", + "Trình độ chuyên môn": "", + "Tỷ lệ (%) đóng góp vào việc tạo ra sáng kiến": "100", + }, + ], + }, + } + ctx = official_to_data_blank(official) + self.assertEqual(ctx["trang_bia"]["don_vi"], "Khoa X") + self.assertEqual(ctx["mau_02"]["don_vi"], "Khoa X") + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_registration_otp.py b/be0/tests/test_registration_otp.py new file mode 100644 index 0000000..bbd05ec --- /dev/null +++ b/be0/tests/test_registration_otp.py @@ -0,0 +1,135 @@ +""" +Registration OTP API (PostgreSQL + mocked outbound mail). + + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://initiative:initiative_secret@127.0.0.1:15432/initiatives" + # Ensure migrations through 014_registration_otp.sql are applied. + cd be0 && python -m unittest tests.test_registration_otp -v + +Optional live login smoke (**credentials must never be committed**): + + export TEST_LIVE_AUTH_EMAIL="nltanh@ump.edu.vn" + export TEST_LIVE_AUTH_PASSWORD='' + cd be0 && python -m unittest tests.test_registration_otp.LiveAuthLoginOptionalTests.test_login_with_env_credentials -v +""" + +from __future__ import annotations + +import os +import unittest +import uuid +from unittest.mock import patch + +from sqlalchemy import select + +from tests.auth_register_staff_fixture import register_staff_fields + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") + +_TEST_PASSWORD = "Testpass1!" + + +@unittest.skipUnless( + _RUN_DB, + "Set INITIATIVE_DATABASE_URL=postgresql+asyncpg://.../initiatives", +) +class RegistrationOtpApiTests(unittest.IsolatedAsyncioTestCase): + async def asyncSetUp(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + await eng.init_engine() + + async def asyncTearDown(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + + async def _delete_user(self, email: str) -> None: + from src.initiative_db.engine import get_session + from src.initiative_db.models import User + + async with get_session() as session: + user = ( + await session.execute(select(User).where(User.email == email)) + ).scalar_one_or_none() + if user is not None: + await session.delete(user) + + async def test_register_verify_otp_then_login(self) -> None: + from fastapi.testclient import TestClient + + from main import app + + email = f"otp-{uuid.uuid4().hex[:12]}@ump.edu.vn" + captured: list[str] = [] + + async def grab(_to: str, raw: str) -> None: + captured.append(raw) + + try: + with patch.dict(os.environ, {"AUTH_MAIL_LOG_ONLY": "1"}): + with patch("src.auth_api.deliver_registration_otp_email", side_effect=grab): + with TestClient(app) as client: + r = client.post( + "/api/v1/auth/register", + json={ + "fullName": "OTP Tester", + "email": email, + "password": _TEST_PASSWORD, + "passwordConfirm": _TEST_PASSWORD, + **register_staff_fields(), + }, + ) + self.assertEqual(r.status_code, 200, r.text) + self.assertTrue(captured) + otp = captured[0] + self.assertEqual(len(otp), 6) + self.assertTrue(otp.isdigit()) + + with TestClient(app) as client: + blocked = client.post( + "/api/v1/auth/login", + json={"email": email, "password": _TEST_PASSWORD}, + ) + self.assertEqual(blocked.status_code, 403, blocked.text) + + bad = client.post("/api/v1/auth/verify-otp", json={"email": email, "otp": "000000"}) + self.assertEqual(bad.status_code, 400, bad.text) + + ok = client.post("/api/v1/auth/verify-otp", json={"email": email, "otp": otp}) + self.assertEqual(ok.status_code, 200, ok.text) + + lg = client.post( + "/api/v1/auth/login", + json={"email": email, "password": _TEST_PASSWORD}, + ) + self.assertEqual(lg.status_code, 200, lg.text) + self.assertTrue(lg.json().get("accessToken")) + finally: + await self._delete_user(email) + + +_LIVE_PW = os.getenv("TEST_LIVE_AUTH_PASSWORD", "").strip() +_LIVE_EMAIL = os.getenv("TEST_LIVE_AUTH_EMAIL", "nltanh@ump.edu.vn").strip().lower() + + +@unittest.skipUnless( + _RUN_DB and bool(_LIVE_PW), + "Set INITIATIVE_DATABASE_URL and TEST_LIVE_AUTH_PASSWORD for optional login smoke " + "(use TEST_LIVE_AUTH_EMAIL to override default nltanh@ump.edu.vn).", +) +class LiveAuthLoginOptionalTests(unittest.TestCase): + """Uses secrets from env only — passwords must never appear in source control.""" + + def test_login_with_env_credentials(self) -> None: + from fastapi.testclient import TestClient + + from main import app + + with TestClient(app) as client: + r = client.post( + "/api/v1/auth/login", + json={"email": _LIVE_EMAIL, "password": _LIVE_PW}, + ) + self.assertEqual(r.status_code, 200, r.text) + self.assertTrue(r.json().get("accessToken")) diff --git a/be0/tests/test_registration_stack_alignment.py b/be0/tests/test_registration_stack_alignment.py new file mode 100644 index 0000000..e0af092 --- /dev/null +++ b/be0/tests/test_registration_stack_alignment.py @@ -0,0 +1,309 @@ +""" +Registration stack alignment: frontend-shaped payloads, API, PostgreSQL, and MinIO. + +Mirrors the JSON body produced by fe0 ``src/lib/auth-service.ts`` (``register()``); +when changing that client, update this test's payload builder if needed. + +Flow: + 1. POST /api/v1/auth/register — expect verification flow (no JWT); DB rows match payload. + 2. Login blocked with 403 until verify-otp. + 3. POST verify-otp — DB email_verified, OTP row consumed. + 4. Login — JWT issued. + 5. Minimal draft save + evidence upload — MinIO attachments bucket contains object at ``storageKey`` (``head_object``). + +Run (same env style as test_backup_e2e): + + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://initiative:initiative_secret@127.0.0.1:15432/initiatives" + export S3_ENDPOINT_URL="http://127.0.0.1:19000" + export S3_PUBLIC_ENDPOINT_URL="http://127.0.0.1:19000" + export S3_ACCESS_KEY="minio_user" + export S3_SECRET_KEY="minio_password" + export S3_BUCKET_ATTACHMENTS="initiative-attachments" + export S3_BUCKET_EXPORTS="initiative-exports" + export S3_BUCKET_QUARANTINE="initiative-quarantine" + export REGISTRATION_STACK_TEST=1 + cd be0 && python -m unittest tests.test_registration_stack_alignment -v +""" + +from __future__ import annotations + +import hashlib +import io +import os +import unittest +import uuid +from unittest.mock import patch + +from sqlalchemy import select + +from tests.auth_register_staff_fixture import register_staff_fields +from tests.fixtures.minimal_submit_bundle import minimal_tabs_bundle + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") +_S3_KEYS = ( + "S3_ENDPOINT_URL", + "S3_ACCESS_KEY", + "S3_SECRET_KEY", + "S3_BUCKET_ATTACHMENTS", + "S3_BUCKET_EXPORTS", + "S3_BUCKET_QUARANTINE", +) +_HAS_S3 = all(os.getenv(k, "").strip() for k in _S3_KEYS) +_RUN_ALIGN = os.getenv("REGISTRATION_STACK_TEST", "").strip().lower() in ("1", "true", "yes") + +_TEST_PASSWORD = "Testpass1!" +_MIN_PDF = ( + b"%PDF-1.4\n%\xe2\xe3\xcf\xd3\n1 0 obj<<>>endobj\ntrailer<<>>\n%%EOF\n" + b"0" * 120 +) + + +def _fe0_style_register_json( + *, + email: str, + full_name: str, + password: str, + staff: dict[str, str], +) -> dict: + """Same keys as fe0 auth-service.register() JSON body (no ``role``, no camelCase drift).""" + return { + "fullName": full_name, + "email": email, + "password": password, + "passwordConfirm": password, + "employeeId": staff["employeeId"], + "academicTitleCode": staff["academicTitleCode"], + "unitNameFreetext": staff["unitNameFreetext"], + "jobTitle": staff["jobTitle"], + } + + +def _token_hash(raw: str) -> str: + return hashlib.sha256(raw.encode("utf-8")).hexdigest() + + +def _s3_client(): + import boto3 + from botocore.config import Config as BotoConfig + + return boto3.client( + "s3", + endpoint_url=os.environ["S3_ENDPOINT_URL"].strip(), + aws_access_key_id=os.environ["S3_ACCESS_KEY"].strip(), + aws_secret_access_key=os.environ["S3_SECRET_KEY"].strip(), + region_name=os.getenv("S3_REGION", "us-east-1"), + config=BotoConfig(signature_version="s3v4"), + ) + + +@unittest.skipUnless( + _RUN_DB and _HAS_S3 and _RUN_ALIGN, + "Need INITIATIVE_DATABASE_URL, full S3_* env, REGISTRATION_STACK_TEST=1 (see module docstring).", +) +class RegistrationStackAlignmentTests(unittest.IsolatedAsyncioTestCase): + async def asyncSetUp(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + await eng.init_engine() + + async def asyncTearDown(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + + async def _delete_user_and_initiatives(self, email: str) -> None: + from src.initiative_db.engine import get_session + from src.initiative_db.models import Initiative, User + + async with get_session() as session: + user = ( + await session.execute(select(User).where(User.email == email)) + ).scalar_one_or_none() + if user is None: + return + inis = ( + await session.execute(select(Initiative).where(Initiative.owner_id == user.id)) + ).scalars().all() + for ini in inis: + await session.delete(ini) + await session.flush() + await session.delete(user) + await session.commit() + + async def test_register_api_db_login_verify_minio_alignment(self) -> None: + from fastapi.testclient import TestClient + + from main import app + from src.initiative_db.engine import get_session + from src.initiative_db.models import ( + RegistrationOtpCode, + User, + UserRoleRow, + UserStaffProfile, + ) + + email = f"align-reg-{uuid.uuid4().hex[:12]}@ump.edu.vn" + staff = register_staff_fields() + full_name = "Alignment Register User" + body = _fe0_style_register_json( + email=email, + full_name=full_name, + password=_TEST_PASSWORD, + staff=staff, + ) + + captured: list[str] = [] + + async def grab_mail(_to: str, raw: str) -> None: + captured.append(raw) + + bucket = os.environ["S3_BUCKET_ATTACHMENTS"].strip() + s3 = _s3_client() + storage_key: str | None = None + + try: + with patch.dict(os.environ, {"AUTH_MAIL_LOG_ONLY": "1"}): + with patch("src.auth_api.deliver_registration_otp_email", side_effect=grab_mail): + with TestClient(app) as client: + # --- Register (mirrors fe0 fetch body) --- + r = client.post("/api/v1/auth/register", json=body) + self.assertEqual(r.status_code, 200, r.text) + payload = r.json() + self.assertTrue(payload.get("emailVerificationRequired"), payload) + self.assertNotIn("accessToken", payload) + self.assertEqual(payload.get("email"), email) + self.assertIn("message", payload) + + u_out = payload.get("user") or {} + self.assertEqual(u_out.get("email"), email) + self.assertEqual(u_out.get("name"), full_name) + self.assertIs(u_out.get("emailVerified"), False) + self.assertIn("viewer", u_out.get("roles") or []) + sp_out = u_out.get("staffProfile") or {} + self.assertEqual(sp_out.get("employeeId"), staff["employeeId"]) + self.assertEqual(sp_out.get("academicTitleCode"), staff["academicTitleCode"]) + self.assertEqual(sp_out.get("unitNameFreetext"), staff["unitNameFreetext"]) + self.assertEqual(sp_out.get("jobTitle"), staff["jobTitle"]) + + self.assertTrue(captured, "OTP should be issued") + raw_otp = captured[0] + self.assertEqual(len(raw_otp), 6, raw_otp) + self.assertTrue(raw_otp.isdigit()) + + with TestClient(app) as client: + blocked = client.post( + "/api/v1/auth/login", + json={"email": email, "password": _TEST_PASSWORD}, + ) + self.assertEqual(blocked.status_code, 403, blocked.text) + + # --- PostgreSQL: user, profile, role, OTP row --- + async with get_session() as session: + user = ( + await session.execute(select(User).where(User.email == email)) + ).scalar_one() + self.assertFalse(user.email_verified) + self.assertEqual(user.full_name, full_name) + + profile = await session.get(UserStaffProfile, user.id) + self.assertIsNotNone(profile) + assert profile is not None + self.assertEqual(profile.employee_id, staff["employeeId"]) + self.assertEqual(profile.academic_title_code, staff["academicTitleCode"]) + self.assertEqual(profile.job_title, staff["jobTitle"]) + + roles = ( + await session.execute( + select(UserRoleRow.role).where(UserRoleRow.user_id == user.id) + ) + ).scalars().all() + self.assertEqual(sorted(roles), ["viewer"]) + + otps = ( + await session.execute( + select(RegistrationOtpCode).where( + RegistrationOtpCode.user_id == user.id + ) + ) + ).scalars().all() + self.assertEqual(len(otps), 1) + o = otps[0] + self.assertIsNone(o.used_at) + self.assertEqual(o.otp_hash, _token_hash(raw_otp)) + + # --- Verify OTP --- + with TestClient(app) as client: + vr = client.post("/api/v1/auth/verify-otp", json={"email": email, "otp": raw_otp}) + self.assertEqual(vr.status_code, 200, vr.text) + + async with get_session() as session: + user = ( + await session.execute(select(User).where(User.email == email)) + ).scalar_one() + self.assertTrue(user.email_verified) + otps = ( + await session.execute( + select(RegistrationOtpCode).where( + RegistrationOtpCode.user_id == user.id + ) + ) + ).scalars().all() + self.assertEqual(len(otps), 1) + self.assertIsNotNone(otps[0].used_at) + + # --- Login --- + with TestClient(app) as client: + ok = client.post( + "/api/v1/auth/login", + json={"email": email, "password": _TEST_PASSWORD}, + ) + self.assertEqual(ok.status_code, 200, ok.text) + token = ok.json()["accessToken"] + self.assertTrue(token) + + cr = client.post( + "/api/applications/new", + headers={"Authorization": f"Bearer {token}"}, + json={"name": "Alignment case"}, + ) + self.assertEqual(cr.status_code, 200, cr.text) + shell = cr.json().get("application") or {} + case_id = str(shell.get("draft_case_id") or "").strip() + self.assertTrue(case_id, shell) + + bundle = minimal_tabs_bundle(initiative_name="Align Initiative") + for tab_name in ("report", "application", "contribution"): + dr = client.post( + "/api/v1/application-drafts", + headers={"Authorization": f"Bearer {token}"}, + json={"caseId": case_id, "tab": tab_name, "data": bundle[tab_name]}, + ) + self.assertEqual(dr.status_code, 200, dr.text) + + er = client.post( + f"/api/v1/application-drafts/{case_id}/evidence", + headers={"Authorization": f"Bearer {token}"}, + data={"kind": "technical"}, + files={"file": ("align.pdf", io.BytesIO(_MIN_PDF), "application/pdf")}, + ) + self.assertEqual(er.status_code, 200, er.text, er.text) + ev_body = er.json() + storage_key = str(ev_body.get("storageKey") or "").strip() + self.assertTrue(storage_key, ev_body) + + # MinIO must contain bytes at the key the API recorded. + assert storage_key is not None + head = s3.head_object(Bucket=bucket, Key=storage_key) + self.assertGreater(int(head["ContentLength"]), 0) + + finally: + await self._delete_user_and_initiatives(email) + if storage_key: + try: + s3.delete_object(Bucket=bucket, Key=storage_key) + except Exception: + pass + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_repair_split_submission.py b/be0/tests/test_repair_split_submission.py new file mode 100644 index 0000000..77ff334 --- /dev/null +++ b/be0/tests/test_repair_split_submission.py @@ -0,0 +1,54 @@ +""" +Unit tests for `repair_split_submission` merge logic (no PostgreSQL required). + +DB integration for the full repair is gated on INITIATIVE_DATABASE_URL (see test_applications_db_integration.py). +""" + +from __future__ import annotations + +import unittest + +from src.initiative_db.repair_split_submission import ( + merge_payload_for_case_repair, + tabs_effectively_empty, +) + + +class RepairSplitSubmissionPureTests(unittest.TestCase): + def test_tabs_effectively_empty_true(self) -> None: + self.assertTrue(tabs_effectively_empty({})) + self.assertTrue(tabs_effectively_empty({"report": {}, "application": {}, "contribution": {}})) + self.assertTrue(tabs_effectively_empty(None)) + + def test_tabs_effectively_empty_false(self) -> None: + self.assertFalse(tabs_effectively_empty({"report": {"x": 1}})) + + def test_merge_prefers_good_tabs(self) -> None: + good = { + "tabs": {"application": {"initiativeName": "A"}, "report": {}, "contribution": {}}, + "caseId": "CASE-OLD", + } + bad = { + "tabs": {}, + "submissionRecord": {"id": "sub-abc"}, + "submissionFile": {"url": "/submitted-initiatives/x.pdf", "type": "pdf"}, + } + m = merge_payload_for_case_repair( + target_case_code="CASE-OK", + good_payload=good, + bad_payload=bad, + ) + self.assertEqual(m["caseId"], "CASE-OK") + self.assertEqual(m["submissionRecord"]["id"], "sub-abc") + self.assertEqual(m["submissionFile"]["url"], "/submitted-initiatives/x.pdf") + self.assertEqual(m["tabs"]["application"]["initiativeName"], "A") + + def test_merge_falls_back_to_bad_tabs_when_good_empty(self) -> None: + good = {"tabs": {}, "caseId": "CASE-OK"} + bad = {"tabs": {"application": {"k": "v"}}, "submissionRecord": {"id": "sub-x"}} + m = merge_payload_for_case_repair(target_case_code="CASE-OK", good_payload=good, bad_payload=bad) + self.assertEqual(m["tabs"]["application"]["k"], "v") + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_research_routes.py b/be0/tests/test_research_routes.py new file mode 100644 index 0000000..4c71cd9 --- /dev/null +++ b/be0/tests/test_research_routes.py @@ -0,0 +1,403 @@ +"""Tests for the research-project routes (proposals lifecycle). + +The pure-helper unit tests always run. The full lifecycle integration test runs only when +INITIATIVE_DATABASE_URL points at PostgreSQL (asyncpg), e.g.: + + export INITIATIVE_DATABASE_URL="postgresql+asyncpg://initiative:initiative_secret@127.0.0.1:15432/initiatives" + cd be0 && python -m unittest tests.test_research_routes -v + +Prereq for the DB test: migration 016_research_projects.sql applied (compose init mount or +scripts/apply_initiative_migrations.py). +""" +from __future__ import annotations + +import os +import unittest +import uuid + +_RUN_DB = os.getenv("INITIATIVE_DATABASE_URL", "").strip().lower().startswith("postgresql") + + +class ExtractScalarsTests(unittest.TestCase): + """Pure unit tests for the proposal-content scalar extraction (no DB).""" + + def test_reads_dotted_keys_and_coerces(self) -> None: + from src.research_routes import _extract_scalars + + s = _extract_scalars( + { + "tenDeTai": " AI dự đoán di căn ", + "capDeTai": "Thành phố", + "chuNhiem.hoTen": "TS. Nguyễn Văn A", + "thoiGianThucHienThang": "24", + "tongKinhPhi": "1800.5", + } + ) + self.assertEqual(s["title"], "AI dự đoán di căn") + self.assertEqual(s["level"], "Thành phố") + self.assertEqual(s["pi_name"], "TS. Nguyễn Văn A") + self.assertEqual(s["period_months"], 24) + self.assertAlmostEqual(s["budget_total"], 1800.5) + + def test_missing_and_garbage_values(self) -> None: + from src.research_routes import _extract_scalars + + empty = _extract_scalars({}) + self.assertEqual(empty["title"], "") + self.assertIsNone(empty["period_months"]) + self.assertIsNone(empty["budget_total"]) + + garbage = _extract_scalars({"thoiGianThucHienThang": "abc", "tongKinhPhi": ""}) + self.assertIsNone(garbage["period_months"]) + self.assertIsNone(garbage["budget_total"]) + + def test_non_dict_content(self) -> None: + from src.research_routes import _extract_scalars + + self.assertEqual(_extract_scalars(None)["title"], "") + self.assertEqual(_extract_scalars("nope")["pi_name"], "") + + +def _bearer(uid: uuid.UUID, roles: list[str]) -> str: + import jwt + + from src.auth_jwt import jwt_secret + + return "Bearer " + jwt.encode({"sub": str(uid), "roles": roles, "cv": 0}, jwt_secret(), algorithm="HS256") + + +@unittest.skipUnless( + _RUN_DB, + "Set INITIATIVE_DATABASE_URL=postgresql+asyncpg://.../initiatives to run DB integration tests", +) +class ResearchLifecycleDbTests(unittest.IsolatedAsyncioTestCase): + """End-to-end: draft → submit → approve, owner/admin authz, and the audit trail.""" + + async def asyncSetUp(self) -> None: + from src.initiative_db import engine as eng + + await eng.dispose_engine() + await eng.init_engine() + self._user_ids: list[uuid.UUID] = [] + self._project_ids: list[uuid.UUID] = [] + + async def asyncTearDown(self) -> None: + from sqlalchemy import delete + + from src.initiative_db import engine as eng + from src.initiative_db.engine import get_session + from src.initiative_db.models import ResearchProject, User + + async with get_session() as session: + for pid in self._project_ids: + await session.execute(delete(ResearchProject).where(ResearchProject.id == pid)) + for uid in self._user_ids: + await session.execute(delete(User).where(User.id == uid)) + await session.commit() + await eng.dispose_engine() + + async def _seed_user(self, *, admin: bool = False) -> uuid.UUID: + from src.initiative_db.engine import get_session + from src.initiative_db.models import User + + uid = uuid.uuid4() + async with get_session() as session: + session.add( + User( + id=uid, + email=f"rp-{uid.hex[:10]}@ump.edu.vn", + password_hash="x", + full_name=("Quản trị" if admin else "Chủ nhiệm") + " Test", + ) + ) + await session.commit() + self._user_ids.append(uid) + return uid + + async def test_full_lifecycle_and_authz(self) -> None: + from fastapi import HTTPException + + from src.research_routes import ( + ApproveIn, + ProjectCreateIn, + approve_project, + create_project, + get_project, + list_audit, + submit_project, + ) + + owner = await self._seed_user() + admin = await self._seed_user(admin=True) + other = await self._seed_user() + owner_tok = _bearer(owner, ["viewer"]) + admin_tok = _bearer(admin, ["admin"]) + other_tok = _bearer(other, ["viewer"]) + + # create draft + created = await create_project( + ProjectCreateIn( + content={ + "tenDeTai": "Đề tài thử nghiệm", + "capDeTai": "Cơ sở", + "thoiGianThucHienThang": "12", + "tongKinhPhi": "500", + } + ), + owner_tok, + ) + self._project_ids.append(uuid.UUID(created.id)) + self.assertEqual(created.status, "draft") + self.assertEqual(created.title, "Đề tài thử nghiệm") + self.assertEqual(created.periodMonths, 12) + + # another user cannot read it (404 hides existence) + with self.assertRaises(HTTPException) as ctx_read: + await get_project(created.id, other_tok) + self.assertEqual(ctx_read.exception.status_code, 404) + + # submit (owner) + submitted = await submit_project(created.id, owner_tok) + self.assertEqual(submitted.status, "submitted") + self.assertIsNotNone(submitted.submittedAt) + + # owner cannot approve (admin-only) → 403 + with self.assertRaises(HTTPException) as ctx_appr: + await approve_project(created.id, ApproveIn(), owner_tok) + self.assertEqual(ctx_appr.exception.status_code, 403) + + # admin approves with a code + approved = await approve_project(created.id, ApproveIn(code="ĐTUD-TEST", note="Đạt"), admin_tok) + self.assertEqual(approved.status, "approved") + self.assertEqual(approved.code, "ĐTUD-TEST") + self.assertIsNotNone(approved.reviewedAt) + + # cannot re-approve (not submitted anymore) → 409 + with self.assertRaises(HTTPException) as ctx_reappr: + await approve_project(created.id, ApproveIn(), admin_tok) + self.assertEqual(ctx_reappr.exception.status_code, 409) + + # audit trail recorded each transition (owner can read) + audit = await list_audit(created.id, owner_tok) + actions = [a.action for a in audit] + self.assertIn("Tạo bản thảo đề tài", actions) + self.assertIn("Nộp đề tài", actions) + self.assertIn("Phê duyệt đề tài", actions) + + async def test_reject_path_and_admin_cannot_edit_others_draft(self) -> None: + from fastapi import HTTPException + + from src.research_routes import ( + ProjectCreateIn, + ProjectUpdateIn, + RejectIn, + create_project, + reject_project, + submit_project, + update_project, + ) + + owner = await self._seed_user() + admin = await self._seed_user(admin=True) + owner_tok = _bearer(owner, ["viewer"]) + admin_tok = _bearer(admin, ["admin"]) + + created = await create_project(ProjectCreateIn(content={"tenDeTai": "Đề tài bị từ chối"}), owner_tok) + self._project_ids.append(uuid.UUID(created.id)) + + # admin can load it but cannot update/submit someone else's draft (owner-only) → 403 + with self.assertRaises(HTTPException) as ctx_upd: + await update_project(created.id, ProjectUpdateIn(content={"tenDeTai": "x"}), admin_tok) + self.assertEqual(ctx_upd.exception.status_code, 403) + with self.assertRaises(HTTPException) as ctx_sub: + await submit_project(created.id, admin_tok) + self.assertEqual(ctx_sub.exception.status_code, 403) + + # owner submits, admin rejects with a note + await submit_project(created.id, owner_tok) + rejected = await reject_project(created.id, RejectIn(note="Chưa đạt yêu cầu"), admin_tok) + self.assertEqual(rejected.status, "rejected") + self.assertEqual(rejected.reviewNote, "Chưa đạt yêu cầu") + + # cannot submit a rejected proposal (not draft) → 409 + with self.assertRaises(HTTPException) as ctx_resub: + await submit_project(created.id, owner_tok) + self.assertEqual(ctx_resub.exception.status_code, 409) + + async def test_cockpit_entities_crud_seeding_and_gate(self) -> None: + from fastapi import HTTPException + + from src.research_routes import ( + ApproveIn, + ProjectCreateIn, + approve_project, + create_entity, + create_project, + delete_entity, + get_cockpit, + list_entity, + submit_project, + update_entity, + ) + + owner = await self._seed_user() + admin = await self._seed_user(admin=True) + owner_tok = _bearer(owner, ["viewer"]) + admin_tok = _bearer(admin, ["admin"]) + + created = await create_project( + ProjectCreateIn( + content={ + "tenDeTai": "Đề tài cockpit", + "chuNhiem.hoTen": "TS. PI A", + "thanhVienThucHien": [{"hoTenHocVi": "KS. Lê C", "chucDanh": "Thành viên chính"}], + "tienDoThucHien": [{"noiDungCongViec": "ND1", "ketQua": "Báo cáo", "thoiGian": "T1-T3"}], + } + ), + owner_tok, + ) + pid = created.id + self._project_ids.append(uuid.UUID(pid)) + + # entity mutation before approval is locked → 409 + with self.assertRaises(HTTPException) as ctx_gate: + await create_entity(pid, "datasets", {"name": "X"}, owner_tok) + self.assertEqual(ctx_gate.exception.status_code, 409) + + # unknown entity type → 404 + with self.assertRaises(HTTPException) as ctx_unknown: + await list_entity(pid, "nope", owner_tok) + self.assertEqual(ctx_unknown.exception.status_code, 404) + + await submit_project(pid, owner_tok) + await approve_project(pid, ApproveIn(code="C1"), admin_tok) + + # approval seeded members (PI + 1 member) + milestones (1) from the proposal content + members = await list_entity(pid, "members", owner_tok) + self.assertGreaterEqual(len(members), 2) + self.assertTrue(any(m["name"] == "TS. PI A" for m in members)) + milestones = await list_entity(pid, "milestones", owner_tok) + self.assertGreaterEqual(len(milestones), 1) + self.assertEqual(milestones[0]["start"], "T1-T3") + + # create + coercion (records "620" → int 620) + ds = await create_entity(pid, "datasets", {"name": "Ảnh CLVT", "records": "620", "status": "Sẵn sàng"}, owner_tok) + self.assertEqual(ds["name"], "Ảnh CLVT") + self.assertEqual(ds["records"], 620) + + # update + upd = await update_entity(pid, "datasets", ds["id"], {"status": "Khóa"}, owner_tok) + self.assertEqual(upd["status"], "Khóa") + + # cockpit bundle reflects entities + audit captured the actions + bundle = await get_cockpit(pid, admin_tok) + self.assertEqual(bundle["project"]["status"], "approved") + self.assertEqual(len(bundle["datasets"]), 1) + audit_actions = [a["action"] for a in bundle["audit"]] + self.assertIn("Thêm bộ dữ liệu", audit_actions) + self.assertIn("Cập nhật bộ dữ liệu", audit_actions) + + # delete + await delete_entity(pid, "datasets", ds["id"], owner_tok) + self.assertEqual(len(await list_entity(pid, "datasets", owner_tok)), 0) + + async def test_seeding_survives_malformed_content(self) -> None: + """A PI can put arbitrary JSON in content; approve-time seeding must never crash (best-effort).""" + from src.research_routes import ( + ApproveIn, + ProjectCreateIn, + approve_project, + create_project, + list_entity, + submit_project, + ) + + owner = await self._seed_user() + admin = await self._seed_user(admin=True) + owner_tok = _bearer(owner, ["viewer"]) + admin_tok = _bearer(admin, ["admin"]) + + created = await create_project( + ProjectCreateIn( + content={ + "tenDeTai": "Đề tài lỗi định dạng", + "chuNhiem.hoTen": "TS. PI A", + "thanhVienThucHien": 5, # truthy non-list — must be ignored, not crash + "tienDoThucHien": "oops", # truthy non-list + } + ), + owner_tok, + ) + pid = created.id + self._project_ids.append(uuid.UUID(pid)) + await submit_project(pid, owner_tok) + + approved = await approve_project(pid, ApproveIn(), admin_tok) # must not raise + self.assertEqual(approved.status, "approved") + # malformed repeatables seeded nothing; only the PI member came from chuNhiem.hoTen + self.assertEqual(len(await list_entity(pid, "milestones", owner_tok)), 0) + members = await list_entity(pid, "members", owner_tok) + self.assertTrue(any(m["name"] == "TS. PI A" for m in members)) + + async def test_update_detail_merges_after_approval(self) -> None: + """The cockpit detail endpoint: approved-only, owner-or-admin, shallow-merge + audit.""" + from fastapi import HTTPException + + from src.research_routes import ( + ApproveIn, + ProjectCreateIn, + ProjectDetailPatchIn, + approve_project, + create_project, + get_cockpit, + submit_project, + update_project_detail, + ) + + owner = await self._seed_user() + admin = await self._seed_user(admin=True) + other = await self._seed_user() + owner_tok = _bearer(owner, ["viewer"]) + admin_tok = _bearer(admin, ["admin"]) + other_tok = _bearer(other, ["viewer"]) + + created = await create_project( + ProjectCreateIn(content={"tenDeTai": "Đề tài chi tiết", "tongKinhPhi": "300"}), + owner_tok, + ) + pid = created.id + self._project_ids.append(uuid.UUID(pid)) + + # detail patch is rejected before approval (cockpit-only) → 409 + with self.assertRaises(HTTPException) as ctx_draft: + await update_project_detail(pid, ProjectDetailPatchIn(patch={"soHopDong": "x"}), owner_tok) + self.assertEqual(ctx_draft.exception.status_code, 409) + + await submit_project(pid, owner_tok) + await approve_project(pid, ApproveIn(code="C-DET"), admin_tok) + + # owner patches admin-detail fields; merge preserves the original proposal key + re-derives scalars + patched = await update_project_detail( + pid, + ProjectDetailPatchIn(patch={"soHopDong": "205/2024/HĐ", "tongKinhPhi": "29900000"}), + owner_tok, + ) + self.assertEqual(patched.content["soHopDong"], "205/2024/HĐ") + self.assertEqual(patched.content["tenDeTai"], "Đề tài chi tiết") # untouched proposal key + self.assertAlmostEqual(patched.budgetTotal, 29900000.0) + + # admin may also patch (owner-or-admin — unlike draft update_project which is owner-only) + patched2 = await update_project_detail( + pid, ProjectDetailPatchIn(patch={"khoaDonVi": "Dược"}), admin_tok + ) + self.assertEqual(patched2.content["khoaDonVi"], "Dược") + self.assertEqual(patched2.content["soHopDong"], "205/2024/HĐ") # earlier patch survives + + # a stranger cannot patch (404 hides the row) + with self.assertRaises(HTTPException) as ctx_other: + await update_project_detail(pid, ProjectDetailPatchIn(patch={"x": "y"}), other_tok) + self.assertEqual(ctx_other.exception.status_code, 404) + + # audit captured the update + bundle = await get_cockpit(pid, owner_tok) + self.assertIn("Cập nhật thông tin đề tài", [a["action"] for a in bundle["audit"]]) diff --git a/be0/tests/test_security_routes.py b/be0/tests/test_security_routes.py new file mode 100644 index 0000000..13288ab --- /dev/null +++ b/be0/tests/test_security_routes.py @@ -0,0 +1,158 @@ +""" +Security regression tests for authenticated / removed routes (no Postgres required). + +Run: cd be0 && python -m unittest tests.test_security_routes -v +""" + +from __future__ import annotations + +import os +import unittest +from unittest.mock import patch + +from tests.security_token_fixture import mint_bearer_token + + +class SecurityRoutesTests(unittest.TestCase): + def _client(self): + from fastapi.testclient import TestClient + + from main import app + + return TestClient(app) + + def test_removed_upload_document_returns_404(self) -> None: + client = self._client() + r = client.post("/upload_document", files={"file": ("x.pdf", b"%PDF", "application/pdf")}) + self.assertEqual(r.status_code, 404) + + def test_removed_get_page_returns_404(self) -> None: + client = self._client() + r = client.post("/get_page", data={"new_page_number": "1"}) + self.assertEqual(r.status_code, 404) + + def test_list_applications_requires_auth(self) -> None: + client = self._client() + r = client.get("/api/applications") + self.assertEqual(r.status_code, 401) + + def test_list_applications_rejects_viewer(self) -> None: + client = self._client() + headers = {"Authorization": mint_bearer_token(roles=("viewer",))} + with patch("src.initiative_db.engine.is_postgres_enabled", return_value=False): + r = client.get("/api/applications", headers=headers) + self.assertEqual(r.status_code, 403) + + def test_list_applications_allows_staff_without_db(self) -> None: + client = self._client() + headers = {"Authorization": mint_bearer_token(roles=("admin",))} + with patch("src.initiative_db.engine.is_postgres_enabled", return_value=False): + with patch("main._load_submitted_items", return_value=[]): + r = client.get("/api/applications", headers=headers) + self.assertEqual(r.status_code, 200, r.text) + self.assertIn("data", r.json()) + + def test_get_application_requires_auth(self) -> None: + client = self._client() + r = client.get("/api/applications/sub-deadbeefdeadbeef") + self.assertEqual(r.status_code, 401) + + def test_get_application_rejects_viewer_without_row(self) -> None: + client = self._client() + headers = {"Authorization": mint_bearer_token(roles=("viewer",), email="viewer@ump.edu.vn")} + with patch("src.initiative_db.engine.is_postgres_enabled", return_value=False): + with patch("main._get_application_from_file_index", return_value=None): + r = client.get("/api/applications/sub-deadbeefdeadbeef", headers=headers) + self.assertEqual(r.status_code, 404) + + def test_review_documents_list_requires_auth(self) -> None: + client = self._client() + r = client.get("/api/v1/review-documents", params={"caseId": "CASE-1"}) + self.assertEqual(r.status_code, 401) + + def test_review_documents_create_requires_auth(self) -> None: + client = self._client() + r = client.post( + "/api/v1/review-documents", + json={"caseId": "CASE-1", "officialBieuMau": {}}, + ) + self.assertEqual(r.status_code, 401) + + def test_chat_requires_auth(self) -> None: + client = self._client() + r = client.post("/api/v1/chat", json={"message": "hello"}) + self.assertEqual(r.status_code, 401) + + def test_analyze_compliance_requires_auth(self) -> None: + client = self._client() + r = client.post( + "/analyze_compliance", + json={"external_requirements": ["ext"], "internal_requirements": ["int"]}, + ) + self.assertEqual(r.status_code, 401) + + def test_test_ollama_requires_admin(self) -> None: + client = self._client() + viewer = {"Authorization": mint_bearer_token(roles=("viewer",))} + r_viewer = client.post("/test_ollama", json={"prompt": "hi"}, headers=viewer) + self.assertEqual(r_viewer.status_code, 403) + + admin = {"Authorization": mint_bearer_token(roles=("admin",))} + with patch( + "main.ollama.chat", + return_value={"message": {"content": "ok"}}, + ): + r_admin = client.post("/test_ollama", json={"prompt": "hi"}, headers=admin) + self.assertEqual(r_admin.status_code, 200, r_admin.text) + + def test_ideas_post_requires_admin(self) -> None: + client = self._client() + headers = {"Authorization": mint_bearer_token(roles=("viewer",))} + r = client.post( + "/api/v1/ideas", + json={"title": "t", "description": "d"}, + headers=headers, + ) + self.assertEqual(r.status_code, 403) + + def test_security_headers_on_health(self) -> None: + client = self._client() + r = client.get("/health") + self.assertEqual(r.status_code, 200) + self.assertEqual(r.headers.get("x-content-type-options"), "nosniff") + self.assertEqual(r.headers.get("x-frame-options"), "DENY") + self.assertIn("referrer-policy", r.headers) + + +class JwtSecretTests(unittest.TestCase): + def test_production_requires_secret(self) -> None: + from src.auth_jwt import jwt_secret + + env = {k: v for k, v in os.environ.items() if k not in ("JWT_SECRET", "ENVIRONMENT")} + env["ENVIRONMENT"] = "production" + with patch.dict(os.environ, env, clear=True): + with self.assertRaises(RuntimeError): + jwt_secret() + + def test_development_allows_dev_fallback(self) -> None: + from src.auth_jwt import jwt_secret + + with patch.dict(os.environ, {"ENVIRONMENT": "development"}, clear=False): + os.environ.pop("JWT_SECRET", None) + secret = jwt_secret() + self.assertGreaterEqual(len(secret), 32) + + +class LoginRateLimitTests(unittest.TestCase): + def test_login_rate_limit_blocks_after_threshold(self) -> None: + from src.auth_rate_limit import allow_login + + email = "ratelimit-test@ump.edu.vn" + ip = "203.0.113.50" + for _ in range(5): + self.assertTrue(allow_login(email, ip)) + self.assertFalse(allow_login(email, ip)) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_staff_profile_domain.py b/be0/tests/test_staff_profile_domain.py new file mode 100644 index 0000000..58d0fe3 --- /dev/null +++ b/be0/tests/test_staff_profile_domain.py @@ -0,0 +1,95 @@ +"""Unit tests for staff_profile_domain (no database).""" + +from __future__ import annotations + +import unittest +import uuid + +from src.initiative_db.models import User, UserStaffProfile +from src.staff_profile_domain import ( + apply_reverify_from_verified, + assert_complete_for_submission, + assert_employee_id_shape, + assert_unit_exclusive, + material_staff_fields_changed, + normalize_employee_id, + staff_row_for_audit, +) + + +class StaffProfileDomainTests(unittest.TestCase): + def test_normalize_employee_id(self) -> None: + self.assertIsNone(normalize_employee_id(None)) + self.assertEqual(normalize_employee_id(" ab-12 "), "AB-12") + + def test_employee_id_shape(self) -> None: + assert_employee_id_shape(None) + assert_employee_id_shape("ABC-123") + with self.assertRaises(ValueError): + assert_employee_id_shape("ab") + + def test_unit_exclusive(self) -> None: + uid = uuid.uuid4() + user = User( + id=uid, + email="t@ump.edu.vn", + password_hash="x", + full_name="T", + unit_id=uuid.uuid4(), + ) + sp = UserStaffProfile(user_id=uid, unit_name_freetext=" Khoa X ") + with self.assertRaises(ValueError): + assert_unit_exclusive(user, sp) + + def test_material_staff_fields_changed(self) -> None: + a = staff_row_for_audit( + UserStaffProfile(user_id=uuid.uuid4(), job_title="A"), + None, + ) + b = staff_row_for_audit( + UserStaffProfile(user_id=uuid.uuid4(), job_title="B"), + None, + ) + self.assertTrue(material_staff_fields_changed(a, b)) + self.assertFalse(material_staff_fields_changed(a, a)) + + def test_apply_reverify_sets_pending(self) -> None: + from datetime import datetime, timezone + + sp = UserStaffProfile( + user_id=uuid.uuid4(), + profile_verification_status="verified", + verified_at=datetime.now(timezone.utc), + verified_by_user_id=uuid.uuid4(), + rejection_reason=None, + ) + now = datetime.now(timezone.utc) + apply_reverify_from_verified(sp, now) + self.assertEqual(sp.profile_verification_status, "pending") + self.assertIsNone(sp.verified_at) + self.assertEqual(sp.verification_submitted_at, now) + + def test_assert_complete_for_submission(self) -> None: + uid = uuid.uuid4() + user = User( + id=uid, + email="t@ump.edu.vn", + password_hash="x", + full_name="T", + unit_id=uuid.uuid4(), + ) + sp = UserStaffProfile( + user_id=uid, + employee_id="CB-001", + academic_title_code="master", + job_title="GV", + ) + assert_complete_for_submission(user, sp) + + sp2 = UserStaffProfile(user_id=uid, employee_id=None) + with self.assertRaises(ValueError): + assert_complete_for_submission(user, sp2) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_submission_readiness.py b/be0/tests/test_submission_readiness.py new file mode 100644 index 0000000..01e12ed --- /dev/null +++ b/be0/tests/test_submission_readiness.py @@ -0,0 +1,47 @@ +"""Unit tests for submit readiness validation (no database).""" + +from __future__ import annotations + +import unittest + +from src.initiative_db.submission_readiness import ( + ApplicationSubmissionNotReadyError, + collect_submission_readiness_gaps, +) + +from tests.fixtures.minimal_submit_bundle import minimal_tabs_bundle + + +class SubmissionReadinessTests(unittest.TestCase): + def test_minimal_tabs_with_technical_evidence_ok(self) -> None: + tabs = minimal_tabs_bundle() + gaps = collect_submission_readiness_gaps( + tabs, + {"research": False, "textbook": False, "technical": True}, + ) + self.assertEqual(gaps, []) + + def test_missing_evidence_fails(self) -> None: + tabs = minimal_tabs_bundle() + gaps = collect_submission_readiness_gaps( + tabs, + {"research": False, "textbook": False, "technical": False}, + ) + self.assertTrue(any("Nhóm 1" in g for g in gaps)) + + def test_missing_honesty_flags(self) -> None: + tabs = minimal_tabs_bundle() + tabs["report"]["honestyConfirmed"] = False + gaps = collect_submission_readiness_gaps( + tabs, + {"research": False, "textbook": False, "technical": True}, + ) + self.assertTrue(any("Báo cáo" in g and "cam kết" in g for g in gaps)) + + def test_exception_carries_missing(self) -> None: + exc = ApplicationSubmissionNotReadyError(["a", "b"]) + self.assertEqual(exc.missing, ["a", "b"]) + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_submissions_projection_research_kind.py b/be0/tests/test_submissions_projection_research_kind.py new file mode 100644 index 0000000..3ed4329 --- /dev/null +++ b/be0/tests/test_submissions_projection_research_kind.py @@ -0,0 +1,43 @@ +"""Projection of `tabs.application.researchEvidenceKind` onto list rows (no DB). + +Run: cd be0 && python -m unittest tests.test_submissions_projection_research_kind -v +""" + +from __future__ import annotations + +import unittest +import uuid +from datetime import datetime, timezone +from types import SimpleNamespace + + +class SubmissionsResearchEvidenceKindProjectionTests(unittest.TestCase): + def test_poster_without_review_round_trips_on_api_row(self) -> None: + from src.initiative_db.submissions import _as_submission_item + + ini = SimpleNamespace( + id=uuid.uuid4(), + case_code="CASE-PROJ-RK", + status="submitted", + submitted_at=datetime(2026, 1, 1, 12, 0, 0, tzinfo=timezone.utc), + ) + payload = { + "submissionRecord": { + "id": "sub-deadbeefcafe", + "submittedDate": "2026-01-01T12:00:00.000Z", + "name": "Test", + }, + "tabs": { + "application": { + "initiativeClassification": "research", + "researchEvidenceKind": "poster-without-review", + } + }, + } + row = _as_submission_item(ini, payload) # type: ignore[arg-type] + self.assertEqual(row.get("researchEvidenceKind"), "poster-without-review") + self.assertEqual(row.get("initiativeClassification"), "research") + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tests/test_user_notifications_merit.py b/be0/tests/test_user_notifications_merit.py new file mode 100644 index 0000000..e624109 --- /dev/null +++ b/be0/tests/test_user_notifications_merit.py @@ -0,0 +1,66 @@ +"""Unit tests for merit label derivation from draft JSON (notification body). + +Run: cd be0 && python -m unittest tests.test_user_notifications_merit -v +""" + +from __future__ import annotations + +import unittest + + +class MeritCategoryFromDraftTests(unittest.TestCase): + def test_poster_without_review_is_trung_binh(self) -> None: + from src.initiative_db.user_notifications import merit_category_label_from_draft_payload + + payload = { + "tabs": { + "application": { + "initiativeClassification": "research", + "researchEvidenceKind": "poster-without-review", + } + } + } + self.assertEqual(merit_category_label_from_draft_payload(payload), "Trung bình") + + def test_international_remains_xuat_sac(self) -> None: + from src.initiative_db.user_notifications import merit_category_label_from_draft_payload + + payload = { + "tabs": { + "application": { + "initiativeClassification": "research", + "researchEvidenceKind": "international", + } + } + } + self.assertEqual(merit_category_label_from_draft_payload(payload), "Xuất sắc") + + def test_textbook_book_is_xuat_sac(self) -> None: + from src.initiative_db.user_notifications import merit_category_label_from_draft_payload + + payload = { + "tabs": { + "application": { + "initiativeClassification": "textbook", + "textbookEvidenceKind": "book", + } + } + } + self.assertEqual(merit_category_label_from_draft_payload(payload), "Xuất sắc") + + def test_poster_with_review_still_kha_bucket(self) -> None: + from src.initiative_db.user_notifications import merit_category_label_from_draft_payload + + payload = { + "tabs": { + "application": { + "initiativeClassification": "research", + "researchEvidenceKind": "poster", + } + } + } + self.assertEqual(merit_category_label_from_draft_payload(payload), "Khá") + + +if __name__ == "__main__": + unittest.main() diff --git a/be0/tools/__pycache__/e2e_application_form_pdf_export.cpython-313.pyc b/be0/tools/__pycache__/e2e_application_form_pdf_export.cpython-313.pyc new file mode 100644 index 0000000..415b3a6 Binary files /dev/null and b/be0/tools/__pycache__/e2e_application_form_pdf_export.cpython-313.pyc differ diff --git a/be0/tools/e2e_application_form_pdf_export.py b/be0/tools/e2e_application_form_pdf_export.py new file mode 100644 index 0000000..13f99bd --- /dev/null +++ b/be0/tools/e2e_application_form_pdf_export.py @@ -0,0 +1,140 @@ +#!/usr/bin/env python3 +""" +Smoke / E2E test for application-form PDF export (docxtpl + LibreOffice). + + Direct (no HTTP): exercises merge + conversion on this machine / container. + cd be0 && python tools/e2e_application_form_pdf_export.py --direct --out /tmp/e2e-mau.pdf + + HTTP: needs FastAPI listening (--url is the API origin, no /api prefix in path). + python tools/e2e_application_form_pdf_export.py --http http://127.0.0.1:4402 --out /tmp/e2e-mau.pdf + + Pure curl + jq (no Python on host; official JSON must be wrapped): + cd /path/to/repo && jq -n --slurpfile o fe0/public/assets/bieu_mau_sang_kien_template.json \ + '{officialBieuMau: $o[0]}' | curl -sfS -X POST http://127.0.0.1:4402/api/v1/docx/preview-application-form-pdf \ + -H 'Content-Type: application/json' -o /tmp/e2e-mau.pdf && file /tmp/e2e-mau.pdf + + If the API returns HTTP 501, rebuild/run be0 with LibreOffice (see be0/Dockerfile: libreoffice-writer-nogui). + +Docker (stack up): + docker compose exec be0 python tools/e2e_application_form_pdf_export.py --direct --out /tmp/e2e-mau.pdf + +Refresh bundled sample after template changes: + cp ../fe0/public/assets/bieu_mau_sang_kien_template.json tools/e2e_sample_official_bieu_mau.json + +Exit 0 on success; non-zero on failure. + +from __future__ import annotations + +import argparse +import json +import sys +from pathlib import Path +from typing import Any, Dict + +BE0_ROOT = Path(__file__).resolve().parent.parent +if str(BE0_ROOT) not in sys.path: + sys.path.insert(0, str(BE0_ROOT)) + + +def _repo_root() -> Path: + return BE0_ROOT.parent + + +def load_sample_official() -> Dict[str, Any]: + """officialBieuMau-shaped JSON (same as Review « Xem lại »).""" + candidates = [ + Path(__file__).resolve().parent / "e2e_sample_official_bieu_mau.json", + _repo_root() / "fe0/public/assets/bieu_mau_sang_kien_template.json", + ] + for p in candidates: + if p.is_file(): + with open(p, encoding="utf-8") as f: + data = json.load(f) + break + else: + raise FileNotFoundError( + "No sample official JSON found. Expected tools/e2e_sample_official_bieu_mau.json " + "or fe0/public/assets/bieu_mau_sang_kien_template.json next to be0/." + ) + + # Visible markers in generated PDF/DOCX + if isinstance(data.get("TRANG BÌA"), dict): + data["TRANG BÌA"]["Tên sáng kiến (Tiếng Việt)"] = "E2E PDF export — tên sáng kiến kiểm thử" + data["TRANG BÌA"]["Năm"] = "2026" + return data + + +def run_direct(out_path: Path | None) -> bytes: + from src.be01.docx_to_pdf import convert_docx_bytes_to_pdf + from src.be01.fill_application_form import fill_application_form_docx + from src.be01.official_to_data_blank import official_to_data_blank + + official = load_sample_official() + ctx = official_to_data_blank(official) + docx = fill_application_form_docx(ctx) + pdf = convert_docx_bytes_to_pdf(docx) + if not pdf.startswith(b"%PDF"): + raise RuntimeError("Output is not a PDF (missing %PDF header).") + if out_path: + out_path.parent.mkdir(parents=True, exist_ok=True) + out_path.write_bytes(pdf) + print(f"Wrote {len(pdf)} bytes → {out_path}") + else: + print(f"OK: generated PDF, {len(pdf)} bytes (no --out)") + return pdf + + +def run_http(base_url: str, out_path: Path | None) -> bytes: + import urllib.error + import urllib.request + + official = load_sample_official() + url = base_url.rstrip("/") + "/api/v1/docx/preview-application-form-pdf" + body = json.dumps({"officialBieuMau": official}).encode("utf-8") + req = urllib.request.Request( + url, + data=body, + method="POST", + headers={"Content-Type": "application/json", "Accept": "application/pdf"}, + ) + try: + with urllib.request.urlopen(req, timeout=180) as resp: + raw = resp.read() + except urllib.error.HTTPError as e: + detail = e.read().decode("utf-8", errors="replace") + raise SystemExit(f"HTTP {e.code}: {detail}") from e + + if not raw.startswith(b"%PDF"): + raise SystemExit(f"Expected PDF, got {len(raw)} bytes, head={raw[:64]!r}") + if out_path: + out_path.parent.mkdir(parents=True, exist_ok=True) + out_path.write_bytes(raw) + print(f"Wrote {len(raw)} bytes → {out_path}") + else: + print(f"OK: HTTP PDF, {len(raw)} bytes") + return raw + + +def main() -> None: + ap = argparse.ArgumentParser(description=__doc__) + g = ap.add_mutually_exclusive_group(required=True) + g.add_argument("--direct", action="store_true", help="Run docxtpl + LibreOffice in-process") + g.add_argument("--http", metavar="BASE_URL", help="POST to FastAPI (e.g. http://127.0.0.1:4402)") + ap.add_argument("--out", type=Path, metavar="FILE.pdf", help="Write PDF to this path") + args = ap.parse_args() + + try: + if args.direct: + run_direct(args.out) + else: + run_http(args.http, args.out) + except FileNotFoundError as e: + print(f"FAIL: {e}", file=sys.stderr) + sys.exit(2) + except Exception as e: + print(f"FAIL: {e}", file=sys.stderr) + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/be0/tools/e2e_sample_official_bieu_mau.json b/be0/tools/e2e_sample_official_bieu_mau.json new file mode 100644 index 0000000..6cfec36 --- /dev/null +++ b/be0/tools/e2e_sample_official_bieu_mau.json @@ -0,0 +1,181 @@ +{ + "TRANG BÌA": { + "Tên sáng kiến (Tiếng Việt)": "", + "Tác giả/nhóm tác giả sáng kiến": "", + "Đơn vị công tác": "", + "Thông tin liên hệ (Điện thoại, Email)": "", + "Năm": "" + }, + "MẪU SỐ 01 - BÁO CÁO MÔ TẢ SÁNG KIẾN": { + "1. Mở đầu": "", + "2. Tên sáng kiến (tên quy trình, giải pháp, phương pháp)": "", + "3. Lĩnh vực áp dụng của sáng kiến": "", + "4. Mô tả sáng kiến": { + "4.1 Tình trạng giải pháp đã biết hoặc hiện trạng công tác khi chưa có sáng kiến": "", + "4.2 Nội dung giải pháp đề nghị công nhận là sáng kiến": { + "Mục đích của sáng kiến": "", + "Về nội dung của sáng kiến": { + "Các bước thực hiện giải pháp": "", + "Các điều kiện cần thiết để áp dụng giải pháp": "", + "Lĩnh vực áp dụng": "", + "Kết quả thu được": "", + "Danh sách đơn vị/cá nhân đã tham gia áp dụng thử hoặc lần đầu": [ + { + "TT": "", + "Tên tổ chức/cá nhân": "", + "Địa chỉ": "", + "Lĩnh vực áp dụng sáng kiến": "" + } + ] + }, + "Về tính mới của sáng kiến": "", + "Về tính hiệu quả": { + "Tạo ra lợi ích kinh tế": "", + "Đem lại hiệu quả trong giảng dạy": "", + "Tăng năng suất lao động": "", + "Nâng cao hiệu quả công việc": "", + "Nâng cao chất lượng công việc, dịch vụ": "", + "Giảm chi phí": "", + "Cải thiện môi trường, điều kiện học tập, làm việc, sống": "", + "Bảo vệ sức khỏe": "", + "Đảm bảo an toàn lao động, PCCC": "", + "Nâng cao khả năng, trình độ, nhận thức, trách nhiệm": "" + } + } + }, + "6. Những thông tin cần được bảo mật (nếu có)": "", + "Ngày ký": { + "Ngày": "", + "Tháng": "", + "Năm": "" + }, + "Lãnh đạo đơn vị (Ký, ghi rõ họ tên)": "", + "Tác giả sáng kiến (Ký, ghi rõ họ tên)": "" + }, + "MẪU SỐ 02 - ĐƠN ĐỀ NGHỊ CÔNG NHẬN SÁNG KIẾN": { + "Đơn vị": "", + "Danh sách tác giả": [ + { + "STT": "", + "Họ và tên": "", + "Ngày tháng năm sinh": "", + "Nơi công tác": "", + "Chức danh": "", + "Trình độ chuyên môn": "", + "Tỷ lệ (%) đóng góp vào việc tạo ra sáng kiến": "" + } + ], + "Tên sáng kiến đề nghị xét công nhận": "", + "Chủ đầu tư tạo ra sáng kiến": "", + "Lĩnh vực áp dụng sáng kiến": "", + "Ngày sáng kiến được áp dụng": "", + "Nội dung của sáng kiến": "", + "Phân loại sáng kiến (đánh dấu ☑)": { + "Giải pháp kỹ thuật, quản lý, tác nghiệp, ứng dụng tiến bộ kỹ thuật áp dụng cho ĐHYD TP.HCM": false, + "Sáng kiến – cải tiến kỹ thuật từ các nghiên cứu khoa học có kết quả được đăng tải trên các tạp chí, hội nghị trong nước và quốc tế": false, + "Sáng kiến – cải tiến kỹ thuật từ sách, giáo trình, tài liệu tham khảo": false + }, + "Những thông tin cần được bảo mật (nếu có)": "", + "Các điều kiện cần thiết để áp dụng sáng kiến": "", + "Đánh giá lợi ích theo ý kiến của tác giả": "", + "Đánh giá lợi ích theo ý kiến của tổ chức, cá nhân đã tham gia áp dụng sáng kiến lần đầu": "", + "Danh sách những người đã tham gia áp dụng thử hoặc áp dụng sáng kiến lần đầu": [ + { + "Số TT": "", + "Họ và tên": "", + "Ngày tháng năm sinh": "", + "Nơi công tác": "", + "Chức danh": "", + "Trình độ chuyên môn": "", + "Nội dung công việc hỗ trợ": "" + } + ], + "Ngày ký": { + "Ngày": "", + "Tháng": "", + "Năm": "" + }, + "Xác nhận của lãnh đạo Đơn vị": "", + "Tác giả sáng kiến (Ký, ghi rõ họ tên)": "" + }, + "MẪU SỐ 03 - BẢN XÁC NHẬN TỶ LỆ (%) ĐÓNG GÓP VÀO VIỆC TẠO RA SÁNG KIẾN": { + "Ngày ký": { + "Ngày": "", + "Tháng": "", + "Năm": "" + }, + "1. Tên sáng kiến": "", + "2. Tác giả chính/Đại diện nhóm tác giả sáng kiến": "", + "Chức vụ, đơn vị công tác": "", + "Tỷ lệ đóng góp": [ + { + "STT": "", + "Họ và tên": "", + "Đơn vị công tác": "", + "% đóng góp": "", + "Chữ ký xác nhận": "" + } + ], + "Tổng % đóng góp": "100", + "Tác giả chính/Đại diện nhóm tác giả sáng kiến (chữ ký và ghi rõ họ tên)": "" + }, + "MẪU SỐ 04 - PHIẾU ĐÁNH GIÁ SÁNG KIẾN": { + "1. Tên sáng kiến": "", + "2. Tác giả/đồng tác giả sáng kiến": "", + "Chức vụ, đơn vị công tác": "", + "3. Nội dung đánh giá": { + "Tính mới (Tối đa 40 điểm)": { + "Nhận xét": "", + "Điểm chấm": "" + }, + "Tính hiệu quả (Tối đa 60 điểm)": { + "Nhận xét": "", + "Điểm chấm": "" + }, + "Tổng cộng": "" + }, + "Kết luận": "", + "Ngày ký": { + "Ngày": "", + "Tháng": "", + "Năm": "" + }, + "Thành viên Hội đồng (Ký, ghi rõ họ tên)": "" + }, + "BẢN CAM KẾT": { + "Ngày ký": { + "Ngày": "", + "Tháng": "", + "Năm": "" + }, + "Tiêu đề phụ (Áp dụng đối với cá nhân đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại Đại học Y Dược TP. Hồ Chí Minh là tác giả của bài báo khoa học)": "", + "I. THÔNG TIN CHỦ THỂ CAM KẾT": { + "Tác giả đăng ký sáng kiến": "", + "CCCD/Hộ chiếu số": "", + "Đơn vị": "", + "Tên Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH": "", + "Năm xét công nhận sáng kiến": "", + "Vai trò đối với bài báo (☑ vào ô tương ứng)": { + "Tác giả chính Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH": false, + "Đồng tác giả Bài báo trong nước/quốc tế là sản phẩm của nhiệm vụ NCKH": false + } + }, + "II. CAM KẾT NỘI DUNG (☑ vào ô tương ứng)": { + "1. Quyền sở hữu đối với bài báo trong nước/quốc tế": { + "Tôi là chủ sở hữu hợp pháp của bài báo hoặc được chủ sở hữu/đồng chủ sở hữu đồng ý cho sử dụng bài báo có tên nêu trên làm sản phẩm đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD": false, + "Trường hợp bài báo là sản phẩm của nhiệm vụ NCKH: chủ sở hữu bài báo (cơ quan) đồng ý cho tác giả/nhóm tác giả sử dụng bài báo có tên nêu trên để đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD": false + }, + "2. Đồng thuận của đồng tác giả bài báo trong nước/quốc tế": { + "Tất cả đồng tác giả đã biết, đồng ý và ký xác nhận cho phép Tác giả đăng ký sáng kiến được sử dụng bài báo có tên nêu trên để đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD": false + }, + "3. Cam kết bài báo trong nước/quốc tế uy tín": { + "Cá nhân đăng ký xét công nhận sáng kiến – cải tiến kỹ thuật tại ĐHYD đối với bài báo trong nước/quốc tế cam kết bài báo không thuộc 'Tạp chí săn mồi'. Tôi xin chịu trách nhiệm kiểm tra, đối chiếu và cung cấp bằng chứng khi được yêu cầu": false + }, + "4. Tuân thủ pháp luật sở hữu trí tuệ": { + "Tôi cam kết rằng việc sử dụng bài báo đăng ký xét công nhận sáng kiến tại ĐHYD sẽ không gây tranh chấp về: quyền tác giả/quyền liên quan, quyền sở hữu công nghiệp, tiết lộ bí mật kinh doanh, vi phạm bảo mật dữ liệu của bất kỳ bên thứ ba nào. Tôi chịu trách nhiệm trước pháp luật về tính trung thực, hợp pháp của hồ sơ": false + } + }, + "III. HẬU QUẢ PHÁP LÝ KHI THÔNG TIN KHÔNG TRUNG THỰC": "Tôi xin cam kết chịu trách nhiệm đối với các thông tin kê khai nêu trên. Nếu thông tin được khai trong bản cam kết này không đúng thì tôi chấp nhận: Hủy kết quả công nhận sáng kiến đã được xét (nếu có); Thu hồi, hủy các danh hiệu thi đua, khen thưởng, hoặc các quyền lợi phát sinh có sử dụng sáng kiến này để xét; Xử lý theo quy định pháp luật hiện hành và theo quy chế/quy định của ĐHYD. Cam kết này có hiệu lực kể từ ngày ký và ràng buộc đối với cá nhân cam kết trong suốt thời gian xét công nhận sáng kiến và sau khi kết thúc 02 năm.", + "Người cam kết (Ký tên, ghi rõ họ tên)": "" + } +} diff --git a/database/crud_examples.sql b/database/crud_examples.sql new file mode 100644 index 0000000..9f3cb60 --- /dev/null +++ b/database/crud_examples.sql @@ -0,0 +1,171 @@ +-- ============================================================================= +-- CRUD PATTERNS — Sáng kiến application system +-- ============================================================================= + +-- ============================================================================= +-- CREATE: Submit a new application with multiple authors (atomic) +-- ============================================================================= +BEGIN; + -- Set audit context + SELECT set_config('my.user_id', '42', true); + + -- 1. Main record + INSERT INTO applications(code, title, registration_year, status, purpose, + is_technical_solution, primary_unit_id, created_by) + VALUES ('SK-2025-007', + 'Hệ thống tự động điền hồ sơ sáng kiến', + 2025, 'DRAFT', + 'Tự động hoá việc điền các mẫu số 01–04', + TRUE, 2, 42) + RETURNING application_id \gset + + -- 2. Authors (defer contribution-sum check until COMMIT) + SET CONSTRAINTS trg_contribution_total DEFERRED; + INSERT INTO application_authors(application_id, user_id, contribution_pct, role, display_order) VALUES + (:application_id, 42, 60.00, 'PRIMARY', 1), + (:application_id, 13, 25.00, 'CO_AUTHOR', 2), + (:application_id, 27, 15.00, 'CO_AUTHOR', 3); + + -- 3. Orgs that tested it + INSERT INTO application_adopters(application_id, org_name, address, field) VALUES + (:application_id, 'Phòng KHCN', '217 Hồng Bàng, Q.5', 'Cải cách hành chính'); +COMMIT; + + +-- ============================================================================= +-- READ: Dashboard — paginated list with filters +-- ============================================================================= +SELECT * FROM v_application_summary + WHERE registration_year = 2025 + AND status = ANY(ARRAY['UNDER_REVIEW','EVALUATED']::text[]) + AND title ILIKE '%động vật%' -- uses trigram index + ORDER BY avg_score DESC NULLS LAST, submitted_at DESC + LIMIT 20 OFFSET 0; + +-- Read: full application with nested data (app layer usually does this as N queries +-- or one JSON aggregate — here's the aggregate version) +SELECT jsonb_build_object( + 'application', to_jsonb(a.*), + 'authors', (SELECT jsonb_agg(jsonb_build_object( + 'user_id', u.user_id, + 'name', u.full_name, + 'pct', aa.contribution_pct, + 'role', aa.role + ) ORDER BY aa.display_order) + FROM application_authors aa + JOIN users u USING (user_id) + WHERE aa.application_id = a.application_id), + 'evaluations',(SELECT jsonb_agg(to_jsonb(e.*)) + FROM evaluations e WHERE e.application_id = a.application_id), + 'attachments',(SELECT jsonb_agg(to_jsonb(att.*)) + FROM attachments att WHERE att.application_id = a.application_id) +) AS document +FROM applications a +WHERE a.application_id = 1 AND a.deleted_at IS NULL; + +-- Full-text search (Vietnamese-friendly; combine with unaccent for better recall) +SELECT application_id, code, title + FROM applications + WHERE to_tsvector('simple', title || ' ' || coalesce(introduction,'')) + @@ plainto_tsquery('simple', 'đạo đức động vật') + ORDER BY registration_year DESC + LIMIT 10; + + +-- ============================================================================= +-- UPDATE: Progress an application through the workflow +-- ============================================================================= +-- Submit (DRAFT → SUBMITTED). Triggers populate submitted_at automatically. +UPDATE applications SET status = 'SUBMITTED' WHERE application_id = 7; + +-- Assign to review panel +UPDATE applications SET status = 'UNDER_REVIEW' WHERE application_id = 7; + +-- Upsert an evaluation (same evaluator re-scores) +INSERT INTO evaluations (application_id, evaluator_id, novelty_score, effectiveness_score, conclusion) +VALUES (7, 99, 32, 48, 'Đề nghị công nhận') +ON CONFLICT (application_id, evaluator_id) +DO UPDATE SET + novelty_score = EXCLUDED.novelty_score, + effectiveness_score = EXCLUDED.effectiveness_score, + conclusion = EXCLUDED.conclusion, + evaluated_at = NOW(); + +-- Update JSONB field: patch a single effectiveness sub-field +UPDATE applications + SET effectiveness = effectiveness || jsonb_build_object( + 'economic', + 'Tiết kiệm ~30% thời gian xét duyệt' + ) + WHERE application_id = 7; + +-- Partial update (PATCH-style) — only update provided fields. The app layer +-- generates SET clauses from the non-null fields in the request body. +UPDATE applications + SET title = COALESCE($1, title), + purpose = COALESCE($2, purpose), + updated_at = NOW() + WHERE application_id = $3 AND deleted_at IS NULL +RETURNING *; + + +-- ============================================================================= +-- DELETE: Soft delete + restore +-- ============================================================================= +-- Soft delete +UPDATE applications SET deleted_at = NOW() WHERE application_id = 7; + +-- Restore +UPDATE applications SET deleted_at = NULL WHERE application_id = 7; + +-- Hard delete (only for drafts, cascades to authors/evaluations/etc.) +DELETE FROM applications + WHERE application_id = 7 + AND status = 'DRAFT'; + + +-- ============================================================================= +-- ANALYTICS: Materialized-view refresh (run nightly via cron/pgAgent) +-- ============================================================================= +REFRESH MATERIALIZED VIEW CONCURRENTLY mv_annual_stats; + +-- Leaderboard: top-scoring approved innovations +SELECT code, title, avg_score + FROM v_application_summary + WHERE status = 'APPROVED' + ORDER BY avg_score DESC + LIMIT 10; + + +-- ============================================================================= +-- REVIEW JSON: Persist / retrieve ReviewPanel bundle +-- ============================================================================= +BEGIN; + SELECT set_config('my.user_id', '42', true); + + -- Latest app version number for this application + WITH v AS ( + SELECT COALESCE(MAX(document_version), 0) + 1 AS next_ver + FROM application_review_documents + WHERE application_id = 7 + ) + INSERT INTO application_review_documents( + application_id, case_id, document_version, official_bieu_mau, template_data, full_bundle, created_by + ) + SELECT + 7, + 'CASE-2026-0007', + v.next_ver, + $${"TRANG BÌA":{"Tên sáng kiến (Tiếng Việt)":"Ví dụ"}}$$::jsonb, + $${"initiativeName":"Ví dụ"}$$::jsonb, + $${"meta":{"caseId":"CASE-2026-0007"}}$$::jsonb, + 42 + FROM v; +COMMIT; + +-- Load latest ReviewPanel bundle by case id +SELECT * + FROM application_review_documents + WHERE case_id = 'CASE-2026-0007' + ORDER BY document_version DESC, created_at DESC + LIMIT 1; diff --git a/database/schema.sql b/database/schema.sql new file mode 100644 index 0000000..a479426 --- /dev/null +++ b/database/schema.sql @@ -0,0 +1,441 @@ +-- ============================================================================= +-- SÁNG KIẾN (INNOVATION APPLICATION) DATABASE SCHEMA +-- PostgreSQL 14+ +-- +-- Domain: Manage innovation applications at ĐHYD TP.HCM (Vietnamese medical +-- university). Supports the full lifecycle: draft → submit → evaluate → approve. +-- +-- Design principles: +-- - 3NF for entities, JSONB for semi-structured/optional narrative +-- - Soft delete (deleted_at) — legal/audit requires historical retention +-- - State machine on applications.status enforced by trigger +-- - Full audit_log via trigger on all CUD operations +-- - Contribution % sums to 100 enforced by DEFERRABLE trigger +-- ============================================================================= + +CREATE EXTENSION IF NOT EXISTS pg_trgm; -- fuzzy matching +CREATE EXTENSION IF NOT EXISTS unaccent; -- Vietnamese diacritics in search + +-- Convenience: updated_at auto-maintenance +CREATE OR REPLACE FUNCTION touch_updated_at() RETURNS TRIGGER AS $$ +BEGIN NEW.updated_at := NOW(); RETURN NEW; END; +$$ LANGUAGE plpgsql; + + +-- ============================================================================= +-- REFERENCE: UNITS (departments, faculties, centers) +-- ============================================================================= +CREATE TABLE units ( + unit_id SERIAL PRIMARY KEY, + code VARCHAR(32) UNIQUE NOT NULL, + name VARCHAR(255) NOT NULL, -- full Vietnamese name + parent_unit_id INT REFERENCES units(unit_id) ON DELETE SET NULL, + type VARCHAR(32) NOT NULL + CHECK (type IN ('TRUONG','KHOA','PHONG','BO_MON','TRUNG_TAM','KHAC')), + is_active BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +CREATE TRIGGER trg_units_touch BEFORE UPDATE ON units + FOR EACH ROW EXECUTE FUNCTION touch_updated_at(); + + +-- ============================================================================= +-- USERS (unified: authors, evaluators, admins — a user can wear many hats) +-- ============================================================================= +CREATE TABLE users ( + user_id SERIAL PRIMARY KEY, + full_name VARCHAR(255) NOT NULL, + title VARCHAR(64), -- PGS.TS, TS., GS., CN., ThS. + date_of_birth DATE, + email VARCHAR(255) UNIQUE, + phone VARCHAR(32), + id_number VARCHAR(32) UNIQUE, -- CCCD / hộ chiếu + unit_id INT REFERENCES units(unit_id) ON DELETE SET NULL, + position VARCHAR(255), -- chức danh: Trưởng phòng, GV cao cấp + qualification VARCHAR(64), -- trình độ: Tiến sĩ, Thạc sĩ, Cử nhân + user_type VARCHAR(32) NOT NULL DEFAULT 'AUTHOR' + CHECK (user_type IN ('AUTHOR','COUNCIL','ADMIN','STUDENT','EXTERNAL')), + is_active BOOLEAN NOT NULL DEFAULT TRUE, + deleted_at TIMESTAMPTZ, -- soft delete + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +CREATE INDEX idx_users_unit ON users(unit_id); +CREATE INDEX idx_users_active ON users(is_active) WHERE deleted_at IS NULL; +CREATE INDEX idx_users_name_trgm ON users USING GIN (full_name gin_trgm_ops); +CREATE TRIGGER trg_users_touch BEFORE UPDATE ON users + FOR EACH ROW EXECUTE FUNCTION touch_updated_at(); + + +-- ============================================================================= +-- APPLICATIONS (sáng kiến) — the core entity +-- ============================================================================= +CREATE TABLE applications ( + application_id SERIAL PRIMARY KEY, + code VARCHAR(32) UNIQUE NOT NULL, -- e.g., 'SK-2025-001' + title TEXT NOT NULL, + title_en TEXT, + registration_year INT NOT NULL CHECK (registration_year BETWEEN 2000 AND 2100), + field_of_application TEXT, -- lĩnh vực áp dụng + + -- Workflow state (enforced by trigger below) + status VARCHAR(32) NOT NULL DEFAULT 'DRAFT' + CHECK (status IN ( + 'DRAFT','SUBMITTED','UNDER_REVIEW', + 'EVALUATED','APPROVED','REJECTED','WITHDRAWN' + )), + + -- Mẫu 01 narrative (long text) + introduction TEXT, -- 1. Mở đầu + current_state TEXT, -- 4.1 Tình trạng đã biết + purpose TEXT, -- Mục đích + implementation_steps TEXT, -- Các bước thực hiện + required_conditions TEXT, -- Điều kiện cần thiết + results_achieved TEXT, -- Kết quả thu được + novelty_description TEXT, -- Tính mới + confidential_info TEXT, -- Thông tin cần bảo mật + + -- 10 effectiveness sub-fields (all optional narrative) → JSONB + effectiveness JSONB NOT NULL DEFAULT '{}'::jsonb, + -- Shape: { "economic":"...", "teaching":"...", "productivity":"...", + -- "work_efficiency":"...", "quality":"...", "cost_reduction":"...", + -- "environment":"...", "health":"...", "safety":"...", "awareness":"..." } + + -- Mẫu 02 fields + owner_org VARCHAR(255), -- chủ đầu tư + first_applied_date DATE, -- ngày áp dụng lần đầu + content_summary TEXT, -- nội dung sáng kiến (short) + author_assessment TEXT, -- đánh giá theo tác giả + org_assessment TEXT, -- đánh giá theo tổ chức + + -- Mẫu 02 classification (mutually exclusive in form, but stored as flags) + is_technical_solution BOOLEAN NOT NULL DEFAULT FALSE, + is_from_research_article BOOLEAN NOT NULL DEFAULT FALSE, + is_from_book_material BOOLEAN NOT NULL DEFAULT FALSE, + CONSTRAINT chk_exactly_one_classification CHECK ( + status = 'DRAFT' OR + (is_technical_solution::int + is_from_research_article::int + is_from_book_material::int) = 1 + ), + + -- Workflow timestamps + submitted_at TIMESTAMPTZ, + decided_at TIMESTAMPTZ, + + primary_unit_id INT REFERENCES units(unit_id), + created_by INT REFERENCES users(user_id), + deleted_at TIMESTAMPTZ, -- soft delete + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_apps_status ON applications(status) WHERE deleted_at IS NULL; +CREATE INDEX idx_apps_year ON applications(registration_year); +CREATE INDEX idx_apps_unit ON applications(primary_unit_id); +CREATE INDEX idx_apps_title_trgm ON applications USING GIN (title gin_trgm_ops); +CREATE INDEX idx_apps_fts ON applications USING GIN ( + to_tsvector('simple', + coalesce(title,'') || ' ' || + coalesce(introduction,'') || ' ' || + coalesce(novelty_description,'') + ) +); +CREATE INDEX idx_apps_effectiveness ON applications USING GIN (effectiveness); +CREATE TRIGGER trg_apps_touch BEFORE UPDATE ON applications + FOR EACH ROW EXECUTE FUNCTION touch_updated_at(); + + +-- ============================================================================= +-- APPLICATION_AUTHORS (M:N with contribution %) +-- ============================================================================= +CREATE TABLE application_authors ( + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + user_id INT NOT NULL REFERENCES users(user_id), + contribution_pct NUMERIC(5,2) NOT NULL CHECK (contribution_pct > 0 AND contribution_pct <= 100), + role VARCHAR(32) NOT NULL DEFAULT 'CO_AUTHOR' + CHECK (role IN ('PRIMARY','CO_AUTHOR')), + display_order INT NOT NULL DEFAULT 0, + PRIMARY KEY (application_id, user_id) +); +CREATE INDEX idx_app_authors_user ON application_authors(user_id); + +-- At most one PRIMARY author per application +CREATE UNIQUE INDEX uq_primary_per_app + ON application_authors(application_id) WHERE role = 'PRIMARY'; + +-- Deferrable check: contribution % must total 100 per application +CREATE OR REPLACE FUNCTION check_contribution_total() RETURNS TRIGGER AS $$ +DECLARE v_total NUMERIC; v_app INT; +BEGIN + v_app := COALESCE(NEW.application_id, OLD.application_id); + SELECT COALESCE(SUM(contribution_pct),0) INTO v_total + FROM application_authors WHERE application_id = v_app; + -- Only enforce when application has left DRAFT + IF (SELECT status FROM applications WHERE application_id = v_app) <> 'DRAFT' + AND v_total <> 100 THEN + RAISE EXCEPTION 'Contribution % for application % must sum to 100 (got %)', + '%', v_app, v_total; + END IF; + RETURN NULL; +END; +$$ LANGUAGE plpgsql; + +CREATE CONSTRAINT TRIGGER trg_contribution_total + AFTER INSERT OR UPDATE OR DELETE ON application_authors + DEFERRABLE INITIALLY DEFERRED + FOR EACH ROW EXECUTE FUNCTION check_contribution_total(); + + +-- ============================================================================= +-- ORGS that tested / adopted the innovation (Mẫu 01 inner table) +-- ============================================================================= +CREATE TABLE application_adopters ( + adopter_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + display_order INT NOT NULL DEFAULT 0, + org_name VARCHAR(255) NOT NULL, + address TEXT, + field TEXT +); +CREATE INDEX idx_adopters_app ON application_adopters(application_id); + + +-- ============================================================================= +-- PARTICIPANTS in first application (Mẫu 02 inner table) +-- ============================================================================= +CREATE TABLE application_participants ( + participant_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + user_id INT REFERENCES users(user_id), -- optional link + display_order INT NOT NULL DEFAULT 0, + full_name VARCHAR(255) NOT NULL, + date_of_birth DATE, + work_unit VARCHAR(255), + position VARCHAR(255), + qualification VARCHAR(64), + support_content TEXT +); +CREATE INDEX idx_participants_app ON application_participants(application_id); + + +-- ============================================================================= +-- EVALUATIONS (Mẫu 04) — council members score applications +-- ============================================================================= +CREATE TABLE evaluations ( + evaluation_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + evaluator_id INT NOT NULL REFERENCES users(user_id), + + novelty_comments TEXT, + novelty_score INT NOT NULL DEFAULT 0 + CHECK (novelty_score BETWEEN 0 AND 40), + + effectiveness_comments TEXT, + effectiveness_score INT NOT NULL DEFAULT 0 + CHECK (effectiveness_score BETWEEN 0 AND 60), + + total_score INT GENERATED ALWAYS AS (novelty_score + effectiveness_score) STORED, + conclusion TEXT, + evaluated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + UNIQUE (application_id, evaluator_id) +); +CREATE INDEX idx_eval_app ON evaluations(application_id); +CREATE INDEX idx_eval_evaluator ON evaluations(evaluator_id); + + +-- ============================================================================= +-- COMMITMENTS (Bản cam kết) — for paper-based innovations +-- ============================================================================= +CREATE TABLE commitments ( + commitment_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + user_id INT NOT NULL REFERENCES users(user_id), + + paper_title TEXT, + role_type VARCHAR(32) NOT NULL + CHECK (role_type IN ('PRIMARY_AUTHOR','CO_AUTHOR')), + + -- 5 commitment checkboxes + is_legal_owner BOOLEAN NOT NULL DEFAULT FALSE, + is_authorized_by_owner BOOLEAN NOT NULL DEFAULT FALSE, + has_coauthor_consent BOOLEAN NOT NULL DEFAULT FALSE, + not_predatory_journal BOOLEAN NOT NULL DEFAULT FALSE, + complies_with_ip_law BOOLEAN NOT NULL DEFAULT FALSE, + + signed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + UNIQUE (application_id, user_id) +); +CREATE INDEX idx_commit_app ON commitments(application_id); + + +-- ============================================================================= +-- ATTACHMENTS (uploaded files — figures, flowcharts, annexes) +-- ============================================================================= +CREATE TABLE attachments ( + attachment_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + file_name VARCHAR(255) NOT NULL, + file_path TEXT NOT NULL, -- S3/MinIO key + file_size BIGINT, + mime_type VARCHAR(128), + kind VARCHAR(32) -- 'LUU_DO', 'PHU_LUC', 'KY_SO', 'KHAC' + CHECK (kind IS NULL OR kind IN ('LUU_DO','PHU_LUC','KY_SO','KHAC')), + uploaded_by INT REFERENCES users(user_id), + uploaded_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +CREATE INDEX idx_attach_app ON attachments(application_id); + + +-- ============================================================================= +-- AUDIT LOG — single table, populated by triggers on all CUD operations +-- ============================================================================= +CREATE TABLE audit_log ( + log_id BIGSERIAL PRIMARY KEY, + table_name VARCHAR(64) NOT NULL, + record_id TEXT NOT NULL, + action VARCHAR(16) NOT NULL CHECK (action IN ('INSERT','UPDATE','DELETE')), + changed_by INT, -- set from app via SET LOCAL my.user_id + changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + old_data JSONB, + new_data JSONB +); +CREATE INDEX idx_audit_table_record ON audit_log(table_name, record_id); +CREATE INDEX idx_audit_user_time ON audit_log(changed_by, changed_at DESC); + +-- Generic audit trigger function +CREATE OR REPLACE FUNCTION audit_trigger() RETURNS TRIGGER AS $$ +DECLARE + v_user INT; + v_pk TEXT; +BEGIN + -- Get user_id from session var if app sets it; else NULL + BEGIN v_user := current_setting('my.user_id')::INT; + EXCEPTION WHEN OTHERS THEN v_user := NULL; END; + + v_pk := COALESCE( + (row_to_json(NEW)::jsonb->>TG_ARGV[0]), + (row_to_json(OLD)::jsonb->>TG_ARGV[0]) + ); + + INSERT INTO audit_log(table_name, record_id, action, changed_by, old_data, new_data) + VALUES ( + TG_TABLE_NAME, + v_pk, + TG_OP, + v_user, + CASE WHEN TG_OP IN ('UPDATE','DELETE') THEN to_jsonb(OLD) END, + CASE WHEN TG_OP IN ('INSERT','UPDATE') THEN to_jsonb(NEW) END + ); + RETURN COALESCE(NEW, OLD); +END; +$$ LANGUAGE plpgsql; + +-- Attach audit trigger to the important tables (pass PK column name as arg) +CREATE TRIGGER trg_audit_applications AFTER INSERT OR UPDATE OR DELETE ON applications + FOR EACH ROW EXECUTE FUNCTION audit_trigger('application_id'); +CREATE TRIGGER trg_audit_authors AFTER INSERT OR UPDATE OR DELETE ON application_authors + FOR EACH ROW EXECUTE FUNCTION audit_trigger('application_id'); +CREATE TRIGGER trg_audit_evaluations AFTER INSERT OR UPDATE OR DELETE ON evaluations + FOR EACH ROW EXECUTE FUNCTION audit_trigger('evaluation_id'); +CREATE TRIGGER trg_audit_commitments AFTER INSERT OR UPDATE OR DELETE ON commitments + FOR EACH ROW EXECUTE FUNCTION audit_trigger('commitment_id'); + + +-- ============================================================================= +-- WORKFLOW STATE MACHINE ENFORCEMENT +-- ============================================================================= +CREATE OR REPLACE FUNCTION enforce_application_transitions() RETURNS TRIGGER AS $$ +DECLARE + allowed BOOLEAN := FALSE; +BEGIN + IF OLD.status = NEW.status THEN RETURN NEW; END IF; + + -- Allowed transitions + allowed := CASE + WHEN OLD.status = 'DRAFT' AND NEW.status IN ('SUBMITTED','WITHDRAWN') THEN TRUE + WHEN OLD.status = 'SUBMITTED' AND NEW.status IN ('UNDER_REVIEW','WITHDRAWN','DRAFT') THEN TRUE + WHEN OLD.status = 'UNDER_REVIEW' AND NEW.status IN ('EVALUATED','WITHDRAWN') THEN TRUE + WHEN OLD.status = 'EVALUATED' AND NEW.status IN ('APPROVED','REJECTED') THEN TRUE + ELSE FALSE + END; + + IF NOT allowed THEN + RAISE EXCEPTION 'Invalid status transition: % → %', OLD.status, NEW.status; + END IF; + + -- Auto-set timestamps + IF NEW.status = 'SUBMITTED' AND OLD.status = 'DRAFT' THEN + NEW.submitted_at := NOW(); + END IF; + IF NEW.status IN ('APPROVED','REJECTED') THEN + NEW.decided_at := NOW(); + END IF; + + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER trg_app_state_machine + BEFORE UPDATE OF status ON applications + FOR EACH ROW EXECUTE FUNCTION enforce_application_transitions(); + + +-- ============================================================================= +-- CONVENIENCE VIEWS +-- ============================================================================= + +-- Dashboard: applications with author names and current evaluation average +CREATE VIEW v_application_summary AS +SELECT + a.application_id, + a.code, + a.title, + a.status, + a.registration_year, + u.name AS primary_unit_name, + (SELECT string_agg(usr.full_name, ', ' ORDER BY aa.display_order) + FROM application_authors aa + JOIN users usr ON usr.user_id = aa.user_id + WHERE aa.application_id = a.application_id) AS author_names, + (SELECT ROUND(AVG(total_score),2) + FROM evaluations WHERE application_id = a.application_id) AS avg_score, + (SELECT COUNT(*) FROM evaluations WHERE application_id = a.application_id) AS num_evaluations, + a.submitted_at, + a.decided_at +FROM applications a +LEFT JOIN units u ON u.unit_id = a.primary_unit_id +WHERE a.deleted_at IS NULL; + +-- Materialized view: annual approval statistics (refresh nightly) +CREATE MATERIALIZED VIEW mv_annual_stats AS +SELECT + registration_year, + COUNT(*) FILTER (WHERE status = 'APPROVED') AS approved, + COUNT(*) FILTER (WHERE status = 'REJECTED') AS rejected, + COUNT(*) FILTER (WHERE status NOT IN ('APPROVED','REJECTED')) AS pending, + COUNT(*) AS total +FROM applications +WHERE deleted_at IS NULL +GROUP BY registration_year; +CREATE UNIQUE INDEX ON mv_annual_stats(registration_year); + + +-- ============================================================================= +-- REVIEW DOCUMENT JSON SNAPSHOTS (ReviewPanel bundle persistence) +-- ============================================================================= +CREATE TABLE IF NOT EXISTS application_review_documents ( + review_document_id SERIAL PRIMARY KEY, + application_id INT NOT NULL REFERENCES applications(application_id) ON DELETE CASCADE, + case_id VARCHAR(128) NOT NULL, + document_version INT NOT NULL DEFAULT 1, + official_bieu_mau JSONB NOT NULL DEFAULT '{}'::jsonb, + template_data JSONB, + full_bundle JSONB, + created_by INT REFERENCES users(user_id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (application_id, document_version) +); +CREATE INDEX idx_review_docs_app_time ON application_review_documents(application_id, created_at DESC); +CREATE INDEX idx_review_docs_case_time ON application_review_documents(case_id, created_at DESC); diff --git a/database/test_schema.sql b/database/test_schema.sql new file mode 100644 index 0000000..73d7e1f --- /dev/null +++ b/database/test_schema.sql @@ -0,0 +1,83 @@ +-- Validation tests: run in a single transaction per block +-- =========================================================== + +-- 1. SEED: units + users +INSERT INTO units(code, name, type) VALUES + ('DHYD', 'Đại học Y Dược TP.HCM', 'TRUONG'), + ('KHCN', 'Phòng Khoa học Công nghệ', 'PHONG'); + +INSERT INTO users(full_name, title, email, id_number, unit_id, qualification, user_type) VALUES + ('Trần Hùng', 'PGS.TS', 'tranhung@ump.edu.vn', '001001', 1, 'Tiến sĩ', 'AUTHOR'), + ('Đỗ Quốc Vũ', 'CN.', 'doquocvu@ump.edu.vn', '001002', 2, 'Cử nhân', 'AUTHOR'), + ('Nguyễn Hội đồng A', 'PGS.TS', 'hdA@ump.edu.vn', '002001', 1, 'Tiến sĩ', 'COUNCIL'); + +-- 2. CREATE an application in DRAFT state +INSERT INTO applications(code, title, registration_year, status, purpose, primary_unit_id, created_by) +VALUES ('SK-2025-001', + 'Quy trình xét duyệt Đạo đức trong nghiên cứu trên động vật', + 2025, 'DRAFT', + 'Chuẩn hoá quy trình xét duyệt hồ sơ', + 2, 2); + +-- 3. ADD authors with DEFERRED constraint (sums to 100 at COMMIT) +BEGIN; +INSERT INTO application_authors(application_id, user_id, contribution_pct, role) VALUES + (1, 1, 50, 'CO_AUTHOR'), + (1, 2, 50, 'PRIMARY'); +-- At this point sum=100, but app is DRAFT so constraint doesn't even care yet +COMMIT; + +-- Verify +SELECT 'Authors inserted:' AS step, count(*) FROM application_authors; + +-- 4. TRY to submit the application (DRAFT → SUBMITTED): needs classification +-- This should FAIL the check constraint because no classification flag is set +\echo 'Test 4: should FAIL (missing classification)' +UPDATE applications SET status='SUBMITTED' WHERE application_id=1; +\echo '' + +-- Fix and retry +UPDATE applications + SET is_technical_solution = TRUE, + status = 'SUBMITTED' + WHERE application_id = 1; +SELECT 'After submit:' AS step, status, submitted_at FROM applications WHERE application_id=1; + +-- 5. TRY invalid transition SUBMITTED → APPROVED (should FAIL) +\echo 'Test 5: should FAIL (illegal transition)' +UPDATE applications SET status='APPROVED' WHERE application_id=1; +\echo '' + +-- Valid transitions +UPDATE applications SET status='UNDER_REVIEW' WHERE application_id=1; + +-- 6. EVALUATOR scores the application +INSERT INTO evaluations(application_id, evaluator_id, novelty_score, effectiveness_score, conclusion) +VALUES (1, 3, 35, 50, 'Đề xuất công nhận'); + +SELECT 'Evaluation:' AS step, novelty_score, effectiveness_score, total_score FROM evaluations; + +-- 7. Move to EVALUATED → APPROVED +UPDATE applications SET status='EVALUATED' WHERE application_id=1; +UPDATE applications SET status='APPROVED' WHERE application_id=1; + +SELECT 'Final status:' AS step, status, decided_at IS NOT NULL AS has_decision_time + FROM applications WHERE application_id=1; + +-- 8. READ: summary view +SELECT code, title, status, author_names, avg_score, num_evaluations + FROM v_application_summary; + +-- 9. AUDIT trail: who changed what? +SELECT table_name, action, changed_at, + (new_data->>'status') AS new_status + FROM audit_log + WHERE table_name = 'applications' + ORDER BY log_id; + +-- 10. Bad contribution sum should fail at COMMIT +\echo 'Test 10: should FAIL (sum != 100 on submitted app)' +BEGIN; + UPDATE application_authors SET contribution_pct = 30 WHERE application_id=1 AND user_id=1; + -- sum is now 30+50=80, but app is APPROVED so trigger will reject at commit +COMMIT; diff --git a/deploy/nginx/minio-s3-proxy.conf.example b/deploy/nginx/minio-s3-proxy.conf.example new file mode 100644 index 0000000..bbe68e2 --- /dev/null +++ b/deploy/nginx/minio-s3-proxy.conf.example @@ -0,0 +1,47 @@ +# Example: expose MinIO S3 API on HTTPS for presigned URLs (fixes mixed content vs https://your-app). +# +# 1. DNS: A/AAAA record for MINIO_API_HOST → your VPS. +# 2. TLS: obtain cert for MINIO_API_HOST (e.g. certbot --nginx). +# 3. Replace MINIO_API_HOST and adjust upstream port if MINIO_API_PORT ≠ 19000. +# 4. Set in .env (same hostname and scheme — no trailing slash): +# S3_PUBLIC_ENDPOINT_URL=https://MINIO_API_HOST +# MINIO_SERVER_URL=https://MINIO_API_HOST +# 5. Recreate/be0 restart so presign matches this host. +# +# Optionally bind Docker’s MinIO publish to localhost only: +# "127.0.0.1:19000:9000" + +upstream minio_s3_api { + server 127.0.0.1:19000; + keepalive 32; +} + +server { + listen 443 ssl http2; + server_name MINIO_API_HOST; + + ssl_certificate /fullchain.pem; + ssl_certificate_key /privkey.pem; + + # Large evidence PDF uploads go through be0, not nginx→MinIO, but PUT via presign can be big. + client_max_body_size 50m; + + # Disable buffering for streamed GETs if needed upstream. + proxy_buffering off; + proxy_request_buffering off; + + location / { + proxy_http_version 1.1; + proxy_set_header Host $http_host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + proxy_set_header Connection ""; + + proxy_connect_timeout 300; + proxy_send_timeout 300; + proxy_read_timeout 300; + + proxy_pass http://minio_s3_api; + } +} diff --git a/docker-compose.prod.yml b/docker-compose.prod.yml new file mode 100644 index 0000000..b5e5553 --- /dev/null +++ b/docker-compose.prod.yml @@ -0,0 +1,211 @@ +# Requires a `.env` next to this file (or exported vars). +# Validates: scripts/verify-prod-env.sh +# +# Images are pinned instead of `:latest` for reproducible builds and supply-chain hygiene. +services: + minio: + image: quay.io/minio/minio:RELEASE.2025-09-07T16-13-09Z + container_name: minio + ports: + - "${MINIO_API_PORT}:9000" # S3 API → http://${PUBLIC_HOST}:${MINIO_API_PORT} + - "127.0.0.1:${MINIO_CONSOLE_PORT}:9001" # Console admin-only via SSH tunnel + environment: + MINIO_ROOT_USER: ${MINIO_ROOT_USER} + MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD} + # Public URL browsers use for the S3 API (match reverse-proxy TLS scheme/host when applicable). + MINIO_SERVER_URL: ${MINIO_SERVER_URL:-http://${PUBLIC_HOST}:${MINIO_API_PORT}} + MINIO_BROWSER_REDIRECT_URL: ${MINIO_BROWSER_REDIRECT_URL:-http://${PUBLIC_HOST}:${MINIO_CONSOLE_PORT}} + # Community MinIO has no per-bucket PutBucketCors; set explicit SPA origin(s) in `.env`. + MINIO_API_CORS_ALLOW_ORIGIN: ${MINIO_API_CORS_ALLOW_ORIGIN:?Set MINIO_API_CORS_ALLOW_ORIGIN to your HTTPS SPA origin} + volumes: + - ./assets/minio-data:/data + command: server /data --console-address ":9001" + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] + interval: 30s + timeout: 20s + retries: 3 + restart: unless-stopped + + # One-shot: ensure buckets. Browser CORS is MINIO_API_CORS_ALLOW_ORIGIN on the minio service. + minio-cors: + image: quay.io/minio/mc:RELEASE.2025-08-13T08-35-41Z + container_name: minio-cors + depends_on: + minio: + condition: service_healthy + environment: + MINIO_ROOT_USER: ${MINIO_ROOT_USER} + MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD} + entrypoint: ["/bin/sh", "-c"] + command: + - | + mc alias set local http://minio:9000 "$$MINIO_ROOT_USER" "$$MINIO_ROOT_PASSWORD" + for b in initiative-attachments initiative-exports initiative-quarantine imagehub-blobs; do + mc mb -p "local/$$b" 2>/dev/null || true + done + echo "MinIO buckets ensured." + + # Auth + roles: POSTGRES_* apply only on first volume init — see docs/deploy-production-docker.md + postgres: + image: postgres:16-alpine + container_name: initiative-postgres + # Bind to localhost only — DB is not for the public internet. + ports: + - "127.0.0.1:15432:5432" + environment: + POSTGRES_USER: ${POSTGRES_USER} + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} + POSTGRES_DB: ${POSTGRES_DB} + volumes: + - initiative_pg_data:/var/lib/postgresql/data + - ./be0/migrations/001_initiative_schema.sql:/docker-entrypoint-initdb.d/01_initiative_schema.sql:ro + - ./be0/migrations/002_application_storage_extensions.sql:/docker-entrypoint-initdb.d/02_application_storage_extensions.sql:ro + - ./be0/migrations/003_review_documents.sql:/docker-entrypoint-initdb.d/03_review_documents.sql:ro + - ./be0/migrations/004_evidence_artifact_review.sql:/docker-entrypoint-initdb.d/04_evidence_artifact_review.sql:ro + - ./be0/migrations/004_application_admin_results.sql:/docker-entrypoint-initdb.d/05_application_admin_results.sql:ro + - ./be0/migrations/006_user_notifications.sql:/docker-entrypoint-initdb.d/06_user_notifications.sql:ro + - ./be0/migrations/007_user_roles_email_policy_admin.sql:/docker-entrypoint-initdb.d/07_user_roles_email_policy_admin.sql:ro + - ./be0/migrations/008_audit_events.sql:/docker-entrypoint-initdb.d/08_audit_events.sql:ro + - ./be0/migrations/009_backup_artifact_roles_storage_kind.sql:/docker-entrypoint-initdb.d/09_backup_artifact_roles_storage_kind.sql:ro + - ./be0/migrations/010_user_staff_profiles.sql:/docker-entrypoint-initdb.d/10_user_staff_profiles.sql:ro + - ./be0/migrations/011_academic_titles_vn.sql:/docker-entrypoint-initdb.d/11_academic_titles_vn.sql:ro + - ./be0/migrations/012_password_reset.sql:/docker-entrypoint-initdb.d/12_password_reset.sql:ro + - ./be0/migrations/013_email_verification.sql:/docker-entrypoint-initdb.d/13_email_verification.sql:ro + - ./be0/migrations/014_registration_otp.sql:/docker-entrypoint-initdb.d/14_registration_otp.sql:ro + - ./be0/migrations/015_document_templates.sql:/docker-entrypoint-initdb.d/15_document_templates.sql:ro + - ./be0/migrations/016_research_projects.sql:/docker-entrypoint-initdb.d/16_research_projects.sql:ro + - ./be0/migrations/017_imagehub_datasets.sql:/docker-entrypoint-initdb.d/17_imagehub_datasets.sql:ro + - ./be0/migrations/018_imagehub_segmentation_links.sql:/docker-entrypoint-initdb.d/18_imagehub_segmentation_links.sql:ro + - ./be0/migrations/019_imagehub_cloud_import.sql:/docker-entrypoint-initdb.d/19_imagehub_cloud_import.sql:ro + - ./be0/migrations/020_imagehub_dataset_stages.sql:/docker-entrypoint-initdb.d/20_imagehub_dataset_stages.sql:ro + - ./be0/migrations/021_imagehub_task_pipeline.sql:/docker-entrypoint-initdb.d/21_imagehub_task_pipeline.sql:ro + - ./be0/migrations/022_imagehub_task_annotations.sql:/docker-entrypoint-initdb.d/22_imagehub_task_annotations.sql:ro + - ./be0/migrations/023_imagehub_dataset_members.sql:/docker-entrypoint-initdb.d/23_imagehub_dataset_members.sql:ro + - ./be0/migrations/024_imagehub_dataset_project_link.sql:/docker-entrypoint-initdb.d/24_imagehub_dataset_project_link.sql:ro + - ./be0/migrations/025_imagehub_task_review_events.sql:/docker-entrypoint-initdb.d/25_imagehub_task_review_events.sql:ro + - ./be0/migrations/026_imagehub_file_folder_path.sql:/docker-entrypoint-initdb.d/26_imagehub_file_folder_path.sql:ro + - ./be0/migrations/027_imagehub_dataset_label_map.sql:/docker-entrypoint-initdb.d/27_imagehub_dataset_label_map.sql:ro + # Evaluate user/db inside the container ($$…) so Compose .env substitution stays in sync at runtime. + healthcheck: + test: ["CMD-SHELL", "pg_isready -U \"$$POSTGRES_USER\" -d \"$$POSTGRES_DB\""] + interval: 10s + timeout: 5s + retries: 12 + start_period: 30s + restart: unless-stopped + + # API — must become healthy (Postgres + MinIO + successful startup) before fe0 starts. + be0: + build: + context: ./be0 + dockerfile: Dockerfile + container_name: be0 + ipc: host + ports: + - "127.0.0.1:4402:4402" + healthcheck: + test: ["CMD-SHELL", "curl -sf http://127.0.0.1:4402/health >/dev/null"] + interval: 10s + timeout: 10s + retries: 15 + start_period: 180s + environment: + - GENERIC_TIMEZONE=UTC + - ENVIRONMENT=production + - JWT_SECRET=${JWT_SECRET:?Set JWT_SECRET in .env — openssl rand -base64 48} + - INITIATIVE_DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB} + - APPLICATION_DRAFT_DIR=/app/assets/application-drafts + - SUBMITTED_INITIATIVES_DIR=/app/submitted-initiatives + - S3_ENDPOINT_URL=http://minio:9000 + - S3_ACCESS_KEY=${MINIO_ROOT_USER} + - S3_SECRET_KEY=${MINIO_ROOT_PASSWORD} + - S3_BUCKET_ATTACHMENTS=initiative-attachments + - S3_BUCKET_EXPORTS=initiative-exports + - S3_BUCKET_QUARANTINE=initiative-quarantine + # Presigned GET/PUT host the browser opens — must be HTTPS when the SPA is HTTPS (see docs/minio-behind-https.md). + - S3_PUBLIC_ENDPOINT_URL=${S3_PUBLIC_ENDPOINT_URL:-http://${PUBLIC_HOST}:${MINIO_API_PORT}} + - CORS_ORIGINS=http://${PUBLIC_HOST}:${FE_PORT},${CORS_ORIGINS_EXTRA:-} + - AUTH_ADMIN_EMAILS=${AUTH_ADMIN_EMAILS:-} + # SMTP — registration OTP + password reset (same vars as docs; set in `.env`). + - SMTP_HOST=${SMTP_HOST:-} + - SMTP_PORT=${SMTP_PORT:-587} + - SMTP_USER=${SMTP_USER:-} + - SMTP_PASSWORD=${SMTP_PASSWORD:-} + - AUTH_MAIL_FROM=${AUTH_MAIL_FROM:-} + - SMTP_USE_TLS=${SMTP_USE_TLS:-1} + - AUTH_PUBLIC_WEB_ORIGIN=${AUTH_PUBLIC_WEB_ORIGIN:-} + - AUTH_MAIL_LOG_ONLY=${AUTH_MAIL_LOG_ONLY:-} + - TEMPLATE_APPLICATION_FORM_DOCX=/app/template_application_form.docx + volumes: + - ./be0:/app + - ./assets:/app/assets + - ./assets/submitted-initiatives:/app/submitted-initiatives + - ./fe0/public/assets/template_application_form.docx:/app/template_application_form.docx:ro + depends_on: + postgres: + condition: service_healthy + minio: + condition: service_healthy + # Dockerfile entrypoint: NLTK download + pip install + uvicorn (no --reload in prod). + restart: unless-stopped + + # Public applicant SPA — minified static build served by nginx (NOT the Vite dev server). + # Build context is the repo root (npm workspace); see frontend_user/Dockerfile.prod. + frontend_user: + build: + context: . + dockerfile: frontend_user/Dockerfile.prod + container_name: frontend_user + ports: + - "${FE_PORT}:8080" + depends_on: + be0: + condition: service_healthy + restart: unless-stopped + + # Admin / council SPA — also a hardened static build, but bound to LOCALHOST only. + # Reach it via an SSH tunnel or a separate authenticated reverse-proxy vhost. + # NOTE: the council review UI is still in progress — keep it off the public internet for now. + frontend_admin: + build: + context: . + dockerfile: frontend_admin/Dockerfile.prod + container_name: frontend_admin + ports: + - "127.0.0.1:${FE_ADMIN_PORT:-8082}:8080" + depends_on: + be0: + condition: service_healthy + restart: unless-stopped + + # Principal-investigator SPA (research proposals + project cockpit) — hardened static build. + frontend_investigator: + build: + context: . + dockerfile: frontend_investigator/Dockerfile.prod + container_name: frontend_investigator + ports: + - "${FE_INV_PORT:-8083}:8080" + depends_on: + be0: + condition: service_healthy + restart: unless-stopped + + frontend_publisher: + build: + context: . + dockerfile: frontend_publisher/Dockerfile.prod + container_name: frontend_publisher + ports: + - "${FE_PUB_PORT:-8084}:8080" + depends_on: + be0: + condition: service_healthy + restart: unless-stopped + +volumes: + initiative_pg_data: + +# All services join Compose’s default project network; DNS names postgres, be0, minio work. +# Do not set your public VPS IP here — use PUBLIC_HOST + ports in `.env` / `ports:`. diff --git a/docker-compose.yml b/docker-compose.yml new file mode 100644 index 0000000..594e4ff --- /dev/null +++ b/docker-compose.yml @@ -0,0 +1,286 @@ +services: + minio: + image: quay.io/minio/minio:latest + container_name: minio + # Host 9000/9001 are often taken; map to free ports on your machine (S3 API / web UI). + ports: + - "19000:9000" # API / S3 endpoint → http://localhost:19000 + - "19001:9001" # Web console → http://localhost:19001 + environment: + MINIO_ROOT_USER: minio_user + MINIO_ROOT_PASSWORD: minio_password # Use strong password in real projects! + # Community MinIO has no per-bucket PutBucketCors (AiStor-only). Browsers need global API CORS for presigned GETs. + MINIO_API_CORS_ALLOW_ORIGIN: ${MINIO_API_CORS_ALLOW_ORIGIN:-*} + volumes: + - ./assets/minio-data:/data + command: server /data --console-address ":9001" + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] + interval: 30s + timeout: 20s + retries: 3 + networks: + profyt-net: + ipv4_address: "10.5.0.3" + + # One-shot: ensure buckets. Browser CORS is MINIO_API_CORS_ALLOW_ORIGIN on the minio service (not mc cors set). + minio-cors: + image: quay.io/minio/mc:latest + container_name: minio-cors + depends_on: + minio: + condition: service_healthy + entrypoint: ["/bin/sh", "-c"] + command: + - | + mc alias set local http://minio:9000 minio_user minio_password + for b in initiative-attachments initiative-exports initiative-quarantine imagehub-blobs; do + mc mb -p "local/$$b" 2>/dev/null || true + done + echo "MinIO buckets ensured." + networks: + profyt-net: + ipv4_address: "10.5.0.5" + restart: "no" + postgres: + image: postgres:16-alpine + container_name: initiative-postgres + # Host 5432 is often taken by a local Postgres; map a different port for host access. + ports: + - "15432:5432" + environment: + POSTGRES_USER: initiative + POSTGRES_PASSWORD: initiative_secret + POSTGRES_DB: initiatives + volumes: + - initiative_pg_data:/var/lib/postgresql/data + # Schema lives under be0 (dyd/0backend/migrations is not in this repo). + - ./be0/migrations/001_initiative_schema.sql:/docker-entrypoint-initdb.d/01_initiative_schema.sql:ro + - ./be0/migrations/002_application_storage_extensions.sql:/docker-entrypoint-initdb.d/02_application_storage_extensions.sql:ro + - ./be0/migrations/003_review_documents.sql:/docker-entrypoint-initdb.d/03_review_documents.sql:ro + - ./be0/migrations/004_evidence_artifact_review.sql:/docker-entrypoint-initdb.d/04_evidence_artifact_review.sql:ro + - ./be0/migrations/004_application_admin_results.sql:/docker-entrypoint-initdb.d/05_application_admin_results.sql:ro + - ./be0/migrations/006_user_notifications.sql:/docker-entrypoint-initdb.d/06_user_notifications.sql:ro + - ./be0/migrations/007_user_roles_email_policy_admin.sql:/docker-entrypoint-initdb.d/07_user_roles_email_policy_admin.sql:ro + - ./be0/migrations/008_audit_events.sql:/docker-entrypoint-initdb.d/08_audit_events.sql:ro + - ./be0/migrations/009_backup_artifact_roles_storage_kind.sql:/docker-entrypoint-initdb.d/09_backup_artifact_roles_storage_kind.sql:ro + - ./be0/migrations/010_user_staff_profiles.sql:/docker-entrypoint-initdb.d/10_user_staff_profiles.sql:ro + - ./be0/migrations/011_academic_titles_vn.sql:/docker-entrypoint-initdb.d/11_academic_titles_vn.sql:ro + - ./be0/migrations/012_password_reset.sql:/docker-entrypoint-initdb.d/12_password_reset.sql:ro + - ./be0/migrations/013_email_verification.sql:/docker-entrypoint-initdb.d/13_email_verification.sql:ro + - ./be0/migrations/014_registration_otp.sql:/docker-entrypoint-initdb.d/14_registration_otp.sql:ro + - ./be0/migrations/015_document_templates.sql:/docker-entrypoint-initdb.d/15_document_templates.sql:ro + - ./be0/migrations/016_research_projects.sql:/docker-entrypoint-initdb.d/16_research_projects.sql:ro + - ./be0/migrations/017_imagehub_datasets.sql:/docker-entrypoint-initdb.d/17_imagehub_datasets.sql:ro + - ./be0/migrations/018_imagehub_segmentation_links.sql:/docker-entrypoint-initdb.d/18_imagehub_segmentation_links.sql:ro + - ./be0/migrations/019_imagehub_cloud_import.sql:/docker-entrypoint-initdb.d/19_imagehub_cloud_import.sql:ro + - ./be0/migrations/020_imagehub_dataset_stages.sql:/docker-entrypoint-initdb.d/20_imagehub_dataset_stages.sql:ro + - ./be0/migrations/021_imagehub_task_pipeline.sql:/docker-entrypoint-initdb.d/21_imagehub_task_pipeline.sql:ro + - ./be0/migrations/022_imagehub_task_annotations.sql:/docker-entrypoint-initdb.d/22_imagehub_task_annotations.sql:ro + - ./be0/migrations/023_imagehub_dataset_members.sql:/docker-entrypoint-initdb.d/23_imagehub_dataset_members.sql:ro + - ./be0/migrations/024_imagehub_dataset_project_link.sql:/docker-entrypoint-initdb.d/24_imagehub_dataset_project_link.sql:ro + - ./be0/migrations/025_imagehub_task_review_events.sql:/docker-entrypoint-initdb.d/25_imagehub_task_review_events.sql:ro + - ./be0/migrations/026_imagehub_file_folder_path.sql:/docker-entrypoint-initdb.d/26_imagehub_file_folder_path.sql:ro + - ./be0/migrations/027_imagehub_dataset_label_map.sql:/docker-entrypoint-initdb.d/27_imagehub_dataset_label_map.sql:ro + healthcheck: + test: ["CMD-SHELL", "pg_isready -U initiative -d initiatives"] + interval: 3s + timeout: 5s + retries: 20 + start_period: 10s + restart: unless-stopped + networks: + profyt-net: + ipv4_address: "10.5.0.10" + + # ── Frontends ─────────────────────────────────────────────────────────────── + # Two SPAs built from the npm workspace (shared kernel + each app). The browser + # calls same-origin /api/*; Vite proxies to be0 (localhost:4402 is wrong inside the + # container). Build context is the repo ROOT — the workspace — not the app dir, so + # `@ump/shared` (../shared/src) resolves. Dev mode: bind-mount the workspace + reinstall + # on start so new deps land in the isolated node_modules volume. + frontend_user: + build: + context: . + dockerfile: frontend_user/Dockerfile + container_name: frontend_user + ipc: host + ports: + - 8081:5173 + environment: + - GENERIC_TIMEZONE=UTC + - VITE_DEV_PROXY_TARGET=http://be0:4402 + # When unset, Vite allows all hosts. Set e.g. YOUR_IP,localhost for cloud/LAN dev. + - VITE_ALLOWED_HOSTS=${VITE_ALLOWED_HOSTS:-} + volumes: + - ./package.json:/app/package.json + - ./package-lock.json:/app/package-lock.json + - ./shared:/app/shared + - ./frontend_user:/app/frontend_user + - ./frontend_admin:/app/frontend_admin + - ./frontend_investigator:/app/frontend_investigator + - ./frontend_publisher:/app/frontend_publisher + # Isolated workspace-hoisted node_modules (not shadowed by the host). + - /app/node_modules + command: ["sh", "-c", "npm install && npm run dev -w frontend_user -- --host 0.0.0.0 --port 5173"] + depends_on: + be0: + condition: service_started + networks: + profyt-net: + ipv4_address: "10.5.0.4" + + frontend_admin: + build: + context: . + dockerfile: frontend_admin/Dockerfile + container_name: frontend_admin + ipc: host + ports: + - 8082:5174 + environment: + - GENERIC_TIMEZONE=UTC + - VITE_DEV_PROXY_TARGET=http://be0:4402 + - VITE_ALLOWED_HOSTS=${VITE_ALLOWED_HOSTS:-} + volumes: + - ./package.json:/app/package.json + - ./package-lock.json:/app/package-lock.json + - ./shared:/app/shared + - ./frontend_user:/app/frontend_user + - ./frontend_admin:/app/frontend_admin + - ./frontend_investigator:/app/frontend_investigator + - ./frontend_publisher:/app/frontend_publisher + - /app/node_modules + command: ["sh", "-c", "npm install && npm run dev -w frontend_admin -- --host 0.0.0.0 --port 5174"] + depends_on: + be0: + condition: service_started + networks: + profyt-net: + ipv4_address: "10.5.0.6" + + frontend_investigator: + build: + context: . + dockerfile: frontend_investigator/Dockerfile + container_name: frontend_investigator + ipc: host + ports: + - 8083:5175 + environment: + - GENERIC_TIMEZONE=UTC + - VITE_DEV_PROXY_TARGET=http://be0:4402 + - VITE_ALLOWED_HOSTS=${VITE_ALLOWED_HOSTS:-} + volumes: + - ./package.json:/app/package.json + - ./package-lock.json:/app/package-lock.json + - ./shared:/app/shared + - ./frontend_user:/app/frontend_user + - ./frontend_admin:/app/frontend_admin + - ./frontend_investigator:/app/frontend_investigator + - ./frontend_publisher:/app/frontend_publisher + - /app/node_modules + command: ["sh", "-c", "npm install && npm run dev -w frontend_investigator -- --host 0.0.0.0 --port 5175"] + depends_on: + be0: + condition: service_started + networks: + profyt-net: + ipv4_address: "10.5.0.7" + + frontend_publisher: + build: + context: . + dockerfile: frontend_publisher/Dockerfile + container_name: frontend_publisher + ipc: host + ports: + - 8084:5176 + environment: + - GENERIC_TIMEZONE=UTC + - VITE_DEV_PROXY_TARGET=http://be0:4402 + - VITE_ALLOWED_HOSTS=${VITE_ALLOWED_HOSTS:-} + volumes: + - ./package.json:/app/package.json + - ./package-lock.json:/app/package-lock.json + - ./shared:/app/shared + - ./frontend_user:/app/frontend_user + - ./frontend_admin:/app/frontend_admin + - ./frontend_investigator:/app/frontend_investigator + - ./frontend_publisher:/app/frontend_publisher + - /app/node_modules + command: ["sh", "-c", "npm install && npm run dev -w frontend_publisher -- --host 0.0.0.0 --port 5176"] + depends_on: + be0: + condition: service_started + networks: + profyt-net: + ipv4_address: "10.5.0.8" + + be0: + build: + context: ./be0 + dockerfile: Dockerfile + container_name: be0 + ipc: host + ports: + - 4402:4402 + environment: + # Dev stack: hot-reload API when bind-mounting ./be0 + - UVICORN_RELOAD=1 + - GENERIC_TIMEZONE=UTC + - INITIATIVE_DATABASE_URL=postgresql+asyncpg://initiative:initiative_secret@postgres:5432/initiatives + - APPLICATION_DRAFT_DIR=/app/assets/application-drafts + # Shared with fe0 `public/submitted-initiatives` so PDFs written by be0 are served by Vite static. + - SUBMITTED_INITIATIVES_DIR=/app/submitted-initiatives + # From inside the be0 container, reach MinIO on the shared Docker network (not localhost:19000). + - S3_ENDPOINT_URL=http://minio:9000 + - S3_ACCESS_KEY=minio_user + - S3_SECRET_KEY=minio_password + - S3_BUCKET_ATTACHMENTS=initiative-attachments + - S3_BUCKET_EXPORTS=initiative-exports + - S3_BUCKET_QUARANTINE=initiative-quarantine + # Presigned « Xem / tải » links in the browser must hit the host-mapped MinIO port, not minio:9000. + - S3_PUBLIC_ENDPOINT_URL=${S3_PUBLIC_ENDPOINT_URL:-http://localhost:19000} + # Optional: comma-separated; merged with localhost defaults (e.g. http://YOUR_IP:8081 for LAN deploys). + - CORS_ORIGINS=${CORS_ORIGINS:-} + # Optional: comma-separated institutional admin emails. When unset, auth_api uses built-in UMP allow-list. + - AUTH_ADMIN_EMAILS=${AUTH_ADMIN_EMAILS:-} + # SMTP (Option A) — OTP + password-reset email. Set SMTP_HOST (and secrets) in repo-root .env; see .env.example. + - SMTP_HOST=${SMTP_HOST:-} + - SMTP_PORT=${SMTP_PORT:-587} + - SMTP_USER=${SMTP_USER:-} + - SMTP_PASSWORD=${SMTP_PASSWORD:-} + - AUTH_MAIL_FROM=${AUTH_MAIL_FROM:-} + - SMTP_USE_TLS=${SMTP_USE_TLS:-1} + - AUTH_PUBLIC_WEB_ORIGIN=${AUTH_PUBLIC_WEB_ORIGIN:-http://localhost:8081} + # Dev-only: OTP/link in stdout instead of SMTP — leave unset when using SMTP above. + - AUTH_MAIL_LOG_ONLY=${AUTH_MAIL_LOG_ONLY:-} + # DOCX mẫu hồ sơ (Xem lại) — cùng file với fe0/public/…/template_application_form.docx + - TEMPLATE_APPLICATION_FORM_DOCX=/app/template_application_form.docx + volumes: + - ./be0:/app + - ./assets:/app/assets + - ./assets/submitted-initiatives:/app/submitted-initiatives + - ./fe0/public/assets/template_application_form.docx:/app/template_application_form.docx:ro + depends_on: + postgres: + condition: service_healthy + minio: + condition: service_healthy + # One-shot minio-cors must finish first so buckets exist (Compose v2.13+). + minio-cors: + condition: service_completed_successfully + networks: + profyt-net: + ipv4_address: "10.5.0.2" + +volumes: + initiative_pg_data: + +networks: + profyt-net: + driver: bridge + ipam: + config: + - subnet: "10.5.0.0/16" \ No newline at end of file diff --git a/docs/ADMIN_APPLICANT_NOTIFICATION_SYSTEM_ANALYSIS.md b/docs/ADMIN_APPLICANT_NOTIFICATION_SYSTEM_ANALYSIS.md new file mode 100644 index 0000000..39f4aa7 --- /dev/null +++ b/docs/ADMIN_APPLICANT_NOTIFICATION_SYSTEM_ANALYSIS.md @@ -0,0 +1,221 @@ +# Analysis: notification system for admin → applicant (status & feedback) + +This document describes **how the stack works today**, **what is missing** for true notifications, and a **concrete v1 path** aligned with **current repo complexity** (`fe0` / `be0` / PostgreSQL). It incorporates a **review** of the refined draft (`ADMIN_APPLICANT_NOTIFICATION_SYSTEM_ANALYSIS.md` from review) and **adjusts** a few points for this codebase. + +It complements `assets/APPLICANT_STATUS_NOTIFICATIONS_PLAN.md` (council + broader product) by anchoring on **`application_admin_results`** and **`PUT /api/applications/{applicationId}/admin-result`**. + +--- + +## 0. Evaluation of the reviewed draft (summary) + +The reviewed version improves the original repo doc in several ways; **adopt these**: + +| Theme | Verdict | +|--------|---------| +| **Locked v1 scope** | In-app inbox only; ~60s polling + `refetchOnWindowFocus`; **no** email, MinIO PDF, WebSocket, or council unification in v1. Reduces scope and matches current team capacity. | +| **Append-on-every-save** | Explicit product choice: new row per admin save; optional **UI-only** collapsing of consecutive rows. Clear and simple. | +| **Schema pragmatism** | v1 `type` with `TEXT + CHECK`; **omit `JSONB` until a second notification type** exists. Fewer columns to maintain. | +| **Indexes** | `created_at DESC` inbox index + **partial index** on `(recipient_user_id) WHERE read_at IS NULL` for unread count. Appropriate. | +| **Security** | `PATCH .../read` returns **404** for foreign rows (same as missing id) to avoid user enumeration. Good default. | +| **Helper surface** | `notification_service.create_admin_decision_notification(...)` keeps the admin route thin and future council hook consistent. | + +**Adjust for this repository (critical):** + +1. **`application_id` type** — Public application identifiers in this project are **strings** (e.g. `sub-{hex}`, case codes), not UUIDs. The notification table should use **`application_id TEXT NOT NULL`** (or `VARCHAR`) for deep links, matching `ApplicationItem.id` and API paths—not `UUID`. +2. **“try/except without rolling back the decision”** — With **`get_session()`**, everything runs in **one transaction** that **commits once** at context exit. If the notification `INSERT` fails **after** the upsert has flushed, the **entire** transaction—including the decision—rolls back on exception, unless you: + - use a **`begin_nested()` savepoint** around the notification insert (rollback to savepoint on failure, then continue), or + - perform the notification insert in a **separate short session after** the admin-result transaction **commits** (best-effort second transaction). + + The reviewed draft’s intent (“decision is sacred; notification best-effort”) is right; the implementation must use one of the patterns above, not a bare `try/except` in the same flat transaction. + +--- + +## 1. v1 scope (locked) + +- **In-app notifications only:** PostgreSQL table + applicant `GET` / `PATCH` + `/dashboard/notifications` + optional header bell. +- **Delivery:** React Query `refetchInterval: 60_000` and `refetchOnWindowFocus: true` (no SSE/WebSocket in v1). +- **Write trigger:** successful **admin-result** upsert (same product semantics as today’s `AdminStaffReadonlyReviewDialog` / `ResultManager`). +- **Deferred to v2:** email + outbox worker, MinIO/PDF letter, council `localStorage` → API unification, notification preferences, i18n beyond Vietnamese, retention jobs. + +--- + +## 2. Current state (baseline) + +### 2.1 Data and decisions + +| Layer | What exists | +|--------|-------------| +| **PostgreSQL** | `initiatives.status` ↔ **`application_admin_results`** on **`PUT …/admin-result`** (idempotent upsert). | +| **Applicant reads** | `GET /api/applications/mine`, `GET /api/applications/{id}` — status and **`nhan_xet`** can reflect admin feedback after submission enrichment. | +| **Admin** | Decided list `lifecycle=decided`; React Query invalidation on save. | +| **Council** | Some flows still use **`localStorage`**; not applicant-visible until server-backed (v2; see assets plan). | + +**Gap:** No **`user_notifications`** (or equivalent); applicants only discover changes by refetching lists/detail. + +### 2.2 Frontend + +- **Sonner:** admin-only affordance at save time. +- **React Query:** `applications`, `applications-mine`, `application-detail`, etc.—no notification queries yet. +- **`/dashboard/notifications`:** linked from sidebars; **no page implementation** observed. + +### 2.3 Backend + +- FastAPI + transactional `get_session()`; natural hook: after admin-result body returns, or inside handler with savepoint / post-commit insert (see §0). + +### 2.4 MinIO (v2 only for notifications) + +Evidence/exports buckets are **not** the source of truth for notification text. Optional v2 PDF letter remains separate from v1. + +--- + +## 3. Target architecture (v1) + +```mermaid +flowchart LR + subgraph admin [Admin UI] + A[Confirm / ResultManager] + end + subgraph be [be0] + B[PUT admin-result] + C[Notification helper] + end + subgraph db [PostgreSQL] + D[application_admin_results] + E[initiatives] + F[user_notifications] + end + subgraph fe [Applicant FE] + H[Polling + onFocus] + I[Inbox + bell] + end + A --> B + B --> D + B --> E + B --> C + C --> F + F --> H + H --> I +``` + +--- + +## 4. Database design (v1) + +### 4.1 Principles + +- **`application_admin_results`** remains canonical for full feedback/rationale. +- Notification rows are **summaries + pointer** to the public **`application_id` string** and optional FKs for audit. + +### 4.2 Table: `user_notifications` + +| Column | Type | Notes | +|--------|------|------| +| `id` | UUID PK | | +| `recipient_user_id` | UUID FK → `users.id` (`ON DELETE CASCADE`) | From `initiatives.owner_id` at insert. | +| `type` | `TEXT NOT NULL` | v1: `CHECK (type IN ('admin_application_decision'))`. | +| `title` | `TEXT NOT NULL` | e.g. “Kết quả duyệt hồ sơ”. | +| `body` | `TEXT NOT NULL` | Decision label + ~280 chars feedback (newline-stripped). | +| `application_id` | `TEXT NOT NULL` | **Public** id (`sub-…` / case-shaped), matches API list/detail. | +| `related_initiative_id` | UUID FK → `initiatives.id` (`ON DELETE SET NULL`) | | +| `source_admin_result_id` | UUID FK → `application_admin_results.id` (`ON DELETE SET NULL`) | | +| `read_at` | `TIMESTAMPTZ` nullable | | +| `created_at` | `TIMESTAMPTZ NOT NULL DEFAULT now()` | | + +**v1:** no `payload JSONB`; add when a second `type` needs extra fields. + +### 4.3 Indexes + +```sql +CREATE INDEX user_notifications_inbox_idx + ON user_notifications (recipient_user_id, created_at DESC); + +CREATE INDEX user_notifications_unread_idx + ON user_notifications (recipient_user_id) + WHERE read_at IS NULL; +``` + +### 4.4 Insertion semantics + +- **Product:** append **one row per successful admin-result save** (including typo fixes). +- **Technical:** implement **best-effort** notification with **savepoint** (`session.begin_nested()`) or **post-commit** insert so a notification failure **never** rolls back the adjudication. Document the chosen pattern in code comments next to the handler. + +--- + +## 5. Backend API (`be0`) + +### 5.1 Write path + +After **`upsert_admin_result`** succeeds, resolve `owner_id`, build title/body, insert `user_notifications` via helper using the patterns in §4.4. + +Sketch: + +```text +notification_service.create_admin_decision_notification( + session, *, initiative, admin_result_row, application_id_public: str, decision_label: str +) -> UserNotification | None +``` + +### 5.2 Read paths (applicant-only in v1) + +| Method | Purpose | +|--------|---------| +| `GET /api/notifications` | Paginated; `recipient_user_id = current user`; fields: `id`, `type`, `title`, `body`, `read_at`, `created_at`, `application_id`, `related_initiative_id`. | +| `GET /api/notifications/unread-count` | Count unread; benefits from partial index. | +| `PATCH /api/notifications/{id}/read` | Set `read_at = now()` only if row belongs to user. | + +**Authorization:** for `PATCH`, return **404** if row missing **or** not owned (same body as missing). + +### 5.3 Relationship to applications API + +Notifications complement **`GET /api/applications/{id}`** (status + feedback); they do not replace it. + +--- + +## 6. Frontend (`fe0`) + +- **Page:** implement **`/dashboard/notifications`**. +- **React Query:** `["notifications", { page }]` and `["notifications-unread-count"]`. +- **Polling:** `refetchInterval: 60_000`, `refetchOnWindowFocus: true`. +- **UX:** row click → `PATCH .../read` (optimistic unread decrement optional) → navigate using **`application_id`** string to existing applicant routes. +- **Bell:** subscribe to unread-count query; same polling cadence. +- **Admin UI:** no change required for v1 unless product adds “don’t notify” toggle later. + +--- + +## 7. Security and privacy + +- Scope all reads/patch to authenticated **recipient**. +- Do not log full notification bodies in verbose HTTP logs in production. +- `recipient_user_id` snapshot at insert time: historical rows stay with original recipient if ownership changes. + +--- + +## 8. Rollout order (v1) + +1. Migration: table + indexes. +2. SQLAlchemy model + `notification_service` helper (with savepoint or post-commit). +3. Wire **admin-result** handler. +4. Applicant `GET` list, `GET` unread-count, `PATCH` read. +5. FE: inbox page + bell. +6. Tests: notification appears after PUT (applicant token); PATCH read; PATCH foreign id → 404; **notification failure does not undo admin-result** (integration test with forced insert error). + +--- + +## 9. v2 candidates (out of scope for v1) + +| Item | Notes | +|------|------| +| Email | Outbox + worker; `user_notifications` stays canonical. | +| MinIO PDF | Generate on save; store artifact key; optional `payload` JSONB for metadata. | +| Council | New `type` + `CHECK` extension + second writer from council API. | +| Preferences / retention | Add when volume or compliance requires. | + +--- + +## 10. Relation to other docs + +- **`assets/APPLICANT_STATUS_NOTIFICATIONS_PLAN.md`** — council outcomes, workflow, broader audit. This file is the **admin-first, v1-scoped** implementation companion. + +--- + +*Update when migrations land, when the transaction strategy (savepoint vs post-commit) is fixed in code, or when v2 scope is agreed.* diff --git a/docs/ADMIN_APPLICATIONS_ADMIN_RESULT_RADICAL_FIX_PLAN.md b/docs/ADMIN_APPLICATIONS_ADMIN_RESULT_RADICAL_FIX_PLAN.md new file mode 100644 index 0000000..d06e6bd --- /dev/null +++ b/docs/ADMIN_APPLICATIONS_ADMIN_RESULT_RADICAL_FIX_PLAN.md @@ -0,0 +1,175 @@ +# Radical fix plan: admin applications, admin results, and dashboard consistency + +This document captures how the current stack is wired, what went wrong with “save decision → see it in **Kết quả đăng ký**”, and a **phased, radical** plan to make the behavior **reliable by design**—not only by patching one screen. + +--- + +## 1. Problem statement (what users experience) + +1. An admin uses **Mẫu hồ sơ và Minh chứng** → **Duyệt (xem trước)** → **Xác nhận** (`AdminStaffReadonlyReviewDialog` + `upsertAdminApplicationResult`). +2. They expect the submission to appear under **Kết quả đăng ký** (`ConsideredInitiativesList` → `ApprovedApplicationsList` with `lifecycle=decided`). +3. Sometimes nothing changes, or behaviour is confusing, because **several independent subsystems** must stay aligned: HTTP client semantics, API routes, PostgreSQL `initiatives.status`, optional file fallback, React Query keys, and optional MinIO (evidence—not the same as admin result). + +--- + +## 2. Implementation map (how it fits together) + +### 2.1 Frontend + +| Piece | Role | +|--------|------| +| `ConsideredInitiativesList` | Renders `ApprovedApplicationsList` with `lifecycle="decided"` — only rows whose **list `status`** is `approved` or `rejected`. | +| `ApprovedApplicationsList` | `useQuery(["applications", filters])` → `GET /api/applications` with filters including `lifecycle`. | +| `AdminDocxTemplatePreview` | Opens `AdminStaffReadonlyReviewDialog`; confirm calls `upsertAdminApplicationResult`. | +| `applicationAdminResultApi` | `fetch` + `create` / `update`; `upsert` = **GET then POST or PUT**. | +| `shared/api/client.ts` | Axios `validateStatus: (s) => s < 500` — **4xx responses do not throw** unless callers pass an override. | + +### 2.2 Backend + +| Piece | Role | +|--------|------| +| `POST/PUT/GET/DELETE …/admin-result` | Persists `application_admin_results` and sets **`initiatives.status`** to `approved` / `rejected` (PostgreSQL). | +| `GET /api/applications` | **Primary:** `list_submitted_applications` from Postgres. **Fallback:** `_load_submitted_items()` file index if Postgres fails or is disabled. | +| `GET /api/applications/{id}` | Same pattern: Postgres first, then file index. | +| MinIO (S3-compatible) | Evidence / attachments buckets; **not** where admin “decision rows” live. Decisions are **Postgres**; MinIO only matters for evidence flows. | + +### 2.3 Data flow (intended) + +```mermaid +sequenceDiagram + participant UI as Admin UI + participant API as be0 FastAPI + participant PG as Postgres + participant List as GET /api/applications + + UI->>API: upsert admin-result (POST or PUT) + API->>PG: application_admin_results + initiatives.status + UI->>API: invalidate + refetch applications + List->>PG: list initiatives + drafts → status approved/rejected + List-->>UI: decided list +``` + +Breakage happens when **any** step returns “success” without real data, reads from a **different backend** than writes, or treats **HTTP 404** as a valid JSON body. + +--- + +## 3. Root causes (systemic, not one-line bugs) + +### A. Global Axios: 4xx treated as success (`validateStatus < 500`) + +- For `GET /api/.../admin-result` with **no row**, the server returns **404** + `{ detail: "…" }`. +- With the default client, that **resolves** instead of rejects. +- Any code that checks `if (!data)` **fails** → truthy object `{ detail }` looks like “existing result”. +- **`upsert`** then chose **PUT** instead of **POST** on first save → no row created, **silent wrong success** possible. + +**Partial fix already applied:** pass `axiosSuccessStatusOnly` (2xx-only) on all `applicationAdminResultApi` calls. + +**Radical fix:** see §5.1 — stop relying on opt-in overrides. + +### B. Client upsert = GET + mutate (race + footguns) + +- Two round trips; duplicated server rules; easy to get wrong if GET semantics change. +- Prefer **idempotent server upsert** (`PUT` with “create or replace” semantics) **or** `POST`-only with clear 409 handling. + +### C. Dual source of truth for application lists (Postgres vs file index) + +- Listing can **silently fall back** to `_load_submitted_items()` when Postgres throws. +- Admin-result writes **only** hit Postgres. +- Result: UI can show **stale or empty** “decided” data while DB was actually updated (or the reverse in dev). + +### D. React Query key `["applications", filters]` + +- Invalidate `["applications"]` is correct for TanStack Query partial matching, but **any** cache/subscription edge case should be covered by tests. + +### E. MinIO vs admin decision (scope confusion) + +- **Fixing “Kết quả đăng ký”** does not require MinIO updates. +- Evidence upload paths are separate; do not conflate in testing or plans. + +--- + +## 4. Verification already performed (baseline) + +- **Docker:** `postgres`, `minio`, `be0` healthy; `GET /api/v1/test` OK. +- **Postgres + `create_admin_result`:** `initiatives.status` and `application_admin_results` stay in sync when using the Python layer directly. +- **Integration test `test_applications_db_integration`:** one failing test (`get_application_by_id` fallback `sub-…` without `submissionRecord.id`) — suggests **ID-resolution** edge cases still risky; align with list/GET contract in the same plan. +- **Host Python** may lack `boto3`; validate MinIO from **`be0` container** or install dev deps locally for S3 tests. + +--- + +## 5. Radical fixes (options, from smaller to larger) + +### 5.1 Frontend: default to real HTTP semantics (highest leverage) + +**Goal:** No API call “succeeds” with a 404/422 body unless explicitly handled. + +**Options:** + +1. **Change default `validateStatus` to 2xx-only** in `ApiClient`, then **globally fix** call sites that depended on reading `{ detail }` from a resolved 4xx response (likely few; grep for patterns). +2. **Two clients:** `apiClientStrict` (2xx-only) for CRUD and `apiClient` legacy only where needed—migrate modules incrementally. +3. **Response interceptor:** if `status >= 400`, reject with unified `ApiError` (preserves current “no throw on 4xx” idea but **never** returns `res.data` as success to `.then()`). + +**Acceptance:** ESLint rule or CI script: forbid `apiClient.get/post/put/delete` without explicit `validateStatus` or wrapper. + +### 5.2 Backend: atomic admin-result upsert + +**Goal:** Single request, no client-side GET-before-POST. + +- Expose **`PUT /api/applications/{id}/admin-result`** as **idempotent upsert** (create or update in one transaction), or document **`POST`** as upsert with unique constraint handling. +- Optionally return **updated application row** snippet (`status`, `applicationId`) so the client can patch cache without listing. + +### 5.3 Backend: single listing source in production + +**Goal:** No silent list fallback when `INITIATIVE_DATABASE_URL` is set. + +- On Postgres enabled: **fail listing with 503** and a clear JSON error instead of falling back to files; or log + metrics + feature flag. +- Deprecate file index for `/api/applications` in environments where submissions are always in PG. + +### 5.4 Contract tests (API + FE) + +- **pytest/httpx:** `POST admin-result` → `GET /api/applications?lifecycle=decided` contains that `id`. +- **Playwright or MSW:** Admin flow confirm → list row appears (requires auth fixture). + +### 5.5 Observability + +- Structured logs: `application_id`, `initiative_id`, `decision`, `source=postgres|file_fallback`. +- Metrics: `applications_list_fallback_total`, `admin_result_upsert_duration_ms`. + +--- + +## 6. Recommended phases (practical rollout) + +| Phase | Scope | Outcome | +|-------|--------|---------| +| **P0** | Keep `axiosSuccessStatusOnly` on admin-result API; add **one** e2e/API test: upsert → decided list. | Regression guard. | +| **P1** | Introduce **strict HTTP** default or interceptor (§5.1); fix broken call sites. | Class of bugs eliminated. | +| **P2** | Backend **idempotent PUT** upsert; simplify client to single call. | Fewer races, simpler mental model. | +| **P3** | Remove or gate **file fallback** for `/api/applications` when PG is configured. | Align list with admin writes. | +| **P4** | Fix failing DB test for `submissionRecord.id` omission; document canonical `applicationId` rules. | Predictable IDs end-to-end. | + +--- + +## 7. Out of scope / non-goals + +- **MinIO** consistency for “admin approve” — wrong layer unless the feature explicitly writes objects on decision. +- **Council** flow (`saveCouncilReviewOutcome` / local storage) — separate product path; only mention if merging with admin outcomes. + +--- + +## 8. Decision log (fill in as you implement) + +| Date | Decision | Rationale | +|------|----------|-----------| +| | | | + +--- + +## References (code) + +- `fe0/src/shared/api/client.ts` — global `validateStatus`. +- `fe0/src/lib/applicationReviewApi.ts` — `axiosSuccessStatusOnly`. +- `fe0/src/lib/applicationAdminResultApi.ts` — admin-result CRUD + upsert. +- `fe0/src/components/admin/result/ConsideredInitiativesList.tsx` — decided list entry point. +- `be0/src/initiative_db/application_admin_results.py` — DB writes + `initiative.status`. +- `be0/main.py` — `list_applications` Postgres vs file fallback. +- `be0/tests/test_applications_db_integration.py` — Postgres integration tests. diff --git a/docs/ARCHITECTURE_REDESIGN.md b/docs/ARCHITECTURE_REDESIGN.md new file mode 100644 index 0000000..d39f630 --- /dev/null +++ b/docs/ARCHITECTURE_REDESIGN.md @@ -0,0 +1,871 @@ +# Architecture Redesign Proposal + +## Overview + +This document outlines a comprehensive architectural redesign for the ProfytAI Compliance Management Platform, addressing critical issues identified in the current implementation. + +## Design Principles + +1. **Separation of Concerns**: Clear boundaries between layers +2. **Dependency Injection**: Loose coupling, easy testing +3. **Domain-Driven Design**: Business logic in domain layer +4. **Security First**: Authentication, authorization, input validation +5. **Testability**: All components should be easily testable +6. **Scalability**: Support for horizontal scaling +7. **Maintainability**: Clear structure, minimal complexity + +--- + +## Proposed Architecture: Layered Architecture with Clean Architecture Principles + +``` +┌─────────────────────────────────────────────────────────┐ +│ Presentation Layer │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ API Routes │ │ Middleware │ │ WebSocket │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +└─────────────────────────────────────────────────────────┘ + │ +┌─────────────────────────────────────────────────────────┐ +│ Application Layer │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Services │ │ Use Cases │ │ DTOs │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +└─────────────────────────────────────────────────────────┘ + │ +┌─────────────────────────────────────────────────────────┐ +│ Domain Layer │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Entities │ │ Interfaces │ │ Value Obj. │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +└─────────────────────────────────────────────────────────┘ + │ +┌─────────────────────────────────────────────────────────┐ +│ Infrastructure Layer │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Repositories │ │ External │ │ Config │ │ +│ │ │ │ Services │ │ │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +--- + +## New Directory Structure + +``` +be0/ +├── src/ +│ ├── api/ # API Layer +│ │ ├── __init__.py +│ │ ├── dependencies.py # Dependency injection +│ │ ├── middleware/ +│ │ │ ├── __init__.py +│ │ │ ├── auth.py # Authentication middleware +│ │ │ ├── cors.py # CORS configuration +│ │ │ ├── rate_limit.py # Rate limiting +│ │ │ └── error_handler.py # Global error handling +│ │ ├── routes/ +│ │ │ ├── __init__.py +│ │ │ ├── workflows.py # Workflow endpoints +│ │ │ ├── documents.py # Document endpoints +│ │ │ ├── compliance.py # Compliance endpoints +│ │ │ ├── health.py # Health check +│ │ │ └── auth.py # Authentication endpoints +│ │ └── schemas/ # Request/Response schemas +│ │ ├── __init__.py +│ │ ├── workflow.py +│ │ ├── document.py +│ │ └── compliance.py +│ │ +│ ├── application/ # Application Layer +│ │ ├── __init__.py +│ │ ├── services/ +│ │ │ ├── __init__.py +│ │ │ ├── workflow_service.py +│ │ │ ├── document_service.py +│ │ │ ├── compliance_service.py +│ │ │ └── ai_service.py +│ │ ├── use_cases/ +│ │ │ ├── __init__.py +│ │ │ ├── create_workflow.py +│ │ │ ├── update_workflow_item.py +│ │ │ ├── analyze_compliance.py +│ │ │ └── process_document.py +│ │ └── dto/ # Data Transfer Objects +│ │ ├── __init__.py +│ │ ├── workflow_dto.py +│ │ └── compliance_dto.py +│ │ +│ ├── domain/ # Domain Layer +│ │ ├── __init__.py +│ │ ├── entities/ +│ │ │ ├── __init__.py +│ │ │ ├── workflow.py +│ │ │ ├── workflow_item.py +│ │ │ ├── document.py +│ │ │ └── compliance_rule.py +│ │ ├── value_objects/ +│ │ │ ├── __init__.py +│ │ │ ├── task_status.py +│ │ │ └── workflow_phase.py +│ │ ├── interfaces/ # Repository interfaces +│ │ │ ├── __init__.py +│ │ │ ├── workflow_repository.py +│ │ │ ├── document_repository.py +│ │ │ └── compliance_repository.py +│ │ └── exceptions/ +│ │ ├── __init__.py +│ │ ├── domain_exceptions.py +│ │ └── service_exceptions.py +│ │ +│ ├── infrastructure/ # Infrastructure Layer +│ │ ├── __init__.py +│ │ ├── database/ +│ │ │ ├── __init__.py +│ │ │ ├── connection.py # DB connection pool +│ │ │ ├── repositories/ +│ │ │ │ ├── __init__.py +│ │ │ │ ├── workflow_repository_impl.py +│ │ │ │ ├── document_repository_impl.py +│ │ │ │ └── neo4j_repository.py +│ │ │ └── migrations/ +│ │ ├── external/ +│ │ │ ├── __init__.py +│ │ │ ├── ollama_client.py # Ollama service client +│ │ │ └── storage/ +│ │ │ ├── __init__.py +│ │ │ └── file_storage.py # File storage abstraction +│ │ ├── config/ +│ │ │ ├── __init__.py +│ │ │ ├── settings.py # Pydantic settings +│ │ │ └── logging_config.py +│ │ └── security/ +│ │ ├── __init__.py +│ │ ├── auth.py # JWT, password hashing +│ │ └── permissions.py +│ │ +│ ├── core/ # Core utilities +│ │ ├── __init__.py +│ │ ├── logging.py +│ │ ├── exceptions.py +│ │ └── constants.py +│ │ +│ └── main.py # Application entry point +│ +├── tests/ # Test suite +│ ├── __init__.py +│ ├── unit/ +│ │ ├── domain/ +│ │ ├── application/ +│ │ └── infrastructure/ +│ ├── integration/ +│ │ ├── api/ +│ │ └── database/ +│ ├── fixtures/ +│ └── conftest.py +│ +├── alembic/ # Database migrations +│ ├── versions/ +│ └── env.py +│ +├── requirements.txt +├── requirements-dev.txt +├── .env.example +└── Dockerfile +``` + +--- + +## Key Architectural Components + +### 1. API Layer (Presentation) + +**Purpose**: Handle HTTP requests, validate input, return responses + +**Responsibilities**: +- Route definitions +- Request/Response serialization +- Input validation +- Authentication/Authorization checks +- Error handling + +### 2. Application Layer + +**Purpose**: Orchestrate business logic, coordinate between domain and infrastructure + +**Responsibilities**: +- Use case implementation +- Service orchestration +- DTO transformation +- Transaction management + +### 3. Domain Layer + +**Purpose**: Core business logic, entities, and business rules + +**Responsibilities**: +- Domain entities +- Business rules +- Value objects +- Domain events +- Repository interfaces (abstractions) + +### 4. Infrastructure Layer + +**Purpose**: External concerns - database, file system, external APIs + +**Responsibilities**: +- Database access +- External API clients +- File storage +- Configuration +- Security implementation + +--- + +## Implementation Examples + +### Example 1: Configuration Management + +```python +# infrastructure/config/settings.py +from pydantic_settings import BaseSettings +from typing import List + +class Settings(BaseSettings): + # Application + app_name: str = "ProfytAI Compliance Platform" + app_version: str = "1.0.0" + debug: bool = False + + # Server + host: str = "0.0.0.0" + port: int = 4402 + + # Database + neo4j_uri: str + neo4j_user: str + neo4j_password: str + + # Security + secret_key: str + algorithm: str = "HS256" + access_token_expire_minutes: int = 30 + cors_origins: List[str] = [] + + # AI/ML + ollama_base_url: str = "http://localhost:11434" + ollama_model: str = "gemma3:27b" + embedding_model: str = "embeddinggemma:300m" + + # Storage + upload_dir: str = "./assets/data/uploads" + max_upload_size: int = 10 * 1024 * 1024 # 10MB + + # Rate Limiting + rate_limit_per_minute: int = 60 + + class Config: + env_file = ".env" + case_sensitive = False + +settings = Settings() +``` + +### Example 2: Domain Entity + +```python +# domain/entities/workflow.py +from dataclasses import dataclass, field +from datetime import datetime +from typing import List, Optional +from uuid import UUID, uuid4 +from domain.value_objects.task_status import TaskStatus +from domain.value_objects.workflow_phase import WorkflowPhase + +@dataclass +class WorkflowItem: + id: int + task: str + status: TaskStatus + requires_approval: bool + approver: Optional[str] = None + comment: Optional[str] = None + updated_by: Optional[str] = None + updated_at: Optional[datetime] = None + +@dataclass +class Workflow: + id: UUID + project_name: str + project_description: Optional[str] + records_officer_email: Optional[str] + current_phase: WorkflowPhase + checklist_items: List[WorkflowItem] = field(default_factory=list) + completed_items: List[int] = field(default_factory=list) + pending_approvals: List[str] = field(default_factory=list) + comments: dict = field(default_factory=dict) + validation_results: dict = field(default_factory=dict) + created_at: datetime = field(default_factory=datetime.utcnow) + updated_at: datetime = field(default_factory=datetime.utcnow) + + def add_item(self, item: WorkflowItem) -> None: + """Add a checklist item to the workflow.""" + self.checklist_items.append(item) + self.updated_at = datetime.utcnow() + + def update_item_status( + self, + item_id: int, + status: TaskStatus, + updated_by: str, + comment: Optional[str] = None + ) -> None: + """Update the status of a workflow item.""" + item = next((i for i in self.checklist_items if i.id == item_id), None) + if not item: + raise ValueError(f"Item {item_id} not found") + + item.status = status + item.updated_by = updated_by + item.updated_at = datetime.utcnow() + if comment: + item.comment = comment + + if status == TaskStatus.COMPLETED and item_id not in self.completed_items: + self.completed_items.append(item_id) + + self.updated_at = datetime.utcnow() + + def can_advance_phase(self) -> bool: + """Check if workflow can advance to next phase.""" + all_completed = all( + item.status == TaskStatus.COMPLETED + for item in self.checklist_items + ) + no_pending_approvals = len(self.pending_approvals) == 0 + return all_completed and no_pending_approvals + + @property + def completion_percentage(self) -> float: + """Calculate completion percentage.""" + if not self.checklist_items: + return 0.0 + completed = len(self.completed_items) + total = len(self.checklist_items) + return (completed / total) * 100 +``` + +### Example 3: Repository Interface (Domain) + +```python +# domain/interfaces/workflow_repository.py +from abc import ABC, abstractmethod +from typing import List, Optional +from uuid import UUID +from domain.entities.workflow import Workflow + +class IWorkflowRepository(ABC): + """Repository interface for workflow persistence.""" + + @abstractmethod + async def create(self, workflow: Workflow) -> Workflow: + """Create a new workflow.""" + pass + + @abstractmethod + async def get_by_id(self, workflow_id: UUID) -> Optional[Workflow]: + """Get workflow by ID.""" + pass + + @abstractmethod + async def get_all(self, skip: int = 0, limit: int = 100) -> List[Workflow]: + """Get all workflows with pagination.""" + pass + + @abstractmethod + async def update(self, workflow: Workflow) -> Workflow: + """Update an existing workflow.""" + pass + + @abstractmethod + async def delete(self, workflow_id: UUID) -> bool: + """Delete a workflow.""" + pass +``` + +### Example 4: Repository Implementation (Infrastructure) + +```python +# infrastructure/database/repositories/workflow_repository_impl.py +from typing import List, Optional +from uuid import UUID +from domain.entities.workflow import Workflow +from domain.interfaces.workflow_repository import IWorkflowRepository +from infrastructure.database.connection import get_db_session +from sqlalchemy.ext.asyncio import AsyncSession +from sqlalchemy import select + +class WorkflowRepository(IWorkflowRepository): + """Neo4j implementation of workflow repository.""" + + def __init__(self, session: AsyncSession): + self.session = session + + async def create(self, workflow: Workflow) -> Workflow: + """Create workflow in Neo4j.""" + query = """ + CREATE (w:Workflow { + id: $id, + project_name: $project_name, + project_description: $project_description, + records_officer_email: $records_officer_email, + current_phase: $current_phase, + created_at: $created_at, + updated_at: $updated_at + }) + RETURN w + """ + # Implementation details... + return workflow + + async def get_by_id(self, workflow_id: UUID) -> Optional[Workflow]: + """Get workflow by ID from Neo4j.""" + query = """ + MATCH (w:Workflow {id: $workflow_id}) + OPTIONAL MATCH (w)-[:HAS_ITEM]->(i:WorkflowItem) + RETURN w, collect(i) as items + """ + # Implementation details... + pass + + # ... other methods +``` + +### Example 5: Service Layer + +```python +# application/services/workflow_service.py +from typing import List, Optional +from uuid import UUID +from domain.entities.workflow import Workflow, WorkflowItem +from domain.interfaces.workflow_repository import IWorkflowRepository +from domain.value_objects.workflow_phase import WorkflowPhase +from domain.value_objects.task_status import TaskStatus +from domain.exceptions.domain_exceptions import WorkflowNotFoundError + +class WorkflowService: + """Service for workflow business logic.""" + + def __init__(self, workflow_repository: IWorkflowRepository): + self.workflow_repository = workflow_repository + + async def create_workflow( + self, + project_name: str, + project_description: Optional[str], + records_officer_email: Optional[str] + ) -> Workflow: + """Create a new workflow with initial phase.""" + workflow = Workflow( + id=UUID(), + project_name=project_name, + project_description=project_description, + records_officer_email=records_officer_email, + current_phase=WorkflowPhase.CONCEPT_DEVELOPMENT + ) + + # Initialize Phase 1 items + phase1_items = self._get_phase1_items() + for item in phase1_items: + workflow.add_item(item) + + return await self.workflow_repository.create(workflow) + + async def get_workflow(self, workflow_id: UUID) -> Workflow: + """Get workflow by ID.""" + workflow = await self.workflow_repository.get_by_id(workflow_id) + if not workflow: + raise WorkflowNotFoundError(f"Workflow {workflow_id} not found") + return workflow + + async def update_workflow_item( + self, + workflow_id: UUID, + item_id: int, + status: TaskStatus, + updated_by: str, + comment: Optional[str] = None + ) -> Workflow: + """Update a workflow item.""" + workflow = await self.get_workflow(workflow_id) + workflow.update_item_status(item_id, status, updated_by, comment) + return await self.workflow_repository.update(workflow) + + async def advance_workflow(self, workflow_id: UUID) -> Workflow: + """Advance workflow to next phase.""" + workflow = await self.get_workflow(workflow_id) + + if not workflow.can_advance_phase(): + raise ValueError("Cannot advance: Phase requirements not met") + + # Advance to next phase logic... + return await self.workflow_repository.update(workflow) + + def _get_phase1_items(self) -> List[WorkflowItem]: + """Get Phase 1 checklist items.""" + return [ + WorkflowItem( + id=1, + task="Include Records Officer in system design process", + status=TaskStatus.PENDING, + requires_approval=True, + approver="Records Officer" + ), + # ... more items + ] +``` + +### Example 6: API Route with Dependency Injection + +```python +# api/routes/workflows.py +from fastapi import APIRouter, Depends, HTTPException, status +from uuid import UUID +from typing import List +from api.schemas.workflow import ( + WorkflowCreateRequest, + WorkflowResponse, + WorkflowItemUpdateRequest +) +from application.services.workflow_service import WorkflowService +from api.dependencies import get_workflow_service, get_current_user +from domain.value_objects.task_status import TaskStatus + +router = APIRouter(prefix="/workflows", tags=["workflows"]) + +@router.post("", response_model=WorkflowResponse, status_code=status.HTTP_201_CREATED) +async def create_workflow( + request: WorkflowCreateRequest, + workflow_service: WorkflowService = Depends(get_workflow_service), + current_user = Depends(get_current_user) +): + """Create a new workflow.""" + try: + workflow = await workflow_service.create_workflow( + project_name=request.project_name, + project_description=request.project_description, + records_officer_email=request.records_officer_email + ) + return WorkflowResponse.from_entity(workflow) + except Exception as e: + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail=str(e) + ) + +@router.get("/{workflow_id}", response_model=WorkflowResponse) +async def get_workflow( + workflow_id: UUID, + workflow_service: WorkflowService = Depends(get_workflow_service), + current_user = Depends(get_current_user) +): + """Get workflow by ID.""" + try: + workflow = await workflow_service.get_workflow(workflow_id) + return WorkflowResponse.from_entity(workflow) + except WorkflowNotFoundError: + raise HTTPException( + status_code=status.HTTP_404_NOT_FOUND, + detail="Workflow not found" + ) + +@router.put("/{workflow_id}/items", response_model=WorkflowResponse) +async def update_workflow_item( + workflow_id: UUID, + request: WorkflowItemUpdateRequest, + workflow_service: WorkflowService = Depends(get_workflow_service), + current_user = Depends(get_current_user) +): + """Update a workflow item.""" + try: + workflow = await workflow_service.update_workflow_item( + workflow_id=workflow_id, + item_id=request.item_id, + status=TaskStatus(request.status), + updated_by=current_user.email, + comment=request.comment + ) + return WorkflowResponse.from_entity(workflow) + except WorkflowNotFoundError: + raise HTTPException( + status_code=status.HTTP_404_NOT_FOUND, + detail="Workflow not found" + ) +``` + +### Example 7: Dependency Injection Setup + +```python +# api/dependencies.py +from functools import lru_cache +from infrastructure.database.connection import get_db_session +from infrastructure.database.repositories.workflow_repository_impl import WorkflowRepository +from application.services.workflow_service import WorkflowService +from infrastructure.external.ollama_client import OllamaClient +from application.services.compliance_service import ComplianceService +from infrastructure.config.settings import settings + +# Repository dependencies +async def get_workflow_repository(): + async for session in get_db_session(): + yield WorkflowRepository(session) + +# Service dependencies +def get_workflow_service( + workflow_repo: WorkflowRepository = Depends(get_workflow_repository) +) -> WorkflowService: + return WorkflowService(workflow_repo) + +def get_compliance_service() -> ComplianceService: + ollama_client = OllamaClient( + base_url=settings.ollama_base_url, + model=settings.ollama_model + ) + return ComplianceService(ollama_client) + +# Auth dependencies +async def get_current_user( + token: str = Depends(oauth2_scheme) +): + # JWT validation logic + pass +``` + +### Example 8: Main Application Setup + +```python +# main.py +from fastapi import FastAPI +from fastapi.middleware.cors import CORSMiddleware +from infrastructure.config.settings import settings +from infrastructure.config.logging_config import setup_logging +from api.middleware.error_handler import setup_exception_handlers +from api.middleware.cors import setup_cors +from api.routes import workflows, documents, compliance, health, auth + +# Setup logging +setup_logging() + +# Create FastAPI app +app = FastAPI( + title=settings.app_name, + version=settings.app_version, + debug=settings.debug +) + +# Setup middleware +setup_cors(app, settings.cors_origins) +setup_exception_handlers(app) + +# Include routers +app.include_router(auth.router) +app.include_router(workflows.router) +app.include_router(documents.router) +app.include_router(compliance.router) +app.include_router(health.router) + +@app.on_event("startup") +async def startup_event(): + """Initialize services on startup.""" + # Initialize database connections + # Initialize external services + pass + +@app.on_event("shutdown") +async def shutdown_event(): + """Cleanup on shutdown.""" + # Close database connections + # Cleanup resources + pass +``` + +--- + +## Security Improvements + +### 1. Authentication & Authorization + +```python +# infrastructure/security/auth.py +from datetime import datetime, timedelta +from typing import Optional +from jose import JWTError, jwt +from passlib.context import CryptContext +from infrastructure.config.settings import settings + +pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto") + +def verify_password(plain_password: str, hashed_password: str) -> bool: + """Verify a password against a hash.""" + return pwd_context.verify(plain_password, hashed_password) + +def get_password_hash(password: str) -> str: + """Hash a password.""" + return pwd_context.hash(password) + +def create_access_token(data: dict, expires_delta: Optional[timedelta] = None): + """Create JWT access token.""" + to_encode = data.copy() + if expires_delta: + expire = datetime.utcnow() + expires_delta + else: + expire = datetime.utcnow() + timedelta( + minutes=settings.access_token_expire_minutes + ) + to_encode.update({"exp": expire}) + encoded_jwt = jwt.encode( + to_encode, + settings.secret_key, + algorithm=settings.algorithm + ) + return encoded_jwt +``` + +### 2. CORS Configuration + +```python +# api/middleware/cors.py +from fastapi import FastAPI +from fastapi.middleware.cors import CORSMiddleware +from typing import List + +def setup_cors(app: FastAPI, allowed_origins: List[str]): + """Configure CORS middleware.""" + app.add_middleware( + CORSMiddleware, + allow_origins=allowed_origins, # Specific origins, not "*" + allow_credentials=True, + allow_methods=["GET", "POST", "PUT", "DELETE", "PATCH"], + allow_headers=["Content-Type", "Authorization"], + ) +``` + +### 3. Rate Limiting + +```python +# api/middleware/rate_limit.py +from fastapi import Request, HTTPException, status +from slowapi import Limiter, _rate_limit_exceeded_handler +from slowapi.util import get_remote_address +from slowapi.errors import RateLimitExceeded + +limiter = Limiter(key_func=get_remote_address) + +@router.post("") +@limiter.limit("10/minute") # 10 requests per minute +async def create_workflow(request: Request, ...): + # Implementation + pass +``` + +--- + +## Testing Structure + +```python +# tests/conftest.py +import pytest +from fastapi.testclient import TestClient +from main import app +from infrastructure.database.connection import get_test_db + +@pytest.fixture +def client(): + return TestClient(app) + +@pytest.fixture +def test_db(): + # Setup test database + yield + # Teardown + +# tests/unit/application/services/test_workflow_service.py +import pytest +from uuid import UUID +from application.services.workflow_service import WorkflowService +from domain.entities.workflow import Workflow + +@pytest.mark.asyncio +async def test_create_workflow(): + # Mock repository + mock_repo = MockWorkflowRepository() + service = WorkflowService(mock_repo) + + workflow = await service.create_workflow( + project_name="Test Project", + project_description="Test Description", + records_officer_email="test@example.com" + ) + + assert workflow.project_name == "Test Project" + assert workflow.current_phase == WorkflowPhase.CONCEPT_DEVELOPMENT +``` + +--- + +## Migration Strategy + +### Phase 1: Foundation (Week 1-2) +1. Create new directory structure +2. Set up configuration management +3. Implement dependency injection +4. Set up database connection + +### Phase 2: Domain Layer (Week 3) +1. Create domain entities +2. Define repository interfaces +3. Implement value objects + +### Phase 3: Infrastructure (Week 4) +1. Implement repository classes +2. Set up external service clients +3. Configure security + +### Phase 4: Application Layer (Week 5) +1. Create service classes +2. Implement use cases +3. Create DTOs + +### Phase 5: API Layer (Week 6) +1. Create route modules +2. Implement middleware +3. Set up error handling + +### Phase 6: Testing & Migration (Week 7-8) +1. Write unit tests +2. Write integration tests +3. Migrate existing endpoints +4. Deploy and monitor + +--- + +## Benefits of This Architecture + +1. **Testability**: Each layer can be tested independently +2. **Maintainability**: Clear separation of concerns +3. **Scalability**: Easy to add new features +4. **Security**: Built-in security at every layer +5. **Flexibility**: Easy to swap implementations (e.g., different databases) +6. **Team Collaboration**: Different teams can work on different layers + +--- + +## Next Steps + +1. Review and approve this architecture +2. Create detailed implementation plan +3. Set up project structure +4. Begin Phase 1 implementation +5. Establish coding standards and review process diff --git a/docs/ARCHITECTURE_SUMMARY.md b/docs/ARCHITECTURE_SUMMARY.md new file mode 100644 index 0000000..993853c --- /dev/null +++ b/docs/ARCHITECTURE_SUMMARY.md @@ -0,0 +1,94 @@ +# Architecture Redesign Summary + +## Quick Overview + +This document provides a quick reference for the architectural improvements proposed for the ProfytAI Compliance Management Platform. + +## Key Improvements + +### 1. **Layered Architecture** +- **API Layer**: HTTP request handling, validation, serialization +- **Application Layer**: Business logic orchestration, use cases +- **Domain Layer**: Core entities, business rules, interfaces +- **Infrastructure Layer**: Database, external services, configuration + +### 2. **Dependency Injection** +- Services depend on interfaces, not implementations +- Easy to test with mocks +- Flexible to swap implementations + +### 3. **Configuration Management** +- Type-safe settings with Pydantic +- Environment variable support +- Centralized configuration + +### 4. **Security** +- JWT authentication +- CORS with specific origins +- Rate limiting +- Input validation at every layer + +### 5. **Database Integration** +- Repository pattern +- Neo4j integration ready +- Migration support + +## File Structure Comparison + +### Before (Current) +``` +be0/ +├── main.py (545 lines - everything in one file) +├── src/ +│ ├── compliance_verifier.py +│ └── utils.py +``` + +### After (Proposed) +``` +be0/ +├── main.py (clean entry point) +├── src/ +│ ├── api/ (routes, middleware, schemas) +│ ├── application/ (services, use cases) +│ ├── domain/ (entities, interfaces) +│ ├── infrastructure/ (database, external, config) +│ └── core/ (utilities) +``` + +## Migration Checklist + +- [ ] Create new directory structure +- [ ] Set up configuration management +- [ ] Implement domain entities +- [ ] Create repository interfaces +- [ ] Implement repository classes +- [ ] Create service layer +- [ ] Split routes into modules +- [ ] Add authentication/authorization +- [ ] Implement error handling +- [ ] Add tests +- [ ] Update documentation + +## Benefits + +1. **Maintainability**: Clear structure, easy to find code +2. **Testability**: Each layer can be tested independently +3. **Scalability**: Easy to add new features +4. **Security**: Built-in at every layer +5. **Team Collaboration**: Different teams can work on different layers + +## Next Steps + +1. Review `ARCHITECTURE_REDESIGN.md` for detailed design +2. Review code examples in `be0/src/` +3. Plan migration timeline +4. Start with Phase 1 (Foundation) + +## Questions? + +Refer to the detailed `ARCHITECTURE_REDESIGN.md` document for: +- Complete architecture explanation +- Code examples +- Migration strategy +- Best practices diff --git a/docs/FIX_CHAT_ERROR.md b/docs/FIX_CHAT_ERROR.md new file mode 100644 index 0000000..c06f8d8 --- /dev/null +++ b/docs/FIX_CHAT_ERROR.md @@ -0,0 +1,139 @@ +# Fix Chat Assistant 500 Error + +## Issue +Getting 500 Internal Server Error when calling `/api/v1/chat` endpoint. + +## Root Causes + +1. **Model Name Mismatch** ✅ FIXED + - Code was using `gemma3:27b` but entrypoint pulls `gemma3:270M` + - **Fixed**: Updated code to use `gemma3:270M` + +2. **Ollama Not Running** + - Ollama service might not be started in the container + - Network connectivity issues + +3. **Model Not Available** + - Model might not be pulled yet + - Model name incorrect + +## Solutions + +### Solution 1: Restart the Container + +```bash +# Stop and restart the backend container +docker-compose down +docker-compose up -d be0 + +# Wait for Ollama to start (check logs) +docker-compose logs -f be0 +``` + +### Solution 2: Check Ollama Status + +```bash +# Check if container is running +docker ps | grep be0 + +# Check Ollama inside container +docker exec be0 ollama list + +# If Ollama is not running, start it +docker exec be0 ollama serve & +``` + +### Solution 3: Pull the Model + +```bash +# Pull the required model +docker exec be0 ollama pull gemma3:270M + +# Verify it's available +docker exec be0 ollama list | grep gemma3 +``` + +### Solution 4: Test the Health Endpoint + +```bash +# Check health endpoint (includes Ollama status) +curl http://localhost:4402/health + +# Should show: +# { +# "status": "healthy", +# "ollama": { +# "status": "connected", +# "available_models": ["gemma3:270M", ...] +# } +# } +``` + +### Solution 5: Check Backend Logs + +```bash +# View recent logs +docker-compose logs be0 | tail -50 + +# View ChatAssistant specific logs +tail -f be0/logs/ChatAssistant.log +``` + +## Quick Fix Commands + +```bash +# 1. Restart everything +docker-compose restart be0 + +# 2. Check Ollama +docker exec be0 ollama list + +# 3. Test health +curl http://localhost:4402/health + +# 4. Test chat endpoint +curl -X POST http://localhost:4402/api/v1/chat \ + -H "Content-Type: application/json" \ + -d '{"message": "Hello"}' +``` + +## What Was Fixed + +1. ✅ Model name changed from `gemma3:27b` to `gemma3:270M` +2. ✅ Added better error handling with specific error messages +3. ✅ Added Ollama connection check on initialization +4. ✅ Added health endpoint with Ollama status +5. ✅ Improved logging for debugging + +## Expected Behavior After Fix + +1. Container starts and Ollama service runs +2. Model `gemma3:270M` is available +3. Health endpoint shows Ollama as "connected" +4. Chat endpoint returns 200 with AI response + +## If Still Not Working + +1. **Check container logs:** + ```bash + docker-compose logs be0 + ``` + +2. **Check if Ollama is accessible:** + ```bash + docker exec be0 curl http://localhost:11434/api/tags + ``` + +3. **Manually start Ollama:** + ```bash + docker exec -d be0 ollama serve + sleep 2 + docker exec be0 ollama list + ``` + +4. **Rebuild container:** + ```bash + docker-compose down + docker-compose build be0 + docker-compose up be0 + ``` diff --git a/docs/HANDOFF.md b/docs/HANDOFF.md new file mode 100644 index 0000000..be9575f --- /dev/null +++ b/docs/HANDOFF.md @@ -0,0 +1,41 @@ +# HANDOFF — SciAgent / ImageHub +_Updated: 2026-06-29 (session-end — Gitea Actions CI/CD pipeline) · branch `main` · **40 commits LOCAL/unpushed** · 🟢 HMW-mode OFF_ + +## TL;DR +- Stood up the repo's **first CI/CD** — **Gitea Actions** on the self-hosted box `103.149.170.102:3000` (Gitea 1.26.2). Previously deploy was manual Docker Compose, **no CI**. +- Pipeline `.gitea/workflows/ci-cd.yml` = **backend** (per-file pytest + throwaway Postgres) · **frontend** (typecheck/build/vitest across workspaces) · **deploy** (host-mode `docker compose up -d` on push to main). Local commit `c2e869b`. +- **One hard gate left: NO act_runner is installed** → all runs queue, nothing executes/deploys. User must run `scripts/setup-gitea-runner.sh` on the box (I have no SSH there). + +## Shipped this session — commit `c2e869b` (local only) +- **`.gitea/workflows/ci-cd.yml`** — 3 jobs. backend: `pip install be0/requirements-dev.txt` then **pytest PER FILE** (loop) vs a `postgres:16-alpine` service (per-file avoids asyncpg cross-module event-loop contamination, [[be0-test-harness-reality]]). frontend: node 20, `npm ci`, `npm run typecheck` + `build`, `npm test --workspaces --if-present` (vitest in shared/investigator/publisher). deploy (`runs-on: deploy`, host): clone/reset **persistent `/srv/sciagent`** (NOT ephemeral — prod compose bind-mounts `./assets/minio-data`+`./be0`), write `.env` from secret `PROD_ENV`, `deploy-prod.sh --no-pull` + `check-prod-stack.sh`. +- **`be0/requirements-dev.txt`** — pytest + pytest-asyncio (neither was pinned). +- **`scripts/setup-gitea-runner.sh`** — act_runner 0.2.11 bootstrap (Docker+compose+node+systemd, labels `ci:docker://catthehacker/ubuntu:act-22.04,deploy:host`). ⚠️ runner registration token baked in (already public on Gitea mirror; rotatable). +- **Done via Gitea admin API (keychain user `oneness`, is_admin):** enabled Actions unit · stored secret `PROD_ENV` (valid prod `.env`, `PUBLIC_HOST=103.149.170.102`, fresh hex PG/MinIO pw + b64 JWT, `AUTH_MAIL_LOG_ONLY=1` placeholder) · minted runner token · pushed workflow+reqs to Gitea (workflow `state: active`). +- **Mirror refreshed** to current code: Gitea `main` now a **1212-file clean snapshot** (was 2026-06-14 / 965 files; now incl. all 4 monorepo FEs + the workflow). Leak-checked clean. Detail: [[gitea-cicd-pipeline]], [[gitea-mirror-and-tracked-secrets]]. + +## Current state +- Migrations 001…027 · 6 be0 routers · monorepo 4 FEs (`fe0` legacy standalone) + `@ump/shared`. +- Gitea workflow active; **runners online: 0**. PROD_ENV set; SMTP unfilled. +- Verify this session = artifact-level only (bash -n, pip syntax, YAML parse) — **no app code changed**, so BE/FE suites not re-run. + +## Next — P1 (start here) +1. **Install the runner** (user, needs root on the box — I have no SSH): `curl -fsSL http://103.149.170.102:3000/tlam89/sciagent/raw/branch/main/scripts/setup-gitea-runner.sh | sudo bash`. Then ping me → I verify it's Online (API) + watch the first run (backend→frontend→deploy), report PASS/FAIL with logs. +2. **Fill SMTP** in `PROD_ENV` secret (else OTP/reset mail only logs). Give me `SMTP_*` → I update the secret via API. +3. (Decision) fe0 vs frontend_user port role — deferred this session (fe0 NOT deployed; user confirmed it was a slip). + +## Open threads / risks +- 🔴 **NO runner = pipeline does nothing.** This is the blocker for all execution/deploy. +- 🔴 **40 commits LOCAL/unpushed to origin** — push to GitHub origin BLOCKED (history has `.env` secrets + 1.8 GB PII `assets/` → rotate + `git filter-repo` first). Gitea mirror is current; origin is not. Do NOT `git push origin`. +- First deploy = **fresh empty stack** (new Postgres via initdb migrations, empty MinIO) — no dev data carried over (assets/ excluded by design). +- Caught near-miss (documented): `git add -A` + `:(exclude)assets` did NOT exclude → leak-check stopped it pre-push. Reliable mirror method now in [[gitea-mirror-and-tracked-secrets]]. +- CLAUDE.md still STALE (says "no CI"; says migr 014 / 3 routers / `fe0`). + +## Quick commands +- Gitea API (admin): `CRED=$(printf 'protocol=http\nhost=103.149.170.102:3000\n\n'|git credential fill); U=…;P=…` then `curl -u $U:$P http://103.149.170.102:3000/api/v1/repos/tlam89/sciagent/actions/runners` (check online) / `…/actions/tasks` (runs). +- Runner install (on box, root): see P1 #1. +- Re-mint runner token: `curl -s -X POST -u $U:$P http://103.149.170.102:3000/api/v1/repos/tlam89/sciagent/actions/runners/registration-token`. + +## Reality flags +- CI lives on **Gitea** (`103.149.170.102:3000`), NOT GitHub. Push to Gitea = clean orphan snapshot convention (excl `.env`/`assets`/`.claude`/`CLAUDE.md`). Origin (GitHub) push stays blocked. +- **Push ≠ deploy.** Even with the runner up, deploy only fires on push to Gitea `main`. This session = local commit only; nothing deployed, nothing pushed to origin. +- 🟢 HMW-mode OFF. No sub-agents spawned this session (main-agent + API + git only). diff --git a/docs/PDF_TEMPLATE_IMPLEMENTATION.md b/docs/PDF_TEMPLATE_IMPLEMENTATION.md new file mode 100644 index 0000000..675ab41 --- /dev/null +++ b/docs/PDF_TEMPLATE_IMPLEMENTATION.md @@ -0,0 +1,985 @@ +# Implementation Guide — `sang-kien-pdf` + +A step-by-step walkthrough of how the Sáng kiến PDF + DOCX template generators are built. Read this if you want to understand **why** each piece exists, **how** to modify the layout, or **how** to port the same approach to a different government form. + +--- + +## Table of contents + +1. [The problem we're solving](#1-the-problem-were-solving) +2. [Architecture overview](#2-architecture-overview) +3. [Tech stack and rationale](#3-tech-stack-and-rationale) +4. [Project setup](#4-project-setup-from-scratch) +5. [Implementing the PDF generator](#5-implementing-the-pdf-generator) + - 5.1 [TypeScript data types](#51-typescript-data-types) + - 5.2 [Font registration](#52-font-registration) + - 5.3 [Shared styles](#53-shared-styles) + - 5.4 [Reusable components](#54-reusable-components) + - 5.5 [Page components](#55-page-components) + - 5.6 [Top-level Document](#56-top-level-document) + - 5.7 [Server-side render helper](#57-server-side-render-helper) +6. [Implementing the DOCX template generator](#6-implementing-the-docx-template-generator) + - 6.1 [The Jinja-in-DOCX strategy](#61-the-jinja-in-docx-strategy) + - 6.2 [The 3-row table loop trick](#62-the-3-row-table-loop-trick) + - 6.3 [Multi-section layout](#63-multi-section-layout) + - 6.4 [Building paragraphs and tables](#64-building-paragraphs-and-tables) +7. [Layout calibration](#7-layout-calibration-matching-the-standard) +8. [Verification workflow](#8-verification-workflow) +9. [Common modifications](#9-common-modifications) +10. [Troubleshooting](#10-troubleshooting) +11. [Porting to a different form](#11-porting-to-a-different-form) + +--- + +## 1. The problem we're solving + +The "Sáng kiến" application is a Vietnamese government form (Đại học Y Dược TP.HCM) that has six sections — a cover page (Trang bìa) plus Mẫu số 01–04 plus Bản cam kết. Every applicant fills out the same skeleton with their own data. + +Two real-world workflows need to be supported: + +1. **Programmatic PDF generation** — a web service receives JSON, returns a printable PDF. No human edits the file before printing. +2. **Word-based filling** — an admin opens a `.docx` template in Word, types into it (or uses `docxtpl`/`Carbone`/etc. to merge JSON), and prints. + +Both outputs must look identical to the official reference document (`Sang_kien_SOP_dong_vat`). The data shape (`data_blank.json`) is fixed by an existing system upstream and must not change. + +The trick is keeping the two generators in sync — same layout, same data fields — while staying within each format's idioms. + +--- + +## 2. Architecture overview + +``` + ┌────────────────────┐ + │ data.json │ ← source of truth (data_blank.json shape) + └──────────┬─────────┘ + │ + ┌────────────────┴────────────────┐ + ▼ ▼ + ┌──────────────────────┐ ┌─────────────────────────┐ + │ React-PDF pipeline │ │ docx + docxtpl path │ + │ │ │ │ + │ data → React tree │ │ build-docx-template.ts │ + │ → PDF buffer │ │ generates .docx with │ + │ │ │ {{ }} placeholders │ + │ │ │ ↓ │ + │ │ │ docxtpl.render(data) │ + │ │ │ → filled .docx │ + └──────────┬───────────┘ └────────────┬────────────┘ + │ │ + ▼ ▼ + filled.pdf filled.docx +``` + +The PDF path uses **runtime composition** — a React component receives data as props and returns a tree of ``/``/`` elements. The renderer turns that into a PDF buffer. + +The DOCX path uses **template-based composition** — a build script (`build-docx-template.ts`) produces a `.docx` file *once*, with placeholder strings like `{{ mau_01.mo_dau }}` baked into the document body. At runtime, `docxtpl` (Python) or any other Jinja-aware OOXML tool reads that `.docx`, finds the placeholders, and replaces them with values from the JSON. + +Both pipelines read **the same TypeScript types and JSON files**, so adding a new field requires touching both sides — but the field name lives in exactly one place: `src/types.ts`. + +--- + +## 3. Tech stack and rationale + +| Concern | Choice | Why | +|---|---|---| +| PDF rendering | `@react-pdf/renderer` v4 | Component-based, server- and browser-compatible. Uses Yoga for flexbox layout. Same API as React, so layouts compose like UI code. | +| Vietnamese font | `@expo-google-fonts/tinos` | Tinos is a metric-equivalent of Times New Roman (Apache 2.0) with the full Latin Extended Additional range — needed for `ư ơ ầ ậ ọ ặ` etc. The `@expo-google-fonts/*` packages ship actual `.ttf` files (most other font packages ship `.woff/.woff2`, which `@react-pdf/renderer` can't read). | +| DOCX generation | `docx` v9 (npm) | Object-model API: build paragraphs, tables, sections in TypeScript, then `Packer.toBuffer()` produces a valid `.docx`. Maintained, typed, stable. | +| Templating engine | `docxtpl` (Python) | The most popular Jinja-style DOCX templater. Recognizes `{{ var }}`, `{% if %}`, and crucially `{%tr for %}` for table-row loops. Compatible templates work in `docx-templates` (JS) and Carbone too. | +| TypeScript | 5.4 | Catches type errors at build time and gives autocompletion across all the data fields. | +| Test rendering | LibreOffice (`soffice`) | Used to convert `.docx` → `.pdf` so we can visually diff against the reference document. | + +**Why not a pure HTML-to-PDF approach (Puppeteer)?** It works, but bundle size is huge and rendering is non-deterministic across machines. React-PDF gives byte-stable output. + +**Why not just generate the DOCX and convert it to PDF?** That would solve the layout-sync problem but couples PDF generation to a heavy toolchain (LibreOffice). React-PDF runs in pure Node.js and works inside serverless environments. + +--- + +## 4. Project setup from scratch + +```bash +mkdir sang-kien-pdf && cd sang-kien-pdf +npm init -y + +# Runtime dependencies +npm install @react-pdf/renderer react @expo-google-fonts/tinos docx + +# Dev dependencies +npm install -D typescript ts-node @types/react @types/node +``` + +Create `tsconfig.json`: + +```json +{ + "compilerOptions": { + "target": "ES2020", + "module": "commonjs", + "lib": ["ES2020", "DOM"], + "jsx": "react", + "outDir": "./dist", + "rootDir": "./", + "strict": true, + "esModuleInterop": true, + "skipLibCheck": true, + "forceConsistentCasingInFileNames": true, + "declaration": true, + "declarationMap": true, + "sourceMap": true, + "resolveJsonModule": true, + "moduleResolution": "node" + }, + "include": ["src/**/*", "example/**/*", "tools/**/*"], + "exclude": ["node_modules", "dist"] +} +``` + +The `jsx: "react"` setting matters — React-PDF uses real JSX, not the new transform. + +Add scripts to `package.json`: + +```json +{ + "scripts": { + "build": "tsc", + "generate": "ts-node example/generate-example.ts", + "generate:blank": "ts-node example/generate-example.ts --blank", + "build:docx": "ts-node tools/build-docx-template.ts" + } +} +``` + +--- + +## 5. Implementing the PDF generator + +### 5.1 TypeScript data types + +Start with the data shape. Every field in the JSON gets a strict TypeScript interface in `src/types.ts`. This is the single source of truth — every page component reads it, every change ripples out through the type system. + +```ts +// src/types.ts +export interface NgayKy { + ngay: string; + thang: string; + nam: string; +} + +export interface TrangBia { + ten_sang_kien: string; + tac_gia: string; + don_vi: string; + thong_tin_lien_he: string; + nam: string; +} + +export interface Mau01ApplyRow { + tt: string; + ten_to_chuc: string; + dia_chi: string; + linh_vuc: string; +} + +export interface Mau01HieuQua { + loi_ich_kinh_te: string; + hieu_qua_giang_day: string; + // … 8 more fields +} + +export interface Mau01 { + mo_dau: string; + ten_sang_kien: string; + // … + danh_sach_ap_dung: Mau01ApplyRow[]; + tinh_hieu_qua: Mau01HieuQua; + ngay_ky: NgayKy; + // … +} + +// … repeat for Mau02, Mau03, Mau04, BanCamKet + +export interface SangKienData { + trang_bia: TrangBia; + mau_01: Mau01; + mau_02: Mau02; + mau_03: Mau03; + mau_04: Mau04; + ban_cam_ket: BanCamKet; +} +``` + +Two design choices worth calling out: + +**All fields are strings (or string arrays).** Even numbers like "Tỷ lệ %" are strings. The form is for humans, not databases — values get rendered verbatim, and string-only types let users write `"15%"` or `"khoảng 15"` without coercion errors. + +**Array-shaped tables.** `danh_sach_tac_gia` is `Mau02AuthorRow[]`, not a fixed-size tuple. The page components iterate with `.map()`, and the DOCX template uses a `{%tr for %}` loop. Both handle 0, 1, or 100 rows. + +### 5.2 Font registration + +`@react-pdf/renderer` ships with three fonts (Helvetica, Times-Roman, Courier) and **none of them include Vietnamese glyphs**. If you skip this step, characters like `ư ơ ầ ậ` will render as blank space. + +```ts +// src/fonts.ts +import { Font } from "@react-pdf/renderer"; + +let registered = false; + +export function registerFonts(): void { + if (registered) return; + + const regular = require.resolve( + "@expo-google-fonts/tinos/400Regular/Tinos_400Regular.ttf" + ); + const italic = require.resolve( + "@expo-google-fonts/tinos/400Regular_Italic/Tinos_400Regular_Italic.ttf" + ); + const bold = require.resolve( + "@expo-google-fonts/tinos/700Bold/Tinos_700Bold.ttf" + ); + const boldItalic = require.resolve( + "@expo-google-fonts/tinos/700Bold_Italic/Tinos_700Bold_Italic.ttf" + ); + + Font.register({ + family: "TimesVN", + fonts: [ + { src: regular }, + { src: italic, fontStyle: "italic" }, + { src: bold, fontWeight: "bold" }, + { src: boldItalic, fontWeight: "bold", fontStyle: "italic" }, + ], + }); + + Font.registerHyphenationCallback((word) => [word]); + registered = true; +} +``` + +Three things happen here: + +1. **`require.resolve()` finds the TTF on disk** — this works in Node and bundlers like Webpack/Vite turn it into an asset URL automatically. +2. **One family, four variants** — `fontWeight` and `fontStyle` keys let `` resolve to the bold TTF. +3. **Hyphenation callback returns `[word]`** — this disables React-PDF's default English hyphenator, which would chop Vietnamese words at random points. + +The `registered` boolean guards against re-registration if `registerFonts()` is called from multiple entry points. + +### 5.3 Shared styles + +`StyleSheet.create()` in `src/styles.ts` defines reusable style objects. Three categories matter: + +**Page-level constants.** A4 with ~2.5 cm margins: + +```ts +page: { + fontFamily: FONT, // "TimesVN" + fontSize: 13, // 13pt body + paddingTop: 71, // ~2.5cm = 71pt + paddingBottom: 71, + paddingLeft: 71, + paddingRight: 71, + lineHeight: 1.25, +}, +``` + +**Paragraph variants** for the three contexts that come up: + +```ts +// Indented body text (justified, first-line indent ~1cm) +paragraph: { textAlign: "justify", textIndent: 28, marginBottom: 0 }, + +// Flush-left lines (section labels, inline list items) +paragraphFlush: { textAlign: "justify", marginBottom: 0 }, + +// Section headings (flush-left, with breathing room above) +sectionHead: { textAlign: "justify", marginBottom: 0, marginTop: 4 }, +``` + +The `marginBottom: 0` is deliberate — Vietnamese government documents are visually dense, so paragraphs only get spacing between sections, not between adjacent lines. + +**Component primitives** (table, checkbox, signature columns): + +```ts +table: { + flexDirection: "column", + borderWidth: 1, borderColor: "#000", + borderRightWidth: 0, borderBottomWidth: 0, // we draw R+B per-cell + marginVertical: 4, +}, +tableCell: { + borderRightWidth: 1, borderBottomWidth: 1, borderColor: "#000", + padding: 4, +}, +``` + +The "outer border drawn on the table, inner borders drawn per-cell" pattern avoids double-thickness lines where cells meet. + +**Cover-specific styles** are isolated in their own group because the cover page has unique requirements (page border via `position: absolute`, "Mẫu số 01" badge in the top corner). + +### 5.4 Reusable components + +`src/components.tsx` factors out the patterns that show up on multiple pages: + +**`label`** — a horizontal row with a bordered square. When `checked`, an inner filled `` appears inside it. We don't use the Unicode `☑` character because Tinos doesn't include it; drawing geometry is font-independent. + +```tsx +export const Checkbox: React.FC = ({ checked, children }) => ( + + + {checked ? : null} + + {children} + +); +``` + +**Header variants** — three different two-column header patterns appear in the document: + +- `` — "BỘ Y TẾ / ĐẠI HỌC Y DƯỢC" left, "CỘNG HÒA…" right (Mẫu 03/04) +- `` — drops "BỘ Y TẾ", shows the unit name in bold (Mẫu 02) +- `` — only the right column (Bản cam kết) + +Each one uses the same `flexDirection: "row"` layout with two equal columns. The differences are which lines appear. + +**Table primitives.** + +```tsx + + + STT + Họ và tên + {/* … */} + + {data.danh_sach_tac_gia.map((row, i) => ( + + {row.stt} + {row.ho_ten} + {/* … */} + + ))} +
+``` + +The `width` prop is a **percentage** (the cell renders with `width: ${width}%`). Column widths must sum to 100. The `Cell` component automatically wraps string children in `` so callers can pass either plain text or nested elements. + +**``** renders the recurring "TP. Hồ Chí Minh, ngày … tháng … năm …" line, with sensible blank-data placeholders (`.....`). + +**``** renders one column of a two-column signature block (centered title, italic subtitle, then a 50pt vertical gap before the bold signer's name). + +### 5.5 Page components + +Each section of the form gets its own component file in `src/pages/`. They all follow the same shape: + +```tsx +// src/pages/Mau01.tsx +import { Page, View, Text } from "@react-pdf/renderer"; +import { styles } from "../styles"; +import { Mau01 } from "../types"; +import { Table, Row, Cell, DateLine } from "../components"; + +interface Props { + data: Mau01; + donVi: string; // pulled from mau_02.don_vi by the parent +} + +export const Mau01Page: React.FC = ({ data, donVi }) => ( + + BÁO CÁO MÔ TẢ SÁNG KIẾN + + + 1. Mở đầu{" "} + + (Giới thiệu về những vấn đề liên quan đến sáng kiến…): + + + {data.mo_dau} + + {/* … rest of the page */} + +); +``` + +Three patterns recur in every page: + +1. **Static + dynamic mixed in the same ``.** Section labels like "1. Mở đầu" are fixed, but the italic instructional helper text and the data value next to them aren't. We use nested `` to apply different styles to different runs in one paragraph (because `` in React-PDF can contain other `` nodes, like `` in HTML). + +2. **`{" "}` for explicit whitespace.** JSX collapses whitespace between elements. To preserve a space between a label and an italic helper, we explicitly insert `{" "}`. + +3. **Default-empty rows for tables.** When `data.danh_sach_ap_dung` is empty, we still want one blank row to render so the printed form has a place to write. The pattern: + ```tsx + {(data.danh_sach_ap_dung && data.danh_sach_ap_dung.length > 0 + ? data.danh_sach_ap_dung + : [{ tt: "", ten_to_chuc: "", dia_chi: "", linh_vuc: "" }] + ).map((row, i) => /* ... */)} + ``` + +**Signature block on Mẫu 01 takes `donVi` as a prop**, not from `data` directly. The reason: the standard layout uses the unit name from Mẫu 02 (`mau_02.don_vi`) on Mẫu 01's signature line. Rather than duplicate the value in the JSON, the parent component (`SangKienDocument`) reads it from `mau_02` and passes it down. + +**Cover page is special.** It uses absolute positioning to put the page border around the entire content area: + +```tsx + + Mẫu số 01 + + + {/* header, title, fields, footer */} + + +``` + +`` tells React-PDF to render the border on every page in this section (irrelevant here since the cover is one page, but harmless), and `position: absolute` (set in `styles.coverBorder`) makes it overlay the whole page. + +### 5.6 Top-level Document + +`src/SangKienDocument.tsx` composes all six pages: + +```tsx +export const SangKienDocument: React.FC<{ data: SangKienData }> = ({ data }) => { + registerFonts(); + const donVi = data.mau_02.don_vi || data.trang_bia.don_vi; + + return ( + + + + + + + + + ); +}; +``` + +`registerFonts()` is idempotent (the internal `registered` flag guards against duplicate registration), so calling it from the top-level component is safe. + +The `` element accepts metadata that shows up in the PDF's title bar — `title`, `author`, `subject`, `creator`, `producer`, `keywords`. These don't affect rendering, just file properties. + +### 5.7 Server-side render helper + +`src/generate.tsx` wraps the React rendering in a Node-friendly Promise: + +```tsx +import { pdf } from "@react-pdf/renderer"; + +export async function renderSangKienPdf(data: SangKienData): Promise { + const instance = pdf(); + const blob = await instance.toBlob(); + const arrayBuffer = await blob.arrayBuffer(); + return Buffer.from(arrayBuffer); +} + +export async function renderSangKienPdfFromFile( + inputJsonPath: string, + outputPdfPath: string +): Promise { + const data = JSON.parse(fs.readFileSync(inputJsonPath, "utf-8")) as SangKienData; + const buffer = await renderSangKienPdf(data); + fs.mkdirSync(path.dirname(outputPdfPath), { recursive: true }); + fs.writeFileSync(outputPdfPath, buffer); +} +``` + +`pdf(...).toBlob()` is the cleanest async API even on the server — the `Buffer.from(await blob.arrayBuffer())` conversion is one line. + +`example/generate-example.ts` is a thin CLI on top: + +```ts +const useBlank = process.argv.includes("--blank"); +const inputPath = useBlank + ? path.join(__dirname, "data-blank.json") + : path.join(__dirname, "sample-data.json"); +const outputPath = path.join(__dirname, "..", "out", `sang-kien-${useBlank ? "blank" : "filled"}.pdf`); + +await renderSangKienPdfFromFile(inputPath, outputPath); +``` + +--- + +## 6. Implementing the DOCX template generator + +### 6.1 The Jinja-in-DOCX strategy + +`docxtpl` works by storing Jinja-style strings *as ordinary text* inside the DOCX, then doing template expansion at render time. The build script's job is to produce a `.docx` whose visible text reads: + +> **Tên sáng kiến (Tiếng Việt):** {{ trang_bia.ten_sang_kien }} + +When you open this in Word, you literally see those curly braces. When `docxtpl` opens it, it walks the OOXML tree, finds runs containing `{{ ... }}`, and replaces them. + +**The catch: text runs split across formatting changes.** If you write `Tên sáng kiến (Tiếng Việt): {{ trang_bia.ten_sang_kien }}` in one run, that's fine. But if you bold "Tên sáng kiến" and leave `{{ … }}` regular, Word stores them as **two separate runs**. A naive search for `{{` in the second run works — but if you split a placeholder *inside* the curly braces (`{{ trang_bia.` in one run, `ten_sang_kien }}` in another), `docxtpl` will fail silently. So: + +> **Rule:** every placeholder must live entirely inside one continuous run with one set of formatting. + +The `docx` library makes this easy — when you write `r("{{ mau_01.mo_dau }}")`, that's exactly one `` element with one `` inside. + +### 6.2 The 3-row table loop trick + +For repeating table rows, `docxtpl` uses a special syntax: `{%tr for item in collection %}` and `{%tr endfor %}`. The `tr` prefix tells the engine "remove the entire `` row containing this tag and use the rows between `for` and `endfor` as the loop body." + +A naive single-row pattern doesn't work: + +``` +[ {%tr for x in items %} {{ x.id }} | {{ x.name }} {%tr endfor %} ] +``` + +Because `{%tr for %}` and `{%tr endfor %}` must be in the **same row** (they're stripped together) — and Jinja then sees two opening tags with no body. + +The reliable pattern is **three rows**: + +``` +Row 1: | {%tr for item in collection %} | (empty cells) | +Row 2: | {{ item.id }} | {{ item.name }} | ← duplicated per item +Row 3: | {%tr endfor %} | (empty cells) | +``` + +Row 1 and Row 3 get stripped. Row 2 gets repeated for each item. The data row carries the actual `{{ }}` fields. + +In code: + +```ts +const aw = [6, 22, 14, 16, 14, 14, 14]; // column widths + +const emptyRow_aw = (firstText: string) => { + const cells: TableCell[] = []; + for (let i = 0; i < aw.length; i++) { + cells.push(new TableCell({ + borders: allThinBorders, + width: { size: aw[i] * 100, type: WidthType.PERCENTAGE }, + children: [new Paragraph({ children: [r(i === 0 ? firstText : " ")] })], + })); + } + return cells; +}; + +new Table({ + rows: [ + new TableRow({ children: [/* header cells */] }), + new TableRow({ children: emptyRow_aw("{%tr for item in mau_02.danh_sach_tac_gia %}") }), + new TableRow({ children: [ + dataCell("{{ item.stt }}", aw[0], AlignmentType.CENTER), + dataCell("{{ item.ho_ten }}", aw[1]), + // … 5 more + ]}), + new TableRow({ children: emptyRow_aw("{%tr endfor %}") }), + ], +}); +``` + +The `emptyRow_aw` helper builds a row where the first cell contains the loop tag and the rest are blanks (just `" "`). After `docxtpl` strips it, the visible table has one header row plus one data row per item. + +### 6.3 Multi-section layout + +Word documents are split into **sections**, each with its own page settings — margins, orientation, page borders, headers, footers. The cover page needs: + +- A **page border** (rounded rectangle around the content area) +- A **header** containing "Mẫu số 01" at the top right *outside* the border + +The rest of the document needs: + +- **No** page border +- **No** "Mẫu số 01" header (it's only on the cover) + +In `docx` v9, this is two sections in the same document: + +```ts +new Document({ + sections: [ + { + properties: { + page: { + size: { width: 11906, height: 16838, orientation: PageOrientation.PORTRAIT }, + margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 }, + borders: { + pageBorderTop: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 }, + pageBorderBottom: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 }, + pageBorderLeft: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 }, + pageBorderRight: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 }, + }, + }, + }, + headers: { default: coverHeader }, // contains "Mẫu số 01" + children: buildCoverPage(), + }, + { + properties: { + page: { size: {/*…*/}, margin: {/*…*/} /* no borders */ }, + }, + // Explicit empty header so the cover header doesn't leak onto subsequent pages + headers: { default: new Header({ children: [new Paragraph({ children: [r("")] })] }) }, + children: [ + ...buildMau01(), + ...buildMau02(), + ...buildMau03(), + ...buildMau04(), + ...buildBanCamKet(), + ], + }, + ], +}); +``` + +Two gotchas worth noting: + +**Twips, not points.** `docx` uses twips (1/1440 inch). Multiply pt by 20 to get twips: +- A4 = 11906 × 16838 twips +- 1 inch margin = 1440 twips +- 1 cm = 567 twips + +**Headers leak across sections.** If section 2 doesn't define `headers`, it inherits section 1's. We have to provide an explicit empty `Header` to prevent the "Mẫu số 01" text from showing up on every page of the document. + +### 6.4 Building paragraphs and tables + +The build script defines small helper functions to keep the body code readable: + +```ts +const FONT = "Times New Roman"; +const SIZE = 26; // 13pt (docx-js uses half-points) +const SIZE_HEADING = 28; // 14pt + +function r(text: string, opts: { bold?: boolean; italic?: boolean; underline?: boolean; size?: number } = {}) { + return new TextRun({ + text, + font: FONT, + size: opts.size ?? SIZE, + bold: opts.bold, + italics: opts.italic, + underline: opts.underline ? { type: UnderlineType.SINGLE } : undefined, + }); +} + +function bodyP(children: TextRun[], opts: { indent?: boolean } = {}) { + return new Paragraph({ + children, + alignment: AlignmentType.JUSTIFIED, + indent: opts.indent ? { firstLine: 567 } : undefined, + spacing: { before: 0, after: 0, line: 300 }, + }); +} + +function flushP(children: TextRun[], opts: { spaceBefore?: number } = {}) { + return new Paragraph({ + children, + alignment: AlignmentType.JUSTIFIED, + spacing: { before: opts.spaceBefore ?? 0, after: 0, line: 300 }, + }); +} + +function centerP(children: TextRun[], opts: { spaceBefore?: number; spaceAfter?: number } = {}) { + return new Paragraph({ + children, + alignment: AlignmentType.CENTER, + spacing: { before: opts.spaceBefore ?? 0, after: opts.spaceAfter ?? 0, line: 300 }, + }); +} +``` + +A typical section then reads naturally: + +```ts +out.push(centerP([r("BÁO CÁO MÔ TẢ SÁNG KIẾN", { bold: true, size: SIZE_HEADING })])); + +out.push(flushP([ + r("1. Mở đầu "), + r("(Giới thiệu về những vấn đề liên quan…):", { italic: true }), +])); +out.push(bodyP([r("{{ mau_01.mo_dau }}")], { indent: true })); +``` + +For checkboxes, since the templating engine has to choose which character to render, we embed the choice in the placeholder itself: + +```ts +const checkbox = (cond: string, label: string) => + flushP([ + r(`{% if ${cond} %}`), + r("☑"), + r("{% else %}"), + r("☐"), + r("{% endif %} "), + r(label), + ]); + +out.push(checkbox( + "mau_02.phan_loai.giai_phap_ky_thuat", + "Giải pháp kỹ thuật, quản lý, tác nghiệp, ứng dụng tiến bộ kỹ thuật áp dụng cho Đại học Y Dược TP.HCM" +)); +``` + +After `docxtpl` runs, this paragraph reduces to `☑ Giải pháp kỹ thuật…` or `☐ Giải pháp kỹ thuật…` depending on the boolean. (For DOCX rendering in Word, the `☑/☐` characters work fine because Word falls back to a Unicode-capable font automatically — unlike React-PDF.) + +--- + +## 7. Layout calibration (matching the standard) + +The "Sang_kien_SOP_dong_vat" reference document defines a specific visual style. Here's a checklist of the calibrations applied to both generators: + +| Aspect | Rule | Where it lives | +|---|---|---| +| Body font | Times New Roman (or Tinos) 13pt | `styles.page.fontSize`, `r()` `SIZE = 26` | +| Page margins | 2.5 cm all around | `padding: 71` (PDF), `margin: 1440` (DOCX) | +| Body line height | 1.25 | `lineHeight: 1.25` (PDF), `line: 300` (DOCX, 240 = single, 300 ≈ 1.25) | +| First-line indent | ~1 cm on body paragraphs | `textIndent: 28` (PDF), `firstLine: 567` (DOCX) | +| Section numbers (`1.`, `2.`, `4.1`) | **NOT bold**; italic instructions in parens | Use `paragraphFlush` not bold | +| Inter-paragraph spacing | None within a section, small gap before new section | `marginBottom: 0`, `sectionHead.marginTop: 4` | +| Cover page | Page border (rounded rect), "Mẫu số 01" outside top-right | Cover-specific styles, dedicated section in DOCX | +| Cover divider | `=====***=====` (literal) | Hardcoded string | +| Cover info fields | Left-aligned, **bold label**, regular value | `coverField` style | +| Two-column header | "ĐƠN VỊ" or "BỘ Y TẾ" left, "CỘNG HÒA" right | `TopHeaderBoYTe`, `TopHeaderDonVi`, `TopHeaderCongHoa` | +| "Độc lập – Tự do – Hạnh phúc" | Underlined, bold | `underline: true` flag in `r()`/styles | +| Tables | Single thin black border, no shaded header | `borderWidth: 1`, no `backgroundColor` on `tableHeaderCell` | +| Mẫu 02 author table column 7 | Header includes parenthetical italic instruction | Custom `TableCell` with two centered paragraphs | +| Signature block | Two columns: "Xác nhận của lãnh đạo / [đơn vị]" left, "Đại diện nhóm tác giả sáng kiến" right | `` (PDF), borderless 2-cell table (DOCX) | +| Mẫu 03 totals row | TỔNG (cols 1–3 merged) ‖ 100 ‖ blank | `columnSpan: 3` in DOCX, manual width sum in PDF | +| Mẫu 04 evaluation rubric | Two scoring rows + total row at bottom | Static text + `{{ … }}` for nhận xét/điểm | + +When in doubt about a layout decision, open the reference DOCX in Word, click into the relevant element, and read its formatting from the ribbon. Mirror those settings in code. + +--- + +## 8. Verification workflow + +Visual diff against the reference is the only reliable way to know you got it right. The flow: + +```bash +# 1. Generate the candidate PDF +npm run generate + +# 2. Convert each page to JPEG +pdftoppm -jpeg -r 100 out/sang-kien-filled.pdf out/page + +# 3. Convert the reference DOCX to PDF and JPEGs the same way +soffice --headless --convert-to pdf reference.docx --outdir ref/ +pdftoppm -jpeg -r 100 ref/reference.pdf ref/ref-page + +# 4. Open them side by side +``` + +For the DOCX generator, add one more step: + +```bash +# Build the template +npm run build:docx + +# Render placeholders WITHOUT filling them — does the layout look right? +soffice --headless --convert-to pdf out/template_application_form.docx --outdir out/ + +# Fill it with sample data and render +python tools/fill-docx.py example/sample-data.json out/sang-kien-filled.docx +soffice --headless --convert-to pdf out/sang-kien-filled.docx --outdir out/ +``` + +Smoke test the DOCX template in Python before declaring victory: + +```python +# tools/test-docx-fill.py +from docxtpl import DocxTemplate +import json + +with open("example/sample-data.json", encoding="utf-8") as f: + data = json.load(f) + +doc = DocxTemplate("out/template_application_form.docx") +doc.render(data) +doc.save("out/template-filled-test.docx") +``` + +If `docxtpl` raises `TemplateSyntaxError: Encountered unknown tag 'endfor'`, you've put a `{%tr for %}` and `{%tr endfor %}` in the same row instead of separate rows. Go re-read [§6.2](#62-the-3-row-table-loop-trick). + +If a `{{ field }}` doesn't get replaced and you can still see the curly braces in the filled output, the placeholder got split across runs by Word's auto-formatting. Build the placeholder with one `r("{{ x }}")` call, not three. + +--- + +## 9. Common modifications + +### Adding a new field + +Say you need to add `mau_01.tong_kinh_phi` (total budget). + +1. **Update `src/types.ts`:** + ```ts + export interface Mau01 { + // … + tong_kinh_phi: string; // new + } + ``` + +2. **Update `example/data-blank.json`** and **`example/sample-data.json`** with the new field. + +3. **Render it in `src/pages/Mau01.tsx`:** + ```tsx + + 7. Tổng kinh phí: {data.tong_kinh_phi} + + ``` + +4. **Add it to the DOCX template generator** in `tools/build-docx-template.ts`: + ```ts + out.push(flushP([r("7. Tổng kinh phí: {{ mau_01.tong_kinh_phi }}")])); + ``` + +5. **Regenerate:** + ```bash + npm run generate + npm run build:docx + ``` + +The TypeScript compiler will yell if you forget to update the page component or miss a field in the JSON. + +### Changing a column width + +Column widths are kept as small integer arrays in the page component (PDF) and the build script (DOCX). They must always sum to 100. + +To widen the "Họ và tên" column on the Mẫu 02 author table from 22% to 28% (and shrink "Nơi công tác" from 16% to 10%): + +In `src/pages/Mau02.tsx`: +```ts +const AUTHOR_WIDTHS = [6, 28, 14, 10, 14, 14, 14] as const; // was [6, 22, 14, 16, …] +``` + +In `tools/build-docx-template.ts` (inside `buildMau02()`): +```ts +const aw = [6, 28, 14, 10, 14, 14, 14]; +``` + +Both numbers must match — there's no shared constant because the PDF widths are percentages of the page width (100% sum) while the DOCX widths happen to use the same convention but go through different code paths. Keeping them in sync is a manual discipline. + +### Adding a new repeating table + +Both the data shape, the page component, and the DOCX template need updates: + +1. **Type:** add `Mau01NewRow[]` to `Mau01`, define `interface Mau01NewRow { … }`. + +2. **PDF page:** mirror the existing pattern in `src/pages/Mau01.tsx`: + ```tsx + + + TT + {/* … */} + + {(data.danh_sach_moi && data.danh_sach_moi.length > 0 + ? data.danh_sach_moi + : [{ tt: "", ... }] + ).map((row, i) => ( + + {row.tt} + {/* … */} + + ))} +
+ ``` + +3. **DOCX template:** use the 3-row pattern from [§6.2](#62-the-3-row-table-loop-trick): + ```ts + const w = [10, 30, 30, 30]; + const emptyRow = (firstText: string) => /* same helper pattern */; + + new Table({ + rows: [ + new TableRow({ children: [headerCell("TT", w[0]), /* … */] }), + new TableRow({ children: emptyRow("{%tr for item in mau_01.danh_sach_moi %}") }), + new TableRow({ children: [dataCell("{{ item.tt }}", w[0], AlignmentType.CENTER), /* … */] }), + new TableRow({ children: emptyRow("{%tr endfor %}") }), + ], + }); + ``` + +### Switching to your organization's font + +Replace the four TTF paths in `src/fonts.ts`: + +```ts +Font.register({ + family: "TimesVN", + fonts: [ + { src: "/path/to/your/Regular.ttf" }, + { src: "/path/to/your/Italic.ttf", fontStyle: "italic" }, + { src: "/path/to/your/Bold.ttf", fontWeight: "bold" }, + { src: "/path/to/your/BoldItalic.ttf", fontWeight: "bold", fontStyle: "italic" }, + ], +}); +``` + +For the DOCX side, change `const FONT = "Times New Roman"` in `tools/build-docx-template.ts` to whatever font you want to embed. Word will fall back to a system font if the named font isn't installed on the reader's machine, so prefer common names (Times New Roman, Arial, Calibri). + +--- + +## 10. Troubleshooting + +**PDF renders blank squares where Vietnamese characters should be.** +The font isn't registered or the registered font lacks Vietnamese glyphs. Check that `registerFonts()` is called and that the TTFs at the resolved paths are actually loaded (not 404 / missing). Tinos has the right glyph coverage; many "Times New Roman clones" don't. + +**`Error: Failed to fetch font from https://…`** +You're hitting `@react-pdf/renderer`'s URL-based font loading and your environment can't reach the URL. Switch to local TTFs via `require.resolve()` (already what `src/fonts.ts` does). + +**`docxtpl` raises `TemplateSyntaxError: Encountered unknown tag 'endfor'`.** +You put the `{%tr for %}` and `{%tr endfor %}` tags in the *same* table row. Re-read [§6.2](#62-the-3-row-table-loop-trick) — they have to be on separate rows. + +**Some `{{ field }}` placeholders aren't being replaced.** +Word split your text run mid-placeholder. Make sure each placeholder is constructed with a single `r("{{ x }}")` call, not split across multiple `r()` calls or assembled from concatenated strings. + +**The DOCX has "Mẫu số 01" appearing on every page, not just the cover.** +The cover-section header is leaking into the next section. Add an explicit empty header to the second section: +```ts +headers: { default: new Header({ children: [new Paragraph({ children: [r("")] })] }) }, +``` + +**Tables overflow the right margin.** +Column width percentages don't sum to exactly 100, or a single cell has too much wide content with no wrap point. Either fix the widths or add `wordBreak: "break-word"` to the cell style. + +**`textIndent` doesn't seem to work in ``.** +React-PDF's `textIndent` only takes effect when the `` *itself* has `display: "block"`-like behavior — i.e. it's a top-level paragraph, not nested inside another ``. If you're nesting, wrap the inner content in a parent `` that has the indent style. + +**The DOCX page border doesn't appear.** +Page borders are a Word feature configured in section properties. Check that you've set all four (`pageBorderTop/Bottom/Left/Right`), with non-zero `size` and a `space` value (24 puts them ~1.7cm from the edge in our setup). LibreOffice and Word may render them slightly differently — Word is the canonical view. + +**Filled DOCX has weird extra empty rows above each table.** +Those are the `{%tr for %}`/`{%tr endfor %}` rows that didn't get stripped — meaning the loop tags ended up in paragraphs *inside* a cell, not as standalone row text. Make sure the `firstText` in your `emptyRow_*()` helper is the entire cell content, not appended to other text. + +--- + +## 11. Porting to a different form + +The same pattern works for any structured government form. The migration steps: + +1. **Extract the data model.** Open the reference DOCX, list every blank line and every table column. Each becomes a field in `types.ts`. Repeating sections (lists of authors, lists of attachments) become arrays. + +2. **Identify the sections.** Most forms have a cover page plus N body sections. Each body section becomes a `` component plus a `buildSectionN()` function in the DOCX builder. + +3. **Catalog the visual primitives.** Headers, signature blocks, tables, checkboxes, date lines — write them once in `components.tsx` (PDF) and as helper functions (DOCX), then reuse. + +4. **Calibrate the styles.** Open the reference, measure margins, font, line spacing, and indent. Set them as constants. See [§7](#7-layout-calibration-matching-the-standard). + +5. **Render and diff.** Generate, convert to JPEG, line up against the reference. Iterate until they match. + +6. **Smoke-test the DOCX template** with `docxtpl`. If a placeholder doesn't fill, it's almost always run-splitting — fix by collapsing into one `r()` call. + +The most labor-intensive part is the visual calibration (step 4–5). Everything else is mechanical translation from "what the form looks like" to "code that produces the same thing." + +--- + +## Appendix: file-by-file inventory + +| File | Lines | Purpose | +|---|---:|---| +| `src/types.ts` | 177 | TypeScript interfaces matching `data_blank.json` | +| `src/fonts.ts` | 56 | Tinos font registration | +| `src/styles.ts` | 239 | Shared `StyleSheet.create()` styles | +| `src/components.tsx` | 156 | Reusable ``, ``, ``, header variants | +| `src/pages/CoverPage.tsx` | 64 | Trang bìa with page border | +| `src/pages/Mau01.tsx` | 172 | Báo cáo mô tả sáng kiến | +| `src/pages/Mau02.tsx` | 206 | Đơn đề nghị công nhận sáng kiến | +| `src/pages/Mau03.tsx` | 82 | Bản xác nhận tỷ lệ đóng góp | +| `src/pages/Mau04.tsx` | 94 | Phiếu đánh giá sáng kiến | +| `src/pages/BanCamKet.tsx` | 119 | Bản cam kết | +| `src/SangKienDocument.tsx` | 43 | Top-level `` composing all pages | +| `src/generate.tsx` | 37 | `renderSangKienPdf(data)` server-side helper | +| `src/index.ts` | 5 | Public API barrel | +| `tools/build-docx-template.ts` | 1301 | Generates the Jinja-style DOCX template | +| `tools/fill-docx.py` | ~30 | CLI to fill a template with JSON data via `docxtpl` | +| `tools/test-docx-fill.py` | ~25 | Smoke test script | +| `example/generate-example.ts` | ~35 | CLI for the PDF pipeline | +| `example/sample-data.json` | — | Realistic filled-in example | +| `example/data-blank.json` | — | All-empty template instance | + +Total: about **2750 lines** of TypeScript + ~50 lines of Python. The DOCX generator is the largest single file because every static line of body text is a `out.push(flushP([r("…")]))` call, but the pattern is repetitive and easy to skim. diff --git a/docs/PDF_converter.md b/docs/PDF_converter.md new file mode 100644 index 0000000..3aa79d4 --- /dev/null +++ b/docs/PDF_converter.md @@ -0,0 +1,363 @@ +# Specification: Browser-Based DOCX-to-PDF Converter + +**Status:** Ready for implementation +**Audience:** Frontend engineer (React + TypeScript) +**Estimated effort:** 1–2 days for a working component, +1 day for polish and tests + +--- + +## 1. Overview + +This document specifies a React component, `DocxToPdfViewer`, that accepts a `.docx` file in the browser, renders it on screen with layout fidelity equivalent to Microsoft Word, and produces a downloadable PDF that matches the rendering page-for-page. The component runs entirely in the browser; no document content ever leaves the user's machine. + +The component is intended for use cases where users need to view a Word document and obtain a PDF copy without installing Word, opening a desktop converter, or trusting a third-party cloud service. Typical scenarios include legal forms, application packets, internal templates, and document submission flows where PDF is the required output format. + +## 2. Goals and Non-Goals + +### 2.1 Goals + +The component must preserve the document's page size, margins, fonts (where embedded or system-available), paragraph alignment, tables, inline and floating images, headers, footers, footnotes, bullet and numbered lists, and basic text formatting (bold, italic, underline, color, size). It must correctly render documents containing non-Latin scripts, with Vietnamese diacritics, CJK characters, and right-to-left scripts as concrete test cases. It must work on the current versions of Chromium-based browsers, Firefox, and Safari without server assistance. It must expose a clear TypeScript API and emit lifecycle events suitable for integration into larger applications. + +### 2.2 Non-Goals + +The output PDF is **rasterised**: each page is a JPEG image embedded in a PDF page of matching dimensions. Text in the output is therefore not selectable or searchable. If selectable text is required, the implementer should use a server-side converter (LibreOffice headless, Aspose, or a paid API) instead — this is documented in Section 12. + +The component does not edit, sign, redact, fill forms in, or otherwise modify the source document. It does not support `.doc` (legacy binary format); callers must convert to `.docx` upstream. It does not attempt to be a general-purpose Word viewer with comments, track changes, or revision history rendering; only the final accepted state is rendered. + +## 3. System Context + +The pipeline has three stages, executed in order: + +``` +┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ +│ .docx file │ -> │ docx-preview │ -> │ html2canvas │ -> │ jsPDF │ -> Blob +│ (Blob) │ │ (HTML) │ │ (Canvas[]) │ │ (PDF Blob) │ +└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ + │ + └─> visible to user as the on-screen preview +``` + +The rendered HTML serves a dual purpose: it is both the on-screen preview shown to the user *and* the source material from which the PDF is rasterised. There is no separate hidden render pass. This is a deliberate architectural choice; see Rule 3 in Section 7. + +## 4. Dependencies + +The implementation requires three runtime dependencies and their type definitions: + +| Package | Version | Purpose | +|---|---|---| +| `docx-preview` | `^0.3.5` | Parses `.docx` and renders to HTML with high layout fidelity. | +| `html2canvas` | `^1.4.1` | Rasterises a DOM subtree to an HTMLCanvasElement. | +| `jspdf` | `^2.5.1` | Assembles canvas images into a multi-page PDF. | + +`docx-preview` has a transitive runtime dependency on `jszip`, which it imports via its package; no direct install is required when bundling with npm. When loading via CDN, `jszip` must be loaded as a separate ` +``` + +For bundled React applications, install via npm; CDN choice does not apply. Verify each library's presence on `window` before the first conversion call and surface a clear error to the user if any failed to load. + +## 8. Error Handling and Edge Cases + +The implementation must handle the following scenarios gracefully: + +**Wrong file type.** When the user drops a `.pdf`, `.txt`, `.doc`, or any non-`.docx` file, the component shows an inline error message and does not enter the rendering stage. Validation is by file extension; MIME-type sniffing is unreliable across browsers. + +**Corrupted or malformed `.docx`.** `docx-preview` will throw during `renderAsync` if the file is not a valid OOXML package or contains unparseable XML. The error must be caught, the status set to `"error"`, and the error message surfaced to the user. The component must remain in a state where another file can be selected. + +**Empty document.** A valid `.docx` containing no content will produce an empty wrapper with no `
` elements. The implementation throws an explicit error rather than producing an empty PDF. + +**Images with restrictive CORS.** With `useBase64URL: true`, `docx-preview` inlines embedded images as data URLs and CORS does not apply. If the option is changed to `false`, externally hosted images will taint the canvas and cause `toDataURL` to throw a `SecurityError`. Do not change this option. + +**Very large documents.** Documents with more than ~50 pages may exhaust memory at `scale: 2` because each captured canvas is held in memory before being added to the PDF. For documents this large, the implementation should release each canvas (by setting its reference to null) immediately after `addImage` returns, and consider lowering `renderScale` to 1.5 when page count exceeds a threshold. + +**Mixed page orientations.** Documents that switch from portrait to landscape mid-flow are handled by the per-page dimension calculation in Section 6.4. Do not assume all pages share the first page's dimensions. + +**Rapid file changes.** If the user drops a second file while the first is still converting, the in-flight conversion must be cancelled or its results discarded. The simplest approach is to track an incrementing conversion ID; results from a non-current ID are ignored on completion. This is not strictly required for correctness — the second call will overwrite the first — but it prevents stale progress updates from confusing the status display. + +## 9. Performance Considerations + +For a typical 5-page A4 document, end-to-end conversion on mid-range 2024 hardware takes 1.5–3 seconds. The dominant cost is `html2canvas` capture, which scales roughly linearly with page count and quadratically with `renderScale`. The `docx-preview` rendering stage typically takes 100–300 ms regardless of page count. PDF assembly is negligible. + +Memory peaks during the capture loop, holding one canvas worth of pixels per page until added to the PDF. At `scale: 2` with US Letter pages, a single canvas is approximately 8 MB of RGBA data. A 20-page document briefly holds ~160 MB before garbage collection. + +Output PDF file sizes for a 5-page document at default settings are approximately 1.5–3 MB. Lowering `imageQuality` from 0.95 to 0.85 typically reduces output by 30% with no visible degradation; lowering below 0.80 introduces visible JPEG artifacts on text edges. + +## 10. Browser Support + +The component targets the current and one prior major version of Chrome, Edge, Firefox, and Safari. Internet Explorer is not supported. The relevant browser features are: + +- `File` and `FileReader` APIs (universal since 2014) +- `Blob` and `URL.createObjectURL` (universal since 2014) +- Canvas `toDataURL` with JPEG support (universal since 2012) +- ES2020 syntax targets in `tsconfig.json` + +`html2canvas` has known limitations rendering certain CSS features — `mix-blend-mode`, `backdrop-filter`, complex `clip-path` — that may affect documents using heavy graphical design. For Word documents this is rarely relevant; standard business documents do not invoke these features. + +## 11. Testing + +Implementations should be verified against the following test corpus: + +| Test document | Asserts | +|---|---| +| Plain prose, 3 pages, A4 | Basic flow; page count and dimensions match | +| Document with one table per page | Tables render with borders and cell shading | +| Mixed portrait and landscape sections | Each PDF page matches its source orientation | +| Document with embedded PNG and JPEG images | Images appear in correct positions | +| Vietnamese-language document with diacritics | All characters render; no missing glyphs | +| Document with header and footer including page numbers | Headers/footers appear on every page | +| Document with bulleted and numbered lists | List markers render with correct indentation | +| 30-page document | Memory does not exceed 500 MB during capture | +| Corrupted .docx (truncated zip) | Component shows error and remains usable | + +Beyond visual diffing of the rendered preview against the source `.docx` opened in Word, the captured PDF should be opened in a separate PDF reader (Acrobat, Preview, or Firefox's built-in viewer) to confirm that page dimensions, count, and rendered content match. Programmatic visual regression testing of the PDF output is beyond the scope of this spec but can be implemented using `pdf-parse` + `pixelmatch` if needed. + +## 12. Known Limitations and Alternatives + +The text in the output PDF is rasterised and therefore not selectable, searchable, copyable, or screen-readable. Users who need any of these properties — particularly accessibility for visually impaired users — must use a server-side converter that emits real PDF text objects. Recommended alternatives in decreasing order of fidelity and increasing order of cost: + +1. **LibreOffice headless** (`soffice --convert-to pdf`): free, self-hosted, very high fidelity, requires Linux server with LibreOffice installed. ~1–3 seconds per document. +2. **Aspose.Words Cloud or self-hosted**: paid, very high fidelity, native PDF text output, requires license. +3. **CloudConvert, ConvertAPI, or similar SaaS**: paid per-document, simple HTTP API, sends document contents to a third party. + +The HTML preview produced by `docx-preview` *is* accessible — screen readers can navigate it, text is selectable, and users can zoom — so the component's accessibility story is intact for users who don't need the PDF artifact itself. + +This component cannot edit, sign, redact, or annotate documents. For those features, evaluate `pdf-lib` (PDF mutation) or `docx` (DOCX generation, which is a different package than `docx-preview`). + +## 13. Appendix: Algorithm Pseudocode + +For reference, the complete conversion algorithm in 20 lines: + +``` +function convert(file, container): + clear container + await renderAsync(file, container, { + inWrapper: true, + breakPages: true, + useBase64URL: true, + experimental: true, + renderHeaders: true, renderFooters: true, renderFootnotes: true, + }) + await rAF; await sleep(50) + + pages = container.querySelectorAll("section.docx") || container.querySelectorAll("section") + if pages is empty: throw + + pdf = new jsPDF using pages[0] dimensions in mm + for each page in pages: + canvas = await html2canvas(page, scale=2, useCORS=true, bg=white) + if not first page: pdf.addPage(page dimensions) + pdf.addImage(canvas.toDataURL("image/jpeg", 0.95), 0, 0, w_mm, h_mm) + + return pdf.output("blob") +``` + +The pseudocode omits error handling, lifecycle management, and progress reporting, all of which are required in the production implementation per Sections 6.6 and 8. + +--- + +*End of specification.* diff --git a/docs/PDF_preview.md b/docs/PDF_preview.md new file mode 100644 index 0000000..f0bd463 --- /dev/null +++ b/docs/PDF_preview.md @@ -0,0 +1,313 @@ + + +# Applicant PDF / report preview — re-implementation guide + +This document describes how PDF and “draft” preview work in the DYD frontend and backend so you can reproduce behavior in another codebase or refactor safely. + +--- + +## 1. Two different things called “PDF” + +| Path | What it is | Layout fidelity | +|------|------------|-------------------| +| **Server PDF** | `GET /api/reports/{reportId}/export/pdf` | Same as **Xuất Word**, then LibreOffice **DOCX → PDF**. Official template layout. | +| **Client “draft” preview** | `PdfExportDialog` + `PrintableReport` + `html2canvas` + `jspdf` | HTML recreation; **not** pixel-equal to Word. Used when server PDF fails (typical: LibreOffice missing). | + +**Canonical document for official exports:** the filled `.docx`. The PDF is a **conversion**, not a separately maintained template. + +Reference implementation: + +- Dialog: [`fe0/src/components/PdfExportDialog.tsx`](../fe0/src/components/PdfExportDialog.tsx) +- Layout: [`fe0/src/components/PrintableReport.tsx`](../fe0/src/components/PrintableReport.tsx) +- Hooks: [`fe0/src/api/hooks.ts`](../fe0/src/api/hooks.ts) +- PDF handler: [`src/Backend/DYD.Application/Features/Reports/ExportReportPdf.cs`](../src/Backend/DYD.Application/Features/Reports/ExportReportPdf.cs) +- LibreOffice wrapper: [`src/Backend/DYD.Application/Common/Export/SofficeConverter.cs`](../src/Backend/DYD.Application/Common/Export/SofficeConverter.cs) + +--- + +## 2. Why server PDF matches DOCX (margins, text, spacing) + +[`ExportReportPdfHandler`](../src/Backend/DYD.Application/Features/Reports/ExportReportPdf.cs) runs: + +1. `ExportReportDocxQuery` → bytes of the filled report `.docx` (identical pipeline to **Xuất Word**). +2. `SofficeConverter.WordToPdfAsync(wordBytes)` → writes temp `.docx`, runs **LibreOffice headless** `--convert-to pdf`, reads `input.pdf`. + +So the PDF is **one layout engine pass** over the merged OpenXML file. Field text and structural spacing come from that document; you are not duplicating merge logic for PDF. + +**Caveats:** Font availability on the server, LibreOffice vs Microsoft Word subtle differences, and very long unbroken strings may change wrapping. For strict WYSIWYG with Word desktop, compare in both viewers. + +**Contrast:** The React `PrintableReport` path **does not** use the Word template—it renders its own HTML. Use it only as a **fallback preview**, not as a legal duplicate of the official form. + +--- + +## 3. User flows (where the preview opens) + +### 3.1 My Reports (`fe0/src/pages/MyReports.tsx`) + +1. User clicks **Xuất PDF** on a row. +2. App calls `GET /api/reports/{id}/export/pdf` and downloads the blob on success. +3. If the error message contains `LibreOffice` or `soffice`, it alerts and opens `PdfExportDialog` with `{ reportId, initiativeId, reportCode }` from the row (initiative id is set correctly). + +### 3.2 Dashboard Overview — Panel 2 (`fe0/src/pages/DashboardOverview.tsx`) + +1. **Xuất PDF** calls the backend download path only (`handleExportPdfBackend`). +2. On LibreOffice failure, fallback opens `PdfExportDialog` but currently passes **`initiativeId: ''`**. That disables `useInitiative`, so **Section I** of `PrintableReport` may show placeholders. **Fix when re-implementing:** pass `selectedReport.initiativeId`. + +### 3.3 Component names + +There is no symbol `ApplicantPreviewPanel`. The modal title is **“Xem trước PDF”** (`PdfExportDialog`). + +--- + +## 4. Data loading for the client preview + +`PdfExportDialog` renders `PrintableReport` and passes data from three React Query hooks (all under authenticated `apiFetch`): + +| Hook | Endpoint | Type (TS) | +|------|----------|-----------| +| `useInitiative(id)` | `GET /api/initiatives/{id}` | `InitiativeDetail` | +| `useReport(id)` | `GET /api/reports/{id}` | `Report` | +| `useDocumentsByReport(reportId)` | `GET /api/documents/by-report/{reportId}` | `DocumentListItem[]` | + +Hooks use `enabled: !!id`. Empty `initiativeId` skips initiative fetch. + +`DocumentListItem` includes optional **`content`** (JSON string) and **`summary`**. Each recognition document (Mẫu 01–04) stores form state as JSON in `content`. + +`DocumentType` enum: `Description = 1`, `Application = 2`, `ContributionRatio = 3`, `Evaluation = 4`. + +--- + +## 5. `PrintableReport` structure (fallback HTML) + +### 5.1 Root layout + +- `ref` on a single root `div` captured by `html2canvas`. +- Width **794px** (A4 at 96dpi), padding **40px**, `box-sizing: border-box`. +- Font: **Times New Roman**, 12px, line-height 1.5, black on white. +- Optional prop `schoolName` (default: `Đại học Y Dược TP. HCM`). + +### 5.2 Sections and data sources + +| Section | Source | Notes | +|---------|--------|------| +| Header (ministry, school, “BÁO CÁO SÁNG KIẾN”, export date) | Static + `new Date()` vi-VN | | +| Metadata grid (mã BC, mã SK, tiêu đề, đơn vị, trạng thái BC, dates) | `report`, `initiative` | Status via numeric map → Vietnamese label | +| **I.** Nội dung Sáng kiến | `initiative` | `shortSummary`, `description`, `objectives`, `scopeOfApplication`, `expectedOutcomes`, `startDate`/`endDate`, `estimatedBudget` | +| **II.** Kết quả thực tế | `report` | `actualOutcomes`, `actualBudget`, `implementationNotes`, `challenges`, `lessonsLearned` | +| **III.** Checklist 4 tài liệu | `documents` | Fixed order of four `DocumentType` values; codes, `DOCUMENT_STATUS_LABELS`, `approvalDate`; optional bullet list from `summary` | +| **IV.** Mẫu 01 | Parse `content` for `DocumentType.Description` → `DescriptionData` | Only if parsed object is truthy | +| **V.** Mẫu 02 | `DocumentType.Application` → `ApplicationData` | Authors/support staff tables, classification labels | +| **VI.** Mẫu 03 | `DocumentType.ContributionRatio` → `ContributionData` | Participants + total % row | +| **VII.** Mẫu 04 | `DocumentType.Evaluation` → `EvaluationData` | Scores table + total /100 | +| Footer | Static + date | | + +Empty string fields use helper `Paragraph` → gray italic “(chưa có nội dung)”. + +`pageBreakBefore` on IV–VII does **not** create real breaks in the **downloaded** PDF from html2canvas; see section 6. + +--- + +## 6. Download pipeline (`PdfExportDialog` → file) + +1. `await document.fonts.ready` if available (helps Vietnamese glyphs). +2. `html2canvas(ref, { scale: 2, backgroundColor: '#ffffff', useCORS: true, logging: false, windowWidth/Height: element scroll size })`. +3. `canvas.toDataURL('image/png')`. +4. `jsPDF` A4 portrait; image width = page width; height from aspect ratio. +5. **Multi-page:** repeatedly `addPage()` and `addImage` with a **negative Y offset** to slice one tall image across pages (`position = heightLeft - imgHeight`). **Limitation:** content can be cut mid-line or mid-table. + +**Output filename:** `{reportCode}_{YYYYMMDD}.pdf` (fallback codes: `report?.code` or `BaoCao`). + +**Dependencies:** `html2canvas`, `jspdf` (see `fe0/package.json`). + +--- + +## 7. JSON shapes (`document.content`) — examples + +Parse with `JSON.parse`; invalid JSON → section omitted (or empty). Shapes are defined in [`PrintableReport.tsx`](../fe0/src/components/PrintableReport.tsx) and should stay aligned with the four tab forms that persist `PUT /api/documents/{id}`. + +### 7.1 Mẫu 01 — `DescriptionData` (`DocumentType.Description`) + +```json +{ + "introduction": "Mở đầu...", + "initiativeName": "Tên sáng kiến", + "applicationField": "Lĩnh vực", + "currentStatus": "Tình trạng giải pháp đã biết", + "purpose": "Mục đích", + "solutionContent": "Nội dung giải pháp", + "implementationSteps": "Các bước thực hiện", + "conditions": "Điều kiện áp dụng", + "trialUnits": [ + { "id": 1, "name": "Đơn vị A", "address": "Địa chỉ", "field": "Lĩnh vực áp dụng" } + ], + "novelty": "Tính mới", + "effectiveness": { + "economic": "...", + "social": "...", + "teaching": "...", + "productivity": "...", + "quality": "...", + "environment": "...", + "safety": "..." + }, + "confidentialInfo": "...", + "submissionDate": "2026-01-15", + "authorName": "..." +} +``` + +### 7.2 Mẫu 02 — `ApplicationData` (`DocumentType.Application`) + +```json +{ + "unitName": "Đơn vị chủ quản", + "initiativeName": "Tên SK đề nghị", + "investorName": "Chủ đầu tư", + "applicationField": "Lĩnh vực", + "firstApplyDate": "2025-06-01", + "authors": [ + { + "id": 1, + "name": "Nguyễn Văn A", + "dob": "01/01/1980", + "workplace": "Khoa X", + "title": "PGS", + "qualification": "TS", + "contributionPercent": 60 + } + ], + "initiativeClassification": "technical", + "contentSummary": "Tóm tắt nội dung", + "confidentialInfo": "", + "conditions": "", + "authorEvaluation": "", + "trialEvaluation": "", + "supportStaff": [ + { + "id": 1, + "name": "Trợ lý", + "dob": "", + "workplace": "", + "title": "", + "qualification": "", + "supportContent": "Hỗ trợ hành chính" + } + ], + "submissionDay": 10, + "submissionMonth": 5, + "submissionYear": "2026" +} +``` + +Classification values: `technical` | `research` | `textbook` (mapped to long Vietnamese labels in UI). + +### 7.3 Mẫu 03 — `ContributionData` (`DocumentType.ContributionRatio`) + +```json +{ + "initiativeName": "Tên SK", + "mainAuthor": "Tác giả chính", + "position": "Trưởng khoa — Khoa X", + "representativePercent": 40, + "submissionDate": "2026-05-01", + "participants": [ + { "id": 1, "fullName": "Nguyễn B", "workUnit": "Khoa Y", "contributionPercent": 40 } + ], + "digitalSignatureConfirmed": true +} +``` + +Total row sums `participants[].contributionPercent` (display only; not validated here). + +### 7.4 Mẫu 04 — `EvaluationData` (`DocumentType.Evaluation`) + +```json +{ + "initiativeName": "Tên SK", + "authorName": "Tác giả", + "position": "Chức vụ", + "evaluationDate": "2026-05-11", + "noveltyLevel": "high", + "noveltyScore": 35, + "noveltyComment": "Nhận xét tính mới", + "effectivenessLevel": "medium", + "effectivenessScore": 45, + "effectivenessComment": "Nhận xét hiệu quả", + "conclusion": "Kết luận" +} +``` + +Levels: `high` | `medium` | `low`. Printed table shows shortened level text (split on ` (`). + +--- + +## 8. Field inventory (PrintableReport) — quick reference + +### 8.1 `InitiativeDetail` → Section I and header + +| UI label (approx.) | Field on `InitiativeDetail` | +|--------------------|----------------------------| +| Mã Sáng kiến | `code` | +| Tiêu đề SK | `title` | +| Đơn vị chủ trì | `owningUnitName` | +| Mô tả tóm tắt | `shortSummary` | +| Mô tả chi tiết | `description` | +| Mục tiêu | `objectives` | +| Phạm vi áp dụng | `scopeOfApplication` | +| Kết quả dự kiến | `expectedOutcomes` | +| Thời gian áp dụng | `startDate`, `endDate` (ISO) | +| Kinh phí dự toán | `estimatedBudget` | + +### 8.2 `Report` → Section II and header + +| UI label | Field on `Report` | +|----------|-------------------| +| Mã Báo cáo | `code` | +| Trạng thái BC | `status` (numeric → label map) | +| Ngày nộp BC | `submissionDate` | +| Ngày duyệt BC | `approvalDate` | +| Kết quả đạt được | `actualOutcomes` | +| Kinh phí thực tế | `actualBudget` | +| Ghi chú triển khai | `implementationNotes` | +| Khó khăn | `challenges` | +| Bài học | `lessonsLearned` | + +### 8.3 `DocumentListItem` → Section III + +| UI | Field | +|----|--------| +| Loại | `type` + `DOCUMENT_TYPE_LABELS` | +| Mã | `code` | +| Trạng thái | `status` + `DOCUMENT_STATUS_LABELS` | +| Ngày duyệt | `approvalDate` | +| Tóm tắt (bullets) | `summary` | + +### 8.4 JSON document sections + +See section 7 for keys. Every field in `DescriptionData`, `ApplicationData`, `ContributionData`, `EvaluationData` maps 1:1 to labels inside [`PrintableReport.tsx`](../fe0/src/components/PrintableReport.tsx) (search `Paragraph label=` and table headers). + +--- + +## 9. Re-implementation checklist + +- [ ] **Official PDF/Word:** Call the same APIs (`export/docx`, `export/pdf`) so template fidelity stays server-side. +- [ ] **Fallback modal:** Fetch initiative + report + documents; render a single scrollable column layout; optional: fix Dashboard Overview `initiativeId` on fallback. +- [ ] **Download:** Match `html2canvas` + `jspdf` multi-page slicing or replace with a service that returns vector PDF (e.g. server-only preview URL). +- [ ] **i18n:** Labels are hardcoded Vietnamese in `PrintableReport`. +- [ ] **Tests:** Fixture objects for `InitiativeDetail`, `Report`, four `DocumentListItem` records with sample `content` JSON; snapshot or visual regression on the root `div` dimensions. + +--- + +## 10. Embedded PDF preview (optional UX) + +If you need an in-app preview that **matches** the official PDF (as in a browser PDF viewer with page count): + +1. `GET /api/reports/{id}/export/pdf` → `blob` → `URL.createObjectURL(blob)`. +2. Use `