24 KiB
Application files: persistence, retrieval by applicationId, and backup notes
This document describes how the running initiative stack stores and loads:
- Evidence attachments (minh chứng 2.1 / 2.2 / kỹ thuật)
- The submitted full-package PDF (đơn + báo cáo from the « Xem lại » flow)
- The filled DOCX / official PDF derived from the Word template
It focuses on what PostgreSQL and MinIO hold. The root file database/schema.sql describes a separate integer applications domain (attachments table with application_id INT); that schema is not wired into be0 today. Production behavior is driven by be0/migrations/*.sql and INITIATIVE_DATABASE_URL.
Implementation planning: The phased backup and storage-hardening plan below is refined against the review in feedback-data-management.md (canonical bytes, storage_kind, SHA verification on pack, streaming ZIP + manifest, indexed IDs, evidence versioning, and sequencing).
Identifiers: what “applicationId” means
The UI and APIs expose a public submission id shaped like sub-{16 hex chars} (see save_submitted_application in be0/src/initiative_db/submissions.py). Internally, persistence is keyed by:
| Concept | Example | Where |
|---|---|---|
Public applicationId (list/detail) |
sub-abc123def4567890 |
drafts.payload.submissionRecord.id, API responses |
| Draft / case code | CASE-… or SUB-… |
initiatives.case_code, draft_case_id on API rows |
| Initiative primary key | UUID | initiatives.id, MinIO key prefix, application_artifacts.initiative_id |
Resolving a row: get_application_by_id (be0/src/initiative_db/submissions.py) scans submitted initiatives and matches when either:
_submission_display_id(initiative, submissionRecord) == applicationId, orinitiative.case_code == applicationId.
So admins can deep-link with sub-… or sometimes CASE-…. For backups, always persist initiatives.id, case_code, and sub-… together.
MinIO
Configured in Docker via S3_* env vars (docker-compose.yml):
| Bucket (env) | Purpose |
|---|---|
initiative-attachments (S3_BUCKET_ATTACHMENTS) |
Evidence uploads for Đơn (research / textbook / technical) |
initiative-exports (S3_BUCKET_EXPORTS) |
Optional copy of the submitted full PDF after successful submit |
initiative-quarantine (S3_BUCKET_QUARANTINE) |
Reserved for quarantine flows (not detailed here) |
Object key layout (be0/src/minio/storage.py):
- Evidence and export artifacts use
build_key_for_initiative:
initiatives/{initiative_uuid_no_hyphens}/{yyyy}/{mm}/{uuid}-{safe_filename}
The API uses the internal endpoint for the server (S3_ENDPOINT_URL, e.g. http://minio:9000) and S3_PUBLIC_ENDPOINT_URL for presigned URLs the browser can open (e.g. http://localhost:19000).
Integrity: uploads compute SHA-256 and store it in object metadata and/or Postgres (application_artifacts.sha256).
PostgreSQL (initiative database)
Core tables (be0/migrations/001_initiative_schema.sql, 002_application_storage_extensions.sql, plus review-doc extensions):
initiatives
id(UUID),case_code(unique text),owner_id,status,submitted_at, etc.- Submitted applications have
status != 'draft'(e.g.submitted).
drafts
payloadJSONB holds the live bundle: tab data,submissionRecord,submissionFile, etc.
After submit, important keys include:
payload.submissionRecord— metadata including publicid(sub-…)payload.submissionFile— e.g.{ "url": "/submitted-initiatives/sub-….pdf", "type": "pdf" }
application_artifacts
One row per (initiative_id, role) (002_application_storage_extensions.sql). Planned (Phase 1): add roles for the printable application form binaries (e.g. official_form_docx, official_form_pdf) — distinct from full_pdf (the client-uploaded full hồ sơ PDF).
role |
Meaning |
|---|---|
full_pdf |
Submitted package PDF — storage_uri is either a MinIO key (under exports bucket) or a relative URL to static files |
research_evidence |
Minh chứng 2.1 (nghiên cứu) |
textbook_evidence |
Minh chứng 2.2 (giáo trình) |
technical_evidence |
Minh chứng kỹ thuật (nhóm 1) |
Columns: storage_uri, original_name, mime_type, byte_size, sha256, uploaded_by, uploaded_at, plus review fields for evidence.
application_submit_snapshots
Append-only rows: merged tabs, submit metadata, and full_pdf_uri (today this records the URL passed at submit time, typically /submitted-initiatives/..., not necessarily the MinIO key).
Treat this table as historical audit of the submit request, not as the driver for backup byte locations: application_artifacts (and storage_kind once added) is the operational source of truth (feedback-data-management.md §8).
application_review_documents
Versioned JSON used to regenerate the Word template output:
official_bieu_mau,template_data,full_bundle(JSONB)- Tied to
initiative_idandcase_id
Today: the binary filled DOCX is not stored in MinIO; this table is the only server-side input to regeneration. Target (for a trustworthy admin backup): treat this JSON as supporting data (re-render, analytics, diffing). The canonical bytes for “what the applicant signed off on” for the printable mẫu should be immutable objects in MinIO plus rows in application_artifacts (see Implementation plan — Phase 1).
Other useful tables
draft_tab_snapshots— history of tab JSON (report/application/contribution)
Backend flows
Evidence upload & download
- POST
/api/v1/application-drafts/{case_id}/evidence— multipart upload; stores object ininitiative-attachments; upsertsapplication_artifactswith roleresearch_evidence|textbook_evidence|technical_evidence(be0/main.py). - GET
/api/v1/application-drafts/{case_id}/evidence— returns metadata plus presigneddownloadUrl/viewUrlfor staff or owner.
case_id is normalized to the initiative’s case_code (e.g. CASE-…).
Submit full PDF
- POST
/api/applications/submit— receives PDF + JSONmetadata(be0/main.py). - Always writes the file to
SUBMITTED_INITIATIVES_DIR(default: repoassets/submitted-initiativesorfe0/public/submitted-initiativesin dev), served under/submitted-initiatives/{sub-….pdf}. - If PostgreSQL is enabled:
save_submitted_applicationupdatesinitiatives/drafts, writesapplication_submit_snapshots,application_taxonomy,application_workflow, andupsert_artifact_full_pdf. - MinIO copy:
_maybe_upload_submitted_pdf_to_exports_miniouploads the same bytes toinitiative-exportsand, on success, setsapplication_artifacts.full_pdf.storage_urito the object key (not the/submitted-initiatives/...URL). If MinIO fails, the artifact still points at the filesystem URL only — this is slated to become a hard failure once canonical storage is enforced (Phase 2).
Filled DOCX / official PDF (preview; persistence plan)
- POST
/api/v1/docx/preview-application-form— renderstemplate_application_form.docxwith docxtpl; returns bytes (no DB/MinIO write today). - POST
/api/v1/docx/preview-application-form-pdf— same merge, then LibreOffice conversion to PDF; returns bytes.
The client builds officialBieuMau from draft state; persistReviewDocumentBundle (POST /api/v1/review-documents) saves the JSON bundle to application_review_documents.
Preview endpoints remain useful for staff “what-if” and for regenerating with newer templates. They must not be the only path that feeds the admin backup ZIP once Phase 1 is done — backups should stream stored printable DOCX/PDF bytes unless a legacy row has no stored object (then document explicit fallback or backfill).
Admin detail: presigned full PDF
For GET /api/applications/{application_id}, when full_pdf.storage_uri looks like a MinIO key (not /submitted-initiatives or http), _enrich_application_detail_full_pdf_presign adds files.fullText.viewUrl (presigned GET on initiative-exports).
Frontend
| Concern | Location |
|---|---|
| Submit PDF | fe0/src/components/applicant/submitInitiativePdf.ts → POST /api/applications/submit with FormData + JWT; metadata includes initiativeCaseId (must match Postgres case_code). |
| Draft load/save | fe0/src/components/applicant/applicationDrafts.ts — GET/POST /api/v1/application-drafts/.... |
| DOCX/PDF from template | fe0/src/lib/applicationFormDocxApi.ts → preview endpoints; ApplicationFormDocxPreview.tsx orchestrates save + review bundle persistence. |
| Evidence UI | e.g. ApplicationEvidenceManagePage.tsx — uses GET /api/v1/application-drafts/{caseId}/evidence with presigned URLs. |
| Admin list/detail | Uses GET /api/applications, GET list/detail with applicationId; detail exposes draft_case_id for loading drafts/evidence. |
Important: sub-… is the list id; draft/evidence APIs use case_code (CASE-…). The API surfaces draft_case_id on submission rows to bridge the two.
Applicant honesty checkboxes, complete tabs & PDF minh chứng (engineering guide)
Goal: applicants cannot tick the cam kết trung thực checkboxes at the end of Báo cáo, Đơn, and Xác nhận đóng góp until the workflow rules below are satisfied; the UI shows a Sonner toast listing missing items. PDF minh chứng means the classification-specific evidence file for Đơn (research / textbook / technical), stored in MinIO via POST /api/v1/application-drafts/{case_id}/evidence (see Evidence upload & download).
Intended behaviour (product)
| Control | When it may be ticked |
|---|---|
Báo cáo (InitiativeReportForm) |
All required fields on the report tab are non-empty (§1–§6 narrative + hiệu quả fields exposed in the UI). |
Đơn (InitiativeApplicationForm) |
All required Đơn fields are complete and the correct PDF minh chứng slot is filled for the chosen classification (local File, or FileHandle with serverStorageKey after MinIO upload). Sub-forms (bản cam kết / biểu xác nhận) must match the selected nhóm. |
Xác nhận đóng góp (ContributionConfirmationForm) |
Same checks as Đơn and Báo cáo, and the applicant has already ticked honesty on Báo cáo and Đơn. |
Xem lại — Gửi (ApplicationFormDocxPreview) |
Same as contribution gate plus contribution.digitalSignatureConfirmed in the persisted contribution JSON. |
Implementation reference:
- Shared validators + messages:
fe0/src/lib/applicantHonestyPrerequisites.ts(collectReportTabHonestyGaps,collectApplicationTabHonestyGaps,collectContributionDigitalSignaturePrerequisiteGaps,collectApplicantSubmitToAdminPrerequisiteGaps,formatApplicantPrerequisiteToastDescription). - Checkbox handlers toast with
toast.error(..., { description })and do not flip state when prerequisites fail.
Staff / council flows without DraftProvider skip the contribution-tab signature gate (no full draft in context); fields stay readOnly as today.
Frontend (detailed)
- Single source of truth for messages — Keep gap strings in
applicantHonestyPrerequisites.tsso DOCX preview and forms stay aligned. - Evidence PDF — Treat as present if
applicantEvidencePdfPresent(file)is true:Filewith non-zero size, orFileHandlewithserverStorageKey(MinIO) or positivesize(IndexedDB). Matches hydration inDraftContextaftergetApplicationEvidence(caseId). - Contribution tab — Uses
draft.reportanddraft.applicationfromDraftContext; authors/% totals are validated on Đơn; contribution UI mirrorsauthorswhen connected to Postgres drafts. - Review submit — Besides tab JSON, enforce contribution signature flag on the object passed into
ApplicationFormDocxPreview(fromdraftTabs.contribution).
Backend (recommended)
Today, gates are client-side only. For integrity:
POST /api/applications/submit— Implemented inbe0/src/initiative_db/submission_readiness.py, invoked fromsave_submitted_applicationbefore the initiative is marked submitted. Loads mergeddrafts.payload.tabs(with snapshot fallback), readsapplication_artifactsforresearch_evidence/textbook_evidence/technical_evidence(non-emptystorage_uri), and validates tab JSON + honesty flags to match the applicant UI. On failure: 400 withdetail: { "message": "…", "missing": ["…", …] }(seeApplicationSubmissionNotReadyErrorhandling inbe0/main.py). The client maps this infe0/src/components/applicant/submitInitiativePdf.ts. Partial PDF written on disk is removed when Postgres validation fails.POST /api/v1/application-drafts/{case_id}/evidence— Already the canonical upload path; reject non-PDF or oversize files (existing behaviour).
PostgreSQL
- Tab JSON lives under
drafts.payload(and/or tab snapshots). Honesty flags are plain booleans:report.honestyConfirmed,application.honestyConfirmed,contribution.digitalSignatureConfirmed. No migration is required for gating unless you add a server-side “submission readiness” snapshot column.
MinIO
- Required PDF for Đơn is stored under
initiative-attachmentswith keys frombuild_key_for_initiative; metadata is reflected inapplication_artifacts(research_evidence|textbook_evidence|technical_evidence). Frontend readiness should agree with either the draft file handle (serverStorageKey) or a freshGET .../evidencebundle (seecollectDocxTemplateCompletenessGapsin admin review for a related pattern).
Retrieving everything for one submission (interim checklist)
Until Phases 1–2 are done, a reader resolving applicationId (sub-…) should:
- Postgres: Resolve
initiatives+ latestdrafts(today:get_application_by_idscan; target: indexedsubmission_public_id— Phase 4). - Submitted full-package PDF (
full_pdfartifact): Readapplication_artifactswithrole = 'full_pdf'. Dispatch onstorage_kindonce added; until then, avoid relying only on string-prefix heuristics for production backups. - Evidence: Roles
research_evidence,textbook_evidence,technical_evidence→ keys ininitiative-attachments. - Printable mẫu DOCX/PDF: After Phase 1, stream from MinIO using new artifact roles; until then see legacy note in Phase 3.
Optional ZIP extras: latest application_review_documents JSON, draft_tab_snapshots, read-only copies of application_submit_snapshots for audit.
Related rationale and risks (regeneration vs backup, polymorphic storage_uri, integrity): feedback-data-management.md.
Implementation plan: admin backup (database + document management)
Goal: admin downloads one ZIP containing all evidence attachments, the submitted full-package PDF, and the printable application DOCX + PDF (mẫu), with verifiable integrity and no reliance on regenerating printable documents at download time (after prerequisites).
Phasing follows the sequencing in feedback-data-management.md §“Suggested order of work”, expanded into concrete schema and API work.
Phase 0 — Decisions & prerequisites
| Item | Action |
|---|---|
| Canonical bytes for printable mẫu | Store immutable DOCX + PDF in MinIO at submit (or immediately pre-submit in the same transaction as finalize), not only JSON. |
| Evidence versioning | Decide: append-only evidence history vs “latest only”. For approvals, prefer versioned or append-only so backup matches what was reviewed (feedback-data-management.md §7). |
| Quarantine bucket | Define behavior if objects exist in initiative-quarantine: include/exclude/fail backup (feedback-data-management.md §11). |
| MinIO operations | Document versioning, lifecycle, retention, DR (suggested spin-off: MINIO_OPERATIONS.md per feedback §9). |
| Dead schema | Move or clearly label database/schema.sql so tooling does not confuse INT application_id with sub-… (feedback-data-management.md §6). |
Phase 1 — Canonical bytes for printable DOCX + PDF (before backup ships)
Problem: Regenerating DOCX/PDF at backup time uses current template, docxtpl, LibreOffice, and fonts — not provably what the applicant saw (feedback-data-management.md §1).
Database
- Extend
application_artifacts.roleCHECK (new migration) with two roles, e.g.official_form_docxandofficial_form_pdf(names TBD; must be distinct fromfull_pdf, which is the client-uploaded full hồ sơ PDF). - On successful submit (or single “finalize” step server-side): compute SHA-256 for each file;
INSERT/upsert rows withstorage_uri= MinIO key,sha256,byte_size,mime_type,original_name,storage_kind = 'minio_exports'(once column exists).
Application logic
- Server: build
officialBieuMaufrom the same snapshot used for submission (bundle already available in draft + review document path), call existingfill_application_form_docx→ bytes; callconvert_docx_bytes_to_pdf→ bytes; upload both toinitiative-exportsusingbuild_key_for_initiative. - Do not put LibreOffice on the admin download path after this; optional background verify-only job may re-read objects.
JSON
- Keep saving
application_review_documentsfor re-render/diff; it is not the sole legal snapshot of the printable files once binaries exist.
Gate: Do not release the admin backup endpoint that promises “printable DOCX/PDF” until this phase is done for new submits; for legacy rows without these artifacts, define policy (backfill job vs manifest flag missing_official_form: true).
Phase 2 — Canonical storage for submitted full-package PDF
Problem: full_pdf may point at filesystem-only, MinIO-only, or both; best-effort upload risks silent loss (feedback-data-management.md §2).
Database
- Add
storage_kindonapplication_artifacts(enum/text): e.g.minio_exports,minio_attachments,filesystem,external_url. Backfill from existingstorage_urishape; default new rows explicitly. - Optionally add
content_sha256_verified_ator rely on manifest at backup time only.
Application logic
- Make MinIO upload of
full_pdfsynchronous and required when persistence is enabled: if upload fails, fail submit with retryable error. - Treat filesystem write as cache for dev/static serving if desired, not sole store.
- Backfill job: filesystem-only historical PDFs →
initiative-exports, then update artifact row +storage_kind.
Infrastructure
- Ensure
SUBMITTED_INITIATIVES_DIRis on a persistent volume in every environment, or stop relying on it for production.
Phase 3 — Admin backup endpoint + ZIP contract
Authorization: admin-only; audit every request: actor, applicationId, timestamp, outcome, bytes streamed (feedback-data-management.md §10).
Resolution: load initiative by submission_public_id or case_code (indexed) after Phase 4; until then use existing lookup with awareness of scan cost for bulk exports.
Integrity
- While streaming each file into the ZIP, compute SHA-256 and compare to
application_artifacts.sha256. On mismatch: fail entire export, log at high severity (feedback-data-management.md§4). - Optional
POST /admin/…/backup/verify(verify-only, no ZIP) for periodic audits.
ZIP layout (suggested; ASCII-safe entry names, original names in manifest):
manifest.json
submitted/full-package.pdf
submitted/official-form.docx
submitted/official-form.pdf
evidence/research/{safe-name-or-id}
evidence/textbook/…
evidence/technical/…
metadata/application_review_documents.json # optional
manifest.json (minimum fields): applicationId, case_code, initiative_id, submitted timestamps, owner id, list of files with role, original_name, mime_type, byte_size, stored sha256, verified sha256 (computed during ZIP build), storage_kind.
Transport
- Stream ZIP with a streaming library (e.g.
zipstream-ng); do not buffer whole archives in memory. - Single-initiative: synchronous response acceptable.
- Bulk (date range, many rows): async job → write ZIP to
initiative-exportsorinitiative-backups→ presigned URL when ready (avoids proxy timeouts).
Sources for each ZIP entry
| Content | Source |
|---|---|
| Full hồ sơ PDF | application_artifacts.full_pdf → MinIO initiative-exports (after Phase 2) |
| Printable DOCX / PDF | official_form_docx / official_form_pdf → initiative-exports |
| Evidence | research_*, textbook_*, technical_* → initiative-attachments |
| Structured snapshot | Optional: latest application_review_documents JSON |
Legacy: If official_form_* missing, either skip with manifest flags or run one-time backfill using frozen template policy — document that backfilled bytes are “as-of backfill date” not original submit date.
Phase 4 — Identifiers & schema hygiene
- Add
submission_public_id(unique, indexed) oninitiatives, set once at submit; replace linear scan inget_application_by_idwith indexed lookup (feedback-data-management.md§5). - Document resolution:
sub-…vsCASE-…explicitly (remove “sometimes” from ops docs).
Phase 5 — Hardening (ongoing)
- MinIO versioning / object lock if compliance requires; off-cluster backup of MinIO; periodic verify-only sweeps (
feedback-data-management.md§9, §10, quarter roadmap).
Frontend (admin)
- New “Tải bản sao lưu” (or similar) on application detail: call backup endpoint, handle long downloads (progress if async + poll).
- For async pattern: show job id, link when presigned URL ready.
- Ensure admin audit expectations match backend logging.
Summary
| Layer | Current summary | After plan |
|---|---|---|
| Postgres | Artifacts + polymorphic storage_uri |
Explicit storage_kind, optional submission_public_id, new artifact roles for official DOCX/PDF |
| MinIO | Evidence + best-effort full PDF | Required full_pdf + official form binaries on initiative-exports; evidence on initiative-attachments |
| Admin backup | Would require regeneration / fragile dispatch | Streaming ZIP + manifest + verified SHA + audit; optional async for bulk |
This aligns the database and document management system with a backup that admins can trust: stored bytes, verified at pack time, and operationally grounded in explicit storage metadata.