sciagent code + Gitea Actions CI/CD

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00
commit 688fac73e9
1167 changed files with 158244 additions and 0 deletions
@@ -0,0 +1,985 @@
+# Implementation Guide — `sang-kien-pdf`
+
+A step-by-step walkthrough of how the Sáng kiến PDF + DOCX template generators are built. Read this if you want to understand **why** each piece exists, **how** to modify the layout, or **how** to port the same approach to a different government form.
+
+---
+
+## Table of contents
+
+1. [The problem we're solving](#1-the-problem-were-solving)
+2. [Architecture overview](#2-architecture-overview)
+3. [Tech stack and rationale](#3-tech-stack-and-rationale)
+4. [Project setup](#4-project-setup-from-scratch)
+5. [Implementing the PDF generator](#5-implementing-the-pdf-generator)
+   - 5.1 [TypeScript data types](#51-typescript-data-types)
+   - 5.2 [Font registration](#52-font-registration)
+   - 5.3 [Shared styles](#53-shared-styles)
+   - 5.4 [Reusable components](#54-reusable-components)
+   - 5.5 [Page components](#55-page-components)
+   - 5.6 [Top-level Document](#56-top-level-document)
+   - 5.7 [Server-side render helper](#57-server-side-render-helper)
+6. [Implementing the DOCX template generator](#6-implementing-the-docx-template-generator)
+   - 6.1 [The Jinja-in-DOCX strategy](#61-the-jinja-in-docx-strategy)
+   - 6.2 [The 3-row table loop trick](#62-the-3-row-table-loop-trick)
+   - 6.3 [Multi-section layout](#63-multi-section-layout)
+   - 6.4 [Building paragraphs and tables](#64-building-paragraphs-and-tables)
+7. [Layout calibration](#7-layout-calibration-matching-the-standard)
+8. [Verification workflow](#8-verification-workflow)
+9. [Common modifications](#9-common-modifications)
+10. [Troubleshooting](#10-troubleshooting)
+11. [Porting to a different form](#11-porting-to-a-different-form)
+
+---
+
+## 1. The problem we're solving
+
+The "Sáng kiến" application is a Vietnamese government form (Đại học Y Dược TP.HCM) that has six sections — a cover page (Trang bìa) plus Mẫu số 01–04 plus Bản cam kết. Every applicant fills out the same skeleton with their own data.
+
+Two real-world workflows need to be supported:
+
+1. **Programmatic PDF generation** — a web service receives JSON, returns a printable PDF. No human edits the file before printing.
+2. **Word-based filling** — an admin opens a `.docx` template in Word, types into it (or uses `docxtpl`/`Carbone`/etc. to merge JSON), and prints.
+
+Both outputs must look identical to the official reference document (`Sang_kien_SOP_dong_vat`). The data shape (`data_blank.json`) is fixed by an existing system upstream and must not change.
+
+The trick is keeping the two generators in sync — same layout, same data fields — while staying within each format's idioms.
+
+---
+
+## 2. Architecture overview
+
+```
+                           ┌────────────────────┐
+                           │   data.json        │  ← source of truth (data_blank.json shape)
+                           └──────────┬─────────┘
+                                      │
+                     ┌────────────────┴────────────────┐
+                     ▼                                 ▼
+        ┌──────────────────────┐          ┌─────────────────────────┐
+        │  React-PDF pipeline  │          │   docx + docxtpl path   │
+        │                      │          │                         │
+        │  data → React tree   │          │  build-docx-template.ts │
+        │  → PDF buffer        │          │  generates .docx with   │
+        │                      │          │  {{ }} placeholders     │
+        │                      │          │           ↓             │
+        │                      │          │  docxtpl.render(data)   │
+        │                      │          │  → filled .docx         │
+        └──────────┬───────────┘          └────────────┬────────────┘
+                   │                                   │
+                   ▼                                   ▼
+              filled.pdf                          filled.docx
+```
+
+The PDF path uses **runtime composition** — a React component receives data as props and returns a tree of `<Page>`/`<View>`/`<Text>` elements. The renderer turns that into a PDF buffer.
+
+The DOCX path uses **template-based composition** — a build script (`build-docx-template.ts`) produces a `.docx` file *once*, with placeholder strings like `{{ mau_01.mo_dau }}` baked into the document body. At runtime, `docxtpl` (Python) or any other Jinja-aware OOXML tool reads that `.docx`, finds the placeholders, and replaces them with values from the JSON.
+
+Both pipelines read **the same TypeScript types and JSON files**, so adding a new field requires touching both sides — but the field name lives in exactly one place: `src/types.ts`.
+
+---
+
+## 3. Tech stack and rationale
+
+| Concern | Choice | Why |
+|---|---|---|
+| PDF rendering | `@react-pdf/renderer` v4 | Component-based, server- and browser-compatible. Uses Yoga for flexbox layout. Same API as React, so layouts compose like UI code. |
+| Vietnamese font | `@expo-google-fonts/tinos` | Tinos is a metric-equivalent of Times New Roman (Apache 2.0) with the full Latin Extended Additional range — needed for `ư ơ ầ ậ ọ ặ` etc. The `@expo-google-fonts/*` packages ship actual `.ttf` files (most other font packages ship `.woff/.woff2`, which `@react-pdf/renderer` can't read). |
+| DOCX generation | `docx` v9 (npm) | Object-model API: build paragraphs, tables, sections in TypeScript, then `Packer.toBuffer()` produces a valid `.docx`. Maintained, typed, stable. |
+| Templating engine | `docxtpl` (Python) | The most popular Jinja-style DOCX templater. Recognizes `{{ var }}`, `{% if %}`, and crucially `{%tr for %}` for table-row loops. Compatible templates work in `docx-templates` (JS) and Carbone too. |
+| TypeScript | 5.4 | Catches type errors at build time and gives autocompletion across all the data fields. |
+| Test rendering | LibreOffice (`soffice`) | Used to convert `.docx` → `.pdf` so we can visually diff against the reference document. |
+
+**Why not a pure HTML-to-PDF approach (Puppeteer)?** It works, but bundle size is huge and rendering is non-deterministic across machines. React-PDF gives byte-stable output.
+
+**Why not just generate the DOCX and convert it to PDF?** That would solve the layout-sync problem but couples PDF generation to a heavy toolchain (LibreOffice). React-PDF runs in pure Node.js and works inside serverless environments.
+
+---
+
+## 4. Project setup from scratch
+
+```bash
+mkdir sang-kien-pdf && cd sang-kien-pdf
+npm init -y
+
+# Runtime dependencies
+npm install @react-pdf/renderer react @expo-google-fonts/tinos docx
+
+# Dev dependencies
+npm install -D typescript ts-node @types/react @types/node
+```
+
+Create `tsconfig.json`:
+
+```json
+{
+  "compilerOptions": {
+    "target": "ES2020",
+    "module": "commonjs",
+    "lib": ["ES2020", "DOM"],
+    "jsx": "react",
+    "outDir": "./dist",
+    "rootDir": "./",
+    "strict": true,
+    "esModuleInterop": true,
+    "skipLibCheck": true,
+    "forceConsistentCasingInFileNames": true,
+    "declaration": true,
+    "declarationMap": true,
+    "sourceMap": true,
+    "resolveJsonModule": true,
+    "moduleResolution": "node"
+  },
+  "include": ["src/**/*", "example/**/*", "tools/**/*"],
+  "exclude": ["node_modules", "dist"]
+}
+```
+
+The `jsx: "react"` setting matters — React-PDF uses real JSX, not the new transform.
+
+Add scripts to `package.json`:
+
+```json
+{
+  "scripts": {
+    "build": "tsc",
+    "generate": "ts-node example/generate-example.ts",
+    "generate:blank": "ts-node example/generate-example.ts --blank",
+    "build:docx": "ts-node tools/build-docx-template.ts"
+  }
+}
+```
+
+---
+
+## 5. Implementing the PDF generator
+
+### 5.1 TypeScript data types
+
+Start with the data shape. Every field in the JSON gets a strict TypeScript interface in `src/types.ts`. This is the single source of truth — every page component reads it, every change ripples out through the type system.
+
+```ts
+// src/types.ts
+export interface NgayKy {
+  ngay: string;
+  thang: string;
+  nam: string;
+}
+
+export interface TrangBia {
+  ten_sang_kien: string;
+  tac_gia: string;
+  don_vi: string;
+  thong_tin_lien_he: string;
+  nam: string;
+}
+
+export interface Mau01ApplyRow {
+  tt: string;
+  ten_to_chuc: string;
+  dia_chi: string;
+  linh_vuc: string;
+}
+
+export interface Mau01HieuQua {
+  loi_ich_kinh_te: string;
+  hieu_qua_giang_day: string;
+  // … 8 more fields
+}
+
+export interface Mau01 {
+  mo_dau: string;
+  ten_sang_kien: string;
+  // …
+  danh_sach_ap_dung: Mau01ApplyRow[];
+  tinh_hieu_qua: Mau01HieuQua;
+  ngay_ky: NgayKy;
+  // …
+}
+
+// … repeat for Mau02, Mau03, Mau04, BanCamKet
+
+export interface SangKienData {
+  trang_bia: TrangBia;
+  mau_01: Mau01;
+  mau_02: Mau02;
+  mau_03: Mau03;
+  mau_04: Mau04;
+  ban_cam_ket: BanCamKet;
+}
+```
+
+Two design choices worth calling out:
+
+**All fields are strings (or string arrays).** Even numbers like "Tỷ lệ %" are strings. The form is for humans, not databases — values get rendered verbatim, and string-only types let users write `"15%"` or `"khoảng 15"` without coercion errors.
+
+**Array-shaped tables.** `danh_sach_tac_gia` is `Mau02AuthorRow[]`, not a fixed-size tuple. The page components iterate with `.map()`, and the DOCX template uses a `{%tr for %}` loop. Both handle 0, 1, or 100 rows.
+
+### 5.2 Font registration
+
+`@react-pdf/renderer` ships with three fonts (Helvetica, Times-Roman, Courier) and **none of them include Vietnamese glyphs**. If you skip this step, characters like `ư ơ ầ ậ` will render as blank space.
+
+```ts
+// src/fonts.ts
+import { Font } from "@react-pdf/renderer";
+
+let registered = false;
+
+export function registerFonts(): void {
+  if (registered) return;
+
+  const regular = require.resolve(
+    "@expo-google-fonts/tinos/400Regular/Tinos_400Regular.ttf"
+  );
+  const italic = require.resolve(
+    "@expo-google-fonts/tinos/400Regular_Italic/Tinos_400Regular_Italic.ttf"
+  );
+  const bold = require.resolve(
+    "@expo-google-fonts/tinos/700Bold/Tinos_700Bold.ttf"
+  );
+  const boldItalic = require.resolve(
+    "@expo-google-fonts/tinos/700Bold_Italic/Tinos_700Bold_Italic.ttf"
+  );
+
+  Font.register({
+    family: "TimesVN",
+    fonts: [
+      { src: regular },
+      { src: italic, fontStyle: "italic" },
+      { src: bold, fontWeight: "bold" },
+      { src: boldItalic, fontWeight: "bold", fontStyle: "italic" },
+    ],
+  });
+
+  Font.registerHyphenationCallback((word) => [word]);
+  registered = true;
+}
+```
+
+Three things happen here:
+
+1. **`require.resolve()` finds the TTF on disk** — this works in Node and bundlers like Webpack/Vite turn it into an asset URL automatically.
+2. **One family, four variants** — `fontWeight` and `fontStyle` keys let `<Text style={{ fontWeight: "bold" }}>` resolve to the bold TTF.
+3. **Hyphenation callback returns `[word]`** — this disables React-PDF's default English hyphenator, which would chop Vietnamese words at random points.
+
+The `registered` boolean guards against re-registration if `registerFonts()` is called from multiple entry points.
+
+### 5.3 Shared styles
+
+`StyleSheet.create()` in `src/styles.ts` defines reusable style objects. Three categories matter:
+
+**Page-level constants.** A4 with ~2.5 cm margins:
+
+```ts
+page: {
+  fontFamily: FONT,         // "TimesVN"
+  fontSize: 13,             // 13pt body
+  paddingTop: 71,           // ~2.5cm = 71pt
+  paddingBottom: 71,
+  paddingLeft: 71,
+  paddingRight: 71,
+  lineHeight: 1.25,
+},
+```
+
+**Paragraph variants** for the three contexts that come up:
+
+```ts
+// Indented body text (justified, first-line indent ~1cm)
+paragraph: { textAlign: "justify", textIndent: 28, marginBottom: 0 },
+
+// Flush-left lines (section labels, inline list items)
+paragraphFlush: { textAlign: "justify", marginBottom: 0 },
+
+// Section headings (flush-left, with breathing room above)
+sectionHead: { textAlign: "justify", marginBottom: 0, marginTop: 4 },
+```
+
+The `marginBottom: 0` is deliberate — Vietnamese government documents are visually dense, so paragraphs only get spacing between sections, not between adjacent lines.
+
+**Component primitives** (table, checkbox, signature columns):
+
+```ts
+table: {
+  flexDirection: "column",
+  borderWidth: 1, borderColor: "#000",
+  borderRightWidth: 0, borderBottomWidth: 0,  // we draw R+B per-cell
+  marginVertical: 4,
+},
+tableCell: {
+  borderRightWidth: 1, borderBottomWidth: 1, borderColor: "#000",
+  padding: 4,
+},
+```
+
+The "outer border drawn on the table, inner borders drawn per-cell" pattern avoids double-thickness lines where cells meet.
+
+**Cover-specific styles** are isolated in their own group because the cover page has unique requirements (page border via `position: absolute`, "Mẫu số 01" badge in the top corner).
+
+### 5.4 Reusable components
+
+`src/components.tsx` factors out the patterns that show up on multiple pages:
+
+**`<Checkbox checked={boolean}>label</Checkbox>`** — a horizontal row with a bordered square. When `checked`, an inner filled `<View>` appears inside it. We don't use the Unicode `☑` character because Tinos doesn't include it; drawing geometry is font-independent.
+
+```tsx
+export const Checkbox: React.FC<CheckboxProps> = ({ checked, children }) => (
+  <View style={styles.checkboxRow}>
+    <View style={styles.checkboxBox}>
+      {checked ? <View style={styles.checkboxFill} /> : null}
+    </View>
+    <Text style={styles.checkboxLabel}>{children}</Text>
+  </View>
+);
+```
+
+**Header variants** — three different two-column header patterns appear in the document:
+
+- `<TopHeaderBoYTe />` — "BỘ Y TẾ / ĐẠI HỌC Y DƯỢC" left, "CỘNG HÒA…" right (Mẫu 03/04)
+- `<TopHeaderDonVi donVi="..." />` — drops "BỘ Y TẾ", shows the unit name in bold (Mẫu 02)
+- `<TopHeaderCongHoa />` — only the right column (Bản cam kết)
+
+Each one uses the same `flexDirection: "row"` layout with two equal columns. The differences are which lines appear.
+
+**Table primitives.**
+
+```tsx
+<Table columns={[6, 22, 14, 16, 14, 14, 14]}>
+  <Row>
+    <Cell width={6} header align="center">STT</Cell>
+    <Cell width={22} header align="center">Họ và tên</Cell>
+    {/* … */}
+  </Row>
+  {data.danh_sach_tac_gia.map((row, i) => (
+    <Row key={i}>
+      <Cell width={6} align="center">{row.stt}</Cell>
+      <Cell width={22}>{row.ho_ten}</Cell>
+      {/* … */}
+    </Row>
+  ))}
+</Table>
+```
+
+The `width` prop is a **percentage** (the cell renders with `width: ${width}%`). Column widths must sum to 100. The `Cell` component automatically wraps string children in `<Text>` so callers can pass either plain text or nested elements.
+
+**`<DateLine ngay thang nam />`** renders the recurring "TP. Hồ Chí Minh, ngày … tháng … năm …" line, with sensible blank-data placeholders (`.....`).
+
+**`<SignatureBlock title subtitle name>`** renders one column of a two-column signature block (centered title, italic subtitle, then a 50pt vertical gap before the bold signer's name).
+
+### 5.5 Page components
+
+Each section of the form gets its own component file in `src/pages/`. They all follow the same shape:
+
+```tsx
+// src/pages/Mau01.tsx
+import { Page, View, Text } from "@react-pdf/renderer";
+import { styles } from "../styles";
+import { Mau01 } from "../types";
+import { Table, Row, Cell, DateLine } from "../components";
+
+interface Props {
+  data: Mau01;
+  donVi: string;  // pulled from mau_02.don_vi by the parent
+}
+
+export const Mau01Page: React.FC<Props> = ({ data, donVi }) => (
+  <Page size="A4" style={styles.page}>
+    <Text style={styles.centerTitleLarge}>BÁO CÁO MÔ TẢ SÁNG KIẾN</Text>
+
+    <Text style={styles.paragraphFlush}>
+      1. Mở đầu{" "}
+      <Text style={styles.italic}>
+        (Giới thiệu về những vấn đề liên quan đến sáng kiến…):
+      </Text>
+    </Text>
+    <Text style={styles.paragraph}>{data.mo_dau}</Text>
+
+    {/* … rest of the page */}
+  </Page>
+);
+```
+
+Three patterns recur in every page:
+
+1. **Static + dynamic mixed in the same `<Text>`.** Section labels like "1. Mở đầu" are fixed, but the italic instructional helper text and the data value next to them aren't. We use nested `<Text>` to apply different styles to different runs in one paragraph (because `<Text>` in React-PDF can contain other `<Text>` nodes, like `<span>` in HTML).
+
+2. **`{" "}` for explicit whitespace.** JSX collapses whitespace between elements. To preserve a space between a label and an italic helper, we explicitly insert `{" "}`.
+
+3. **Default-empty rows for tables.** When `data.danh_sach_ap_dung` is empty, we still want one blank row to render so the printed form has a place to write. The pattern:
+   ```tsx
+   {(data.danh_sach_ap_dung && data.danh_sach_ap_dung.length > 0
+     ? data.danh_sach_ap_dung
+     : [{ tt: "", ten_to_chuc: "", dia_chi: "", linh_vuc: "" }]
+   ).map((row, i) => /* ... */)}
+   ```
+
+**Signature block on Mẫu 01 takes `donVi` as a prop**, not from `data` directly. The reason: the standard layout uses the unit name from Mẫu 02 (`mau_02.don_vi`) on Mẫu 01's signature line. Rather than duplicate the value in the JSON, the parent component (`SangKienDocument`) reads it from `mau_02` and passes it down.
+
+**Cover page is special.** It uses absolute positioning to put the page border around the entire content area:
+
+```tsx
+<Page size="A4" style={styles.page}>
+  <Text style={styles.formNumberOnCover}>Mẫu số 01</Text>
+  <View style={styles.coverBorder} fixed />
+  <View style={styles.coverContent}>
+    {/* header, title, fields, footer */}
+  </View>
+</Page>
+```
+
+`<View fixed>` tells React-PDF to render the border on every page in this section (irrelevant here since the cover is one page, but harmless), and `position: absolute` (set in `styles.coverBorder`) makes it overlay the whole page.
+
+### 5.6 Top-level Document
+
+`src/SangKienDocument.tsx` composes all six pages:
+
+```tsx
+export const SangKienDocument: React.FC<{ data: SangKienData }> = ({ data }) => {
+  registerFonts();
+  const donVi = data.mau_02.don_vi || data.trang_bia.don_vi;
+
+  return (
+    <Document
+      title={data.trang_bia.ten_sang_kien || "Báo cáo mô tả sáng kiến"}
+      author={data.trang_bia.tac_gia}
+    >
+      <CoverPage data={data.trang_bia} />
+      <Mau01Page data={data.mau_01} donVi={donVi} />
+      <Mau02Page data={data.mau_02} />
+      <Mau03Page data={data.mau_03} />
+      <Mau04Page data={data.mau_04} />
+      <BanCamKetPage data={data.ban_cam_ket} />
+    </Document>
+  );
+};
+```
+
+`registerFonts()` is idempotent (the internal `registered` flag guards against duplicate registration), so calling it from the top-level component is safe.
+
+The `<Document>` element accepts metadata that shows up in the PDF's title bar — `title`, `author`, `subject`, `creator`, `producer`, `keywords`. These don't affect rendering, just file properties.
+
+### 5.7 Server-side render helper
+
+`src/generate.tsx` wraps the React rendering in a Node-friendly Promise:
+
+```tsx
+import { pdf } from "@react-pdf/renderer";
+
+export async function renderSangKienPdf(data: SangKienData): Promise<Buffer> {
+  const instance = pdf(<SangKienDocument data={data} />);
+  const blob = await instance.toBlob();
+  const arrayBuffer = await blob.arrayBuffer();
+  return Buffer.from(arrayBuffer);
+}
+
+export async function renderSangKienPdfFromFile(
+  inputJsonPath: string,
+  outputPdfPath: string
+): Promise<void> {
+  const data = JSON.parse(fs.readFileSync(inputJsonPath, "utf-8")) as SangKienData;
+  const buffer = await renderSangKienPdf(data);
+  fs.mkdirSync(path.dirname(outputPdfPath), { recursive: true });
+  fs.writeFileSync(outputPdfPath, buffer);
+}
+```
+
+`pdf(...).toBlob()` is the cleanest async API even on the server — the `Buffer.from(await blob.arrayBuffer())` conversion is one line.
+
+`example/generate-example.ts` is a thin CLI on top:
+
+```ts
+const useBlank = process.argv.includes("--blank");
+const inputPath = useBlank
+  ? path.join(__dirname, "data-blank.json")
+  : path.join(__dirname, "sample-data.json");
+const outputPath = path.join(__dirname, "..", "out", `sang-kien-${useBlank ? "blank" : "filled"}.pdf`);
+
+await renderSangKienPdfFromFile(inputPath, outputPath);
+```
+
+---
+
+## 6. Implementing the DOCX template generator
+
+### 6.1 The Jinja-in-DOCX strategy
+
+`docxtpl` works by storing Jinja-style strings *as ordinary text* inside the DOCX, then doing template expansion at render time. The build script's job is to produce a `.docx` whose visible text reads:
+
+> **Tên sáng kiến (Tiếng Việt):** {{ trang_bia.ten_sang_kien }}
+
+When you open this in Word, you literally see those curly braces. When `docxtpl` opens it, it walks the OOXML tree, finds runs containing `{{ ... }}`, and replaces them.
+
+**The catch: text runs split across formatting changes.** If you write `Tên sáng kiến (Tiếng Việt): {{ trang_bia.ten_sang_kien }}` in one run, that's fine. But if you bold "Tên sáng kiến" and leave `{{ … }}` regular, Word stores them as **two separate runs**. A naive search for `{{` in the second run works — but if you split a placeholder *inside* the curly braces (`{{ trang_bia.` in one run, `ten_sang_kien }}` in another), `docxtpl` will fail silently. So:
+
+> **Rule:** every placeholder must live entirely inside one continuous run with one set of formatting.
+
+The `docx` library makes this easy — when you write `r("{{ mau_01.mo_dau }}")`, that's exactly one `<w:r>` element with one `<w:t>` inside.
+
+### 6.2 The 3-row table loop trick
+
+For repeating table rows, `docxtpl` uses a special syntax: `{%tr for item in collection %}` and `{%tr endfor %}`. The `tr` prefix tells the engine "remove the entire `<w:tr>` row containing this tag and use the rows between `for` and `endfor` as the loop body."
+
+A naive single-row pattern doesn't work:
+
+```
+[ {%tr for x in items %} {{ x.id }} | {{ x.name }} {%tr endfor %} ]
+```
+
+Because `{%tr for %}` and `{%tr endfor %}` must be in the **same row** (they're stripped together) — and Jinja then sees two opening tags with no body.
+
+The reliable pattern is **three rows**:
+
+```
+Row 1: | {%tr for item in collection %} | (empty cells) |
+Row 2: | {{ item.id }} | {{ item.name }} |   ← duplicated per item
+Row 3: | {%tr endfor %} | (empty cells) |
+```
+
+Row 1 and Row 3 get stripped. Row 2 gets repeated for each item. The data row carries the actual `{{ }}` fields.
+
+In code:
+
+```ts
+const aw = [6, 22, 14, 16, 14, 14, 14];  // column widths
+
+const emptyRow_aw = (firstText: string) => {
+  const cells: TableCell[] = [];
+  for (let i = 0; i < aw.length; i++) {
+    cells.push(new TableCell({
+      borders: allThinBorders,
+      width: { size: aw[i] * 100, type: WidthType.PERCENTAGE },
+      children: [new Paragraph({ children: [r(i === 0 ? firstText : " ")] })],
+    }));
+  }
+  return cells;
+};
+
+new Table({
+  rows: [
+    new TableRow({ children: [/* header cells */] }),
+    new TableRow({ children: emptyRow_aw("{%tr for item in mau_02.danh_sach_tac_gia %}") }),
+    new TableRow({ children: [
+      dataCell("{{ item.stt }}", aw[0], AlignmentType.CENTER),
+      dataCell("{{ item.ho_ten }}", aw[1]),
+      // … 5 more
+    ]}),
+    new TableRow({ children: emptyRow_aw("{%tr endfor %}") }),
+  ],
+});
+```
+
+The `emptyRow_aw` helper builds a row where the first cell contains the loop tag and the rest are blanks (just `" "`). After `docxtpl` strips it, the visible table has one header row plus one data row per item.
+
+### 6.3 Multi-section layout
+
+Word documents are split into **sections**, each with its own page settings — margins, orientation, page borders, headers, footers. The cover page needs:
+
+- A **page border** (rounded rectangle around the content area)
+- A **header** containing "Mẫu số 01" at the top right *outside* the border
+
+The rest of the document needs:
+
+- **No** page border
+- **No** "Mẫu số 01" header (it's only on the cover)
+
+In `docx` v9, this is two sections in the same document:
+
+```ts
+new Document({
+  sections: [
+    {
+      properties: {
+        page: {
+          size: { width: 11906, height: 16838, orientation: PageOrientation.PORTRAIT },
+          margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 },
+          borders: {
+            pageBorderTop:    { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
+            pageBorderBottom: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
+            pageBorderLeft:   { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
+            pageBorderRight:  { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
+          },
+        },
+      },
+      headers: { default: coverHeader },  // contains "Mẫu số 01"
+      children: buildCoverPage(),
+    },
+    {
+      properties: {
+        page: { size: {/*…*/}, margin: {/*…*/} /* no borders */ },
+      },
+      // Explicit empty header so the cover header doesn't leak onto subsequent pages
+      headers: { default: new Header({ children: [new Paragraph({ children: [r("")] })] }) },
+      children: [
+        ...buildMau01(),
+        ...buildMau02(),
+        ...buildMau03(),
+        ...buildMau04(),
+        ...buildBanCamKet(),
+      ],
+    },
+  ],
+});
+```
+
+Two gotchas worth noting:
+
+**Twips, not points.** `docx` uses twips (1/1440 inch). Multiply pt by 20 to get twips:
+- A4 = 11906 × 16838 twips
+- 1 inch margin = 1440 twips
+- 1 cm = 567 twips
+
+**Headers leak across sections.** If section 2 doesn't define `headers`, it inherits section 1's. We have to provide an explicit empty `Header` to prevent the "Mẫu số 01" text from showing up on every page of the document.
+
+### 6.4 Building paragraphs and tables
+
+The build script defines small helper functions to keep the body code readable:
+
+```ts
+const FONT = "Times New Roman";
+const SIZE = 26;          // 13pt (docx-js uses half-points)
+const SIZE_HEADING = 28;  // 14pt
+
+function r(text: string, opts: { bold?: boolean; italic?: boolean; underline?: boolean; size?: number } = {}) {
+  return new TextRun({
+    text,
+    font: FONT,
+    size: opts.size ?? SIZE,
+    bold: opts.bold,
+    italics: opts.italic,
+    underline: opts.underline ? { type: UnderlineType.SINGLE } : undefined,
+  });
+}
+
+function bodyP(children: TextRun[], opts: { indent?: boolean } = {}) {
+  return new Paragraph({
+    children,
+    alignment: AlignmentType.JUSTIFIED,
+    indent: opts.indent ? { firstLine: 567 } : undefined,
+    spacing: { before: 0, after: 0, line: 300 },
+  });
+}
+
+function flushP(children: TextRun[], opts: { spaceBefore?: number } = {}) {
+  return new Paragraph({
+    children,
+    alignment: AlignmentType.JUSTIFIED,
+    spacing: { before: opts.spaceBefore ?? 0, after: 0, line: 300 },
+  });
+}
+
+function centerP(children: TextRun[], opts: { spaceBefore?: number; spaceAfter?: number } = {}) {
+  return new Paragraph({
+    children,
+    alignment: AlignmentType.CENTER,
+    spacing: { before: opts.spaceBefore ?? 0, after: opts.spaceAfter ?? 0, line: 300 },
+  });
+}
+```
+
+A typical section then reads naturally:
+
+```ts
+out.push(centerP([r("BÁO CÁO MÔ TẢ SÁNG KIẾN", { bold: true, size: SIZE_HEADING })]));
+
+out.push(flushP([
+  r("1. Mở đầu "),
+  r("(Giới thiệu về những vấn đề liên quan…):", { italic: true }),
+]));
+out.push(bodyP([r("{{ mau_01.mo_dau }}")], { indent: true }));
+```
+
+For checkboxes, since the templating engine has to choose which character to render, we embed the choice in the placeholder itself:
+
+```ts
+const checkbox = (cond: string, label: string) =>
+  flushP([
+    r(`{% if ${cond} %}`),
+    r("☑"),
+    r("{% else %}"),
+    r("☐"),
+    r("{% endif %} "),
+    r(label),
+  ]);
+
+out.push(checkbox(
+  "mau_02.phan_loai.giai_phap_ky_thuat",
+  "Giải pháp kỹ thuật, quản lý, tác nghiệp, ứng dụng tiến bộ kỹ thuật áp dụng cho Đại học Y Dược TP.HCM"
+));
+```
+
+After `docxtpl` runs, this paragraph reduces to `☑ Giải pháp kỹ thuật…` or `☐ Giải pháp kỹ thuật…` depending on the boolean. (For DOCX rendering in Word, the `☑/☐` characters work fine because Word falls back to a Unicode-capable font automatically — unlike React-PDF.)
+
+---
+
+## 7. Layout calibration (matching the standard)
+
+The "Sang_kien_SOP_dong_vat" reference document defines a specific visual style. Here's a checklist of the calibrations applied to both generators:
+
+| Aspect | Rule | Where it lives |
+|---|---|---|
+| Body font | Times New Roman (or Tinos) 13pt | `styles.page.fontSize`, `r()` `SIZE = 26` |
+| Page margins | 2.5 cm all around | `padding: 71` (PDF), `margin: 1440` (DOCX) |
+| Body line height | 1.25 | `lineHeight: 1.25` (PDF), `line: 300` (DOCX, 240 = single, 300 ≈ 1.25) |
+| First-line indent | ~1 cm on body paragraphs | `textIndent: 28` (PDF), `firstLine: 567` (DOCX) |
+| Section numbers (`1.`, `2.`, `4.1`) | **NOT bold**; italic instructions in parens | Use `paragraphFlush` not bold |
+| Inter-paragraph spacing | None within a section, small gap before new section | `marginBottom: 0`, `sectionHead.marginTop: 4` |
+| Cover page | Page border (rounded rect), "Mẫu số 01" outside top-right | Cover-specific styles, dedicated section in DOCX |
+| Cover divider | `=====***=====` (literal) | Hardcoded string |
+| Cover info fields | Left-aligned, **bold label**, regular value | `coverField` style |
+| Two-column header | "ĐƠN VỊ" or "BỘ Y TẾ" left, "CỘNG HÒA" right | `TopHeaderBoYTe`, `TopHeaderDonVi`, `TopHeaderCongHoa` |
+| "Độc lập – Tự do – Hạnh phúc" | Underlined, bold | `underline: true` flag in `r()`/styles |
+| Tables | Single thin black border, no shaded header | `borderWidth: 1`, no `backgroundColor` on `tableHeaderCell` |
+| Mẫu 02 author table column 7 | Header includes parenthetical italic instruction | Custom `TableCell` with two centered paragraphs |
+| Signature block | Two columns: "Xác nhận của lãnh đạo / [đơn vị]" left, "Đại diện nhóm tác giả sáng kiến" right | `<View style={signatureRow}>` (PDF), borderless 2-cell table (DOCX) |
+| Mẫu 03 totals row | TỔNG (cols 1–3 merged) ‖ 100 ‖ blank | `columnSpan: 3` in DOCX, manual width sum in PDF |
+| Mẫu 04 evaluation rubric | Two scoring rows + total row at bottom | Static text + `{{ … }}` for nhận xét/điểm |
+
+When in doubt about a layout decision, open the reference DOCX in Word, click into the relevant element, and read its formatting from the ribbon. Mirror those settings in code.
+
+---
+
+## 8. Verification workflow
+
+Visual diff against the reference is the only reliable way to know you got it right. The flow:
+
+```bash
+# 1. Generate the candidate PDF
+npm run generate
+
+# 2. Convert each page to JPEG
+pdftoppm -jpeg -r 100 out/sang-kien-filled.pdf out/page
+
+# 3. Convert the reference DOCX to PDF and JPEGs the same way
+soffice --headless --convert-to pdf reference.docx --outdir ref/
+pdftoppm -jpeg -r 100 ref/reference.pdf ref/ref-page
+
+# 4. Open them side by side
+```
+
+For the DOCX generator, add one more step:
+
+```bash
+# Build the template
+npm run build:docx
+
+# Render placeholders WITHOUT filling them — does the layout look right?
+soffice --headless --convert-to pdf out/template_application_form.docx --outdir out/
+
+# Fill it with sample data and render
+python tools/fill-docx.py example/sample-data.json out/sang-kien-filled.docx
+soffice --headless --convert-to pdf out/sang-kien-filled.docx --outdir out/
+```
+
+Smoke test the DOCX template in Python before declaring victory:
+
+```python
+# tools/test-docx-fill.py
+from docxtpl import DocxTemplate
+import json
+
+with open("example/sample-data.json", encoding="utf-8") as f:
+    data = json.load(f)
+
+doc = DocxTemplate("out/template_application_form.docx")
+doc.render(data)
+doc.save("out/template-filled-test.docx")
+```
+
+If `docxtpl` raises `TemplateSyntaxError: Encountered unknown tag 'endfor'`, you've put a `{%tr for %}` and `{%tr endfor %}` in the same row instead of separate rows. Go re-read [§6.2](#62-the-3-row-table-loop-trick).
+
+If a `{{ field }}` doesn't get replaced and you can still see the curly braces in the filled output, the placeholder got split across runs by Word's auto-formatting. Build the placeholder with one `r("{{ x }}")` call, not three.
+
+---
+
+## 9. Common modifications
+
+### Adding a new field
+
+Say you need to add `mau_01.tong_kinh_phi` (total budget).
+
+1. **Update `src/types.ts`:**
+   ```ts
+   export interface Mau01 {
+     // …
+     tong_kinh_phi: string;  // new
+   }
+   ```
+
+2. **Update `example/data-blank.json`** and **`example/sample-data.json`** with the new field.
+
+3. **Render it in `src/pages/Mau01.tsx`:**
+   ```tsx
+   <Text style={styles.paragraphFlush}>
+     7. Tổng kinh phí: {data.tong_kinh_phi}
+   </Text>
+   ```
+
+4. **Add it to the DOCX template generator** in `tools/build-docx-template.ts`:
+   ```ts
+   out.push(flushP([r("7. Tổng kinh phí: {{ mau_01.tong_kinh_phi }}")]));
+   ```
+
+5. **Regenerate:**
+   ```bash
+   npm run generate
+   npm run build:docx
+   ```
+
+The TypeScript compiler will yell if you forget to update the page component or miss a field in the JSON.
+
+### Changing a column width
+
+Column widths are kept as small integer arrays in the page component (PDF) and the build script (DOCX). They must always sum to 100.
+
+To widen the "Họ và tên" column on the Mẫu 02 author table from 22% to 28% (and shrink "Nơi công tác" from 16% to 10%):
+
+In `src/pages/Mau02.tsx`:
+```ts
+const AUTHOR_WIDTHS = [6, 28, 14, 10, 14, 14, 14] as const;  // was [6, 22, 14, 16, …]
+```
+
+In `tools/build-docx-template.ts` (inside `buildMau02()`):
+```ts
+const aw = [6, 28, 14, 10, 14, 14, 14];
+```
+
+Both numbers must match — there's no shared constant because the PDF widths are percentages of the page width (100% sum) while the DOCX widths happen to use the same convention but go through different code paths. Keeping them in sync is a manual discipline.
+
+### Adding a new repeating table
+
+Both the data shape, the page component, and the DOCX template need updates:
+
+1. **Type:** add `Mau01NewRow[]` to `Mau01`, define `interface Mau01NewRow { … }`.
+
+2. **PDF page:** mirror the existing pattern in `src/pages/Mau01.tsx`:
+   ```tsx
+   <Table columns={[10, 30, 30, 30]}>
+     <Row>
+       <Cell width={10} header align="center">TT</Cell>
+       {/* … */}
+     </Row>
+     {(data.danh_sach_moi && data.danh_sach_moi.length > 0
+       ? data.danh_sach_moi
+       : [{ tt: "", ... }]
+     ).map((row, i) => (
+       <Row key={i}>
+         <Cell width={10} align="center">{row.tt}</Cell>
+         {/* … */}
+       </Row>
+     ))}
+   </Table>
+   ```
+
+3. **DOCX template:** use the 3-row pattern from [§6.2](#62-the-3-row-table-loop-trick):
+   ```ts
+   const w = [10, 30, 30, 30];
+   const emptyRow = (firstText: string) => /* same helper pattern */;
+
+   new Table({
+     rows: [
+       new TableRow({ children: [headerCell("TT", w[0]), /* … */] }),
+       new TableRow({ children: emptyRow("{%tr for item in mau_01.danh_sach_moi %}") }),
+       new TableRow({ children: [dataCell("{{ item.tt }}", w[0], AlignmentType.CENTER), /* … */] }),
+       new TableRow({ children: emptyRow("{%tr endfor %}") }),
+     ],
+   });
+   ```
+
+### Switching to your organization's font
+
+Replace the four TTF paths in `src/fonts.ts`:
+
+```ts
+Font.register({
+  family: "TimesVN",
+  fonts: [
+    { src: "/path/to/your/Regular.ttf" },
+    { src: "/path/to/your/Italic.ttf", fontStyle: "italic" },
+    { src: "/path/to/your/Bold.ttf", fontWeight: "bold" },
+    { src: "/path/to/your/BoldItalic.ttf", fontWeight: "bold", fontStyle: "italic" },
+  ],
+});
+```
+
+For the DOCX side, change `const FONT = "Times New Roman"` in `tools/build-docx-template.ts` to whatever font you want to embed. Word will fall back to a system font if the named font isn't installed on the reader's machine, so prefer common names (Times New Roman, Arial, Calibri).
+
+---
+
+## 10. Troubleshooting
+
+**PDF renders blank squares where Vietnamese characters should be.**
+The font isn't registered or the registered font lacks Vietnamese glyphs. Check that `registerFonts()` is called and that the TTFs at the resolved paths are actually loaded (not 404 / missing). Tinos has the right glyph coverage; many "Times New Roman clones" don't.
+
+**`Error: Failed to fetch font from https://…`**
+You're hitting `@react-pdf/renderer`'s URL-based font loading and your environment can't reach the URL. Switch to local TTFs via `require.resolve()` (already what `src/fonts.ts` does).
+
+**`docxtpl` raises `TemplateSyntaxError: Encountered unknown tag 'endfor'`.**
+You put the `{%tr for %}` and `{%tr endfor %}` tags in the *same* table row. Re-read [§6.2](#62-the-3-row-table-loop-trick) — they have to be on separate rows.
+
+**Some `{{ field }}` placeholders aren't being replaced.**
+Word split your text run mid-placeholder. Make sure each placeholder is constructed with a single `r("{{ x }}")` call, not split across multiple `r()` calls or assembled from concatenated strings.
+
+**The DOCX has "Mẫu số 01" appearing on every page, not just the cover.**
+The cover-section header is leaking into the next section. Add an explicit empty header to the second section:
+```ts
+headers: { default: new Header({ children: [new Paragraph({ children: [r("")] })] }) },
+```
+
+**Tables overflow the right margin.**
+Column width percentages don't sum to exactly 100, or a single cell has too much wide content with no wrap point. Either fix the widths or add `wordBreak: "break-word"` to the cell style.
+
+**`textIndent` doesn't seem to work in `<Text>`.**
+React-PDF's `textIndent` only takes effect when the `<Text>` *itself* has `display: "block"`-like behavior — i.e. it's a top-level paragraph, not nested inside another `<Text>`. If you're nesting, wrap the inner content in a parent `<Text>` that has the indent style.
+
+**The DOCX page border doesn't appear.**
+Page borders are a Word feature configured in section properties. Check that you've set all four (`pageBorderTop/Bottom/Left/Right`), with non-zero `size` and a `space` value (24 puts them ~1.7cm from the edge in our setup). LibreOffice and Word may render them slightly differently — Word is the canonical view.
+
+**Filled DOCX has weird extra empty rows above each table.**
+Those are the `{%tr for %}`/`{%tr endfor %}` rows that didn't get stripped — meaning the loop tags ended up in paragraphs *inside* a cell, not as standalone row text. Make sure the `firstText` in your `emptyRow_*()` helper is the entire cell content, not appended to other text.
+
+---
+
+## 11. Porting to a different form
+
+The same pattern works for any structured government form. The migration steps:
+
+1. **Extract the data model.** Open the reference DOCX, list every blank line and every table column. Each becomes a field in `types.ts`. Repeating sections (lists of authors, lists of attachments) become arrays.
+
+2. **Identify the sections.** Most forms have a cover page plus N body sections. Each body section becomes a `<Page>` component plus a `buildSectionN()` function in the DOCX builder.
+
+3. **Catalog the visual primitives.** Headers, signature blocks, tables, checkboxes, date lines — write them once in `components.tsx` (PDF) and as helper functions (DOCX), then reuse.
+
+4. **Calibrate the styles.** Open the reference, measure margins, font, line spacing, and indent. Set them as constants. See [§7](#7-layout-calibration-matching-the-standard).
+
+5. **Render and diff.** Generate, convert to JPEG, line up against the reference. Iterate until they match.
+
+6. **Smoke-test the DOCX template** with `docxtpl`. If a placeholder doesn't fill, it's almost always run-splitting — fix by collapsing into one `r()` call.
+
+The most labor-intensive part is the visual calibration (step 4–5). Everything else is mechanical translation from "what the form looks like" to "code that produces the same thing."
+
+---
+
+## Appendix: file-by-file inventory
+
+| File | Lines | Purpose |
+|---|---:|---|
+| `src/types.ts` | 177 | TypeScript interfaces matching `data_blank.json` |
+| `src/fonts.ts` | 56 | Tinos font registration |
+| `src/styles.ts` | 239 | Shared `StyleSheet.create()` styles |
+| `src/components.tsx` | 156 | Reusable `<Checkbox>`, `<Table>`, `<DateLine>`, header variants |
+| `src/pages/CoverPage.tsx` | 64 | Trang bìa with page border |
+| `src/pages/Mau01.tsx` | 172 | Báo cáo mô tả sáng kiến |
+| `src/pages/Mau02.tsx` | 206 | Đơn đề nghị công nhận sáng kiến |
+| `src/pages/Mau03.tsx` | 82 | Bản xác nhận tỷ lệ đóng góp |
+| `src/pages/Mau04.tsx` | 94 | Phiếu đánh giá sáng kiến |
+| `src/pages/BanCamKet.tsx` | 119 | Bản cam kết |
+| `src/SangKienDocument.tsx` | 43 | Top-level `<Document>` composing all pages |
+| `src/generate.tsx` | 37 | `renderSangKienPdf(data)` server-side helper |
+| `src/index.ts` | 5 | Public API barrel |
+| `tools/build-docx-template.ts` | 1301 | Generates the Jinja-style DOCX template |
+| `tools/fill-docx.py` | ~30 | CLI to fill a template with JSON data via `docxtpl` |
+| `tools/test-docx-fill.py` | ~25 | Smoke test script |
+| `example/generate-example.ts` | ~35 | CLI for the PDF pipeline |
+| `example/sample-data.json` | — | Realistic filled-in example |
+| `example/data-blank.json` | — | All-empty template instance |
+
+Total: about **2750 lines** of TypeScript + ~50 lines of Python. The DOCX generator is the largest single file because every static line of body text is a `out.push(flushP([r("…")]))` call, but the pattern is repetitive and easy to skim.