sciagent code + Gitea Actions CI/CD
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Thinh Lam
2026-06-30 09:38:30 +07:00
commit 688fac73e9
1167 changed files with 158244 additions and 0 deletions
+985
View File
@@ -0,0 +1,985 @@
# Implementation Guide — `sang-kien-pdf`
A step-by-step walkthrough of how the Sáng kiến PDF + DOCX template generators are built. Read this if you want to understand **why** each piece exists, **how** to modify the layout, or **how** to port the same approach to a different government form.
---
## Table of contents
1. [The problem we're solving](#1-the-problem-were-solving)
2. [Architecture overview](#2-architecture-overview)
3. [Tech stack and rationale](#3-tech-stack-and-rationale)
4. [Project setup](#4-project-setup-from-scratch)
5. [Implementing the PDF generator](#5-implementing-the-pdf-generator)
- 5.1 [TypeScript data types](#51-typescript-data-types)
- 5.2 [Font registration](#52-font-registration)
- 5.3 [Shared styles](#53-shared-styles)
- 5.4 [Reusable components](#54-reusable-components)
- 5.5 [Page components](#55-page-components)
- 5.6 [Top-level Document](#56-top-level-document)
- 5.7 [Server-side render helper](#57-server-side-render-helper)
6. [Implementing the DOCX template generator](#6-implementing-the-docx-template-generator)
- 6.1 [The Jinja-in-DOCX strategy](#61-the-jinja-in-docx-strategy)
- 6.2 [The 3-row table loop trick](#62-the-3-row-table-loop-trick)
- 6.3 [Multi-section layout](#63-multi-section-layout)
- 6.4 [Building paragraphs and tables](#64-building-paragraphs-and-tables)
7. [Layout calibration](#7-layout-calibration-matching-the-standard)
8. [Verification workflow](#8-verification-workflow)
9. [Common modifications](#9-common-modifications)
10. [Troubleshooting](#10-troubleshooting)
11. [Porting to a different form](#11-porting-to-a-different-form)
---
## 1. The problem we're solving
The "Sáng kiến" application is a Vietnamese government form (Đại học Y Dược TP.HCM) that has six sections — a cover page (Trang bìa) plus Mẫu số 0104 plus Bản cam kết. Every applicant fills out the same skeleton with their own data.
Two real-world workflows need to be supported:
1. **Programmatic PDF generation** — a web service receives JSON, returns a printable PDF. No human edits the file before printing.
2. **Word-based filling** — an admin opens a `.docx` template in Word, types into it (or uses `docxtpl`/`Carbone`/etc. to merge JSON), and prints.
Both outputs must look identical to the official reference document (`Sang_kien_SOP_dong_vat`). The data shape (`data_blank.json`) is fixed by an existing system upstream and must not change.
The trick is keeping the two generators in sync — same layout, same data fields — while staying within each format's idioms.
---
## 2. Architecture overview
```
┌────────────────────┐
│ data.json │ ← source of truth (data_blank.json shape)
└──────────┬─────────┘
┌────────────────┴────────────────┐
▼ ▼
┌──────────────────────┐ ┌─────────────────────────┐
│ React-PDF pipeline │ │ docx + docxtpl path │
│ │ │ │
│ data → React tree │ │ build-docx-template.ts │
│ → PDF buffer │ │ generates .docx with │
│ │ │ {{ }} placeholders │
│ │ │ ↓ │
│ │ │ docxtpl.render(data) │
│ │ │ → filled .docx │
└──────────┬───────────┘ └────────────┬────────────┘
│ │
▼ ▼
filled.pdf filled.docx
```
The PDF path uses **runtime composition** — a React component receives data as props and returns a tree of `<Page>`/`<View>`/`<Text>` elements. The renderer turns that into a PDF buffer.
The DOCX path uses **template-based composition** — a build script (`build-docx-template.ts`) produces a `.docx` file *once*, with placeholder strings like `{{ mau_01.mo_dau }}` baked into the document body. At runtime, `docxtpl` (Python) or any other Jinja-aware OOXML tool reads that `.docx`, finds the placeholders, and replaces them with values from the JSON.
Both pipelines read **the same TypeScript types and JSON files**, so adding a new field requires touching both sides — but the field name lives in exactly one place: `src/types.ts`.
---
## 3. Tech stack and rationale
| Concern | Choice | Why |
|---|---|---|
| PDF rendering | `@react-pdf/renderer` v4 | Component-based, server- and browser-compatible. Uses Yoga for flexbox layout. Same API as React, so layouts compose like UI code. |
| Vietnamese font | `@expo-google-fonts/tinos` | Tinos is a metric-equivalent of Times New Roman (Apache 2.0) with the full Latin Extended Additional range — needed for `ư ơ ầ ậ ọ ặ` etc. The `@expo-google-fonts/*` packages ship actual `.ttf` files (most other font packages ship `.woff/.woff2`, which `@react-pdf/renderer` can't read). |
| DOCX generation | `docx` v9 (npm) | Object-model API: build paragraphs, tables, sections in TypeScript, then `Packer.toBuffer()` produces a valid `.docx`. Maintained, typed, stable. |
| Templating engine | `docxtpl` (Python) | The most popular Jinja-style DOCX templater. Recognizes `{{ var }}`, `{% if %}`, and crucially `{%tr for %}` for table-row loops. Compatible templates work in `docx-templates` (JS) and Carbone too. |
| TypeScript | 5.4 | Catches type errors at build time and gives autocompletion across all the data fields. |
| Test rendering | LibreOffice (`soffice`) | Used to convert `.docx``.pdf` so we can visually diff against the reference document. |
**Why not a pure HTML-to-PDF approach (Puppeteer)?** It works, but bundle size is huge and rendering is non-deterministic across machines. React-PDF gives byte-stable output.
**Why not just generate the DOCX and convert it to PDF?** That would solve the layout-sync problem but couples PDF generation to a heavy toolchain (LibreOffice). React-PDF runs in pure Node.js and works inside serverless environments.
---
## 4. Project setup from scratch
```bash
mkdir sang-kien-pdf && cd sang-kien-pdf
npm init -y
# Runtime dependencies
npm install @react-pdf/renderer react @expo-google-fonts/tinos docx
# Dev dependencies
npm install -D typescript ts-node @types/react @types/node
```
Create `tsconfig.json`:
```json
{
"compilerOptions": {
"target": "ES2020",
"module": "commonjs",
"lib": ["ES2020", "DOM"],
"jsx": "react",
"outDir": "./dist",
"rootDir": "./",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true,
"resolveJsonModule": true,
"moduleResolution": "node"
},
"include": ["src/**/*", "example/**/*", "tools/**/*"],
"exclude": ["node_modules", "dist"]
}
```
The `jsx: "react"` setting matters — React-PDF uses real JSX, not the new transform.
Add scripts to `package.json`:
```json
{
"scripts": {
"build": "tsc",
"generate": "ts-node example/generate-example.ts",
"generate:blank": "ts-node example/generate-example.ts --blank",
"build:docx": "ts-node tools/build-docx-template.ts"
}
}
```
---
## 5. Implementing the PDF generator
### 5.1 TypeScript data types
Start with the data shape. Every field in the JSON gets a strict TypeScript interface in `src/types.ts`. This is the single source of truth — every page component reads it, every change ripples out through the type system.
```ts
// src/types.ts
export interface NgayKy {
ngay: string;
thang: string;
nam: string;
}
export interface TrangBia {
ten_sang_kien: string;
tac_gia: string;
don_vi: string;
thong_tin_lien_he: string;
nam: string;
}
export interface Mau01ApplyRow {
tt: string;
ten_to_chuc: string;
dia_chi: string;
linh_vuc: string;
}
export interface Mau01HieuQua {
loi_ich_kinh_te: string;
hieu_qua_giang_day: string;
// … 8 more fields
}
export interface Mau01 {
mo_dau: string;
ten_sang_kien: string;
// …
danh_sach_ap_dung: Mau01ApplyRow[];
tinh_hieu_qua: Mau01HieuQua;
ngay_ky: NgayKy;
// …
}
// … repeat for Mau02, Mau03, Mau04, BanCamKet
export interface SangKienData {
trang_bia: TrangBia;
mau_01: Mau01;
mau_02: Mau02;
mau_03: Mau03;
mau_04: Mau04;
ban_cam_ket: BanCamKet;
}
```
Two design choices worth calling out:
**All fields are strings (or string arrays).** Even numbers like "Tỷ lệ %" are strings. The form is for humans, not databases — values get rendered verbatim, and string-only types let users write `"15%"` or `"khoảng 15"` without coercion errors.
**Array-shaped tables.** `danh_sach_tac_gia` is `Mau02AuthorRow[]`, not a fixed-size tuple. The page components iterate with `.map()`, and the DOCX template uses a `{%tr for %}` loop. Both handle 0, 1, or 100 rows.
### 5.2 Font registration
`@react-pdf/renderer` ships with three fonts (Helvetica, Times-Roman, Courier) and **none of them include Vietnamese glyphs**. If you skip this step, characters like `ư ơ ầ ậ` will render as blank space.
```ts
// src/fonts.ts
import { Font } from "@react-pdf/renderer";
let registered = false;
export function registerFonts(): void {
if (registered) return;
const regular = require.resolve(
"@expo-google-fonts/tinos/400Regular/Tinos_400Regular.ttf"
);
const italic = require.resolve(
"@expo-google-fonts/tinos/400Regular_Italic/Tinos_400Regular_Italic.ttf"
);
const bold = require.resolve(
"@expo-google-fonts/tinos/700Bold/Tinos_700Bold.ttf"
);
const boldItalic = require.resolve(
"@expo-google-fonts/tinos/700Bold_Italic/Tinos_700Bold_Italic.ttf"
);
Font.register({
family: "TimesVN",
fonts: [
{ src: regular },
{ src: italic, fontStyle: "italic" },
{ src: bold, fontWeight: "bold" },
{ src: boldItalic, fontWeight: "bold", fontStyle: "italic" },
],
});
Font.registerHyphenationCallback((word) => [word]);
registered = true;
}
```
Three things happen here:
1. **`require.resolve()` finds the TTF on disk** — this works in Node and bundlers like Webpack/Vite turn it into an asset URL automatically.
2. **One family, four variants**`fontWeight` and `fontStyle` keys let `<Text style={{ fontWeight: "bold" }}>` resolve to the bold TTF.
3. **Hyphenation callback returns `[word]`** — this disables React-PDF's default English hyphenator, which would chop Vietnamese words at random points.
The `registered` boolean guards against re-registration if `registerFonts()` is called from multiple entry points.
### 5.3 Shared styles
`StyleSheet.create()` in `src/styles.ts` defines reusable style objects. Three categories matter:
**Page-level constants.** A4 with ~2.5 cm margins:
```ts
page: {
fontFamily: FONT, // "TimesVN"
fontSize: 13, // 13pt body
paddingTop: 71, // ~2.5cm = 71pt
paddingBottom: 71,
paddingLeft: 71,
paddingRight: 71,
lineHeight: 1.25,
},
```
**Paragraph variants** for the three contexts that come up:
```ts
// Indented body text (justified, first-line indent ~1cm)
paragraph: { textAlign: "justify", textIndent: 28, marginBottom: 0 },
// Flush-left lines (section labels, inline list items)
paragraphFlush: { textAlign: "justify", marginBottom: 0 },
// Section headings (flush-left, with breathing room above)
sectionHead: { textAlign: "justify", marginBottom: 0, marginTop: 4 },
```
The `marginBottom: 0` is deliberate — Vietnamese government documents are visually dense, so paragraphs only get spacing between sections, not between adjacent lines.
**Component primitives** (table, checkbox, signature columns):
```ts
table: {
flexDirection: "column",
borderWidth: 1, borderColor: "#000",
borderRightWidth: 0, borderBottomWidth: 0, // we draw R+B per-cell
marginVertical: 4,
},
tableCell: {
borderRightWidth: 1, borderBottomWidth: 1, borderColor: "#000",
padding: 4,
},
```
The "outer border drawn on the table, inner borders drawn per-cell" pattern avoids double-thickness lines where cells meet.
**Cover-specific styles** are isolated in their own group because the cover page has unique requirements (page border via `position: absolute`, "Mẫu số 01" badge in the top corner).
### 5.4 Reusable components
`src/components.tsx` factors out the patterns that show up on multiple pages:
**`<Checkbox checked={boolean}>label</Checkbox>`** — a horizontal row with a bordered square. When `checked`, an inner filled `<View>` appears inside it. We don't use the Unicode `☑` character because Tinos doesn't include it; drawing geometry is font-independent.
```tsx
export const Checkbox: React.FC<CheckboxProps> = ({ checked, children }) => (
<View style={styles.checkboxRow}>
<View style={styles.checkboxBox}>
{checked ? <View style={styles.checkboxFill} /> : null}
</View>
<Text style={styles.checkboxLabel}>{children}</Text>
</View>
);
```
**Header variants** — three different two-column header patterns appear in the document:
- `<TopHeaderBoYTe />` — "BỘ Y TẾ / ĐẠI HỌC Y DƯỢC" left, "CỘNG HÒA…" right (Mẫu 03/04)
- `<TopHeaderDonVi donVi="..." />` — drops "BỘ Y TẾ", shows the unit name in bold (Mẫu 02)
- `<TopHeaderCongHoa />` — only the right column (Bản cam kết)
Each one uses the same `flexDirection: "row"` layout with two equal columns. The differences are which lines appear.
**Table primitives.**
```tsx
<Table columns={[6, 22, 14, 16, 14, 14, 14]}>
<Row>
<Cell width={6} header align="center">STT</Cell>
<Cell width={22} header align="center">Họ tên</Cell>
{/* … */}
</Row>
{data.danh_sach_tac_gia.map((row, i) => (
<Row key={i}>
<Cell width={6} align="center">{row.stt}</Cell>
<Cell width={22}>{row.ho_ten}</Cell>
{/* … */}
</Row>
))}
</Table>
```
The `width` prop is a **percentage** (the cell renders with `width: ${width}%`). Column widths must sum to 100. The `Cell` component automatically wraps string children in `<Text>` so callers can pass either plain text or nested elements.
**`<DateLine ngay thang nam />`** renders the recurring "TP. Hồ Chí Minh, ngày … tháng … năm …" line, with sensible blank-data placeholders (`.....`).
**`<SignatureBlock title subtitle name>`** renders one column of a two-column signature block (centered title, italic subtitle, then a 50pt vertical gap before the bold signer's name).
### 5.5 Page components
Each section of the form gets its own component file in `src/pages/`. They all follow the same shape:
```tsx
// src/pages/Mau01.tsx
import { Page, View, Text } from "@react-pdf/renderer";
import { styles } from "../styles";
import { Mau01 } from "../types";
import { Table, Row, Cell, DateLine } from "../components";
interface Props {
data: Mau01;
donVi: string; // pulled from mau_02.don_vi by the parent
}
export const Mau01Page: React.FC<Props> = ({ data, donVi }) => (
<Page size="A4" style={styles.page}>
<Text style={styles.centerTitleLarge}>BÁO CÁO TẢ SÁNG KIẾN</Text>
<Text style={styles.paragraphFlush}>
1. Mở đu{" "}
<Text style={styles.italic}>
(Giới thiệu về những vấn đ liên quan đến sáng kiến):
</Text>
</Text>
<Text style={styles.paragraph}>{data.mo_dau}</Text>
{/* … rest of the page */}
</Page>
);
```
Three patterns recur in every page:
1. **Static + dynamic mixed in the same `<Text>`.** Section labels like "1. Mở đầu" are fixed, but the italic instructional helper text and the data value next to them aren't. We use nested `<Text>` to apply different styles to different runs in one paragraph (because `<Text>` in React-PDF can contain other `<Text>` nodes, like `<span>` in HTML).
2. **`{" "}` for explicit whitespace.** JSX collapses whitespace between elements. To preserve a space between a label and an italic helper, we explicitly insert `{" "}`.
3. **Default-empty rows for tables.** When `data.danh_sach_ap_dung` is empty, we still want one blank row to render so the printed form has a place to write. The pattern:
```tsx
{(data.danh_sach_ap_dung && data.danh_sach_ap_dung.length > 0
? data.danh_sach_ap_dung
: [{ tt: "", ten_to_chuc: "", dia_chi: "", linh_vuc: "" }]
).map((row, i) => /* ... */)}
```
**Signature block on Mẫu 01 takes `donVi` as a prop**, not from `data` directly. The reason: the standard layout uses the unit name from Mẫu 02 (`mau_02.don_vi`) on Mẫu 01's signature line. Rather than duplicate the value in the JSON, the parent component (`SangKienDocument`) reads it from `mau_02` and passes it down.
**Cover page is special.** It uses absolute positioning to put the page border around the entire content area:
```tsx
<Page size="A4" style={styles.page}>
<Text style={styles.formNumberOnCover}>Mẫu số 01</Text>
<View style={styles.coverBorder} fixed />
<View style={styles.coverContent}>
{/* header, title, fields, footer */}
</View>
</Page>
```
`<View fixed>` tells React-PDF to render the border on every page in this section (irrelevant here since the cover is one page, but harmless), and `position: absolute` (set in `styles.coverBorder`) makes it overlay the whole page.
### 5.6 Top-level Document
`src/SangKienDocument.tsx` composes all six pages:
```tsx
export const SangKienDocument: React.FC<{ data: SangKienData }> = ({ data }) => {
registerFonts();
const donVi = data.mau_02.don_vi || data.trang_bia.don_vi;
return (
<Document
title={data.trang_bia.ten_sang_kien || "Báo cáo mô tả sáng kiến"}
author={data.trang_bia.tac_gia}
>
<CoverPage data={data.trang_bia} />
<Mau01Page data={data.mau_01} donVi={donVi} />
<Mau02Page data={data.mau_02} />
<Mau03Page data={data.mau_03} />
<Mau04Page data={data.mau_04} />
<BanCamKetPage data={data.ban_cam_ket} />
</Document>
);
};
```
`registerFonts()` is idempotent (the internal `registered` flag guards against duplicate registration), so calling it from the top-level component is safe.
The `<Document>` element accepts metadata that shows up in the PDF's title bar — `title`, `author`, `subject`, `creator`, `producer`, `keywords`. These don't affect rendering, just file properties.
### 5.7 Server-side render helper
`src/generate.tsx` wraps the React rendering in a Node-friendly Promise:
```tsx
import { pdf } from "@react-pdf/renderer";
export async function renderSangKienPdf(data: SangKienData): Promise<Buffer> {
const instance = pdf(<SangKienDocument data={data} />);
const blob = await instance.toBlob();
const arrayBuffer = await blob.arrayBuffer();
return Buffer.from(arrayBuffer);
}
export async function renderSangKienPdfFromFile(
inputJsonPath: string,
outputPdfPath: string
): Promise<void> {
const data = JSON.parse(fs.readFileSync(inputJsonPath, "utf-8")) as SangKienData;
const buffer = await renderSangKienPdf(data);
fs.mkdirSync(path.dirname(outputPdfPath), { recursive: true });
fs.writeFileSync(outputPdfPath, buffer);
}
```
`pdf(...).toBlob()` is the cleanest async API even on the server — the `Buffer.from(await blob.arrayBuffer())` conversion is one line.
`example/generate-example.ts` is a thin CLI on top:
```ts
const useBlank = process.argv.includes("--blank");
const inputPath = useBlank
? path.join(__dirname, "data-blank.json")
: path.join(__dirname, "sample-data.json");
const outputPath = path.join(__dirname, "..", "out", `sang-kien-${useBlank ? "blank" : "filled"}.pdf`);
await renderSangKienPdfFromFile(inputPath, outputPath);
```
---
## 6. Implementing the DOCX template generator
### 6.1 The Jinja-in-DOCX strategy
`docxtpl` works by storing Jinja-style strings *as ordinary text* inside the DOCX, then doing template expansion at render time. The build script's job is to produce a `.docx` whose visible text reads:
> **Tên sáng kiến (Tiếng Việt):** {{ trang_bia.ten_sang_kien }}
When you open this in Word, you literally see those curly braces. When `docxtpl` opens it, it walks the OOXML tree, finds runs containing `{{ ... }}`, and replaces them.
**The catch: text runs split across formatting changes.** If you write `Tên sáng kiến (Tiếng Việt): {{ trang_bia.ten_sang_kien }}` in one run, that's fine. But if you bold "Tên sáng kiến" and leave `{{ … }}` regular, Word stores them as **two separate runs**. A naive search for `{{` in the second run works — but if you split a placeholder *inside* the curly braces (`{{ trang_bia.` in one run, `ten_sang_kien }}` in another), `docxtpl` will fail silently. So:
> **Rule:** every placeholder must live entirely inside one continuous run with one set of formatting.
The `docx` library makes this easy — when you write `r("{{ mau_01.mo_dau }}")`, that's exactly one `<w:r>` element with one `<w:t>` inside.
### 6.2 The 3-row table loop trick
For repeating table rows, `docxtpl` uses a special syntax: `{%tr for item in collection %}` and `{%tr endfor %}`. The `tr` prefix tells the engine "remove the entire `<w:tr>` row containing this tag and use the rows between `for` and `endfor` as the loop body."
A naive single-row pattern doesn't work:
```
[ {%tr for x in items %} {{ x.id }} | {{ x.name }} {%tr endfor %} ]
```
Because `{%tr for %}` and `{%tr endfor %}` must be in the **same row** (they're stripped together) — and Jinja then sees two opening tags with no body.
The reliable pattern is **three rows**:
```
Row 1: | {%tr for item in collection %} | (empty cells) |
Row 2: | {{ item.id }} | {{ item.name }} | ← duplicated per item
Row 3: | {%tr endfor %} | (empty cells) |
```
Row 1 and Row 3 get stripped. Row 2 gets repeated for each item. The data row carries the actual `{{ }}` fields.
In code:
```ts
const aw = [6, 22, 14, 16, 14, 14, 14]; // column widths
const emptyRow_aw = (firstText: string) => {
const cells: TableCell[] = [];
for (let i = 0; i < aw.length; i++) {
cells.push(new TableCell({
borders: allThinBorders,
width: { size: aw[i] * 100, type: WidthType.PERCENTAGE },
children: [new Paragraph({ children: [r(i === 0 ? firstText : " ")] })],
}));
}
return cells;
};
new Table({
rows: [
new TableRow({ children: [/* header cells */] }),
new TableRow({ children: emptyRow_aw("{%tr for item in mau_02.danh_sach_tac_gia %}") }),
new TableRow({ children: [
dataCell("{{ item.stt }}", aw[0], AlignmentType.CENTER),
dataCell("{{ item.ho_ten }}", aw[1]),
// … 5 more
]}),
new TableRow({ children: emptyRow_aw("{%tr endfor %}") }),
],
});
```
The `emptyRow_aw` helper builds a row where the first cell contains the loop tag and the rest are blanks (just `" "`). After `docxtpl` strips it, the visible table has one header row plus one data row per item.
### 6.3 Multi-section layout
Word documents are split into **sections**, each with its own page settings — margins, orientation, page borders, headers, footers. The cover page needs:
- A **page border** (rounded rectangle around the content area)
- A **header** containing "Mẫu số 01" at the top right *outside* the border
The rest of the document needs:
- **No** page border
- **No** "Mẫu số 01" header (it's only on the cover)
In `docx` v9, this is two sections in the same document:
```ts
new Document({
sections: [
{
properties: {
page: {
size: { width: 11906, height: 16838, orientation: PageOrientation.PORTRAIT },
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 },
borders: {
pageBorderTop: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
pageBorderBottom: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
pageBorderLeft: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
pageBorderRight: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
},
},
},
headers: { default: coverHeader }, // contains "Mẫu số 01"
children: buildCoverPage(),
},
{
properties: {
page: { size: {/*…*/}, margin: {/*…*/} /* no borders */ },
},
// Explicit empty header so the cover header doesn't leak onto subsequent pages
headers: { default: new Header({ children: [new Paragraph({ children: [r("")] })] }) },
children: [
...buildMau01(),
...buildMau02(),
...buildMau03(),
...buildMau04(),
...buildBanCamKet(),
],
},
],
});
```
Two gotchas worth noting:
**Twips, not points.** `docx` uses twips (1/1440 inch). Multiply pt by 20 to get twips:
- A4 = 11906 × 16838 twips
- 1 inch margin = 1440 twips
- 1 cm = 567 twips
**Headers leak across sections.** If section 2 doesn't define `headers`, it inherits section 1's. We have to provide an explicit empty `Header` to prevent the "Mẫu số 01" text from showing up on every page of the document.
### 6.4 Building paragraphs and tables
The build script defines small helper functions to keep the body code readable:
```ts
const FONT = "Times New Roman";
const SIZE = 26; // 13pt (docx-js uses half-points)
const SIZE_HEADING = 28; // 14pt
function r(text: string, opts: { bold?: boolean; italic?: boolean; underline?: boolean; size?: number } = {}) {
return new TextRun({
text,
font: FONT,
size: opts.size ?? SIZE,
bold: opts.bold,
italics: opts.italic,
underline: opts.underline ? { type: UnderlineType.SINGLE } : undefined,
});
}
function bodyP(children: TextRun[], opts: { indent?: boolean } = {}) {
return new Paragraph({
children,
alignment: AlignmentType.JUSTIFIED,
indent: opts.indent ? { firstLine: 567 } : undefined,
spacing: { before: 0, after: 0, line: 300 },
});
}
function flushP(children: TextRun[], opts: { spaceBefore?: number } = {}) {
return new Paragraph({
children,
alignment: AlignmentType.JUSTIFIED,
spacing: { before: opts.spaceBefore ?? 0, after: 0, line: 300 },
});
}
function centerP(children: TextRun[], opts: { spaceBefore?: number; spaceAfter?: number } = {}) {
return new Paragraph({
children,
alignment: AlignmentType.CENTER,
spacing: { before: opts.spaceBefore ?? 0, after: opts.spaceAfter ?? 0, line: 300 },
});
}
```
A typical section then reads naturally:
```ts
out.push(centerP([r("BÁO CÁO MÔ TẢ SÁNG KIẾN", { bold: true, size: SIZE_HEADING })]));
out.push(flushP([
r("1. Mở đầu "),
r("(Giới thiệu về những vấn đề liên quan…):", { italic: true }),
]));
out.push(bodyP([r("{{ mau_01.mo_dau }}")], { indent: true }));
```
For checkboxes, since the templating engine has to choose which character to render, we embed the choice in the placeholder itself:
```ts
const checkbox = (cond: string, label: string) =>
flushP([
r(`{% if ${cond} %}`),
r("☑"),
r("{% else %}"),
r("☐"),
r("{% endif %} "),
r(label),
]);
out.push(checkbox(
"mau_02.phan_loai.giai_phap_ky_thuat",
"Giải pháp kỹ thuật, quản lý, tác nghiệp, ứng dụng tiến bộ kỹ thuật áp dụng cho Đại học Y Dược TP.HCM"
));
```
After `docxtpl` runs, this paragraph reduces to `☑ Giải pháp kỹ thuật…` or `☐ Giải pháp kỹ thuật…` depending on the boolean. (For DOCX rendering in Word, the `☑/☐` characters work fine because Word falls back to a Unicode-capable font automatically — unlike React-PDF.)
---
## 7. Layout calibration (matching the standard)
The "Sang_kien_SOP_dong_vat" reference document defines a specific visual style. Here's a checklist of the calibrations applied to both generators:
| Aspect | Rule | Where it lives |
|---|---|---|
| Body font | Times New Roman (or Tinos) 13pt | `styles.page.fontSize`, `r()` `SIZE = 26` |
| Page margins | 2.5 cm all around | `padding: 71` (PDF), `margin: 1440` (DOCX) |
| Body line height | 1.25 | `lineHeight: 1.25` (PDF), `line: 300` (DOCX, 240 = single, 300 ≈ 1.25) |
| First-line indent | ~1 cm on body paragraphs | `textIndent: 28` (PDF), `firstLine: 567` (DOCX) |
| Section numbers (`1.`, `2.`, `4.1`) | **NOT bold**; italic instructions in parens | Use `paragraphFlush` not bold |
| Inter-paragraph spacing | None within a section, small gap before new section | `marginBottom: 0`, `sectionHead.marginTop: 4` |
| Cover page | Page border (rounded rect), "Mẫu số 01" outside top-right | Cover-specific styles, dedicated section in DOCX |
| Cover divider | `=====***=====` (literal) | Hardcoded string |
| Cover info fields | Left-aligned, **bold label**, regular value | `coverField` style |
| Two-column header | "ĐƠN VỊ" or "BỘ Y TẾ" left, "CỘNG HÒA" right | `TopHeaderBoYTe`, `TopHeaderDonVi`, `TopHeaderCongHoa` |
| "Độc lập Tự do Hạnh phúc" | Underlined, bold | `underline: true` flag in `r()`/styles |
| Tables | Single thin black border, no shaded header | `borderWidth: 1`, no `backgroundColor` on `tableHeaderCell` |
| Mẫu 02 author table column 7 | Header includes parenthetical italic instruction | Custom `TableCell` with two centered paragraphs |
| Signature block | Two columns: "Xác nhận của lãnh đạo / [đơn vị]" left, "Đại diện nhóm tác giả sáng kiến" right | `<View style={signatureRow}>` (PDF), borderless 2-cell table (DOCX) |
| Mẫu 03 totals row | TỔNG (cols 13 merged) ‖ 100 ‖ blank | `columnSpan: 3` in DOCX, manual width sum in PDF |
| Mẫu 04 evaluation rubric | Two scoring rows + total row at bottom | Static text + `{{ … }}` for nhận xét/điểm |
When in doubt about a layout decision, open the reference DOCX in Word, click into the relevant element, and read its formatting from the ribbon. Mirror those settings in code.
---
## 8. Verification workflow
Visual diff against the reference is the only reliable way to know you got it right. The flow:
```bash
# 1. Generate the candidate PDF
npm run generate
# 2. Convert each page to JPEG
pdftoppm -jpeg -r 100 out/sang-kien-filled.pdf out/page
# 3. Convert the reference DOCX to PDF and JPEGs the same way
soffice --headless --convert-to pdf reference.docx --outdir ref/
pdftoppm -jpeg -r 100 ref/reference.pdf ref/ref-page
# 4. Open them side by side
```
For the DOCX generator, add one more step:
```bash
# Build the template
npm run build:docx
# Render placeholders WITHOUT filling them — does the layout look right?
soffice --headless --convert-to pdf out/template_application_form.docx --outdir out/
# Fill it with sample data and render
python tools/fill-docx.py example/sample-data.json out/sang-kien-filled.docx
soffice --headless --convert-to pdf out/sang-kien-filled.docx --outdir out/
```
Smoke test the DOCX template in Python before declaring victory:
```python
# tools/test-docx-fill.py
from docxtpl import DocxTemplate
import json
with open("example/sample-data.json", encoding="utf-8") as f:
data = json.load(f)
doc = DocxTemplate("out/template_application_form.docx")
doc.render(data)
doc.save("out/template-filled-test.docx")
```
If `docxtpl` raises `TemplateSyntaxError: Encountered unknown tag 'endfor'`, you've put a `{%tr for %}` and `{%tr endfor %}` in the same row instead of separate rows. Go re-read [§6.2](#62-the-3-row-table-loop-trick).
If a `{{ field }}` doesn't get replaced and you can still see the curly braces in the filled output, the placeholder got split across runs by Word's auto-formatting. Build the placeholder with one `r("{{ x }}")` call, not three.
---
## 9. Common modifications
### Adding a new field
Say you need to add `mau_01.tong_kinh_phi` (total budget).
1. **Update `src/types.ts`:**
```ts
export interface Mau01 {
// …
tong_kinh_phi: string; // new
}
```
2. **Update `example/data-blank.json`** and **`example/sample-data.json`** with the new field.
3. **Render it in `src/pages/Mau01.tsx`:**
```tsx
<Text style={styles.paragraphFlush}>
7. Tổng kinh phí: {data.tong_kinh_phi}
</Text>
```
4. **Add it to the DOCX template generator** in `tools/build-docx-template.ts`:
```ts
out.push(flushP([r("7. Tổng kinh phí: {{ mau_01.tong_kinh_phi }}")]));
```
5. **Regenerate:**
```bash
npm run generate
npm run build:docx
```
The TypeScript compiler will yell if you forget to update the page component or miss a field in the JSON.
### Changing a column width
Column widths are kept as small integer arrays in the page component (PDF) and the build script (DOCX). They must always sum to 100.
To widen the "Họ và tên" column on the Mẫu 02 author table from 22% to 28% (and shrink "Nơi công tác" from 16% to 10%):
In `src/pages/Mau02.tsx`:
```ts
const AUTHOR_WIDTHS = [6, 28, 14, 10, 14, 14, 14] as const; // was [6, 22, 14, 16, …]
```
In `tools/build-docx-template.ts` (inside `buildMau02()`):
```ts
const aw = [6, 28, 14, 10, 14, 14, 14];
```
Both numbers must match — there's no shared constant because the PDF widths are percentages of the page width (100% sum) while the DOCX widths happen to use the same convention but go through different code paths. Keeping them in sync is a manual discipline.
### Adding a new repeating table
Both the data shape, the page component, and the DOCX template need updates:
1. **Type:** add `Mau01NewRow[]` to `Mau01`, define `interface Mau01NewRow { … }`.
2. **PDF page:** mirror the existing pattern in `src/pages/Mau01.tsx`:
```tsx
<Table columns={[10, 30, 30, 30]}>
<Row>
<Cell width={10} header align="center">TT</Cell>
{/* … */}
</Row>
{(data.danh_sach_moi && data.danh_sach_moi.length > 0
? data.danh_sach_moi
: [{ tt: "", ... }]
).map((row, i) => (
<Row key={i}>
<Cell width={10} align="center">{row.tt}</Cell>
{/* … */}
</Row>
))}
</Table>
```
3. **DOCX template:** use the 3-row pattern from [§6.2](#62-the-3-row-table-loop-trick):
```ts
const w = [10, 30, 30, 30];
const emptyRow = (firstText: string) => /* same helper pattern */;
new Table({
rows: [
new TableRow({ children: [headerCell("TT", w[0]), /* … */] }),
new TableRow({ children: emptyRow("{%tr for item in mau_01.danh_sach_moi %}") }),
new TableRow({ children: [dataCell("{{ item.tt }}", w[0], AlignmentType.CENTER), /* … */] }),
new TableRow({ children: emptyRow("{%tr endfor %}") }),
],
});
```
### Switching to your organization's font
Replace the four TTF paths in `src/fonts.ts`:
```ts
Font.register({
family: "TimesVN",
fonts: [
{ src: "/path/to/your/Regular.ttf" },
{ src: "/path/to/your/Italic.ttf", fontStyle: "italic" },
{ src: "/path/to/your/Bold.ttf", fontWeight: "bold" },
{ src: "/path/to/your/BoldItalic.ttf", fontWeight: "bold", fontStyle: "italic" },
],
});
```
For the DOCX side, change `const FONT = "Times New Roman"` in `tools/build-docx-template.ts` to whatever font you want to embed. Word will fall back to a system font if the named font isn't installed on the reader's machine, so prefer common names (Times New Roman, Arial, Calibri).
---
## 10. Troubleshooting
**PDF renders blank squares where Vietnamese characters should be.**
The font isn't registered or the registered font lacks Vietnamese glyphs. Check that `registerFonts()` is called and that the TTFs at the resolved paths are actually loaded (not 404 / missing). Tinos has the right glyph coverage; many "Times New Roman clones" don't.
**`Error: Failed to fetch font from https://…`**
You're hitting `@react-pdf/renderer`'s URL-based font loading and your environment can't reach the URL. Switch to local TTFs via `require.resolve()` (already what `src/fonts.ts` does).
**`docxtpl` raises `TemplateSyntaxError: Encountered unknown tag 'endfor'`.**
You put the `{%tr for %}` and `{%tr endfor %}` tags in the *same* table row. Re-read [§6.2](#62-the-3-row-table-loop-trick) — they have to be on separate rows.
**Some `{{ field }}` placeholders aren't being replaced.**
Word split your text run mid-placeholder. Make sure each placeholder is constructed with a single `r("{{ x }}")` call, not split across multiple `r()` calls or assembled from concatenated strings.
**The DOCX has "Mẫu số 01" appearing on every page, not just the cover.**
The cover-section header is leaking into the next section. Add an explicit empty header to the second section:
```ts
headers: { default: new Header({ children: [new Paragraph({ children: [r("")] })] }) },
```
**Tables overflow the right margin.**
Column width percentages don't sum to exactly 100, or a single cell has too much wide content with no wrap point. Either fix the widths or add `wordBreak: "break-word"` to the cell style.
**`textIndent` doesn't seem to work in `<Text>`.**
React-PDF's `textIndent` only takes effect when the `<Text>` *itself* has `display: "block"`-like behavior — i.e. it's a top-level paragraph, not nested inside another `<Text>`. If you're nesting, wrap the inner content in a parent `<Text>` that has the indent style.
**The DOCX page border doesn't appear.**
Page borders are a Word feature configured in section properties. Check that you've set all four (`pageBorderTop/Bottom/Left/Right`), with non-zero `size` and a `space` value (24 puts them ~1.7cm from the edge in our setup). LibreOffice and Word may render them slightly differently — Word is the canonical view.
**Filled DOCX has weird extra empty rows above each table.**
Those are the `{%tr for %}`/`{%tr endfor %}` rows that didn't get stripped — meaning the loop tags ended up in paragraphs *inside* a cell, not as standalone row text. Make sure the `firstText` in your `emptyRow_*()` helper is the entire cell content, not appended to other text.
---
## 11. Porting to a different form
The same pattern works for any structured government form. The migration steps:
1. **Extract the data model.** Open the reference DOCX, list every blank line and every table column. Each becomes a field in `types.ts`. Repeating sections (lists of authors, lists of attachments) become arrays.
2. **Identify the sections.** Most forms have a cover page plus N body sections. Each body section becomes a `<Page>` component plus a `buildSectionN()` function in the DOCX builder.
3. **Catalog the visual primitives.** Headers, signature blocks, tables, checkboxes, date lines — write them once in `components.tsx` (PDF) and as helper functions (DOCX), then reuse.
4. **Calibrate the styles.** Open the reference, measure margins, font, line spacing, and indent. Set them as constants. See [§7](#7-layout-calibration-matching-the-standard).
5. **Render and diff.** Generate, convert to JPEG, line up against the reference. Iterate until they match.
6. **Smoke-test the DOCX template** with `docxtpl`. If a placeholder doesn't fill, it's almost always run-splitting — fix by collapsing into one `r()` call.
The most labor-intensive part is the visual calibration (step 45). Everything else is mechanical translation from "what the form looks like" to "code that produces the same thing."
---
## Appendix: file-by-file inventory
| File | Lines | Purpose |
|---|---:|---|
| `src/types.ts` | 177 | TypeScript interfaces matching `data_blank.json` |
| `src/fonts.ts` | 56 | Tinos font registration |
| `src/styles.ts` | 239 | Shared `StyleSheet.create()` styles |
| `src/components.tsx` | 156 | Reusable `<Checkbox>`, `<Table>`, `<DateLine>`, header variants |
| `src/pages/CoverPage.tsx` | 64 | Trang bìa with page border |
| `src/pages/Mau01.tsx` | 172 | Báo cáo mô tả sáng kiến |
| `src/pages/Mau02.tsx` | 206 | Đơn đề nghị công nhận sáng kiến |
| `src/pages/Mau03.tsx` | 82 | Bản xác nhận tỷ lệ đóng góp |
| `src/pages/Mau04.tsx` | 94 | Phiếu đánh giá sáng kiến |
| `src/pages/BanCamKet.tsx` | 119 | Bản cam kết |
| `src/SangKienDocument.tsx` | 43 | Top-level `<Document>` composing all pages |
| `src/generate.tsx` | 37 | `renderSangKienPdf(data)` server-side helper |
| `src/index.ts` | 5 | Public API barrel |
| `tools/build-docx-template.ts` | 1301 | Generates the Jinja-style DOCX template |
| `tools/fill-docx.py` | ~30 | CLI to fill a template with JSON data via `docxtpl` |
| `tools/test-docx-fill.py` | ~25 | Smoke test script |
| `example/generate-example.ts` | ~35 | CLI for the PDF pipeline |
| `example/sample-data.json` | — | Realistic filled-in example |
| `example/data-blank.json` | — | All-empty template instance |
Total: about **2750 lines** of TypeScript + ~50 lines of Python. The DOCX generator is the largest single file because every static line of body text is a `out.push(flushP([r("…")]))` call, but the pattern is repetitive and easy to skim.