Files
sciagent/docs/PDF_TEMPLATE_IMPLEMENTATION.md
T
Thinh Lam 688fac73e9
CI/CD / backend (push) Failing after 2m8s
CI/CD / frontend (push) Failing after 1m40s
CI/CD / deploy (push) Has been skipped
sciagent code + Gitea Actions CI/CD
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 09:38:30 +07:00

41 KiB
Raw Blame History

Implementation Guide — sang-kien-pdf

A step-by-step walkthrough of how the Sáng kiến PDF + DOCX template generators are built. Read this if you want to understand why each piece exists, how to modify the layout, or how to port the same approach to a different government form.


Table of contents

  1. The problem we're solving
  2. Architecture overview
  3. Tech stack and rationale
  4. Project setup
  5. Implementing the PDF generator
  6. Implementing the DOCX template generator
  7. Layout calibration
  8. Verification workflow
  9. Common modifications
  10. Troubleshooting
  11. Porting to a different form

1. The problem we're solving

The "Sáng kiến" application is a Vietnamese government form (Đại học Y Dược TP.HCM) that has six sections — a cover page (Trang bìa) plus Mẫu số 0104 plus Bản cam kết. Every applicant fills out the same skeleton with their own data.

Two real-world workflows need to be supported:

  1. Programmatic PDF generation — a web service receives JSON, returns a printable PDF. No human edits the file before printing.
  2. Word-based filling — an admin opens a .docx template in Word, types into it (or uses docxtpl/Carbone/etc. to merge JSON), and prints.

Both outputs must look identical to the official reference document (Sang_kien_SOP_dong_vat). The data shape (data_blank.json) is fixed by an existing system upstream and must not change.

The trick is keeping the two generators in sync — same layout, same data fields — while staying within each format's idioms.


2. Architecture overview

                           ┌────────────────────┐
                           │   data.json        │  ← source of truth (data_blank.json shape)
                           └──────────┬─────────┘
                                      │
                     ┌────────────────┴────────────────┐
                     ▼                                 ▼
        ┌──────────────────────┐          ┌─────────────────────────┐
        │  React-PDF pipeline  │          │   docx + docxtpl path   │
        │                      │          │                         │
        │  data → React tree   │          │  build-docx-template.ts │
        │  → PDF buffer        │          │  generates .docx with   │
        │                      │          │  {{ }} placeholders     │
        │                      │          │           ↓             │
        │                      │          │  docxtpl.render(data)   │
        │                      │          │  → filled .docx         │
        └──────────┬───────────┘          └────────────┬────────────┘
                   │                                   │
                   ▼                                   ▼
              filled.pdf                          filled.docx

The PDF path uses runtime composition — a React component receives data as props and returns a tree of <Page>/<View>/<Text> elements. The renderer turns that into a PDF buffer.

The DOCX path uses template-based composition — a build script (build-docx-template.ts) produces a .docx file once, with placeholder strings like {{ mau_01.mo_dau }} baked into the document body. At runtime, docxtpl (Python) or any other Jinja-aware OOXML tool reads that .docx, finds the placeholders, and replaces them with values from the JSON.

Both pipelines read the same TypeScript types and JSON files, so adding a new field requires touching both sides — but the field name lives in exactly one place: src/types.ts.


3. Tech stack and rationale

Concern Choice Why
PDF rendering @react-pdf/renderer v4 Component-based, server- and browser-compatible. Uses Yoga for flexbox layout. Same API as React, so layouts compose like UI code.
Vietnamese font @expo-google-fonts/tinos Tinos is a metric-equivalent of Times New Roman (Apache 2.0) with the full Latin Extended Additional range — needed for ư ơ ầ ậ ọ ặ etc. The @expo-google-fonts/* packages ship actual .ttf files (most other font packages ship .woff/.woff2, which @react-pdf/renderer can't read).
DOCX generation docx v9 (npm) Object-model API: build paragraphs, tables, sections in TypeScript, then Packer.toBuffer() produces a valid .docx. Maintained, typed, stable.
Templating engine docxtpl (Python) The most popular Jinja-style DOCX templater. Recognizes {{ var }}, {% if %}, and crucially {%tr for %} for table-row loops. Compatible templates work in docx-templates (JS) and Carbone too.
TypeScript 5.4 Catches type errors at build time and gives autocompletion across all the data fields.
Test rendering LibreOffice (soffice) Used to convert .docx.pdf so we can visually diff against the reference document.

Why not a pure HTML-to-PDF approach (Puppeteer)? It works, but bundle size is huge and rendering is non-deterministic across machines. React-PDF gives byte-stable output.

Why not just generate the DOCX and convert it to PDF? That would solve the layout-sync problem but couples PDF generation to a heavy toolchain (LibreOffice). React-PDF runs in pure Node.js and works inside serverless environments.


4. Project setup from scratch

mkdir sang-kien-pdf && cd sang-kien-pdf
npm init -y

# Runtime dependencies
npm install @react-pdf/renderer react @expo-google-fonts/tinos docx

# Dev dependencies
npm install -D typescript ts-node @types/react @types/node

Create tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "commonjs",
    "lib": ["ES2020", "DOM"],
    "jsx": "react",
    "outDir": "./dist",
    "rootDir": "./",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "declaration": true,
    "declarationMap": true,
    "sourceMap": true,
    "resolveJsonModule": true,
    "moduleResolution": "node"
  },
  "include": ["src/**/*", "example/**/*", "tools/**/*"],
  "exclude": ["node_modules", "dist"]
}

The jsx: "react" setting matters — React-PDF uses real JSX, not the new transform.

Add scripts to package.json:

{
  "scripts": {
    "build": "tsc",
    "generate": "ts-node example/generate-example.ts",
    "generate:blank": "ts-node example/generate-example.ts --blank",
    "build:docx": "ts-node tools/build-docx-template.ts"
  }
}

5. Implementing the PDF generator

5.1 TypeScript data types

Start with the data shape. Every field in the JSON gets a strict TypeScript interface in src/types.ts. This is the single source of truth — every page component reads it, every change ripples out through the type system.

// src/types.ts
export interface NgayKy {
  ngay: string;
  thang: string;
  nam: string;
}

export interface TrangBia {
  ten_sang_kien: string;
  tac_gia: string;
  don_vi: string;
  thong_tin_lien_he: string;
  nam: string;
}

export interface Mau01ApplyRow {
  tt: string;
  ten_to_chuc: string;
  dia_chi: string;
  linh_vuc: string;
}

export interface Mau01HieuQua {
  loi_ich_kinh_te: string;
  hieu_qua_giang_day: string;
  // … 8 more fields
}

export interface Mau01 {
  mo_dau: string;
  ten_sang_kien: string;
  // …
  danh_sach_ap_dung: Mau01ApplyRow[];
  tinh_hieu_qua: Mau01HieuQua;
  ngay_ky: NgayKy;
  // …
}

// … repeat for Mau02, Mau03, Mau04, BanCamKet

export interface SangKienData {
  trang_bia: TrangBia;
  mau_01: Mau01;
  mau_02: Mau02;
  mau_03: Mau03;
  mau_04: Mau04;
  ban_cam_ket: BanCamKet;
}

Two design choices worth calling out:

All fields are strings (or string arrays). Even numbers like "Tỷ lệ %" are strings. The form is for humans, not databases — values get rendered verbatim, and string-only types let users write "15%" or "khoảng 15" without coercion errors.

Array-shaped tables. danh_sach_tac_gia is Mau02AuthorRow[], not a fixed-size tuple. The page components iterate with .map(), and the DOCX template uses a {%tr for %} loop. Both handle 0, 1, or 100 rows.

5.2 Font registration

@react-pdf/renderer ships with three fonts (Helvetica, Times-Roman, Courier) and none of them include Vietnamese glyphs. If you skip this step, characters like ư ơ ầ ậ will render as blank space.

// src/fonts.ts
import { Font } from "@react-pdf/renderer";

let registered = false;

export function registerFonts(): void {
  if (registered) return;

  const regular = require.resolve(
    "@expo-google-fonts/tinos/400Regular/Tinos_400Regular.ttf"
  );
  const italic = require.resolve(
    "@expo-google-fonts/tinos/400Regular_Italic/Tinos_400Regular_Italic.ttf"
  );
  const bold = require.resolve(
    "@expo-google-fonts/tinos/700Bold/Tinos_700Bold.ttf"
  );
  const boldItalic = require.resolve(
    "@expo-google-fonts/tinos/700Bold_Italic/Tinos_700Bold_Italic.ttf"
  );

  Font.register({
    family: "TimesVN",
    fonts: [
      { src: regular },
      { src: italic, fontStyle: "italic" },
      { src: bold, fontWeight: "bold" },
      { src: boldItalic, fontWeight: "bold", fontStyle: "italic" },
    ],
  });

  Font.registerHyphenationCallback((word) => [word]);
  registered = true;
}

Three things happen here:

  1. require.resolve() finds the TTF on disk — this works in Node and bundlers like Webpack/Vite turn it into an asset URL automatically.
  2. One family, four variantsfontWeight and fontStyle keys let <Text style={{ fontWeight: "bold" }}> resolve to the bold TTF.
  3. Hyphenation callback returns [word] — this disables React-PDF's default English hyphenator, which would chop Vietnamese words at random points.

The registered boolean guards against re-registration if registerFonts() is called from multiple entry points.

5.3 Shared styles

StyleSheet.create() in src/styles.ts defines reusable style objects. Three categories matter:

Page-level constants. A4 with ~2.5 cm margins:

page: {
  fontFamily: FONT,         // "TimesVN"
  fontSize: 13,             // 13pt body
  paddingTop: 71,           // ~2.5cm = 71pt
  paddingBottom: 71,
  paddingLeft: 71,
  paddingRight: 71,
  lineHeight: 1.25,
},

Paragraph variants for the three contexts that come up:

// Indented body text (justified, first-line indent ~1cm)
paragraph: { textAlign: "justify", textIndent: 28, marginBottom: 0 },

// Flush-left lines (section labels, inline list items)
paragraphFlush: { textAlign: "justify", marginBottom: 0 },

// Section headings (flush-left, with breathing room above)
sectionHead: { textAlign: "justify", marginBottom: 0, marginTop: 4 },

The marginBottom: 0 is deliberate — Vietnamese government documents are visually dense, so paragraphs only get spacing between sections, not between adjacent lines.

Component primitives (table, checkbox, signature columns):

table: {
  flexDirection: "column",
  borderWidth: 1, borderColor: "#000",
  borderRightWidth: 0, borderBottomWidth: 0,  // we draw R+B per-cell
  marginVertical: 4,
},
tableCell: {
  borderRightWidth: 1, borderBottomWidth: 1, borderColor: "#000",
  padding: 4,
},

The "outer border drawn on the table, inner borders drawn per-cell" pattern avoids double-thickness lines where cells meet.

Cover-specific styles are isolated in their own group because the cover page has unique requirements (page border via position: absolute, "Mẫu số 01" badge in the top corner).

5.4 Reusable components

src/components.tsx factors out the patterns that show up on multiple pages:

<Checkbox checked={boolean}>label</Checkbox> — a horizontal row with a bordered square. When checked, an inner filled <View> appears inside it. We don't use the Unicode character because Tinos doesn't include it; drawing geometry is font-independent.

export const Checkbox: React.FC<CheckboxProps> = ({ checked, children }) => (
  <View style={styles.checkboxRow}>
    <View style={styles.checkboxBox}>
      {checked ? <View style={styles.checkboxFill} /> : null}
    </View>
    <Text style={styles.checkboxLabel}>{children}</Text>
  </View>
);

Header variants — three different two-column header patterns appear in the document:

  • <TopHeaderBoYTe /> — "BỘ Y TẾ / ĐẠI HỌC Y DƯỢC" left, "CỘNG HÒA…" right (Mẫu 03/04)
  • <TopHeaderDonVi donVi="..." /> — drops "BỘ Y TẾ", shows the unit name in bold (Mẫu 02)
  • <TopHeaderCongHoa /> — only the right column (Bản cam kết)

Each one uses the same flexDirection: "row" layout with two equal columns. The differences are which lines appear.

Table primitives.

<Table columns={[6, 22, 14, 16, 14, 14, 14]}>
  <Row>
    <Cell width={6} header align="center">STT</Cell>
    <Cell width={22} header align="center">Họ  tên</Cell>
    {/* … */}
  </Row>
  {data.danh_sach_tac_gia.map((row, i) => (
    <Row key={i}>
      <Cell width={6} align="center">{row.stt}</Cell>
      <Cell width={22}>{row.ho_ten}</Cell>
      {/* … */}
    </Row>
  ))}
</Table>

The width prop is a percentage (the cell renders with width: ${width}%). Column widths must sum to 100. The Cell component automatically wraps string children in <Text> so callers can pass either plain text or nested elements.

<DateLine ngay thang nam /> renders the recurring "TP. Hồ Chí Minh, ngày … tháng … năm …" line, with sensible blank-data placeholders (.....).

<SignatureBlock title subtitle name> renders one column of a two-column signature block (centered title, italic subtitle, then a 50pt vertical gap before the bold signer's name).

5.5 Page components

Each section of the form gets its own component file in src/pages/. They all follow the same shape:

// src/pages/Mau01.tsx
import { Page, View, Text } from "@react-pdf/renderer";
import { styles } from "../styles";
import { Mau01 } from "../types";
import { Table, Row, Cell, DateLine } from "../components";

interface Props {
  data: Mau01;
  donVi: string;  // pulled from mau_02.don_vi by the parent
}

export const Mau01Page: React.FC<Props> = ({ data, donVi }) => (
  <Page size="A4" style={styles.page}>
    <Text style={styles.centerTitleLarge}>BÁO CÁO  TẢ SÁNG KIẾN</Text>

    <Text style={styles.paragraphFlush}>
      1. Mở đầu{" "}
      <Text style={styles.italic}>
        (Giới thiệu về những vấn đề liên quan đến sáng kiến):
      </Text>
    </Text>
    <Text style={styles.paragraph}>{data.mo_dau}</Text>

    {/* … rest of the page */}
  </Page>
);

Three patterns recur in every page:

  1. Static + dynamic mixed in the same <Text>. Section labels like "1. Mở đầu" are fixed, but the italic instructional helper text and the data value next to them aren't. We use nested <Text> to apply different styles to different runs in one paragraph (because <Text> in React-PDF can contain other <Text> nodes, like <span> in HTML).

  2. {" "} for explicit whitespace. JSX collapses whitespace between elements. To preserve a space between a label and an italic helper, we explicitly insert {" "}.

  3. Default-empty rows for tables. When data.danh_sach_ap_dung is empty, we still want one blank row to render so the printed form has a place to write. The pattern:

    {(data.danh_sach_ap_dung && data.danh_sach_ap_dung.length > 0
      ? data.danh_sach_ap_dung
      : [{ tt: "", ten_to_chuc: "", dia_chi: "", linh_vuc: "" }]
    ).map((row, i) => /* ... */)}
    

Signature block on Mẫu 01 takes donVi as a prop, not from data directly. The reason: the standard layout uses the unit name from Mẫu 02 (mau_02.don_vi) on Mẫu 01's signature line. Rather than duplicate the value in the JSON, the parent component (SangKienDocument) reads it from mau_02 and passes it down.

Cover page is special. It uses absolute positioning to put the page border around the entire content area:

<Page size="A4" style={styles.page}>
  <Text style={styles.formNumberOnCover}>Mẫu số 01</Text>
  <View style={styles.coverBorder} fixed />
  <View style={styles.coverContent}>
    {/* header, title, fields, footer */}
  </View>
</Page>

<View fixed> tells React-PDF to render the border on every page in this section (irrelevant here since the cover is one page, but harmless), and position: absolute (set in styles.coverBorder) makes it overlay the whole page.

5.6 Top-level Document

src/SangKienDocument.tsx composes all six pages:

export const SangKienDocument: React.FC<{ data: SangKienData }> = ({ data }) => {
  registerFonts();
  const donVi = data.mau_02.don_vi || data.trang_bia.don_vi;

  return (
    <Document
      title={data.trang_bia.ten_sang_kien || "Báo cáo mô tả sáng kiến"}
      author={data.trang_bia.tac_gia}
    >
      <CoverPage data={data.trang_bia} />
      <Mau01Page data={data.mau_01} donVi={donVi} />
      <Mau02Page data={data.mau_02} />
      <Mau03Page data={data.mau_03} />
      <Mau04Page data={data.mau_04} />
      <BanCamKetPage data={data.ban_cam_ket} />
    </Document>
  );
};

registerFonts() is idempotent (the internal registered flag guards against duplicate registration), so calling it from the top-level component is safe.

The <Document> element accepts metadata that shows up in the PDF's title bar — title, author, subject, creator, producer, keywords. These don't affect rendering, just file properties.

5.7 Server-side render helper

src/generate.tsx wraps the React rendering in a Node-friendly Promise:

import { pdf } from "@react-pdf/renderer";

export async function renderSangKienPdf(data: SangKienData): Promise<Buffer> {
  const instance = pdf(<SangKienDocument data={data} />);
  const blob = await instance.toBlob();
  const arrayBuffer = await blob.arrayBuffer();
  return Buffer.from(arrayBuffer);
}

export async function renderSangKienPdfFromFile(
  inputJsonPath: string,
  outputPdfPath: string
): Promise<void> {
  const data = JSON.parse(fs.readFileSync(inputJsonPath, "utf-8")) as SangKienData;
  const buffer = await renderSangKienPdf(data);
  fs.mkdirSync(path.dirname(outputPdfPath), { recursive: true });
  fs.writeFileSync(outputPdfPath, buffer);
}

pdf(...).toBlob() is the cleanest async API even on the server — the Buffer.from(await blob.arrayBuffer()) conversion is one line.

example/generate-example.ts is a thin CLI on top:

const useBlank = process.argv.includes("--blank");
const inputPath = useBlank
  ? path.join(__dirname, "data-blank.json")
  : path.join(__dirname, "sample-data.json");
const outputPath = path.join(__dirname, "..", "out", `sang-kien-${useBlank ? "blank" : "filled"}.pdf`);

await renderSangKienPdfFromFile(inputPath, outputPath);

6. Implementing the DOCX template generator

6.1 The Jinja-in-DOCX strategy

docxtpl works by storing Jinja-style strings as ordinary text inside the DOCX, then doing template expansion at render time. The build script's job is to produce a .docx whose visible text reads:

Tên sáng kiến (Tiếng Việt): {{ trang_bia.ten_sang_kien }}

When you open this in Word, you literally see those curly braces. When docxtpl opens it, it walks the OOXML tree, finds runs containing {{ ... }}, and replaces them.

The catch: text runs split across formatting changes. If you write Tên sáng kiến (Tiếng Việt): {{ trang_bia.ten_sang_kien }} in one run, that's fine. But if you bold "Tên sáng kiến" and leave {{ … }} regular, Word stores them as two separate runs. A naive search for {{ in the second run works — but if you split a placeholder inside the curly braces ({{ trang_bia. in one run, ten_sang_kien }} in another), docxtpl will fail silently. So:

Rule: every placeholder must live entirely inside one continuous run with one set of formatting.

The docx library makes this easy — when you write r("{{ mau_01.mo_dau }}"), that's exactly one <w:r> element with one <w:t> inside.

6.2 The 3-row table loop trick

For repeating table rows, docxtpl uses a special syntax: {%tr for item in collection %} and {%tr endfor %}. The tr prefix tells the engine "remove the entire <w:tr> row containing this tag and use the rows between for and endfor as the loop body."

A naive single-row pattern doesn't work:

[ {%tr for x in items %} {{ x.id }} | {{ x.name }} {%tr endfor %} ]

Because {%tr for %} and {%tr endfor %} must be in the same row (they're stripped together) — and Jinja then sees two opening tags with no body.

The reliable pattern is three rows:

Row 1: | {%tr for item in collection %} | (empty cells) |
Row 2: | {{ item.id }} | {{ item.name }} |   ← duplicated per item
Row 3: | {%tr endfor %} | (empty cells) |

Row 1 and Row 3 get stripped. Row 2 gets repeated for each item. The data row carries the actual {{ }} fields.

In code:

const aw = [6, 22, 14, 16, 14, 14, 14];  // column widths

const emptyRow_aw = (firstText: string) => {
  const cells: TableCell[] = [];
  for (let i = 0; i < aw.length; i++) {
    cells.push(new TableCell({
      borders: allThinBorders,
      width: { size: aw[i] * 100, type: WidthType.PERCENTAGE },
      children: [new Paragraph({ children: [r(i === 0 ? firstText : " ")] })],
    }));
  }
  return cells;
};

new Table({
  rows: [
    new TableRow({ children: [/* header cells */] }),
    new TableRow({ children: emptyRow_aw("{%tr for item in mau_02.danh_sach_tac_gia %}") }),
    new TableRow({ children: [
      dataCell("{{ item.stt }}", aw[0], AlignmentType.CENTER),
      dataCell("{{ item.ho_ten }}", aw[1]),
      // … 5 more
    ]}),
    new TableRow({ children: emptyRow_aw("{%tr endfor %}") }),
  ],
});

The emptyRow_aw helper builds a row where the first cell contains the loop tag and the rest are blanks (just " "). After docxtpl strips it, the visible table has one header row plus one data row per item.

6.3 Multi-section layout

Word documents are split into sections, each with its own page settings — margins, orientation, page borders, headers, footers. The cover page needs:

  • A page border (rounded rectangle around the content area)
  • A header containing "Mẫu số 01" at the top right outside the border

The rest of the document needs:

  • No page border
  • No "Mẫu số 01" header (it's only on the cover)

In docx v9, this is two sections in the same document:

new Document({
  sections: [
    {
      properties: {
        page: {
          size: { width: 11906, height: 16838, orientation: PageOrientation.PORTRAIT },
          margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 },
          borders: {
            pageBorderTop:    { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
            pageBorderBottom: { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
            pageBorderLeft:   { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
            pageBorderRight:  { style: BorderStyle.SINGLE, size: 12, color: "000000", space: 24 },
          },
        },
      },
      headers: { default: coverHeader },  // contains "Mẫu số 01"
      children: buildCoverPage(),
    },
    {
      properties: {
        page: { size: {/*…*/}, margin: {/*…*/} /* no borders */ },
      },
      // Explicit empty header so the cover header doesn't leak onto subsequent pages
      headers: { default: new Header({ children: [new Paragraph({ children: [r("")] })] }) },
      children: [
        ...buildMau01(),
        ...buildMau02(),
        ...buildMau03(),
        ...buildMau04(),
        ...buildBanCamKet(),
      ],
    },
  ],
});

Two gotchas worth noting:

Twips, not points. docx uses twips (1/1440 inch). Multiply pt by 20 to get twips:

  • A4 = 11906 × 16838 twips
  • 1 inch margin = 1440 twips
  • 1 cm = 567 twips

Headers leak across sections. If section 2 doesn't define headers, it inherits section 1's. We have to provide an explicit empty Header to prevent the "Mẫu số 01" text from showing up on every page of the document.

6.4 Building paragraphs and tables

The build script defines small helper functions to keep the body code readable:

const FONT = "Times New Roman";
const SIZE = 26;          // 13pt (docx-js uses half-points)
const SIZE_HEADING = 28;  // 14pt

function r(text: string, opts: { bold?: boolean; italic?: boolean; underline?: boolean; size?: number } = {}) {
  return new TextRun({
    text,
    font: FONT,
    size: opts.size ?? SIZE,
    bold: opts.bold,
    italics: opts.italic,
    underline: opts.underline ? { type: UnderlineType.SINGLE } : undefined,
  });
}

function bodyP(children: TextRun[], opts: { indent?: boolean } = {}) {
  return new Paragraph({
    children,
    alignment: AlignmentType.JUSTIFIED,
    indent: opts.indent ? { firstLine: 567 } : undefined,
    spacing: { before: 0, after: 0, line: 300 },
  });
}

function flushP(children: TextRun[], opts: { spaceBefore?: number } = {}) {
  return new Paragraph({
    children,
    alignment: AlignmentType.JUSTIFIED,
    spacing: { before: opts.spaceBefore ?? 0, after: 0, line: 300 },
  });
}

function centerP(children: TextRun[], opts: { spaceBefore?: number; spaceAfter?: number } = {}) {
  return new Paragraph({
    children,
    alignment: AlignmentType.CENTER,
    spacing: { before: opts.spaceBefore ?? 0, after: opts.spaceAfter ?? 0, line: 300 },
  });
}

A typical section then reads naturally:

out.push(centerP([r("BÁO CÁO MÔ TẢ SÁNG KIẾN", { bold: true, size: SIZE_HEADING })]));

out.push(flushP([
  r("1. Mở đầu "),
  r("(Giới thiệu về những vấn đề liên quan…):", { italic: true }),
]));
out.push(bodyP([r("{{ mau_01.mo_dau }}")], { indent: true }));

For checkboxes, since the templating engine has to choose which character to render, we embed the choice in the placeholder itself:

const checkbox = (cond: string, label: string) =>
  flushP([
    r(`{% if ${cond} %}`),
    r("☑"),
    r("{% else %}"),
    r("☐"),
    r("{% endif %} "),
    r(label),
  ]);

out.push(checkbox(
  "mau_02.phan_loai.giai_phap_ky_thuat",
  "Giải pháp kỹ thuật, quản lý, tác nghiệp, ứng dụng tiến bộ kỹ thuật áp dụng cho Đại học Y Dược TP.HCM"
));

After docxtpl runs, this paragraph reduces to ☑ Giải pháp kỹ thuật… or ☐ Giải pháp kỹ thuật… depending on the boolean. (For DOCX rendering in Word, the ☑/☐ characters work fine because Word falls back to a Unicode-capable font automatically — unlike React-PDF.)


7. Layout calibration (matching the standard)

The "Sang_kien_SOP_dong_vat" reference document defines a specific visual style. Here's a checklist of the calibrations applied to both generators:

Aspect Rule Where it lives
Body font Times New Roman (or Tinos) 13pt styles.page.fontSize, r() SIZE = 26
Page margins 2.5 cm all around padding: 71 (PDF), margin: 1440 (DOCX)
Body line height 1.25 lineHeight: 1.25 (PDF), line: 300 (DOCX, 240 = single, 300 ≈ 1.25)
First-line indent ~1 cm on body paragraphs textIndent: 28 (PDF), firstLine: 567 (DOCX)
Section numbers (1., 2., 4.1) NOT bold; italic instructions in parens Use paragraphFlush not bold
Inter-paragraph spacing None within a section, small gap before new section marginBottom: 0, sectionHead.marginTop: 4
Cover page Page border (rounded rect), "Mẫu số 01" outside top-right Cover-specific styles, dedicated section in DOCX
Cover divider =====***===== (literal) Hardcoded string
Cover info fields Left-aligned, bold label, regular value coverField style
Two-column header "ĐƠN VỊ" or "BỘ Y TẾ" left, "CỘNG HÒA" right TopHeaderBoYTe, TopHeaderDonVi, TopHeaderCongHoa
"Độc lập Tự do Hạnh phúc" Underlined, bold underline: true flag in r()/styles
Tables Single thin black border, no shaded header borderWidth: 1, no backgroundColor on tableHeaderCell
Mẫu 02 author table column 7 Header includes parenthetical italic instruction Custom TableCell with two centered paragraphs
Signature block Two columns: "Xác nhận của lãnh đạo / [đơn vị]" left, "Đại diện nhóm tác giả sáng kiến" right <View style={signatureRow}> (PDF), borderless 2-cell table (DOCX)
Mẫu 03 totals row TỔNG (cols 13 merged) ‖ 100 ‖ blank columnSpan: 3 in DOCX, manual width sum in PDF
Mẫu 04 evaluation rubric Two scoring rows + total row at bottom Static text + {{ … }} for nhận xét/điểm

When in doubt about a layout decision, open the reference DOCX in Word, click into the relevant element, and read its formatting from the ribbon. Mirror those settings in code.


8. Verification workflow

Visual diff against the reference is the only reliable way to know you got it right. The flow:

# 1. Generate the candidate PDF
npm run generate

# 2. Convert each page to JPEG
pdftoppm -jpeg -r 100 out/sang-kien-filled.pdf out/page

# 3. Convert the reference DOCX to PDF and JPEGs the same way
soffice --headless --convert-to pdf reference.docx --outdir ref/
pdftoppm -jpeg -r 100 ref/reference.pdf ref/ref-page

# 4. Open them side by side

For the DOCX generator, add one more step:

# Build the template
npm run build:docx

# Render placeholders WITHOUT filling them — does the layout look right?
soffice --headless --convert-to pdf out/template_application_form.docx --outdir out/

# Fill it with sample data and render
python tools/fill-docx.py example/sample-data.json out/sang-kien-filled.docx
soffice --headless --convert-to pdf out/sang-kien-filled.docx --outdir out/

Smoke test the DOCX template in Python before declaring victory:

# tools/test-docx-fill.py
from docxtpl import DocxTemplate
import json

with open("example/sample-data.json", encoding="utf-8") as f:
    data = json.load(f)

doc = DocxTemplate("out/template_application_form.docx")
doc.render(data)
doc.save("out/template-filled-test.docx")

If docxtpl raises TemplateSyntaxError: Encountered unknown tag 'endfor', you've put a {%tr for %} and {%tr endfor %} in the same row instead of separate rows. Go re-read §6.2.

If a {{ field }} doesn't get replaced and you can still see the curly braces in the filled output, the placeholder got split across runs by Word's auto-formatting. Build the placeholder with one r("{{ x }}") call, not three.


9. Common modifications

Adding a new field

Say you need to add mau_01.tong_kinh_phi (total budget).

  1. Update src/types.ts:

    export interface Mau01 {
      // …
      tong_kinh_phi: string;  // new
    }
    
  2. Update example/data-blank.json and example/sample-data.json with the new field.

  3. Render it in src/pages/Mau01.tsx:

    <Text style={styles.paragraphFlush}>
      7. Tổng kinh phí: {data.tong_kinh_phi}
    </Text>
    
  4. Add it to the DOCX template generator in tools/build-docx-template.ts:

    out.push(flushP([r("7. Tổng kinh phí: {{ mau_01.tong_kinh_phi }}")]));
    
  5. Regenerate:

    npm run generate
    npm run build:docx
    

The TypeScript compiler will yell if you forget to update the page component or miss a field in the JSON.

Changing a column width

Column widths are kept as small integer arrays in the page component (PDF) and the build script (DOCX). They must always sum to 100.

To widen the "Họ và tên" column on the Mẫu 02 author table from 22% to 28% (and shrink "Nơi công tác" from 16% to 10%):

In src/pages/Mau02.tsx:

const AUTHOR_WIDTHS = [6, 28, 14, 10, 14, 14, 14] as const;  // was [6, 22, 14, 16, …]

In tools/build-docx-template.ts (inside buildMau02()):

const aw = [6, 28, 14, 10, 14, 14, 14];

Both numbers must match — there's no shared constant because the PDF widths are percentages of the page width (100% sum) while the DOCX widths happen to use the same convention but go through different code paths. Keeping them in sync is a manual discipline.

Adding a new repeating table

Both the data shape, the page component, and the DOCX template need updates:

  1. Type: add Mau01NewRow[] to Mau01, define interface Mau01NewRow { … }.

  2. PDF page: mirror the existing pattern in src/pages/Mau01.tsx:

    <Table columns={[10, 30, 30, 30]}>
      <Row>
        <Cell width={10} header align="center">TT</Cell>
        {/* … */}
      </Row>
      {(data.danh_sach_moi && data.danh_sach_moi.length > 0
        ? data.danh_sach_moi
        : [{ tt: "", ... }]
      ).map((row, i) => (
        <Row key={i}>
          <Cell width={10} align="center">{row.tt}</Cell>
          {/* … */}
        </Row>
      ))}
    </Table>
    
  3. DOCX template: use the 3-row pattern from §6.2:

    const w = [10, 30, 30, 30];
    const emptyRow = (firstText: string) => /* same helper pattern */;
    
    new Table({
      rows: [
        new TableRow({ children: [headerCell("TT", w[0]), /* … */] }),
        new TableRow({ children: emptyRow("{%tr for item in mau_01.danh_sach_moi %}") }),
        new TableRow({ children: [dataCell("{{ item.tt }}", w[0], AlignmentType.CENTER), /* … */] }),
        new TableRow({ children: emptyRow("{%tr endfor %}") }),
      ],
    });
    

Switching to your organization's font

Replace the four TTF paths in src/fonts.ts:

Font.register({
  family: "TimesVN",
  fonts: [
    { src: "/path/to/your/Regular.ttf" },
    { src: "/path/to/your/Italic.ttf", fontStyle: "italic" },
    { src: "/path/to/your/Bold.ttf", fontWeight: "bold" },
    { src: "/path/to/your/BoldItalic.ttf", fontWeight: "bold", fontStyle: "italic" },
  ],
});

For the DOCX side, change const FONT = "Times New Roman" in tools/build-docx-template.ts to whatever font you want to embed. Word will fall back to a system font if the named font isn't installed on the reader's machine, so prefer common names (Times New Roman, Arial, Calibri).


10. Troubleshooting

PDF renders blank squares where Vietnamese characters should be. The font isn't registered or the registered font lacks Vietnamese glyphs. Check that registerFonts() is called and that the TTFs at the resolved paths are actually loaded (not 404 / missing). Tinos has the right glyph coverage; many "Times New Roman clones" don't.

Error: Failed to fetch font from https://… You're hitting @react-pdf/renderer's URL-based font loading and your environment can't reach the URL. Switch to local TTFs via require.resolve() (already what src/fonts.ts does).

docxtpl raises TemplateSyntaxError: Encountered unknown tag 'endfor'. You put the {%tr for %} and {%tr endfor %} tags in the same table row. Re-read §6.2 — they have to be on separate rows.

Some {{ field }} placeholders aren't being replaced. Word split your text run mid-placeholder. Make sure each placeholder is constructed with a single r("{{ x }}") call, not split across multiple r() calls or assembled from concatenated strings.

The DOCX has "Mẫu số 01" appearing on every page, not just the cover. The cover-section header is leaking into the next section. Add an explicit empty header to the second section:

headers: { default: new Header({ children: [new Paragraph({ children: [r("")] })] }) },

Tables overflow the right margin. Column width percentages don't sum to exactly 100, or a single cell has too much wide content with no wrap point. Either fix the widths or add wordBreak: "break-word" to the cell style.

textIndent doesn't seem to work in <Text>. React-PDF's textIndent only takes effect when the <Text> itself has display: "block"-like behavior — i.e. it's a top-level paragraph, not nested inside another <Text>. If you're nesting, wrap the inner content in a parent <Text> that has the indent style.

The DOCX page border doesn't appear. Page borders are a Word feature configured in section properties. Check that you've set all four (pageBorderTop/Bottom/Left/Right), with non-zero size and a space value (24 puts them ~1.7cm from the edge in our setup). LibreOffice and Word may render them slightly differently — Word is the canonical view.

Filled DOCX has weird extra empty rows above each table. Those are the {%tr for %}/{%tr endfor %} rows that didn't get stripped — meaning the loop tags ended up in paragraphs inside a cell, not as standalone row text. Make sure the firstText in your emptyRow_*() helper is the entire cell content, not appended to other text.


11. Porting to a different form

The same pattern works for any structured government form. The migration steps:

  1. Extract the data model. Open the reference DOCX, list every blank line and every table column. Each becomes a field in types.ts. Repeating sections (lists of authors, lists of attachments) become arrays.

  2. Identify the sections. Most forms have a cover page plus N body sections. Each body section becomes a <Page> component plus a buildSectionN() function in the DOCX builder.

  3. Catalog the visual primitives. Headers, signature blocks, tables, checkboxes, date lines — write them once in components.tsx (PDF) and as helper functions (DOCX), then reuse.

  4. Calibrate the styles. Open the reference, measure margins, font, line spacing, and indent. Set them as constants. See §7.

  5. Render and diff. Generate, convert to JPEG, line up against the reference. Iterate until they match.

  6. Smoke-test the DOCX template with docxtpl. If a placeholder doesn't fill, it's almost always run-splitting — fix by collapsing into one r() call.

The most labor-intensive part is the visual calibration (step 45). Everything else is mechanical translation from "what the form looks like" to "code that produces the same thing."


Appendix: file-by-file inventory

File Lines Purpose
src/types.ts 177 TypeScript interfaces matching data_blank.json
src/fonts.ts 56 Tinos font registration
src/styles.ts 239 Shared StyleSheet.create() styles
src/components.tsx 156 Reusable <Checkbox>, <Table>, <DateLine>, header variants
src/pages/CoverPage.tsx 64 Trang bìa with page border
src/pages/Mau01.tsx 172 Báo cáo mô tả sáng kiến
src/pages/Mau02.tsx 206 Đơn đề nghị công nhận sáng kiến
src/pages/Mau03.tsx 82 Bản xác nhận tỷ lệ đóng góp
src/pages/Mau04.tsx 94 Phiếu đánh giá sáng kiến
src/pages/BanCamKet.tsx 119 Bản cam kết
src/SangKienDocument.tsx 43 Top-level <Document> composing all pages
src/generate.tsx 37 renderSangKienPdf(data) server-side helper
src/index.ts 5 Public API barrel
tools/build-docx-template.ts 1301 Generates the Jinja-style DOCX template
tools/fill-docx.py ~30 CLI to fill a template with JSON data via docxtpl
tools/test-docx-fill.py ~25 Smoke test script
example/generate-example.ts ~35 CLI for the PDF pipeline
example/sample-data.json Realistic filled-in example
example/data-blank.json All-empty template instance

Total: about 2750 lines of TypeScript + ~50 lines of Python. The DOCX generator is the largest single file because every static line of body text is a out.push(flushP([r("…")])) call, but the pattern is repetitive and easy to skim.