` elements. The implementation throws an explicit error rather than producing an empty PDF. **Images with restrictive CORS.** With `useBase64URL: true`, `docx-preview` inlines embedded images as data URLs and CORS does not apply. If the option is changed to `false`, externally hosted images will taint the canvas and cause `toDataURL` to throw a `SecurityError`. Do not change this option. **Very large documents.** Documents with more than ~50 pages may exhaust memory at `scale: 2` because each captured canvas is held in memory before being added to the PDF. For documents this large, the implementation should release each canvas (by setting its reference to null) immediately after `addImage` returns, and consider lowering `renderScale` to 1.5 when page count exceeds a threshold. **Mixed page orientations.** Documents that switch from portrait to landscape mid-flow are handled by the per-page dimension calculation in Section 6.4. Do not assume all pages share the first page's dimensions. **Rapid file changes.** If the user drops a second file while the first is still converting, the in-flight conversion must be cancelled or its results discarded. The simplest approach is to track an incrementing conversion ID; results from a non-current ID are ignored on completion. This is not strictly required for correctness — the second call will overwrite the first — but it prevents stale progress updates from confusing the status display. ## 9. Performance Considerations For a typical 5-page A4 document, end-to-end conversion on mid-range 2024 hardware takes 1.5–3 seconds. The dominant cost is `html2canvas` capture, which scales roughly linearly with page count and quadratically with `renderScale`. The `docx-preview` rendering stage typically takes 100–300 ms regardless of page count. PDF assembly is negligible. Memory peaks during the capture loop, holding one canvas worth of pixels per page until added to the PDF. At `scale: 2` with US Letter pages, a single canvas is approximately 8 MB of RGBA data. A 20-page document briefly holds ~160 MB before garbage collection. Output PDF file sizes for a 5-page document at default settings are approximately 1.5–3 MB. Lowering `imageQuality` from 0.95 to 0.85 typically reduces output by 30% with no visible degradation; lowering below 0.80 introduces visible JPEG artifacts on text edges. ## 10. Browser Support The component targets the current and one prior major version of Chrome, Edge, Firefox, and Safari. Internet Explorer is not supported. The relevant browser features are: - `File` and `FileReader` APIs (universal since 2014) - `Blob` and `URL.createObjectURL` (universal since 2014) - Canvas `toDataURL` with JPEG support (universal since 2012) - ES2020 syntax targets in `tsconfig.json` `html2canvas` has known limitations rendering certain CSS features — `mix-blend-mode`, `backdrop-filter`, complex `clip-path` — that may affect documents using heavy graphical design. For Word documents this is rarely relevant; standard business documents do not invoke these features. ## 11. Testing Implementations should be verified against the following test corpus: | Test document | Asserts | |---|---| | Plain prose, 3 pages, A4 | Basic flow; page count and dimensions match | | Document with one table per page | Tables render with borders and cell shading | | Mixed portrait and landscape sections | Each PDF page matches its source orientation | | Document with embedded PNG and JPEG images | Images appear in correct positions | | Vietnamese-language document with diacritics | All characters render; no missing glyphs | | Document with header and footer including page numbers | Headers/footers appear on every page | | Document with bulleted and numbered lists | List markers render with correct indentation | | 30-page document | Memory does not exceed 500 MB during capture | | Corrupted .docx (truncated zip) | Component shows error and remains usable | Beyond visual diffing of the rendered preview against the source `.docx` opened in Word, the captured PDF should be opened in a separate PDF reader (Acrobat, Preview, or Firefox's built-in viewer) to confirm that page dimensions, count, and rendered content match. Programmatic visual regression testing of the PDF output is beyond the scope of this spec but can be implemented using `pdf-parse` + `pixelmatch` if needed. ## 12. Known Limitations and Alternatives The text in the output PDF is rasterised and therefore not selectable, searchable, copyable, or screen-readable. Users who need any of these properties — particularly accessibility for visually impaired users — must use a server-side converter that emits real PDF text objects. Recommended alternatives in decreasing order of fidelity and increasing order of cost: 1. **LibreOffice headless** (`soffice --convert-to pdf`): free, self-hosted, very high fidelity, requires Linux server with LibreOffice installed. ~1–3 seconds per document. 2. **Aspose.Words Cloud or self-hosted**: paid, very high fidelity, native PDF text output, requires license. 3. **CloudConvert, ConvertAPI, or similar SaaS**: paid per-document, simple HTTP API, sends document contents to a third party. The HTML preview produced by `docx-preview` *is* accessible — screen readers can navigate it, text is selectable, and users can zoom — so the component's accessibility story is intact for users who don't need the PDF artifact itself. This component cannot edit, sign, redact, or annotate documents. For those features, evaluate `pdf-lib` (PDF mutation) or `docx` (DOCX generation, which is a different package than `docx-preview`). ## 13. Appendix: Algorithm Pseudocode For reference, the complete conversion algorithm in 20 lines: ``` function convert(file, container): clear container await renderAsync(file, container, { inWrapper: true, breakPages: true, useBase64URL: true, experimental: true, renderHeaders: true, renderFooters: true, renderFootnotes: true, }) await rAF; await sleep(50) pages = container.querySelectorAll("section.docx") || container.querySelectorAll("section") if pages is empty: throw pdf = new jsPDF using pages[0] dimensions in mm for each page in pages: canvas = await html2canvas(page, scale=2, useCORS=true, bg=white) if not first page: pdf.addPage(page dimensions) pdf.addImage(canvas.toDataURL("image/jpeg", 0.95), 0, 0, w_mm, h_mm) return pdf.output("blob") ``` The pseudocode omits error handling, lifecycle management, and progress reporting, all of which are required in the production implementation per Sections 6.6 and 8. --- *End of specification.*