Back to blog index
Library notes

How pdf-lib and PDF.js split responsibility for in-browser PDF work

NoSend Tools' PDF suite is built on two libraries with very different jobs: pdf-lib for creation and editing, PDF.js for rendering and parsing. Here is why the split exists, what each library handles, why PDF structure makes this harder than it looks, and why keeping the whole pipeline inside the browser matters for documents that frequently contain personal or legal data.

pdf-lib handles the write side

pdf-lib (https://pdf-lib.js.org) is a pure JavaScript / TypeScript library for creating and modifying PDF files. Because it ships no native extensions and requires no WASM, it runs identically in Node.js and in the browser with a plain import. NoSend Tools uses pdf-lib as the main engine for pdf-merge (combining multiple PDFs), pdf-split (splitting by page range), pdf-extract-pages (pulling out specific pages), pdf-rotate (rotating pages), and pdf-meta-strip (removing metadata).

The core pattern is three steps: load with `PDFDocument.load(arrayBuffer)`, move pages between documents with `copyPages(srcDoc, [0, 1, 2])`, and serialize the result with `save()`, which returns a Uint8Array. Because each page is a first-class object you can reorder and copy individually, operations like "extract only odd pages" or "interleave two PDFs" map naturally onto the API. pdf-lib maintains XRef table consistency and object-reference integrity internally, which makes it difficult to accidentally write a structurally broken output file.

PDF.js handles the read and render side

PDF.js (https://mozilla.github.io/pdf.js/) is Mozilla's browser-native PDF rendering engine — the same code that has powered Firefox's built-in viewer for years. It excels at text extraction, annotation inspection, and rasterizing pages to a Canvas. NoSend Tools uses PDF.js in pdf-to-jpg and pdf-to-png (converting PDF pages to raster images) and in pdf-extract-text.

PDF.js ships a Worker thread for rasterization so the main UI thread stays responsive during heavy rendering. The render call is `page.render({ canvasContext, viewport })` where adjusting the viewport scale produces any output resolution you need. Where pdf-lib rearranges page objects, PDF.js turns the visual appearance of a page into pixels — complementary strengths rather than overlapping ones. Using both libraries instead of picking one covers the full surface of PDF work without either library needing to do something it was not designed for.

Why PDFs are structurally tricky: incremental updates and object streams

One reason PDF handling is harder than it looks is the incremental update mechanism. Rather than rewriting the file in place, PDF appends changes to the end of the existing byte sequence. Each update adds a new cross-reference (XRef) section that points to revised objects; reading the final state requires walking backwards through the XRef chain from the end of the file. For signed PDFs or forms with filled values, failing to process the incremental update chain correctly can drop content or silently invalidate digital signatures.

PDF 1.5 introduced object streams, which bundle multiple indirect objects into a single compressed stream to reduce file size. This adds another parsing layer: a reader must decompress the stream and then locate individual objects inside it before it can resolve ordinary object references. pdf-lib handles both of these internally, but badly malformed files — incorrect stream lengths, inconsistent cross-references — can cause it to throw. Each NoSend Tools PDF tool catches those exceptions and surfaces a clear error message rather than producing a silently broken output.

Why decryption needs WASM-compiled qpdf, and why doing it in-browser matters

pdf-lib deliberately does not support decrypting password-protected PDFs. Correctly implementing the full range of PDF encryption schemes — RC4 40-bit, RC4 128-bit, AES-128, AES-256 — is non-trivial, and the library's authors chose to leave it out of scope. For pdf-unlock, NoSend Tools uses a WASM build of qpdf, the C++ command-line PDF transformation tool compiled to WebAssembly via Emscripten. qpdf has over twenty years of history and handles every PDF encryption variant in the specification. Running inside the browser's sandbox, the decryption happens entirely in local memory — neither the password nor the encrypted file bytes travel over the network.

The sensitivity argument for keeping PDF work in the browser is straightforward: the files people want to unlock, merge, or extract pages from tend to be exactly the documents they would least want to hand to a third-party server. Contracts, medical records, tax filings, deeds — documents where even a temporary copy on an operator's infrastructure represents an unacceptable risk. With pdf-lib, PDF.js, and WASM qpdf all called from client-side code, the Network tab in DevTools records zero outgoing requests carrying your file content. The claim "your data stays in your browser" is verifiable at runtime by any user, not just by the operator.