Acknowledge
There are critical comments about the PDF.js project, they say it should be easier to integrate the default viewer. I understand them somehow as a developer who want to show a PDF quickly, but knowing the primary goal of an OSS project can help us to understand the trade-off decisions made by the maintainers.
All these are what I learned from the PDF.js project. It may not be accurate, but I hope it can help you to understand the PDF.js better.
Table of Contents
Background
PDF.js is an OSS project supported by Mozilla and developed using HTML5, It's goal is to create a general-purpose, web standards-based platform for rendering PDFs in the Firefox browser. Many people find out that it's hard to integrate it into project, it's somehow intentionally.
It's not developed as a component or library you can easily integrate like most npm packages, because it's primary goal is to be used easily in Firefox browser, the goal doesn't match most developers expectations. It's the trade-off decision made by the maintainers, we should understand it.
Issues talked about why it's not easy to integrate
Introduction
PDF.js use web worker for better rendering performance.
A typical web application only has one bundle result, but PDF.js has at least 4 bundles, they are main, worker, sandbox and web bundles.
// From https://github.com/mozilla/pdf.js/blob/master/gulpfile.mjs#L1001
function buildGeneric(defines, dir) {
rimraf.sync(dir);
return merge([
createMainBundle(defines).pipe(gulp.dest(dir + "build")),
createWorkerBundle(defines).pipe(gulp.dest(dir + "build")),
createSandboxBundle(defines).pipe(gulp.dest(dir + "build")),
createWebBundle(defines, {
defaultPreferencesDir: defines.SKIP_BABEL
? "generic/"
: "generic-legacy/",
}).pipe(gulp.dest(dir + "web")),
// ...
]);
}
The default viewer uses the web bundle directly, it depends on the main and worker bundles. Keep in mind you must load the main bundle before using the default viewer. Each time a PDF document is opened using the open method, it will create a new worker to render the PDF document.
The main bundle is built from src/pdf.js
, it's the entry of the main bundle. The worker bundle is built from src/pdf.worker.js
, it's the entry of the worker bundle.
The src/web
directory contains the source code of the default viewer, all modules depend on the main bundle have to import from the pdfjs-lib
package, it will be resolved to web/pdfjs.js
using the resolve.alias
option of webpack when building.
// web/pdfjs.js
// https://github.com/mozilla/pdf.js/blob/master/web/pdfjs.js
if ((typeof PDFJSDev === "undefined" || PDFJSDev.test("GENERIC")) && !globalThis.pdfjsLib) {
await globalThis.pdfjsLibPromise;
}
const {
AbortException,
// ...
} = globalThis.pdfjsLib;
export {
AbortException,
// ...
};
As we can see from the code above, the web bundle must load the main bundle first. You can import from the result of globalThis.pdfjsLibPromise
only when the main bundle promise is resolved.
// An example module in web/alt_text_manager.js depends main bundle.
//
// From https://github.com/mozilla/pdf.js/blob/master/web/alt_text_manager.js
import { DOMSVGFactory, shadow } from "pdfjs-lib";
class AltTextManager {
// ...
}
PDFViewerApplication
The global PDFViewerApplication
object is the entry of the default viewer of PDF.js, it glues all the modules together, and provides the API for the default viewer.
AppOptions
There are dozens of options in PDF.js, and they all belong to four kinds for now. You may wonder why there are so many options, and what they mean at the first time. This document will help you to understand them.
Let's crack on them one by one!
Option Kinds
- VIEWER
- API
- WORKER
- PREFERENCE
Important options
defaultUrl
locale
workerSrc
defaultUrl
- Type
URL | string | Uint8Array
The url of the PDF file. If you got CORS issue when loading a PDF file from a different origin, see details at origin match error section in common pitfalls page .
locale
- Type
string
- Default
en-US
The locale of the viewer, it easy to switch a different locale by setting this option. See all supported locales in folder l10n.
workerSrc
The url of the PDF.js web worker bundle.
The PDF.js project use web worker to speed up the rendering process, it means there must be some code creating and initializing the web worker using the workerSrc
. It configured with default value, but it always depends on how you deploy it. Make sure you can download the worker bundle from the workerSrc
url.
Other options
textLayerMode
- Type
integer
- Default
1
- Values
0
,1
,2
- Source
web/ui_utils/TextLayerMode
The textLayerMode
of text layer.
Important Events
documentinit
Emitted after called the setInitialView
method to show the initial view successfully. After the event is emitted, the viewer will be ready to use.
documenterror
Emitted after error occurred when rendering document.
pagerendered
Emitted after each page is rendered successfully.