Puppeteer is the default way to make a PDF from HTML in Node. It drives a real Chromium browser, so whatever renders in Chrome renders in your PDF. CSS Grid, Flexbox, web fonts, JavaScript. All of it works.

The basic call is three lines. You launch a browser, load some HTML, call page.pdf(). Done. You can ship that in an afternoon.

Production is where Puppeteer PDF generation gets interesting. The blank pages. The missing backgrounds. The memory that creeps up until your container gets OOM-killed at 2am. The Lambda deploy that fails because Chromium does not fit. This guide is the full path: from the three-line call to a setup that survives real traffic. Every section has working code you can paste.

1. Setup and the basic call

Install Puppeteer. The full package downloads its own Chromium build on npm install, so the first install is slow and heavy.

install.sh

npm install puppeteer

Now the smallest thing that works. Launch a browser, open a page, set the HTML, print to PDF, close the browser.

basic.js

import puppeteer from 'puppeteer'

async function htmlToPdf(html) {

const browser = await puppeteer.launch()

try {

const page = await browser.newPage()

await page.setContent(html, { waitUntil: 'networkidle0' })

const pdf = await page.pdf({ format: 'A4', printBackground: true })

return pdf // a Buffer (Uint8Array)

} finally {

await browser.close()

}

}

page.pdf() returns a Buffer. Write it to disk, stream it as an HTTP response, or push it to object storage. Whatever you do, always close the browser. Wrap the work in try/finally so a thrown error does not leave a Chromium process running forever. Leaked browsers are the single most common cause of "why is my server out of memory."

setContent vs goto

Two ways to get content onto the page. Use setContent(html) when you have an HTML string, like a server-rendered invoice. Use page.goto(url) when you want to render a live URL. They take the same waitUntil options. More on waiting in section 3, because that one option decides whether your PDF is complete or half-drawn.

2. Every page.pdf() option that matters

This is where you control the output. Most blank-looking or wrong-sized PDFs come from getting these wrong. Here is a fully specced call with the options you will actually reach for.

options.js

const pdf = await page.pdf({

format: 'A4', // or Letter, Legal, A3... ignored if width/height set

// width: '210mm', height: '297mm', // explicit size overrides format

printBackground: true, // the #1 reason backgrounds vanish

landscape: false,

scale: 1, // 0.1 to 2; shrink to fit more on a page

margin: { top: '20mm', bottom: '20mm', left: '15mm', right: '15mm' },

pageRanges: '', // e.g. '1-3, 5' to print a subset

preferCSSPageSize: false, // true: let CSS @page size win over format

displayHeaderFooter: false

})

The ones that bite people:

  • printBackground. It defaults to false. Your colored headers, table stripes, and dark sections all disappear because Chrome strips backgrounds for print by default, exactly like the browser print dialog does. Set it to true and they come back. This is the number one Puppeteer PDF surprise.
  • format vs width/height. Pick one. If you set width and height, format is ignored. Mixing them and wondering why the page is the wrong size is a classic.
  • margin. Margins here are the printable border. If you also use headers and footers, the margins must leave room for them or the header overlaps your content. See section 4.
  • scale. Useful when a wide table runs off the edge. Drop to 0.8 and it fits. Below about 0.5 text gets unreadable.
  • preferCSSPageSize. Off by default, so format drives the size. Turn it on when you want your CSS @page { size: ... } rule to control the page instead. Do not set both a format and a conflicting @page size and expect a sane result.

3. setContent vs goto, and waiting for content

Your first PDF will probably be blank or half-loaded. The cause is almost always timing. You called page.pdf() before the page finished rendering. Fonts had not loaded. Images were still fetching. A chart had not drawn yet.

Fix it by waiting for the right signal, not by sprinkling arbitrary setTimeout calls.

waitUntil

Both setContent and goto accept waitUntil. For PDFs you almost always want 'networkidle0', which resolves when there have been no network connections for 500ms. That covers fonts, images, and async data. 'load' fires too early for anything with remote resources.

waiting.js

await page.setContent(html, { waitUntil: 'networkidle0' })

// wait for a specific element to exist

await page.waitForSelector('#chart svg', { timeout: 10000 })

// wait for all web fonts to finish loading

await page.evaluate(() => document.fonts.ready)

// wait for every image to be decoded, not just requested

await page.evaluate(async () => {

const imgs = Array.from(document.images)

await Promise.all(imgs.map(img => img.complete ? 1 : img.decode().catch(() => 1)))

})

const pdf = await page.pdf({ format: 'A4', printBackground: true })

The order matters. Set content first. Then wait for the specific things you know must be present: a selector, fonts, images. Only then print. If you render charts with a client-side library, wait for the selector that the library injects, not just network idle, because the draw can happen after the last request resolves.

One trap to call out: networkidle0 can hang forever if your page keeps a connection open. Long-polling, a websocket, an analytics beacon that retries. The wait never resolves because the network never goes quiet. If that is your page, switch to networkidle2, which allows up to two open connections, or drop waitUntil entirely and wait for an explicit signal instead: a selector, or a flag your code sets on window when rendering is truly done. Explicit beats implicit here. Never reach for a fixed setTimeout(3000) as the fix. It is slow when the page is ready early and flaky when the page is slow, which is the worst of both.

If you are weighing this against other rendering engines, the broader HTML to PDF in Node.js guide compares Puppeteer to wkhtmltopdf, jsPDF, pdf-lib, and a managed API side by side.

4. Headers and footers

You want a page number on every page, or a logo at the top, or a date in the corner. Puppeteer does this with template HTML, but the API has sharp edges.

Set displayHeaderFooter: true, then pass headerTemplate and footerTemplate as HTML strings. Inside those strings, special classes get replaced with live values: pageNumber, totalPages, date, title, and url.

header-footer.js

const pdf = await page.pdf({

format: 'A4',

printBackground: true,

displayHeaderFooter: true,

// leave room: top/bottom margin must exceed the template height

margin: { top: '25mm', bottom: '20mm', left: '15mm', right: '15mm' },

headerTemplate: `

  <div style="font-size:9px; width:100%; padding:0 15mm; color:#666;">

    <span class="title"></span>

  </div>`,

footerTemplate: `

  <div style="font-size:9px; width:100%; padding:0 15mm; color:#666;

    display:flex; justify-content:space-between;">

    <span class="date"></span>

    <span>Page <span class="pageNumber"></span> of <span class="totalPages"></span></span>

  </div>`

})

The gotchas, all of which have cost someone an hour:

  • Default font size is tiny. If you do not set font-size inline, the template renders at about 6px and looks broken. Always set it explicitly, like font-size:9px.
  • Margins must leave room. The header and footer draw inside the top and bottom margins. If your margin is smaller than the template, the header overlaps your content or gets clipped. Bump the margin until it fits.
  • Styles must be inline. The templates render in an isolated context. They do not see your page CSS or web fonts. Put every style inline in the template HTML.
  • Empty by default. If you turn on displayHeaderFooter but pass no templates, you get default Chrome ones (url and date) that probably are not what you want. Pass your own, even an empty <div></div> if you want one side blank.

5. Page breaks and print CSS

Multi-page documents need control over where pages split. You do not want a table row sliced in half or a heading stranded at the bottom of a page. CSS handles this, and Puppeteer respects it.

page-breaks.css

/* force a break before each new section */

.section { break-before: page; }

/* never split these across pages */

tr, .card, figure { break-inside: avoid; }

/* keep a heading with the content that follows it */

h2, h3 { break-after: avoid; }

/* repeat table headers on every page */

thead { display: table-header-group; }

/* define the page size in CSS (pairs with preferCSSPageSize) */

@page { size: A4; margin: 20mm; }

A few rules to get this consistent:

  • Use the modern properties. break-before, break-after, and break-inside are the standard names. The old page-break-* aliases still work in Chromium, but use the new ones.
  • Emulate print media. Puppeteer defaults to screen media for page.pdf() on newer versions, but be explicit. Call await page.emulateMediaType('print') before printing so your @media print blocks apply and the output matches the browser print preview.
  • preferCSSPageSize and @page. If you define @page { size: ... } in CSS, set preferCSSPageSize: true in the page.pdf() call so that rule wins. Otherwise the format option overrides it and your CSS size is silently ignored.
  • table-header-group repeats headers. Set thead { display: table-header-group } and a long table reprints its header row on every page. Small thing, big quality bump on reports.

6. Web fonts and images

Custom fonts and lazy images are the two things most likely to be missing from your first PDF. The page captured before they arrived.

Fonts

If you load fonts from Google Fonts or a CDN, the request must finish and the font must be applied before you print. networkidle0 covers the request. document.fonts.ready covers the apply step. Use both. If you can, embed the font as a base64 @font-face data URI in the HTML so there is no network request to wait on at all. That is the most reliable option for serverless, where outbound font fetches can be slow or blocked.

Lazy images

Images with loading="lazy" never load if they are below the fold, because in headless print there is no scrolling. Either remove the lazy attribute for the PDF render, or scroll the page to force them in.

fonts-images.js

// force lazy images to load by disabling lazy loading

await page.evaluate(() => {

document.querySelectorAll('img[loading="lazy"]')

.forEach(img => { img.loading = 'eager' })

})

// then wait for fonts and decoded images before printing

await page.evaluate(() => document.fonts.ready)

Want to skip the Chromium ops entirely?

PDFBase runs warm Chromium on dedicated infra. Same rendering, none of the pooling, cold starts, or zombie processes. You send HTML, you get a PDF.

7. Performance and production

Here is the honest part. The three-line call does not survive production. Launching a fresh Chromium for every request will melt your server under any real load. This section is real ops work, and there is no way around it if you self-host.

Reuse the browser

Launching Chromium takes 1 to 3 seconds and burns memory. Launch once at startup, keep the browser alive, and open a fresh page per request. A new page is cheap. A new browser is not.

browser-pool.js

import puppeteer from 'puppeteer'

let browserPromise

function getBrowser() {

if (!browserPromise) {

browserPromise = puppeteer.launch({

headless: true,

args: ['--no-sandbox', '--disable-dev-shm-usage']

})

}

return browserPromise

}

export async function renderPdf(html) {

const browser = await getBrowser()

const page = await browser.newPage()

try {

await page.setContent(html, { waitUntil: 'networkidle0', timeout: 15000 })

return await page.pdf({ format: 'A4', printBackground: true })

} finally {

await page.close() // close the page, keep the browser

}

}

That singleton is the floor. A real production setup adds more:

  • Concurrency limits. Each open page uses memory. Ten concurrent renders of a heavy report can blow your RAM budget. Put a queue in front, like p-limit, and cap concurrent pages to a number you have load-tested. Reject or queue past that.
  • Always close pages. Close the page in finally, every time. An unclosed page is a tab that never goes away and leaks memory until the browser dies.
  • Timeouts everywhere. Set a timeout on setContent and a hard ceiling on the whole render. A hung page should fail fast, not hang the queue behind it.
  • Recycle the browser. Long-lived Chromium leaks over time. Restart it every N renders or on a timer. Watch memory and crashes, and relaunch when it gets unhealthy.
  • Zombie cleanup. Crashed renders can leave orphaned Chromium processes. On a crash, kill the process tree and relaunch. Run with --no-sandbox and --disable-dev-shm-usage in containers, where /dev/shm is tiny and Chromium otherwise crashes mid-render.

None of this is hard in isolation. Together it is a small service you now own, monitor, and page yourself for. Budget for it.

A note on output and memory

Stream the result when you can. page.pdf() buffers the whole document in memory before returning. For a few pages that is nothing. For a 200-page report with images, that Buffer is large, and holding several of them at once across concurrent requests is how you spike memory. Pass a path option to write straight to disk, or pipe the Buffer to your storage upload immediately and let it be garbage collected. Do not stash PDFs in an in-memory cache keyed by request unless you have measured it. The classic OOM is not one giant render, it is ten medium ones landing at the same second.

8. Serverless: AWS Lambda and Vercel

Serverless seems perfect for PDF generation. Bursty, stateless, pay-per-use. Then you try to deploy and hit a wall. The full Chromium that Puppeteer downloads is around 400MB and does not fit in a Lambda deployment package or most serverless function size limits.

The fix is a stripped-down Chromium built for serverless plus the lightweight puppeteer-core, which ships no browser of its own.

lambda.js

// npm install puppeteer-core @sparticuz/chromium

import chromium from '@sparticuz/chromium'

import puppeteer from 'puppeteer-core'

export async function handler(event) {

const browser = await puppeteer.launch({

args: chromium.args,

executablePath: await chromium.executablePath(),

headless: true

})

try {

const page = await browser.newPage()

await page.setContent(event.html, { waitUntil: 'networkidle0' })

const pdf = await page.pdf({ format: 'A4', printBackground: true })

return { statusCode: 200, body: pdf.toString('base64'), isBase64Encoded: true }

} finally {

await browser.close()

}

}

What you sign up for going serverless:

  • Size constraints. Even @sparticuz/chromium is large. On Lambda you usually ship it as a layer, or move to a container image to clear the unzipped size limit. On Vercel, Chromium plus your code can brush against the function bundle ceiling. Check the limits before you commit.
  • Cold starts. A cold Lambda has to load and launch Chromium. That is several seconds added to the first request after a scale-up or idle period. For interactive PDF generation, that latency shows.
  • No browser reuse across invocations. A new container means a new launch. The pooling trick from section 7 only helps within a warm container, so you pay the launch cost more often than on a long-lived server.
  • Memory and time. Give the function generous memory (1536MB or more) and a long timeout. Chromium is hungry and a heavy render is not fast.

9. Common failure modes

The same handful of problems account for most Puppeteer PDF support threads. Here they are with the cause and the fix, tight.

  • Blank page. You printed before the content rendered. Use waitUntil: 'networkidle0', wait for a selector, and await document.fonts.ready before page.pdf().
  • Missing backgrounds and colors. printBackground defaults to false. Set it to true.
  • Missing CSS or fonts. External stylesheets or fonts had not loaded, or the header/footer template does not inherit page CSS. Wait for network idle and fonts ready. Inline styles in header/footer templates. Embed fonts as data URIs when you can.
  • Huge memory and OOM kills. Launch-per-request or leaked pages and browsers. Reuse one browser, close every page in finally, cap concurrency, recycle the browser periodically.
  • Timeouts under load. Too many concurrent renders, no queue, no per-render timeout. Add a concurrency limit and hard timeouts so one slow render does not stall the rest.
  • Crashes in Docker. Tiny /dev/shm and sandbox issues. Launch with --no-sandbox and --disable-dev-shm-usage.

When to stop self-hosting and use an API

Puppeteer is the right call for plenty of cases. Low volume, simple layouts, a side feature behind an export button. Run it yourself and move on.

The math shifts once volume and reliability matter. Sections 7 and 8 are not a one-time setup. They are an ongoing service: pooling, concurrency caps, browser recycling, zombie cleanup, Lambda layers, cold starts, memory dashboards, and the pager that goes off when Chromium wedges under load. Add up the engineering hours against a per-PDF cost. At even modest volume, an hour of your time spent on browser ops costs more than a lot of PDFs.

That is the case for a managed API. PDFBase runs warm Chromium on dedicated infrastructure with pinned browser versions, so you skip everything in sections 7 and 8 entirely. No pooling. No cold starts. No zombie processes. You send HTML, you get a PDF, with the same real Chromium rendering you already trust.

pdfbase-example.js

import PDFBase from 'pdfbase'

const client = new PDFBase('pk_live_...')

const pdf = await client.pdfs.create({

html: invoiceHtml,

format: 'A4',

printBackground: true,

output: 'url'

})

console.log(pdf.data.url) // signed URL, 24h expiry

Same options you already know: format, margin, printBackground, headers and footers. No browser to babysit. You can also pass a url instead of raw HTML to render a live page, or use output: 'buffer' for the raw bytes.

For the full option set, watermarks, merging, and batch processing, see the PDFBase docs. Or kick the tires first with the free HTML to PDF tool, no code required.

Wrapping up

Puppeteer PDF generation is three lines to start and a real service to run well. The basics are easy. The hard part is everything after the demo: waiting for content, getting backgrounds and fonts to show, controlling page breaks, and keeping Chromium alive and lean under load.

If you are at low volume with simple documents, self-host Puppeteer and enjoy the control. If you are heading toward production volume and do not want to own a browser farm, an API is the cheaper path once you count your own time.

Want to try the managed route? Grab 100 free credits, no credit card. The docs cover generation, templates, watermarks, batch jobs, and the MCP server for AI agents.