Convert URL to PDF in Node.js — Puppeteer, Playwright & API (2026)

You have a URL. You need a PDF. Maybe you're archiving customer-facing dashboards. Maybe you're generating reports from a live web app. Maybe legal needs a "frozen" copy of a terms page every time it changes.

The hard part isn't calling page.pdf(). The hard part is that the page you're capturing is a React SPA that renders a blank white rectangle until JavaScript finishes executing, has a sticky nav that bleeds into your PDF, lazy-loads half its images, and sits behind a login wall.

This guide covers every viable way to capture a live URL as a PDF in Node.js—Puppeteer, Playwright, wkhtmltopdf, and a dedicated API. Real code, real gotchas, honest tradeoffs. If you're looking for HTML string to PDF instead of a live URL, that's a separate guide.

The URL-to-PDF pipeline. The "Wait" step is where most tools fail on SPAs.

1. Puppeteer

Puppeteer launches headless Chromium, navigates to your URL, waits for the page to render, and prints to PDF. It's the most common approach because it gives you a real browser—CSS Grid, Flexbox, web fonts, JavaScript execution, the whole stack.

Basic URL to PDF

url-to-pdf-puppeteer.js

import puppeteer from 'puppeteer'

async function urlToPdf(url) {

const browser = await puppeteer.launch()

const page = await browser.newPage()

// Critical: set viewport BEFORE navigating

await page.setViewport({ width: 1280, height: 720 })

await page.goto(url, {

waitUntil: 'networkidle0',

timeout: 30_000

})

const pdf = await page.pdf({

format: 'A4',

printBackground: true,

margin: { top: '20mm', bottom: '20mm', left: '15mm', right: '15mm' }

})

await browser.close()

return pdf // Buffer

}

const pdf = await urlToPdf('https://example.com/dashboard')

The SPA blank page problem

If you're capturing a React, Next.js, Vue, or Angular app, the initial HTML is often just an empty <div id="root"></div>. The actual content doesn't exist until JavaScript downloads, parses, and executes. Using waitUntil: 'domcontentloaded' will capture a blank page.

Fix: Use waitUntil: 'networkidle0' (waits until zero network requests for 500ms) or, better yet, wait for a specific element: await page.waitForSelector('.dashboard-loaded'). The custom selector approach is the most reliable because networkidle0 can still fire before client-side rendering completes on heavy SPAs.

Handling authentication

Capturing a page behind login? You need to inject cookies or headers before navigating. Here's the cookie injection pattern:

puppeteer-auth.js

const page = await browser.newPage()

// Inject session cookies BEFORE navigation

await page.setCookie({

name: 'session_token',

value: 'eyJhbGciOiJIUzI1NiIs...',

domain: 'app.example.com',

path: '/',

httpOnly: true,

secure: true

})

// Or set custom headers (e.g., Bearer token)

await page.setExtraHTTPHeaders({

'Authorization': 'Bearer sk_live_...'

})

await page.goto('https://app.example.com/reports/q2', {

waitUntil: 'networkidle0'

})

Injecting CSS to clean up the output

Most sites look terrible as PDFs because they have no @media print rules. The sticky nav, cookie banner, chat widget, and floating CTAs all end up in your PDF. Fix this by injecting CSS after the page loads:

inject-print-css.js

// After page loads, inject CSS to hide non-content elements

await page.addStyleTag({

content: `

nav, footer, .cookie-banner, .chat-widget,

.sticky-cta, [data-testid="intercom"] {

display: none !important;

}

body { padding: 0 !important; }

* { box-shadow: none !important; }

`

})

Default: 800 x 600

Content wraps like a phone. Sidebars stack vertically. Tables overflow. Responsive breakpoints trigger mobile layouts. Your PDF looks like a screenshot from 2005.

Correct: 1280 x 720+

Full desktop layout. Sidebars stay in place. Tables fit. The page looks like it does when you browse it. Always set viewport before navigating.

Puppeteer's default viewport is 800x600. This single setting is responsible for more "why does my PDF look wrong" questions than everything else combined.

Dealing with lazy-loaded images and infinite scroll

Pages with lazy-loaded images (almost every modern site) won't load images that are below the fold. You need to scroll the page to trigger the lazy loading before capturing:

scroll-lazy-images.js

// Auto-scroll to trigger lazy loading

await page.evaluate(async () => {

await new Promise((resolve) => {

let totalHeight = 0

const distance = 400

const timer = setInterval(() => {

window.scrollBy(0, distance)

totalHeight += distance

if (totalHeight >= document.body.scrollHeight) {

clearInterval(timer)

window.scrollTo(0, 0) // scroll back to top

resolve()

}

}, 200)

})

// Wait for images to actually load after triggering

await page.waitForNetworkIdle({ idleTime: 1000 })

This handles lazy images, but for infinite scroll pages (social feeds, long listings), you'll need to decide how much content to capture. Scroll to a specific element or set a max scroll height—otherwise you'll be scrolling forever and your PDF will be 200 pages.

2. Playwright

Playwright is Puppeteer's successor from the same team (now at Microsoft). Same concept—headless browser, navigate, print—but with a cleaner API, better auto-waiting, and cross-browser support. If you're starting fresh in 2026, Playwright over Puppeteer is the right call.

url-to-pdf-playwright.js

import { chromium } from 'playwright'

async function urlToPdf(url) {

const browser = await chromium.launch()

const page = await browser.newPage({

viewport: { width: 1280, height: 720 }

})

// Playwright auto-waits for load, but we want network idle

await page.goto(url, {

waitUntil: 'networkidle',

timeout: 30_000

})

// Optional: wait for a specific element (best for SPAs)

await page.waitForSelector('[data-ready="true"]', {

timeout: 10_000

})

const pdf = await page.pdf({

format: 'A4',

printBackground: true,

margin: { top: '20mm', bottom: '20mm', left: '15mm', right: '15mm' }

})

await browser.close()

return pdf

}

Why Playwright over Puppeteer

Better auto-waiting. Playwright's waitForSelector and action APIs handle race conditions that Puppeteer makes you deal with manually. Less flaky in production.
Cross-browser PDF. You can generate PDFs using Chromium or WebKit. Swap chromium for webkit and get WebKit rendering—useful if you need to match Safari's behavior.
Browser contexts. Playwright's BrowserContext isolates sessions without launching separate browser instances. One browser, multiple isolated "tabs" with their own cookies, storage, and viewport. More efficient for batch processing.
Authentication via storage state. You can save and restore full browser state (cookies + localStorage) using storageState. Log in once, save the state file, reuse it across all PDF captures.

playwright-batch-auth.js

// Batch PDFs with shared auth using browser contexts

const browser = await chromium.launch()

const context = await browser.newContext({

storageState: 'auth.json', // saved login state

viewport: { width: 1280, height: 720 }

})

const urls = [

'https://app.example.com/reports/q1',

'https://app.example.com/reports/q2',

'https://app.example.com/reports/q3'

]

for (const url of urls) {

const page = await context.newPage()

await page.goto(url, { waitUntil: 'networkidle' })

await page.pdf({ path: `report-${url.split('/').pop()}.pdf` })

await page.close()

}

await browser.close()

The downsides are the same as Puppeteer: 400MB+ browser binary, high memory usage, cold starts, browser lifecycle ops. The API is nicer, but the infrastructure burden is identical.

3. wkhtmltopdf (deprecated)

wkhtmltopdf uses an old Qt WebKit fork to convert URLs to PDF. It's been around since 2008 and still shows up in legacy systems.

Don't use it for new projects. The upstream repository has been archived. The last release was 2020. It uses a WebKit version so old that CSS Grid and modern Flexbox don't work. JavaScript-heavy SPAs won't render properly. And it has known SSRF vulnerabilities—it can read local files via file:// protocol.

If you're already running it in production, it works. But make a plan to migrate to Puppeteer, Playwright, or an API. Every month you wait makes the migration harder.

Skip the browser infrastructure entirely

PDFBase captures any URL as a PDF in one API call. Handles SPAs, auth, lazy loading, and viewport—no Chromium on your server.

Try free — 100 credits, no card Try the free tool first

4. PDFBase API

Instead of managing a headless browser on your server, you send a URL to an API and get a PDF back. PDFBase runs warm Chromium instances on dedicated infrastructure—no cold starts, no Docker layers, no browser pool management. One HTTP request.

curl

curl -X POST https://api.pdfbase.dev/v1/pdfs \

-H "Authorization: Bearer pk_live_..." \

-H "Content-Type: application/json" \

-d '{

"url": "https://app.example.com/reports/q2",

"format": "A4",

"viewport": { "width": 1280, "height": 720 },

"waitUntil": "networkidle0",

"output": "url"

}'

Node.js

pdfbase-url-to-pdf.js

import PDFBase from 'pdfbase'

const client = new PDFBase('pk_live_...')

const pdf = await client.pdfs.create({

url: 'https://app.example.com/reports/q2',

format: 'A4',

viewport: { width: 1280, height: 720 },

waitUntil: 'networkidle0',

output: 'url'

})

console.log(pdf.data.url) // signed URL, 24h expiry

Auth, cookies, and headers

Capturing pages behind login is where the API approach pays off. No cookie injection dance, no browser state management—just pass the credentials in the request:

pdfbase-auth.js

const pdf = await client.pdfs.create({

url: 'https://app.example.com/dashboard',

cookies: [{

name: 'session_token',

value: 'eyJhbGciOiJIUzI1NiIs...',

domain: 'app.example.com'

}],

headers: {

'Authorization': 'Bearer sk_live_...'

},

injectCSS: 'nav, footer, .chat-widget { display: none !important; }',

waitForSelector: '.dashboard-loaded',

output: 'url'

})

Why the API approach works at scale

No Chromium on your server. Your app stays lightweight. No 400MB Docker layers, no memory spikes from browser instances, no zombie process cleanup.
SPA-ready out of the box. PDFBase executes JavaScript, waits for networkidle0, and supports custom wait selectors. React, Next.js, Vue—all render correctly.
1-2 second response times. Warm browser pools eliminate cold starts. Puppeteer and Playwright take 3-5 seconds on a cold start; the API runs on warm instances.
CSS injection built in. Pass injectCSS to strip navs, footers, and chat widgets without writing browser scripting logic.
Cookies, headers, viewport—all first-class. Everything you'd manually configure in Puppeteer is a parameter in the API request.

Check the PDFBase docs for the full API reference, or try the free URL to PDF tool to test it without writing code.

Comparison

Here's how all four methods stack up across the dimensions that actually matter in production.

	Puppeteer	Playwright	wkhtmltopdf	PDFBase API
SPA Support	Yes (manual waits)	Yes (better auto-wait)	No	Yes (built-in)
Auth / Cookies	Manual injection	storageState + manual	Limited	API parameter
CSS Injection	addStyleTag()	addStyleTag()	--user-style-sheet	injectCSS param
Speed (per page)	3-5s (cold) / 1-2s (warm)	3-5s (cold) / 1-2s (warm)	1-3s	1-2s (warm pool)
Maintenance	High (browser ops)	High (browser ops)	None (unmaintained)	Zero (managed)
Memory Usage	100-300MB per instance	100-300MB per instance	50-100MB	None (remote)
Cross-browser	Chromium only	Chromium + WebKit + Firefox	Qt WebKit (ancient)	Chromium

Which Should You Choose?

Skip the analysis paralysis. Match your situation:

One-off script or internal tool, low volume

Use Puppeteer. Write a 20-line script, run it locally, get your PDF. No API key, no cost. Just make sure you set the viewport to 1280x720 and use networkidle0.

Need cross-browser rendering or starting fresh

Use Playwright. Better API, better auto-waiting, Chromium + WebKit support. If you're writing new URL-to-PDF code in 2026, Playwright is strictly better than Puppeteer.

Production workloads, recurring captures, or you don't want browser ops

Use a PDF API like PDFBase. The engineering time you'd spend on Chromium infrastructure, browser pooling, memory management, and Docker optimization costs more than $0.01 per PDF. One POST request, zero ops.

Already running wkhtmltopdf

Migrate. It's unmaintained, has security issues, and can't render modern web pages. Move to Playwright (self-hosted) or PDFBase (managed). Don't start new projects with it.

Wrapping Up

Converting a URL to PDF sounds simple until you hit SPAs that render blank, auth walls, lazy-loaded images, and viewport misconfigurations. The actual rendering is the easy part. The hard part is making it work reliably on real-world web pages.

If you own the page you're capturing, add a data-ready attribute or a @media print stylesheet. It makes every tool work better. If you're capturing third-party pages, budget time for the edge cases—or let an API handle them for you.

For more on the HTML-to-PDF side of things (when you have HTML strings, not URLs), see HTML to PDF in Node.js — The Complete Guide.

If you want to try PDFBase, you can grab 100 free credits without a credit card. The docs cover everything from basic URL capture to templates, watermarks, batch processing, and the MCP server for AI agents.

Convert URL to PDF in Node.js Puppeteer, Playwright & API

1. Puppeteer

Basic URL to PDF

Handling authentication

Injecting CSS to clean up the output

Dealing with lazy-loaded images and infinite scroll

2. Playwright

Why Playwright over Puppeteer

3. wkhtmltopdf (deprecated)

4. PDFBase API

curl

Node.js

Auth, cookies, and headers

Why the API approach works at scale

Comparison

Which Should You Choose?

Wrapping Up

More from the PDFBase blog