I Built a Screenshot & Metadata API for $12/month

Every app that displays link previews needs the same thing: fetch a URL, extract its Open Graph tags, maybe grab a screenshot or generate a PDF. And every developer faces the same choice: spin up a headless browser yourself (and deal with memory leaks, zombie processes, and Chromium updates), or pay $50+/month for an enterprise screenshot API.

I wanted a middle ground, so I built one on a $12/month DigitalOcean droplet. Here's the full stack.

What It Does

Four endpoints, one API key:

# Take a screenshot
curl -X POST https://snap.michaelcli.com/api/screenshot \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com", "width": 1280, "height": 720}' \
  --output screenshot.png

# Extract metadata (Open Graph, Twitter Cards, etc.)
curl -X POST https://snap.michaelcli.com/api/metadata \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com"}'

# Generate PDF from URL
curl -X POST https://snap.michaelcli.com/api/pdf \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "A4"}'

# Extract text content
curl -X POST https://snap.michaelcli.com/api/text \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

The Stack

Runtime: Node.js 22 + Express
Browser: Playwright with Chromium (stealth config: realistic UA, locale, timezone)
Database: SQLite via better-sqlite3 (API keys, usage tracking)
Payments: Stripe Checkout with webhook integration
Reverse proxy: nginx with Let's Encrypt SSL
Process management: PM2
Total server code: ~300 lines

Architecture Decisions

Why Playwright over Puppeteer?

Playwright's networkidle wait strategy is more reliable for SPAs. It also handles PDF generation natively without extra libraries.

Why SQLite over Postgres?

For a single-server API with low-to-medium traffic, SQLite with WAL mode is faster and simpler. No connection pooling, no separate process, no network latency. The entire database is a single file that's easy to backup.

Browser pooling

I keep a single browser instance alive and create new contexts per request. Contexts are isolated (separate cookies, storage) but share the browser process, saving ~200ms per request vs cold launch.

let browser = null;

async function getBrowser() {
  if (!browser || !browser.isConnected()) {
    browser = await chromium.launch({
      args: ['--no-sandbox', '--disable-dev-shm-usage',
             '--disable-blink-features=AutomationControlled']
    });
  }
  return browser;
}

Security

SSRF protection: URLs are validated — no localhost, 127.0.0.1, or private IP ranges allowed
Rate limiting: Per-key monthly limits enforced via SQLite queries
API key auth: All endpoints require X-API-Key header
Input validation: Width/height bounds, format whitelist, URL scheme check

Pricing

The goal was to be 10x cheaper than enterprise screenshot APIs while still sustainable:

Free: 50 requests/month, no credit card required
Starter ($5/mo): 500 requests/month
Basic ($15/mo): 2,000 requests/month
Pro ($39/mo): 10,000 requests/month

The server cost is $12/month regardless of tier, so break-even is a single Starter subscriber.

Try It Free

Live demo on the landing page — no signup needed. Enter any URL and see metadata extraction in real time.

Try the Live Demo

Free tier: 50 requests/month · No credit card required

What I Learned

Chromium memory is the bottleneck. On a 2GB droplet, Chromium can easily consume 500MB+. The browser pooling pattern is essential — launching a new browser per request would crash the server under any real load.

SQLite is underrated for APIs. WAL mode gives you concurrent reads with serialized writes. For an API doing mostly reads (rate limit checks) with occasional writes (usage logging), it's perfect. Zero latency, zero config, zero maintenance.

Stealth matters. Many sites detect headless browsers and serve different content (or block entirely). Setting a realistic user agent, locale, and timezone gets past most basic detection. Playwright's context isolation means each request looks like a fresh browser session.

Questions about the architecture? I'd love to hear about alternative approaches. Check out the API or get a free website audit.

I Built a Screenshot & Metadata API for $12/month — Here's the Stack