API Reference

The webreadr API is a JSON-over-HTTP service. The base URL is https://webreadr.com. Every request and response body is JSON. Authenticate with a scoped API key issued from your dashboard.

Overview

webreadr follows a fetch-render-return-everything model. The rawHtml field, the post-JavaScript DOM for browser engines, is always returned and never transformed. Two additive blocks come alongside it: structured (the JSON-LD, OpenGraph and embedded-state the page already ships) and network (XHR/fetch responses captured during render, often the only way to recover data from SPAs). The lossy derived formats (markdown, text) are opt-in.

Authentication

Pass your key as a bearer token. Keys look like wr_<prefix>_<secret> and the full secret is shown exactly once at creation: only its hash and prefix are stored. Revoking a key takes effect immediately (subsequent use returns 401).

Authorization: Bearer YOUR_API_KEY

The key-management endpoints (/v1/keys) and /v1/usage are gated by your logged-in browser session instead, so issue and revoke keys from the dashboard.

Rate limits & quotas

Every request is metered per key. The free tier allows 20 requests/minute, 5,000 requests/month and 2 concurrent crawls. Each response carries X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset. Over the limit you get 429 rate_limited with a Retry-After header. The limiter degrades open: if our counter store is down, requests are allowed rather than failed.

Scrape options

These fields appear in /v1/scrape, in /v1/crawl's scrapeOptions, and in /v1/extract.

Field	Type	Default	Description
formats	string[]	["rawHtml","structured","metadata"]	Any of rawHtml, html, markdown, text, links, metadata, screenshot, structured, network. rawHtml is always returned on success; markdown/text/html are opt-in and lossy.
engine	string	auto	auto walks the ladder for you, or pin a specific strength: level1 (fastest) through level6 (strongest). Higher levels are more resilient against blocking but slower.
onlyMainContent	bool	true	Readability extraction; only affects the lossy formats, never touches rawHtml/structured/network.
timeout	int (s)	45	Per-page timeout.
waitFor	int (ms)	0	Extra settle time (browser engines). Engines already auto-scroll, dismiss overlays and wait for network idle, so you rarely need this.
waitForSelector	string	null	CSS selector to wait for (browser engines).
useProxy	bool	true	Allow the proxied levels when walking the ladder in auto mode.
country	string	us	Preferred country for the proxied levels.
headers	object	{}	Extra request headers.

POST/v1/scrape

Single URL → the fully rendered page. Synchronous.

Request

POST /v1/scrape
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "url": "https://example.com/article",
  "formats": ["rawHtml", "structured", "network", "metadata"],
  "engine": "auto",
  "onlyMainContent": true,
  "timeout": 45
}

Response 200

{
  "success": true,
  "data": {
    "url": "https://example.com/article",
    "finalUrl": "https://example.com/article",
    "rawHtml": "<!doctype html>...",        // post-JS DOM, always present
    "structured": {
      "jsonLd": [{ "@type": "Product", "offers": { "price": "39.97" } }],
      "openGraph": { "og:title": "..." },
      "embeddedState": { "__NEXT_DATA__": { } }
    },
    "network": [
      { "url": "https://api.site.com/products", "status": 200,
        "contentType": "application/json", "body": "{...}" }
    ],
    "metadata": { "title": "...", "description": "...", "canonical": "..." },
    "markdown": null,                          // opt-in, explicitly lossy
    "screenshot": null,
    "engineUsed": "level6",
    "blocked": false,
    "status": 200,
    "elapsed": 4.1
  }
}

POST/v1/crawl

Recursively crawl a site. Returns a job id immediately; a worker processes it. Poll GET /v1/crawl/{jobId} for progress and results (supports ?offset= & ?limit= paging).

Request

POST /v1/crawl
{
  "url": "https://example.com",
  "maxDepth": 3,
  "limit": 200,
  "includePaths": ["^/blog/"],
  "excludePaths": ["^/tag/", "\\?"],
  "sameDomainOnly": true,
  "scrapeOptions": { "formats": ["structured", "metadata"] }
}

// → { "success": true, "jobId": "c_3f9a1c8e",
//     "statusUrl": "/v1/crawl/c_3f9a1c8e" }

Poll the job

GET /v1/crawl/c_3f9a1c8e?offset=0&limit=50

{
  "jobId": "c_3f9a1c8e",
  "status": "running",      // queued | running | completed | failed
  "total": 200,
  "completed": 47,
  "discovered": 138,
  "data": []
}

POST/v1/map

Fast URL discovery, no full scrape. Sources: sitemap.xml, robots.txt and a shallow link harvest. The optional search filters discovered URLs by keyword.

Request

POST /v1/map
{ "url": "https://example.com", "limit": 5000, "search": "blog" }

// → { "success": true, "count": 3,
//     "links": ["https://example.com/", "https://example.com/blog/x"],
//     "sources": { "sitemap": 2, "links": 1 } }

POST/v1/extract

URL(s) plus a JSON schema and/or natural-language prompt → validated structured JSON. The page is scraped through the ladder, the cleaned content and schema are fed to an LLM (free local model first, cloud fallback), and the output is validated against your schema. On validation failure there's one repair retry, then the raw text is returned with an error flag.

Request

POST /v1/extract
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract the product name, price, and availability.",
  "schema": {
    "type": "object",
    "properties": {
      "name":  { "type": "string" },
      "price": { "type": "number" },
      "inStock": { "type": "boolean" }
    },
    "required": ["name", "price"]
  }
}

// → { "success": true,
//     "data": { "name": "Widget Pro", "price": 49.99, "inStock": true },
//     "meta": { "model": "lmstudio:qwen3-8b", "validated": true } }

POST/v1/keys

Issue a scoped key (requires a logged-in session). An empty scopes list means all scopes. A key scoped ["scrape"] calling /v1/crawl is rejected 403 forbidden.

Request

POST /v1/keys        (cookie session required)
{ "name": "production", "scopes": ["scrape", "crawl"] }

// → secret shown ONCE, never again:
// { "success": true,
//   "key": "wr_92f2176eaa_Y_eLXg5q…",
//   "apiKey": { "id": "fb39…", "keyPrefix": "wr_92f2176eaa",
//     "scopes": ["scrape","crawl"], "active": true } }

GET /v1/keys: list your keys (prefix + metadata only).
DELETE /v1/keys/{id}: revoke a key; subsequent use returns 401.

GET/v1/usage

Your usage across all keys: plan limits, monthly quota remaining, and recent requests. Cookie-gated; view it on the usage page.

Errors

All errors share one envelope:

{
  "success": false,
  "error": "blocked",                 // machine code
  "message": "All engines exhausted", // human readable
  "detail": {}
}

Machine codes: invalid_url, bad_options, blocked, timeout, not_found, schema_validation_failed, unauthorized, forbidden, rate_limited, internal.

Ready to build?

Issue a key and run your first scrape from the playground.