API Reference

The webreadr API is a JSON-over-HTTP service. The base URL is https://webreadr.com. Every request and response body is JSON. Authenticate with a scoped API key issued from your dashboard.

Overview

webreadr follows a fetch-render-return-everything model. The rawHtml field, the post-JavaScript DOM for browser engines, is always returned and never transformed. Two additive blocks come alongside it: structured (the JSON-LD, OpenGraph and embedded-state the page already ships) and network (XHR/fetch responses captured during render, often the only way to recover data from SPAs). The lossy derived formats (markdown, text) are opt-in.

Authentication

Pass your key as a bearer token. Keys look like wr_<prefix>_<secret> and the full secret is shown exactly once at creation: only its hash and prefix are stored. Revoking a key takes effect immediately (subsequent use returns 401).

Authorization: Bearer YOUR_API_KEY

The key-management endpoints (/v1/keys) and /v1/usage are gated by your logged-in browser session instead, so issue and revoke keys from the dashboard.

Rate limits & quotas

Every request is metered per key. The free tier allows 20 requests/minute, 5,000 requests/month and 2 concurrent crawls. Each response carries X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset. Over the limit you get 429 rate_limited with a Retry-After header. The limiter degrades open: if our counter store is down, requests are allowed rather than failed.

Scrape options

These fields appear in /v1/scrape, in /v1/crawl's scrapeOptions, and in /v1/extract.

FieldTypeDefaultDescription
formatsstring[]["rawHtml","structured","metadata"]Any of rawHtml, html, markdown, text, links, metadata, screenshot, structured, network. rawHtml is always returned on success; markdown/text/html are opt-in and lossy.
enginestringautoauto walks the ladder for you, or pin a specific strength: level1 (fastest) through level6 (strongest). Higher levels are more resilient against blocking but slower.
onlyMainContentbooltrueReadability extraction; only affects the lossy formats, never touches rawHtml/structured/network.
timeoutint (s)45Per-page timeout.
waitForint (ms)0Extra settle time (browser engines). Engines already auto-scroll, dismiss overlays and wait for network idle, so you rarely need this.
waitForSelectorstringnullCSS selector to wait for (browser engines).
useProxybooltrueAllow the proxied levels when walking the ladder in auto mode.
countrystringusPreferred country for the proxied levels.
headersobject{}Extra request headers.

POST/v1/scrape

Single URL → the fully rendered page. Synchronous.

Request
POST /v1/scrape
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "url": "https://example.com/article",
  "formats": ["rawHtml", "structured", "network", "metadata"],
  "engine": "auto",
  "onlyMainContent": true,
  "timeout": 45
}
Response 200
{
  "success": true,
  "data": {
    "url": "https://example.com/article",
    "finalUrl": "https://example.com/article",
    "rawHtml": "<!doctype html>...",        // post-JS DOM, always present
    "structured": {
      "jsonLd": [{ "@type": "Product", "offers": { "price": "39.97" } }],
      "openGraph": { "og:title": "..." },
      "embeddedState": { "__NEXT_DATA__": { } }
    },
    "network": [
      { "url": "https://api.site.com/products", "status": 200,
        "contentType": "application/json", "body": "{...}" }
    ],
    "metadata": { "title": "...", "description": "...", "canonical": "..." },
    "markdown": null,                          // opt-in, explicitly lossy
    "screenshot": null,
    "engineUsed": "level6",
    "blocked": false,
    "status": 200,
    "elapsed": 4.1
  }
}

POST/v1/crawl

Recursively crawl a site. Returns a job id immediately; a worker processes it. Poll GET /v1/crawl/{jobId} for progress and results (supports ?offset= & ?limit= paging).

Request
POST /v1/crawl
{
  "url": "https://example.com",
  "maxDepth": 3,
  "limit": 200,
  "includePaths": ["^/blog/"],
  "excludePaths": ["^/tag/", "\\?"],
  "sameDomainOnly": true,
  "scrapeOptions": { "formats": ["structured", "metadata"] }
}

// → { "success": true, "jobId": "c_3f9a1c8e",
//     "statusUrl": "/v1/crawl/c_3f9a1c8e" }
Poll the job
GET /v1/crawl/c_3f9a1c8e?offset=0&limit=50

{
  "jobId": "c_3f9a1c8e",
  "status": "running",      // queued | running | completed | failed
  "total": 200,
  "completed": 47,
  "discovered": 138,
  "data": []
}

POST/v1/map

Fast URL discovery, no full scrape. Sources: sitemap.xml, robots.txt and a shallow link harvest. The optional search filters discovered URLs by keyword.

Request
POST /v1/map
{ "url": "https://example.com", "limit": 5000, "search": "blog" }

// → { "success": true, "count": 3,
//     "links": ["https://example.com/", "https://example.com/blog/x"],
//     "sources": { "sitemap": 2, "links": 1 } }

POST/v1/extract

URL(s) plus a JSON schema and/or natural-language prompt → validated structured JSON. The page is scraped through the ladder, the cleaned content and schema are fed to an LLM (free local model first, cloud fallback), and the output is validated against your schema. On validation failure there's one repair retry, then the raw text is returned with an error flag.

Request
POST /v1/extract
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract the product name, price, and availability.",
  "schema": {
    "type": "object",
    "properties": {
      "name":  { "type": "string" },
      "price": { "type": "number" },
      "inStock": { "type": "boolean" }
    },
    "required": ["name", "price"]
  }
}

// → { "success": true,
//     "data": { "name": "Widget Pro", "price": 49.99, "inStock": true },
//     "meta": { "model": "lmstudio:qwen3-8b", "validated": true } }

POST/v1/keys

Issue a scoped key (requires a logged-in session). An empty scopes list means all scopes. A key scoped ["scrape"] calling /v1/crawl is rejected 403 forbidden.

Request
POST /v1/keys        (cookie session required)
{ "name": "production", "scopes": ["scrape", "crawl"] }

// → secret shown ONCE, never again:
// { "success": true,
//   "key": "wr_92f2176eaa_Y_eLXg5q…",
//   "apiKey": { "id": "fb39…", "keyPrefix": "wr_92f2176eaa",
//     "scopes": ["scrape","crawl"], "active": true } }

GET/v1/usage

Your usage across all keys: plan limits, monthly quota remaining, and recent requests. Cookie-gated; view it on the usage page.

Errors

All errors share one envelope:

{
  "success": false,
  "error": "blocked",                 // machine code
  "message": "All engines exhausted", // human readable
  "detail": {}
}

Machine codes: invalid_url, bad_options, blocked, timeout, not_found, schema_validation_failed, unauthorized, forbidden, rate_limited, internal.

Ready to build?

Issue a key and run your first scrape from the playground.