API Reference
The webreadr API is a JSON-over-HTTP service. The base URL is https://webreadr.com. Every request and response body is JSON. Authenticate with a scoped API key issued from your dashboard.
Overview
webreadr follows a fetch-render-return-everything model. The rawHtml field, the post-JavaScript DOM for browser engines, is always returned and never transformed. Two additive blocks come alongside it: structured (the JSON-LD, OpenGraph and embedded-state the page already ships) and network (XHR/fetch responses captured during render, often the only way to recover data from SPAs). The lossy derived formats (markdown, text) are opt-in.
Authentication
Pass your key as a bearer token. Keys look like wr_<prefix>_<secret> and the full secret is shown exactly once at creation: only its hash and prefix are stored. Revoking a key takes effect immediately (subsequent use returns 401).
Authorization: Bearer YOUR_API_KEY
The key-management endpoints (/v1/keys) and /v1/usage are gated by your logged-in browser session instead, so issue and revoke keys from the dashboard.
Rate limits & quotas
Every request is metered per key. The free tier allows 20 requests/minute, 5,000 requests/month and 2 concurrent crawls. Each response carries X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset. Over the limit you get 429 rate_limited with a Retry-After header. The limiter degrades open: if our counter store is down, requests are allowed rather than failed.
Scrape options
These fields appear in /v1/scrape, in /v1/crawl's scrapeOptions, and in /v1/extract.
| Field | Type | Default | Description |
|---|---|---|---|
| formats | string[] | ["rawHtml","structured","metadata"] | Any of rawHtml, html, markdown, text, links, metadata, screenshot, structured, network. rawHtml is always returned on success; markdown/text/html are opt-in and lossy. |
| engine | string | auto | auto walks the ladder for you, or pin a specific strength: level1 (fastest) through level6 (strongest). Higher levels are more resilient against blocking but slower. |
| onlyMainContent | bool | true | Readability extraction; only affects the lossy formats, never touches rawHtml/structured/network. |
| timeout | int (s) | 45 | Per-page timeout. |
| waitFor | int (ms) | 0 | Extra settle time (browser engines). Engines already auto-scroll, dismiss overlays and wait for network idle, so you rarely need this. |
| waitForSelector | string | null | CSS selector to wait for (browser engines). |
| useProxy | bool | true | Allow the proxied levels when walking the ladder in auto mode. |
| country | string | us | Preferred country for the proxied levels. |
| headers | object | {} | Extra request headers. |
POST/v1/scrape
Single URL → the fully rendered page. Synchronous.
POST /v1/scrape
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
{
"url": "https://example.com/article",
"formats": ["rawHtml", "structured", "network", "metadata"],
"engine": "auto",
"onlyMainContent": true,
"timeout": 45
}{
"success": true,
"data": {
"url": "https://example.com/article",
"finalUrl": "https://example.com/article",
"rawHtml": "<!doctype html>...", // post-JS DOM, always present
"structured": {
"jsonLd": [{ "@type": "Product", "offers": { "price": "39.97" } }],
"openGraph": { "og:title": "..." },
"embeddedState": { "__NEXT_DATA__": { } }
},
"network": [
{ "url": "https://api.site.com/products", "status": 200,
"contentType": "application/json", "body": "{...}" }
],
"metadata": { "title": "...", "description": "...", "canonical": "..." },
"markdown": null, // opt-in, explicitly lossy
"screenshot": null,
"engineUsed": "level6",
"blocked": false,
"status": 200,
"elapsed": 4.1
}
}POST/v1/crawl
Recursively crawl a site. Returns a job id immediately; a worker processes it. Poll GET /v1/crawl/{jobId} for progress and results (supports ?offset= & ?limit= paging).
POST /v1/crawl
{
"url": "https://example.com",
"maxDepth": 3,
"limit": 200,
"includePaths": ["^/blog/"],
"excludePaths": ["^/tag/", "\\?"],
"sameDomainOnly": true,
"scrapeOptions": { "formats": ["structured", "metadata"] }
}
// → { "success": true, "jobId": "c_3f9a1c8e",
// "statusUrl": "/v1/crawl/c_3f9a1c8e" }GET /v1/crawl/c_3f9a1c8e?offset=0&limit=50
{
"jobId": "c_3f9a1c8e",
"status": "running", // queued | running | completed | failed
"total": 200,
"completed": 47,
"discovered": 138,
"data": []
}POST/v1/map
Fast URL discovery, no full scrape. Sources: sitemap.xml, robots.txt and a shallow link harvest. The optional search filters discovered URLs by keyword.
POST /v1/map
{ "url": "https://example.com", "limit": 5000, "search": "blog" }
// → { "success": true, "count": 3,
// "links": ["https://example.com/", "https://example.com/blog/x"],
// "sources": { "sitemap": 2, "links": 1 } }POST/v1/extract
URL(s) plus a JSON schema and/or natural-language prompt → validated structured JSON. The page is scraped through the ladder, the cleaned content and schema are fed to an LLM (free local model first, cloud fallback), and the output is validated against your schema. On validation failure there's one repair retry, then the raw text is returned with an error flag.
POST /v1/extract
{
"urls": ["https://example.com/product/123"],
"prompt": "Extract the product name, price, and availability.",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"inStock": { "type": "boolean" }
},
"required": ["name", "price"]
}
}
// → { "success": true,
// "data": { "name": "Widget Pro", "price": 49.99, "inStock": true },
// "meta": { "model": "lmstudio:qwen3-8b", "validated": true } }POST/v1/keys
Issue a scoped key (requires a logged-in session). An empty scopes list means all scopes. A key scoped ["scrape"] calling /v1/crawl is rejected 403 forbidden.
POST /v1/keys (cookie session required)
{ "name": "production", "scopes": ["scrape", "crawl"] }
// → secret shown ONCE, never again:
// { "success": true,
// "key": "wr_92f2176eaa_Y_eLXg5q…",
// "apiKey": { "id": "fb39…", "keyPrefix": "wr_92f2176eaa",
// "scopes": ["scrape","crawl"], "active": true } }GET /v1/keys: list your keys (prefix + metadata only).DELETE /v1/keys/{id}: revoke a key; subsequent use returns 401.
GET/v1/usage
Your usage across all keys: plan limits, monthly quota remaining, and recent requests. Cookie-gated; view it on the usage page.
Errors
All errors share one envelope:
{
"success": false,
"error": "blocked", // machine code
"message": "All engines exhausted", // human readable
"detail": {}
}Machine codes: invalid_url, bad_options, blocked, timeout, not_found, schema_validation_failed, unauthorized, forbidden, rate_limited, internal.