Web Fetch

Use this when you have specific URLs and need to retrieve their full HTML content — for content extraction, web scraping, SEO analysis, or change tracking. The Web Fetch API accepts a list of URLs and returns the page title and full HTML content for each. You can fetch up to 10 URLs in a single request. Every request goes to the same endpoint:

POST https://api.crustdata.com/web/enrich/live

The endpoint path is /web/enrich/live (not /web/fetch/live) because it follows the Crustdata convention where “enrich” means adding data to a known identifier — in this case, enriching a URL with its page content.

Request body

Parameter	Type	Required	Default	Description
`urls`	string[]	Yes	—	URLs to fetch. Min: 1, max: 10. Must include `http://` or `https://`.
`solveCloudflare`	boolean	No	`false`	Attempt to bypass Cloudflare protection.

Response body

The response is an array (not an object) — one entry per URL in your request.

Field	Type	Description
`success`	boolean	Whether this URL was fetched successfully.
`url`	string?	The URL that was fetched. `null` if the fetch failed.
`timestamp`	integer?	Unix timestamp (seconds) when fetched. `null` on failure.
`pageTitle`	string?	The `<title>` tag content. `null` on failure.
`content`	string?	Full HTML content of the page. `null` on failure.

Timestamps: Fetch timestamps are in seconds. Search timestamps are in milliseconds. Account for this when comparing timestamps across endpoints.

Two kinds of failure, two places to check:

Request-level errors (400, 401) — the entire request failed. You get an error object, not an array. Caused by missing fields, empty arrays, or bad auth.
Per-URL failures within a 200 — individual entries with success: false and null fields. Caused by unreachable URLs, timeouts, or bot protection.

Always check the HTTP status first, then check success for each entry in the array.

Fetch a single URL

The simplest request fetches one URL and returns its HTML content.

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": ["https://example.com"]
  }'

The content field is trimmed here. It contains the full HTML of the fetched page.

Extract: Parse content using an HTML parser (BeautifulSoup for Python, Cheerio for Node.js) to extract specific elements like text, links, or metadata.

Fetch multiple URLs

Pass up to 10 URLs to fetch their content in parallel.

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": [
      "https://example.com",
      "https://example.org",
      "https://www.crustdata.com"
    ]
  }'

Current platform behavior: The response array order may differ from the request order. Match successful results by their url field, not by array index.

Handle partial failures

When some URLs succeed and others fail, the request still returns 200. Failed URLs have success: false with all other fields as null.

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": [
      "https://example.com",
      "https://this-domain-does-not-exist-xyz.com"
    ]
  }'

Correlating failures to input URLs

Failed entries have url: null, so you cannot directly identify which input URL failed. To correlate failures:

Track the URLs you sent.
Collect the url values from all successful entries.
Any input URL not in the successful set is the one that failed.

const requestedUrls = ["https://example.com", "https://this-domain-does-not-exist-xyz.com"];
const successfulUrls = new Set(
  fetchResponse.filter(r => r.success).map(r => r.url)
);
const failedUrls = requestedUrls.filter(url => !successfulUrls.has(url));
// failedUrls = ["https://this-domain-does-not-exist-xyz.com"]

Always check the success field for each entry in the response array. Build your parsing logic to handle both successful and failed entries gracefully.

Bypass Cloudflare protection

Some websites use Cloudflare to block automated requests. Set solveCloudflare: true to attempt to bypass this protection.

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": ["https://example.com"],
    "solveCloudflare": true
  }'

Current platform behavior: Cloudflare bypass is not guaranteed. Some sites have additional protections that may still block the request.

Processing fetched content

The content field returns raw HTML. Here are common next steps:

Task	Approach
Extract text	Parse HTML and strip tags (BeautifulSoup, Cheerio, etc.)
Extract links	Find all `<a>` tags and their `href` attributes
Extract metadata	Parse `<meta>` tags for SEO data (description, og:title, etc.)
Detect changes	Fetch periodically and diff the `content` or `pageTitle` fields
Resolve relative URLs	Combine relative paths with the base `url` from the response

Error handling

Fetch returns request-level errors for invalid input or auth failures. These are separate from per-URL success: false entries within a 200 response.

{
    "error": {
        "type": "invalid_request",
        "message": "urls: This field is required.",
        "metadata": []
    }
}

Common gotchas

Mistake	Fix
Omitting `http://` or `https://` in URLs	All URLs must include the protocol prefix.
Sending more than 10 URLs	The API accepts a maximum of 10 URLs per request. Batch larger lists.
Assuming response order matches request	Match results by the `url` field, not by array index.
Treating a `200` as all-success	A `200` can contain failed entries. Check `success` for each item.
Sending an empty `urls` array	Returns `400`: `"urls: This list may not be empty."`.
Expecting JavaScript-rendered content	Current platform behavior: The API fetches server-side HTML. JavaScript-heavy SPAs may return minimal HTML.
Comparing Search and Fetch timestamps	Search uses milliseconds, Fetch uses seconds. Divide Search by 1000 to compare.

Next steps

Web Search — search the web to find URLs to fetch.
Web API Examples — ready-to-copy patterns for common workflows.

Documentation

Products

Request body

Response body

Fetch a single URL

Fetch multiple URLs

Handle partial failures

Correlating failures to input URLs

Bypass Cloudflare protection

Processing fetched content

Error handling

Common gotchas

Next steps

Documentation

Products

​Request body

​Response body

​Fetch a single URL

​Fetch multiple URLs

​Handle partial failures

​Correlating failures to input URLs

​Bypass Cloudflare protection

​Processing fetched content

​Error handling

​Common gotchas

​Next steps

Request body

Response body

Fetch a single URL

Fetch multiple URLs

Handle partial failures

Correlating failures to input URLs

Bypass Cloudflare protection

Processing fetched content

Error handling

Common gotchas

Next steps