Skip to main content
Use this when you have specific URLs and need to retrieve their full HTML content — for content extraction, web scraping, SEO analysis, or change tracking. The Web Fetch API accepts a list of URLs and returns the page title and full HTML content for each. You can fetch up to 10 URLs in a single request. Every request goes to the same endpoint:
POST https://api.crustdata.com/web/enrich/live
The endpoint path is /web/enrich/live (not /web/fetch/live) because it follows the Crustdata convention where “enrich” means adding data to a known identifier — in this case, enriching a URL with its page content.

Request body

ParameterTypeRequiredDefaultDescription
urlsstring[]YesURLs to fetch. Min: 1, max: 10. Must include http:// or https://.
solveCloudflarebooleanNofalseAttempt to bypass Cloudflare protection.

Response body

The response is an array (not an object) — one entry per URL in your request.
FieldTypeDescription
successbooleanWhether this URL was fetched successfully.
urlstring?The URL that was fetched. null if the fetch failed.
timestampinteger?Unix timestamp (seconds) when fetched. null on failure.
pageTitlestring?The <title> tag content. null on failure.
contentstring?Full HTML content of the page. null on failure.
Timestamps: Fetch timestamps are in seconds. Search timestamps are in milliseconds. Account for this when comparing timestamps across endpoints.
Two kinds of failure, two places to check:
  • Request-level errors (400, 401) — the entire request failed. You get an error object, not an array. Caused by missing fields, empty arrays, or bad auth.
  • Per-URL failures within a 200 — individual entries with success: false and null fields. Caused by unreachable URLs, timeouts, or bot protection.
Always check the HTTP status first, then check success for each entry in the array.

Fetch a single URL

The simplest request fetches one URL and returns its HTML content.
curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": ["https://example.com"]
  }'
The content field is trimmed here. It contains the full HTML of the fetched page.
Extract: Parse content using an HTML parser (BeautifulSoup for Python, Cheerio for Node.js) to extract specific elements like text, links, or metadata.

Fetch multiple URLs

Pass up to 10 URLs to fetch their content in parallel.
curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": [
      "https://example.com",
      "https://example.org",
      "https://www.crustdata.com"
    ]
  }'
Current platform behavior: The response array order may differ from the request order. Match successful results by their url field, not by array index.

Handle partial failures

When some URLs succeed and others fail, the request still returns 200. Failed URLs have success: false with all other fields as null.
curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": [
      "https://example.com",
      "https://this-domain-does-not-exist-xyz.com"
    ]
  }'

Correlating failures to input URLs

Failed entries have url: null, so you cannot directly identify which input URL failed. To correlate failures:
  1. Track the URLs you sent.
  2. Collect the url values from all successful entries.
  3. Any input URL not in the successful set is the one that failed.
const requestedUrls = ["https://example.com", "https://this-domain-does-not-exist-xyz.com"];
const successfulUrls = new Set(
  fetchResponse.filter(r => r.success).map(r => r.url)
);
const failedUrls = requestedUrls.filter(url => !successfulUrls.has(url));
// failedUrls = ["https://this-domain-does-not-exist-xyz.com"]
Always check the success field for each entry in the response array. Build your parsing logic to handle both successful and failed entries gracefully.

Bypass Cloudflare protection

Some websites use Cloudflare to block automated requests. Set solveCloudflare: true to attempt to bypass this protection.
curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": ["https://example.com"],
    "solveCloudflare": true
  }'
Current platform behavior: Cloudflare bypass is not guaranteed. Some sites have additional protections that may still block the request.

Processing fetched content

The content field returns raw HTML. Here are common next steps:
TaskApproach
Extract textParse HTML and strip tags (BeautifulSoup, Cheerio, etc.)
Extract linksFind all <a> tags and their href attributes
Extract metadataParse <meta> tags for SEO data (description, og:title, etc.)
Detect changesFetch periodically and diff the content or pageTitle fields
Resolve relative URLsCombine relative paths with the base url from the response

Error handling

Fetch returns request-level errors for invalid input or auth failures. These are separate from per-URL success: false entries within a 200 response.
{
    "error": {
        "type": "invalid_request",
        "message": "urls: This field is required.",
        "metadata": []
    }
}

Common gotchas

MistakeFix
Omitting http:// or https:// in URLsAll URLs must include the protocol prefix.
Sending more than 10 URLsThe API accepts a maximum of 10 URLs per request. Batch larger lists.
Assuming response order matches requestMatch results by the url field, not by array index.
Treating a 200 as all-successA 200 can contain failed entries. Check success for each item.
Sending an empty urls arrayReturns 400: "urls: This list may not be empty.".
Expecting JavaScript-rendered contentCurrent platform behavior: The API fetches server-side HTML. JavaScript-heavy SPAs may return minimal HTML.
Comparing Search and Fetch timestampsSearch uses milliseconds, Fetch uses seconds. Divide Search by 1000 to compare.

Next steps