Web Fetch recipes - Crustdata Docs

Ready-to-copy patterns for Web Fetch. Each example shows a real request, the response, and what to extract.

Fetch multiple URLs

Pass up to 10 URLs to fetch their content in parallel.

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": [
      "https://example.com",
      "https://example.org",
      "https://www.crustdata.com"
    ]
  }'

Current platform behavior: The response array order may differ from the request order. Match successful results by their url field, not by array index.

Handle partial failures

When some URLs succeed and others fail, the request still returns 200. Failed URLs have success: false with all other fields as null.

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": [
      "https://example.com",
      "https://this-domain-does-not-exist-xyz.com"
    ]
  }'

Correlating failures to input URLs

Failed entries have url: null, so you cannot directly identify which input URL failed. To correlate failures:

Track the URLs you sent.
Collect the url values from all successful entries.
Any input URL not in the successful set is the one that failed.

const requestedUrls = [
    "https://example.com",
    "https://this-domain-does-not-exist-xyz.com",
];
const successfulUrls = new Set(
    fetchResponse.filter((r) => r.success).map((r) => r.url),
);
const failedUrls = requestedUrls.filter((url) => !successfulUrls.has(url));
// failedUrls = ["https://this-domain-does-not-exist-xyz.com"]

Always check the success field for each entry in the response array. Build your parsing logic to handle both successful and failed entries gracefully.

Bypass Cloudflare protection with human mode

Some websites use Cloudflare to block automated requests. Set human_mode: true to attempt a browser-like fetch path for these pages.

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": ["https://example.com"],
    "human_mode": true
  }'

Current platform behavior: Cloudflare bypass is not guaranteed. Some sites have additional protections that may still block the request.

Processing fetched content

The content field returns raw HTML. Here are common next steps:

Task	Approach
Extract text	Parse HTML and strip tags (BeautifulSoup, Cheerio, etc.)
Extract links	Find all `<a>` tags and their `href` attributes
Extract metadata	Parse `<meta>` tags for SEO data (description, og:title, etc.)
Detect changes	Fetch periodically and diff the `content` or `title` fields
Resolve relative URLs	Combine relative paths with the base `url` from the response

Next steps

Web Fetch — back to the main Fetch page.
Reference — request parameters, error handling, and common gotchas.
Web Search recipes — search-then-fetch workflow patterns.

​Fetch multiple URLs

​Handle partial failures

​Correlating failures to input URLs

​Bypass Cloudflare protection with human mode

​Processing fetched content

​Next steps

Fetch multiple URLs

Handle partial failures

Correlating failures to input URLs

Bypass Cloudflare protection with human mode

Processing fetched content

Next steps