Skip to main content

Web Fetch API

🚀 Try Now ​

The Web Fetch API allows you to fetch the HTML content of webpages given their URLs. This endpoint retrieves the page title and full HTML content for up to 10 URLs in a single request.

Use Cases​

  • Content Extraction: Retrieve webpage content for analysis and processing
  • Web Scraping: Fetch HTML from multiple pages efficiently
  • Data Collection: Gather webpage content for research and monitoring
  • Content Monitoring: Track changes on specific webpages over time
  • SEO Analysis: Extract page content and metadata for optimization

Endpoint​

POST /screener/web-fetch

Request Parameters​

ParameterTypeDescriptionRequiredDefault
urlsarrayArray of URLs to fetch (max 10 URLs per request)Yes-

URL Requirements​

  • URLs must be properly formatted with protocol prefix (http:// or https://)
  • Maximum of 10 URLs per request
  • Each URL should be a valid, accessible webpage

Example Requests​

1. Fetch Single URL

1. Fetch content from a single webpage​

curl --request POST \
--url https://api.crustdata.com/screener/web-fetch \
--header 'Authorization: Token $authToken' \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com"]
}'
2. Fetch Multiple URLs

2. Fetch content from multiple webpages​

curl --request POST \
--url https://api.crustdata.com/screener/web-fetch \
--header 'Authorization: Token $authToken' \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com",
"https://www.crustdata.com",
"https://docs.crustdata.com"
]
}'
3. Maximum Batch Fetch (10 URLs)

3. Fetch the maximum number of URLs in a single request​

curl --request POST \
--url https://api.crustdata.com/screener/web-fetch \
--header 'Authorization: Token $authToken' \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com",
"https://example.org",
"https://example.net",
"https://www.crustdata.com",
"https://docs.crustdata.com",
"https://github.com",
"https://stackoverflow.com",
"https://news.ycombinator.com",
"https://www.producthunt.com",
"https://techcrunch.com"
]
}'

Example Responses​

Success Response

Successful Response (200 OK)​

[
{
"success": true,
"url": "https://example.com",
"timestamp": 1765281552,
"pageTitle": "Example Domain",
"content": "<html lang=\"en\"><head><title>Example Domain</title><meta name=\"viewport\" content=\"width=device-width, initial-scale=1\"><style>body{background:#eee;width:60vw;margin:15vh auto;font-family:system-ui,sans-serif}h1{font-size:1.5em}div{opacity:0.8}a:link,a:visited{color:#348}</style></head><body><div><h1>Example Domain</h1><p>This domain is for use in documentation examples without needing permission. Avoid use in operations.</p><p><a href=\"https://iana.org/domains/example\">Learn more</a></p></div>\n</body></html>"
}
]

Response Fields​

FieldTypeDescription
successbooleanWhether the fetch was successful for this URL
urlstringThe URL that was fetched
timestampintegerUnix timestamp (seconds) when the page was fetched
pageTitlestringThe title of the webpage (from <title> tag)
contentstringThe full HTML content of the webpage

Multiple URLs Response​

When fetching multiple URLs, the response is an array with one object per URL:

[
{
"success": true,
"url": "https://example.com",
"timestamp": 1765281552,
"pageTitle": "Example Domain",
"content": "<html>...</html>"
},
{
"success": true,
"url": "https://www.crustdata.com",
"timestamp": 1765281553,
"pageTitle": "Crustdata - B2B Data Platform",
"content": "<html>...</html>"
}
]
Partial Failure Response

Partial Failure Response (200 OK)​

When some URLs succeed and others fail:

[
{
"success": true,
"url": "https://example.com",
"timestamp": 1765281552,
"pageTitle": "Example Domain",
"content": "<html>...</html>"
},
{
"success": false,
"url": "https://invalid-domain-that-does-not-exist.com",
"error": "Failed to fetch URL"
}
]

Common Failure Reasons​

  • Invalid or unreachable URL
  • Timeout waiting for page to load
  • Network connectivity issues
Validation Error

Validation Error Response (400 Bad Request)​

{
"urls": [
"This field is required."
]
}

Common Validation Errors​

  • Missing urls parameter
  • Empty urls array
  • More than 10 URLs in the request
  • Invalid URL format (missing http:// or https:// prefix)
  • URLs is not an array
Insufficient Credits

Insufficient Credits (402 Payment Required)​

{
"error": "Insufficient credits. Please get in touch with the Crustdata sales team."
}
Service Error

Service Error Response (500 Internal Server Error)​

{
"error": "Failed to process web fetch request"
}

Common Errors​

  • Failed to process web fetch request - The fetch service encountered an internal error
  • Web fetch request timed out - The request took too long to complete
  • Failed to connect to web fetch service - The service is temporarily unavailable

Best Practices​

  1. URL Validation: Always ensure URLs are properly formatted with http:// or https:// prefixes
  2. Batch Requests: Group URLs together (up to 10) to minimize API calls
  3. Error Handling: Check the success field for each URL in the response
  4. Content Processing: Be prepared to handle various HTML structures and encodings
  5. Timeout Handling: Implement retry logic for timeouts on critical pages

Content Processing Tips​

Extracting Text from HTML​

The content field contains raw HTML. You may want to:

  • Parse HTML using libraries (BeautifulSoup for Python, Cheerio for Node.js)
  • Extract specific elements (headings, paragraphs, links)
  • Remove scripts and styles for text-only content
  • Handle special characters and encoding properly

Common Use Cases​

  • Metadata Extraction: Parse <meta> tags for SEO data
  • Link Discovery: Extract all <a> tags for crawling
  • Content Analysis: Process text content for NLP tasks
  • Change Detection: Compare content over time to detect updates

FAQ​

Q: What's the maximum number of URLs I can fetch in one request? A: You can fetch up to 10 URLs per request.

Q: How long does it take to fetch a page? A: Most pages are fetched within 2-5 seconds. Complex pages with lots of JavaScript may take longer.

Q: Do I need to include the protocol in URLs? A: Yes, all URLs must start with http:// or https://.

Q: What happens if one URL fails in a batch request? A: The successful URLs will return their content, and failed URLs will have "success": false with an error message. You still pay credits for successful fetches.

Q: Will I be charged for failed fetches? A: This depends on the failure type. Network errors and invalid URLs typically don't consume credits, but rate limits and timeouts may.

Q: Can I fetch content from password-protected pages? A: No, this API fetches publicly accessible pages only. Authentication is not supported.

Q: What's the maximum page size that can be fetched? A: There are reasonable limits to prevent abuse, but typical webpages (up to several MB) should work fine.

Q: How do I handle relative URLs in the fetched content? A: You'll need to resolve relative URLs yourself using the base URL from the url field in the response.

Q: Can I specify custom headers or user agents? A: Custom headers are not currently supported. The API uses a standard user agent.