Web Fetch API
🚀 Try Now ​
The Web Fetch API allows you to fetch the HTML content of webpages given their URLs. This endpoint retrieves the page title and full HTML content for up to 10 URLs in a single request.
Use Cases​
- Content Extraction: Retrieve webpage content for analysis and processing
- Web Scraping: Fetch HTML from multiple pages efficiently
- Data Collection: Gather webpage content for research and monitoring
- Content Monitoring: Track changes on specific webpages over time
- SEO Analysis: Extract page content and metadata for optimization
Endpoint​
POST /screener/web-fetch
Request Parameters​
| Parameter | Type | Description | Required | Default |
|---|---|---|---|---|
urls | array | Array of URLs to fetch (max 10 URLs per request) | Yes | - |
URL Requirements​
- URLs must be properly formatted with protocol prefix (
http://orhttps://) - Maximum of 10 URLs per request
- Each URL should be a valid, accessible webpage
Example Requests​
1. Fetch Single URL
1. Fetch content from a single webpage​
curl --request POST \
--url https://api.crustdata.com/screener/web-fetch \
--header 'Authorization: Token $authToken' \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com"]
}'
2. Fetch Multiple URLs
2. Fetch content from multiple webpages​
curl --request POST \
--url https://api.crustdata.com/screener/web-fetch \
--header 'Authorization: Token $authToken' \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com",
"https://www.crustdata.com",
"https://docs.crustdata.com"
]
}'
3. Maximum Batch Fetch (10 URLs)
3. Fetch the maximum number of URLs in a single request​
curl --request POST \
--url https://api.crustdata.com/screener/web-fetch \
--header 'Authorization: Token $authToken' \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com",
"https://example.org",
"https://example.net",
"https://www.crustdata.com",
"https://docs.crustdata.com",
"https://github.com",
"https://stackoverflow.com",
"https://news.ycombinator.com",
"https://www.producthunt.com",
"https://techcrunch.com"
]
}'
Example Responses​
Success Response
Successful Response (200 OK)​
[
{
"success": true,
"url": "https://example.com",
"timestamp": 1765281552,
"pageTitle": "Example Domain",
"content": "<html lang=\"en\"><head><title>Example Domain</title><meta name=\"viewport\" content=\"width=device-width, initial-scale=1\"><style>body{background:#eee;width:60vw;margin:15vh auto;font-family:system-ui,sans-serif}h1{font-size:1.5em}div{opacity:0.8}a:link,a:visited{color:#348}</style></head><body><div><h1>Example Domain</h1><p>This domain is for use in documentation examples without needing permission. Avoid use in operations.</p><p><a href=\"https://iana.org/domains/example\">Learn more</a></p></div>\n</body></html>"
}
]
Response Fields​
| Field | Type | Description |
|---|---|---|
success | boolean | Whether the fetch was successful for this URL |
url | string | The URL that was fetched |
timestamp | integer | Unix timestamp (seconds) when the page was fetched |
pageTitle | string | The title of the webpage (from <title> tag) |
content | string | The full HTML content of the webpage |
Multiple URLs Response​
When fetching multiple URLs, the response is an array with one object per URL:
[
{
"success": true,
"url": "https://example.com",
"timestamp": 1765281552,
"pageTitle": "Example Domain",
"content": "<html>...</html>"
},
{
"success": true,
"url": "https://www.crustdata.com",
"timestamp": 1765281553,
"pageTitle": "Crustdata - B2B Data Platform",
"content": "<html>...</html>"
}
]
Partial Failure Response
Partial Failure Response (200 OK)​
When some URLs succeed and others fail:
[
{
"success": true,
"url": "https://example.com",
"timestamp": 1765281552,
"pageTitle": "Example Domain",
"content": "<html>...</html>"
},
{
"success": false,
"url": "https://invalid-domain-that-does-not-exist.com",
"error": "Failed to fetch URL"
}
]
Common Failure Reasons​
- Invalid or unreachable URL
- Timeout waiting for page to load
- Network connectivity issues
Validation Error
Insufficient Credits
Insufficient Credits (402 Payment Required)​
{
"error": "Insufficient credits. Please get in touch with the Crustdata sales team."
}
Service Error
Service Error Response (500 Internal Server Error)​
{
"error": "Failed to process web fetch request"
}
Common Errors​
Failed to process web fetch request- The fetch service encountered an internal errorWeb fetch request timed out- The request took too long to completeFailed to connect to web fetch service- The service is temporarily unavailable
Best Practices​
- URL Validation: Always ensure URLs are properly formatted with
http://orhttps://prefixes - Batch Requests: Group URLs together (up to 10) to minimize API calls
- Error Handling: Check the
successfield for each URL in the response - Content Processing: Be prepared to handle various HTML structures and encodings
- Timeout Handling: Implement retry logic for timeouts on critical pages
Content Processing Tips​
Extracting Text from HTML​
The content field contains raw HTML. You may want to:
- Parse HTML using libraries (BeautifulSoup for Python, Cheerio for Node.js)
- Extract specific elements (headings, paragraphs, links)
- Remove scripts and styles for text-only content
- Handle special characters and encoding properly
Common Use Cases​
- Metadata Extraction: Parse
<meta>tags for SEO data - Link Discovery: Extract all
<a>tags for crawling - Content Analysis: Process text content for NLP tasks
- Change Detection: Compare content over time to detect updates
FAQ​
Q: What's the maximum number of URLs I can fetch in one request? A: You can fetch up to 10 URLs per request.
Q: How long does it take to fetch a page? A: Most pages are fetched within 2-5 seconds. Complex pages with lots of JavaScript may take longer.
Q: Do I need to include the protocol in URLs?
A: Yes, all URLs must start with http:// or https://.
Q: What happens if one URL fails in a batch request?
A: The successful URLs will return their content, and failed URLs will have "success": false with an error message. You still pay credits for successful fetches.
Q: Will I be charged for failed fetches? A: This depends on the failure type. Network errors and invalid URLs typically don't consume credits, but rate limits and timeouts may.
Q: Can I fetch content from password-protected pages? A: No, this API fetches publicly accessible pages only. Authentication is not supported.
Q: What's the maximum page size that can be fetched? A: There are reasonable limits to prevent abuse, but typical webpages (up to several MB) should work fine.
Q: How do I handle relative URLs in the fetched content?
A: You'll need to resolve relative URLs yourself using the base URL from the url field in the response.
Q: Can I specify custom headers or user agents? A: Custom headers are not currently supported. The API uses a standard user agent.
Related APIs​
- Web Search API - Search the web using our search engine