Web Fetch API

🚀 Try Now

The Web Fetch API allows you to fetch the HTML content of webpages given their URLs. This endpoint retrieves the page title and full HTML content for up to 10 URLs in a single request.

Use Cases

Content Extraction: Retrieve webpage content for analysis and processing
Web Scraping: Fetch HTML from multiple pages efficiently
Data Collection: Gather webpage content for research and monitoring
Content Monitoring: Track changes on specific webpages over time
SEO Analysis: Extract page content and metadata for optimization

Endpoint

POST /screener/web-fetch

Request Parameters

Parameter	Type	Description	Required	Default
`urls`	array	Array of URLs to fetch (max 10 URLs per request)	Yes	-

URL Requirements

URLs must be properly formatted with protocol prefix (http:// or https://)
Maximum of 10 URLs per request
Each URL should be a valid, accessible webpage

Example Requests

1. Fetch Single URL

1. Fetch content from a single webpage

curl --request POST \
  --url https://api.crustdata.com/screener/web-fetch \
  --header 'Authorization: Token $authToken' \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://example.com"]
  }'

2. Fetch Multiple URLs

2. Fetch content from multiple webpages

curl --request POST \
  --url https://api.crustdata.com/screener/web-fetch \
  --header 'Authorization: Token $authToken' \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com",
      "https://www.crustdata.com",
      "https://docs.crustdata.com"
    ]
  }'

3. Maximum Batch Fetch (10 URLs)

3. Fetch the maximum number of URLs in a single request

curl --request POST \
  --url https://api.crustdata.com/screener/web-fetch \
  --header 'Authorization: Token $authToken' \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com",
      "https://example.org",
      "https://example.net",
      "https://www.crustdata.com",
      "https://docs.crustdata.com",
      "https://github.com",
      "https://stackoverflow.com",
      "https://news.ycombinator.com",
      "https://www.producthunt.com",
      "https://techcrunch.com"
    ]
  }'

Example Responses

Success Response

Successful Response (200 OK)

[
  {
    "success": true,
    "url": "https://example.com",
    "timestamp": 1765281552,
    "pageTitle": "Example Domain",
    "content": "<html lang=\"en\"><head><title>Example Domain</title><meta name=\"viewport\" content=\"width=device-width, initial-scale=1\"><style>body{background:#eee;width:60vw;margin:15vh auto;font-family:system-ui,sans-serif}h1{font-size:1.5em}div{opacity:0.8}a:link,a:visited{color:#348}</style></head><body><div><h1>Example Domain</h1><p>This domain is for use in documentation examples without needing permission. Avoid use in operations.</p><p><a href=\"https://iana.org/domains/example\">Learn more</a></p></div>\n</body></html>"
  }
]

Response Fields

Field	Type	Description
`success`	boolean	Whether the fetch was successful for this URL
`url`	string	The URL that was fetched
`timestamp`	integer	Unix timestamp (seconds) when the page was fetched
`pageTitle`	string	The title of the webpage (from `<title>` tag)
`content`	string	The full HTML content of the webpage

Multiple URLs Response

When fetching multiple URLs, the response is an array with one object per URL:

[
  {
    "success": true,
    "url": "https://example.com",
    "timestamp": 1765281552,
    "pageTitle": "Example Domain",
    "content": "<html>...</html>"
  },
  {
    "success": true,
    "url": "https://www.crustdata.com",
    "timestamp": 1765281553,
    "pageTitle": "Crustdata - B2B Data Platform",
    "content": "<html>...</html>"
  }
]

Partial Failure Response

Partial Failure Response (200 OK)

When some URLs succeed and others fail:

[
  {
    "success": true,
    "url": "https://example.com",
    "timestamp": 1765281552,
    "pageTitle": "Example Domain",
    "content": "<html>...</html>"
  },
  {
    "success": false,
    "url": "https://invalid-domain-that-does-not-exist.com",
    "error": "Failed to fetch URL"
  }
]

Common Failure Reasons

Invalid or unreachable URL
Timeout waiting for page to load
Network connectivity issues

Validation Error

Validation Error Response (400 Bad Request)

{
  "urls": [
    "This field is required."
  ]
}

Common Validation Errors

Missing urls parameter
Empty urls array
More than 10 URLs in the request
Invalid URL format (missing http:// or https:// prefix)
URLs is not an array

Insufficient Credits

Insufficient Credits (402 Payment Required)

{
  "error": "Insufficient credits. Please get in touch with the Crustdata sales team."
}

Service Error

Service Error Response (500 Internal Server Error)

{
  "error": "Failed to process web fetch request"
}

Common Errors

Failed to process web fetch request - The fetch service encountered an internal error
Web fetch request timed out - The request took too long to complete
Failed to connect to web fetch service - The service is temporarily unavailable

Best Practices

URL Validation: Always ensure URLs are properly formatted with http:// or https:// prefixes
Batch Requests: Group URLs together (up to 10) to minimize API calls
Error Handling: Check the success field for each URL in the response
Content Processing: Be prepared to handle various HTML structures and encodings
Timeout Handling: Implement retry logic for timeouts on critical pages

Content Processing Tips

Extracting Text from HTML

The content field contains raw HTML. You may want to:

Parse HTML using libraries (BeautifulSoup for Python, Cheerio for Node.js)
Extract specific elements (headings, paragraphs, links)
Remove scripts and styles for text-only content
Handle special characters and encoding properly

Common Use Cases

Metadata Extraction: Parse <meta> tags for SEO data
Link Discovery: Extract all <a> tags for crawling
Content Analysis: Process text content for NLP tasks
Change Detection: Compare content over time to detect updates

FAQ

Q: What's the maximum number of URLs I can fetch in one request? A: You can fetch up to 10 URLs per request.

Q: How long does it take to fetch a page? A: Most pages are fetched within 2-5 seconds. Complex pages with lots of JavaScript may take longer.

Q: Do I need to include the protocol in URLs? A: Yes, all URLs must start with http:// or https://.

Q: What happens if one URL fails in a batch request? A: The successful URLs will return their content, and failed URLs will have "success": false with an error message. You still pay credits for successful fetches.

Q: Will I be charged for failed fetches? A: This depends on the failure type. Network errors and invalid URLs typically don't consume credits, but rate limits and timeouts may.

Q: Can I fetch content from password-protected pages? A: No, this API fetches publicly accessible pages only. Authentication is not supported.

Q: What's the maximum page size that can be fetched? A: There are reasonable limits to prevent abuse, but typical webpages (up to several MB) should work fine.

Q: How do I handle relative URLs in the fetched content? A: You'll need to resolve relative URLs yourself using the base URL from the url field in the response.

Q: Can I specify custom headers or user agents? A: Custom headers are not currently supported. The API uses a standard user agent.

Web Search API - Search the web using our search engine

🚀 Try Now ​

Use Cases​

Endpoint​

Request Parameters​

URL Requirements​

Example Requests​

1. Fetch content from a single webpage​

2. Fetch content from multiple webpages​

3. Fetch the maximum number of URLs in a single request​

Example Responses​

Successful Response (200 OK)​

Response Fields​

Multiple URLs Response​

Partial Failure Response (200 OK)​

Common Failure Reasons​

Validation Error Response (400 Bad Request)​

Common Validation Errors​

Insufficient Credits (402 Payment Required)​

Service Error Response (500 Internal Server Error)​

Common Errors​

Best Practices​

Content Processing Tips​

Extracting Text from HTML​

Common Use Cases​

FAQ​

Related APIs​