Web Search recipes - Crustdata Docs

Ready-to-copy patterns for Web Search. Each example shows a real request, the response, and what to extract.

Restrict results to a specific site

Use the site parameter to limit results to a single domain. Useful for finding company pages on LinkedIn, profiles on GitHub, or content on a specific website.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "ADAMSBROWN, LLC",
    "sources": ["web"],
    "site": "linkedin.com/company"
  }'

Extract: The first result URL is typically the best match. For company profile URLs, pass the result to Company Identify for a full profile.

Search with date filtering

Use start_date and end_date (Unix timestamps in seconds) to limit results to a specific time range.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "distributed systems",
    "location": "US",
    "sources": ["web", "news"],
    "site": "example.com",
    "start_date": 1728259200,
    "end_date": 1730937600
  }'

Convert dates to Unix timestamps: October 7, 2024 = 1728259200. You can use any Unix timestamp converter tool.

Multi-page search

Use page to request multiple result pages in a single response. The metadata object tells you which pages succeeded.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "artificial intelligence startups",
    "sources": ["web"],
    "location": "US",
    "page": 3
  }'

Response trimmed for clarity. Pages 1 succeeded, page 2 failed, page 3 was empty.

The response aggregates results across all successful pages into a single results[] array. Check metadata to understand page-level outcomes:

metadata.totalResults — total results available across all sources and pages.
metadata.failedPages — page numbers that returned errors. Retry the request with a smaller page value if needed.
metadata.emptyPages — page numbers that returned no results. You have reached the end of available results — do not retry.

if (response.metadata.failedPages.length > 0) {
    console.log("Failed pages:", response.metadata.failedPages);
}

if (response.metadata.emptyPages.length > 0) {
    console.log(
        "Reached end of results at page",
        Math.min(...response.metadata.emptyPages),
    );
}

Current platform behavior (not guaranteed by the OpenAPI contract): Each page returns approximately 10 results. If metadata.emptyPages contains page numbers, you have reached the end of available results.

Company and profile discovery

Find a company’s website domain

Search for a company by name followed by “website”. The first result URL is typically the company’s website.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "ADAMSBROWN, LLC website",
    "sources": ["web"]
  }'

Extract: results[0].url → https://www.adamsbrowncpa.com/

Do not wrap the company name in quotes — this lets the search engine match partial name variations. If the company name is common, add city and state: "ADAMSBROWN, LLC WICHITA KS website".

Bridge to Company API: Extract the domain from the URL, then pass it to Company Enrich for the full company profile:

const url = new URL("https://www.adamsbrowncpa.com/");
const domain = url.hostname.replace("www.", ""); // "adamsbrowncpa.com"

Company Enrich request body

{ "domains": ["adamsbrowncpa.com"] }

Find a person’s profile URL

Use the site parameter with the profile host (e.g. linkedin.com/in) to find a person’s profile, then enrich via the Person API.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "Jeff Dean Google",
    "sources": ["web"],
    "site": "linkedin.com/in"
  }'

Bridge to Person API: Pass the profile URL to Person Enrich:

{
    "professional_network_profile_urls": [
        "https://www.linkedin.com/in/jeff-dean-8b212555"
    ]
}

Find a developer profile

Use site: "github.com" to search for developer profiles on GitHub.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "Tyler Lambe",
    "sources": ["web"],
    "site": "github.com",
    "location": "US"
  }'

Research and analysis

Search for academic research on a topic

Search for academic articles with date filtering to find papers with citation data and PDF links.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "deep learning",
    "location": "US",
    "sources": ["scholar-articles"],
    "start_date": 1672531200,
    "end_date": 1704067200
  }'

Extract:

citations — citation count to gauge impact.
pdf_url — direct PDF download link (when available).
authors[].profile_url — Author profile link.
metadata — citation string: "Author - Year - Publisher".

Look up an academic researcher’s profile

Search for a researcher by name to get their full academic profile with h-index, citation metrics, and top publications.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "jeff dean",
    "location": "US",
    "sources": ["scholar-author"]
  }'

Extract: citations.all for total impact, h_index.all for research quality, articles[] for top publications.

Get an AI-generated overview of a topic

Use deep research mode for a synthesized answer with source references.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "uv vs pip",
    "location": "US",
    "sources": ["ai"]
  }'

Extract: content for the overview text, references[].url for source verification.

Search news with date filtering

Filter news results to a specific date range by providing start_date and end_date as Unix timestamps in seconds.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "artificial intelligence developments",
    "location": "US",
    "sources": ["news"],
    "start_date": 1728259200,
    "end_date": 1730937600
  }'

start_date and end_date are Unix timestamps in seconds. October 7, 2024 = 1728259200. November 7, 2024 = 1730937600.

Search for recent social media mentions of a topic or person.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "crustdata AI agents",
    "location": "US",
    "sources": ["social"]
  }'

Current platform behavior: Social media results may return an empty results array for some queries depending on availability. Always check results.length before processing.

Mixed-source search with safe parsing

When searching multiple sources, the results[] array contains items with different shapes. Always branch on result.source.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "machine learning infrastructure",
    "location": "US",
    "sources": ["web", "news", "scholar-articles"]
  }'

Safe parsing logic:

const fetchableUrls = [];

for (const result of response.results) {
    switch (result.source) {
        case "web":
        case "news":
        case "social":
            fetchableUrls.push(result.url);
            break;
        case "scholar-articles":
        case "scholar-articles-enriched":
            fetchableUrls.push(result.url);
            break;
        case "scholar-author":
            console.log(`Author: ${result.name} (${result.affiliation})`);
            break;
        case "ai":
            console.log(`AI Overview: ${result.content}`);
            result.references?.forEach((ref) => fetchableUrls.push(ref.url));
            break;
    }
}

Not every search result should go to Fetch. Academic author results are profiles, not content pages. Deep research results provide content inline and use references[].url for source URLs instead of a top-level url.

Search-then-fetch workflows

End-to-end competitive intelligence

Search for competitor news, then fetch the full article content for analysis.

Search for competitor news

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "OpenAI funding 2026",
    "location": "US",
    "sources": ["news", "web"]
  }'

Extract URLs from results[].url.

Select URLs and fetch content

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "urls": [
      "https://www.reuters.com/technology/openai-funding-2026/",
      "https://techcrunch.com/2026/01/15/openai-funding/"
    ]
  }'

Parse results and handle failures

Check success for each entry. Parse HTML from successful fetches. To identify which URLs failed, compare requested URLs against successful url values.

const successfulUrls = new Set(
  fetchResponse.filter(r => r.success).map(r => r.url)
);
const failedUrls = requestedUrls.filter(u => !successfulUrls.has(u));

Failed entries have url: null, so correlate failures by comparing successful URLs to your input list. See Fetch: correlating failures.

Full pipeline in Python: search → fetch → parse

A complete Python example that searches, filters fetchable URLs by source, fetches content, and handles failures.

import requests

API_KEY = "YOUR_API_KEY"
HEADERS = {
    "authorization": f"Bearer {API_KEY}",
    "content-type": "application/json",
    "x-api-version": "2025-11-01",
}

# Step 1: Search
search_resp = requests.post(
    "https://api.crustdata.com/web/search/live",
    headers=HEADERS,
    json={"query": "OpenAI funding 2026", "sources": ["web", "news"]},
).json()

# Step 2: Extract fetchable URLs
fetchable_urls = []
for result in search_resp["results"]:
    if result["source"] in ("web", "news", "social", "scholar-articles", "scholar-articles-enriched"):
        fetchable_urls.append(result["url"])
    elif result["source"] == "ai":
        for ref in result.get("references", []):
            fetchable_urls.append(ref["url"])

# Step 3: Fetch (max 10 URLs per request)
fetch_resp = requests.post(
    "https://api.crustdata.com/web/enrich/live",
    headers=HEADERS,
    json={"urls": fetchable_urls[:10]},
).json()

# Step 4: Process results and correlate failures
successful_urls = set()
for item in fetch_resp:
    if item["success"]:
        successful_urls.add(item["url"])
        print(f"Fetched: {item['title']} ({len(item['content'])} chars)")

failed_urls = [u for u in fetchable_urls[:10] if u not in successful_urls]
if failed_urls:
    print(f"Failed URLs: {failed_urls}")

AI overview → fetch source references

When using deep research mode, the overview content is inline. To get the full source articles, fetch the URLs from references[].

Search with deep research mode

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{"query": "uv vs pip", "sources": ["ai"], "location": "US"}'

Extract: results[0].references[].url — the source article URLs.

Fetch the reference URLs

curl --request POST \
  --url https://api.crustdata.com/web/enrich/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{"urls": ["https://realpython.com/uv-vs-pip/"]}'

Parse the content from successful entries to read the full source articles.

AI results do not have a top-level url field. Always use references[].url for fetch targets.

Next steps

Source types — full result shapes and field presence for each source.
Web Search — main Search page with request/response and errors.
Web Fetch — fetch the HTML content of URLs returned by search results.

​Restrict results to a specific site

​Search with date filtering

​Multi-page search

​Company and profile discovery

​Find a company’s website domain

​Find a person’s profile URL

​Find a developer profile

​Research and analysis

​Search for academic research on a topic

​Look up an academic researcher’s profile

​Get an AI-generated overview of a topic

​Search news with date filtering

​Search social media posts

​Mixed-source search with safe parsing

​Search-then-fetch workflows

​End-to-end competitive intelligence

​Full pipeline in Python: search → fetch → parse

​AI overview → fetch source references

​Next steps

Restrict results to a specific site

Search with date filtering

Multi-page search

Company and profile discovery

Find a company’s website domain

Find a person’s profile URL

Find a developer profile

Research and analysis

Search for academic research on a topic

Look up an academic researcher’s profile

Get an AI-generated overview of a topic

Search news with date filtering

Search social media posts

Mixed-source search with safe parsing

Search-then-fetch workflows

End-to-end competitive intelligence

Full pipeline in Python: search → fetch → parse

AI overview → fetch source references

Next steps