Web Search

Use this when you want to find web pages, news articles, academic papers, author profiles, AI-generated overviews, or social media posts matching a search query. The Web Search API accepts a query and returns results from one or more source types. The result shape varies by source — always specify sources explicitly when you need predictable parsing. Every request goes to the same endpoint:

POST https://api.crustdata.com/web/search/live

Parameters

Request fields and defaults

Result shapes

Per-source field tables with Tabs

Field matrix

Which fields exist for each source

Request body

Parameter	Type	Required	Default	Description
`query`	string	Yes	—	Search query text. Max 5,000 characters. Supports search operators like `site:` and `filetype:`.
`geolocation`	string	No	—	ISO 3166-1 alpha-2 country code for region-specific results (e.g., `"US"`, `"GB"`, `"JP"`).
`sources`	string[]	No	—	Sources to query: `web`, `news`, `scholar-articles`, `scholar-articles-enriched`, `scholar-author`, `ai`, `social`. Current platform behavior: omitting this field searches all sources.
`site`	string	No	—	Restrict results to a domain (e.g., `"linkedin.com/company"`, `"github.com"`). Max 500 characters.
`startDate`	integer	No	—	Unix timestamp (seconds). Only results after this date.
`endDate`	integer	No	—	Unix timestamp (seconds). Only results before this date. Must be > `startDate`.
`numPages`	integer	No	`1`	Number of result pages to return. Minimum: `1`.
`solveCloudflare`	boolean	No	`false`	Current platform behavior: Attempt to bypass Cloudflare protection when fetching result page content. Affects content retrieval, not search discovery itself. Not guaranteed to succeed.

Source capabilities

Current platform behavior — not guaranteed by the OpenAPI contract. Parameter applicability varies by source. This table reflects observed behavior.

Source	Best use case	Fetchable `url`?	`site` effective?	Date filters effective?
`web`	General web search	Yes	Yes	Yes
`news`	News articles	Yes	Yes	Yes
`scholar-articles`	Academic papers	Yes	No	Yes
`scholar-articles-enriched`	Papers + author profiles	Yes	No	Yes
`scholar-author`	Researcher profiles	No	No	No
`ai`	AI-generated summaries	No	No	No
`social`	Social media mentions	Yes	No	No

Response body

Field	Type	Description
`success`	boolean	Whether the search executed successfully.
`query`	string	The query as interpreted by the API (includes `site:` prefix if `site` was set).
`timestamp`	integer	Unix timestamp in milliseconds when the search was performed.
`results`	array	Search results. Shape varies by `source` — see Result shapes by source.
`metadata.totalResults`	integer	Total number of results available across all pages (may exceed the number in the `results` array if you requested fewer pages).
`metadata.failedPages`	array	Page numbers that failed to return results.
`metadata.emptyPages`	array	Page numbers that returned no results.

Timestamps: Search timestamp is in milliseconds. Fetch timestamp is in seconds. Divide Search timestamps by 1000 when comparing across endpoints.

Your first search

The simplest search uses a query with an explicit sources array. Always specify sources for predictable result parsing.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "crustdata",
    "sources": ["web"],
    "geolocation": "US"
  }'

Response trimmed for clarity.

Extract: Each result in results[] contains source, title, url, snippet, and position. Use position for ranking and url for follow-up fetching.

Restrict results to a specific site

Use the site parameter to limit results to a single domain. Useful for finding company pages on LinkedIn, profiles on GitHub, or content on a specific website.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "ADAMSBROWN, LLC",
    "sources": ["web"],
    "site": "linkedin.com/company"
  }'

Extract: The first result URL is typically the best match. For company LinkedIn URLs, pass the result to the Company Identify API for a full profile.

Search with date filtering

Use startDate and endDate (Unix timestamps in seconds) to limit results to a specific time range.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "distributed systems",
    "geolocation": "US",
    "sources": ["web", "news"],
    "site": "example.com",
    "startDate": 1728259200,
    "endDate": 1730937600
  }'

Convert dates to Unix timestamps: October 7, 2024 = 1728259200. You can use any Unix timestamp converter tool.

Result shapes by source

The results[] array shape depends on the source field of each result. Use this reference when parsing multi-source responses.

web / news
scholar-articles
scholar-author
ai
social

Standard web and news results share the same shape.

Field	Type	Description
`source`	string	`"web"` or `"news"`.
`title`	string	Page title.
`url`	string	Page URL.
`snippet`	string	Text excerpt.
`position`	integer	Result position (1-based).

{
    "source": "web",
    "title": "Crustdata: Real-Time B2B Data Broker via API or Data Feed",
    "url": "https://crustdata.com/",
    "snippet": "Crustdata is a B2B data provider offering real-time company & people datasets.",
    "position": 1
}

Academic article results include citation data, author information, and optional PDF links.

Field	Type	Description
`source`	string	`"scholar-articles"` or `"scholar-articles-enriched"`.
`title`	string	Article title.
`url`	string	Link to the article.
`snippet`	string	Abstract excerpt.
`metadata`	string	Citation string: `"Author - Year - Publisher"`.
`pdf_url`	string?	Direct PDF link, if available.
`position`	integer	Result position (1-based).
`authors`	array	`[{ name, profile_url, profile_id }]`.
`citations`	integer	Total citation count.

{
    "source": "scholar-articles",
    "title": "Understanding deep learning",
    "url": "https://books.google.com/books?hl=en&lr=lang_en&id=rvyxEAAAQBAJ",
    "snippet": "...to this field understand the principles behind deep learning.",
    "metadata": "SJD Prince - 2023 - books.google.com",
    "pdf_url": null,
    "position": 1,
    "authors": [
        {
            "name": "SJD Prince",
            "profile_url": "https://scholar.google.com/citations?user=fjm67xYAAAAJ&hl=en&oi=sra",
            "profile_id": "fjm67xYAAAAJ"
        }
    ],
    "citations": 618
}

Use scholar-articles-enriched instead of scholar-articles to get richer author profile data. The result shape is the same, with more author fields populated.

Author profile results have a completely different shape — no snippet, position, or title. Instead, you get a full researcher profile.

Field	Type	Description
`source`	string	`"scholar-author"`.
`url`	string	Google Scholar profile URL.
`name`	string	Author full name.
`affiliation`	string	Institutional affiliation.
`website`	string?	Personal or institutional website.
`interests`	array	`[{ title, link }]` — research interests.
`thumbnail`	string?	Profile photo URL.
`citations`	object	`{ all, since_2020 }` — total and recent counts.
`h_index`	object	`{ all, since_2020 }`.
`i10_index`	object	`{ all, since_2020 }`.
`articles`	array	Top publications: `[{ title, url, year, citations, authors, publication }]`.

{
    "source": "scholar-author",
    "url": "https://scholar.google.com/citations?user=NMS69lQAAAAJ&hl=en&oi=ao",
    "name": "Jeff Dean",
    "affiliation": "Google Chief Scientist, Google Research and Google DeepMind",
    "website": "http://research.google.com/people/jeff",
    "interests": [
        { "title": "Distributed systems", "link": "https://scholar.google.com/..." }
    ],
    "citations": { "all": 401624, "since_2020": 231008 },
    "h_index": { "all": 114, "since_2020": 78 },
    "i10_index": { "all": 319, "since_2020": 203 },
    "articles": [
        {
            "title": "MapReduce: simplified data processing on large clusters",
            "url": "https://scholar.google.com/...",
            "year": "2008",
            "citations": "37255",
            "authors": "J Dean, S Ghemawat",
            "publication": "Communications of the ACM 51 (1), 107-113, 2008"
        }
    ]
}

AI mode returns a single AI-generated overview with source references. No snippet, position, or standard search fields.

Field	Type	Description
`source`	string	`"ai"`.
`title`	string	Always `"AI Overview"`.
`content`	string	AI-generated overview text.
`references`	array	Source articles: `[{ title, url, snippet }]`.
`images`	array	Embedded images: `[{ url, alt, width, height }]`.

{
    "source": "ai",
    "title": "AI Overview",
    "content": "The primary difference between uv and pip is speed and scope...",
    "references": [
        {
            "title": "uv vs pip: Managing Python Packages and Dependencies",
            "url": "https://realpython.com/uv-vs-pip/",
            "snippet": "When it comes to Python package managers..."
        }
    ],
    "images": []
}

Social media results use the same shape as web/news results.

Field	Type	Description
`source`	string	`"social"`.
`title`	string	Post or page title.
`url`	string	Post URL.
`snippet`	string	Post excerpt.
`position`	integer	Result position (1-based).

Current platform behavior: Social search results may return empty for some queries depending on availability. Always check results.length before processing.

Result ordering and ranking

Current platform behavior: When querying a single source, position reflects the source’s natural ranking order. When querying multiple sources, results from different sources are interleaved and position may reflect a per-source rank rather than a global rank. metadata.totalResults is the total count across all requested sources and pages.

Parsing multi-source responses

When you query multiple sources at once (or omit sources), the results[] array can contain items with different shapes. Always check the source field of each result to determine which fields are available:

for (const result of response.results) {
  switch (result.source) {
    case 'web':
    case 'news':
    case 'social':
      // Standard: title, url, snippet, position
      console.log(result.title, result.url);
      break;
    case 'scholar-articles':
    case 'scholar-articles-enriched':
      // Academic: standard fields + authors, citations, pdf_url, metadata
      console.log(result.title, result.citations, result.authors);
      break;
    case 'scholar-author':
      // Author profile: name, affiliation, h_index, articles[]
      console.log(result.name, result.affiliation, result.h_index);
      break;
    case 'ai':
      // AI overview: content, references[]
      console.log(result.content, result.references);
      break;
  }
}

Multi-page search

Use numPages to request multiple pages of results. The metadata object tells you which pages succeeded.

curl --request POST \
  --url https://api.crustdata.com/web/search/live \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "query": "artificial intelligence startups",
    "sources": ["web"],
    "geolocation": "US",
    "numPages": 3
  }'

Response trimmed for clarity. Pages 1 succeeded, page 2 failed, page 3 was empty.

The response aggregates results across all successful pages into a single results[] array. Check metadata to understand page-level outcomes:

metadata.totalResults — total results available across all sources and pages.
metadata.failedPages — page numbers that returned errors. Retry these individually with numPages: 1 and appropriate offset logic.
metadata.emptyPages — page numbers that returned no results. You have reached the end of available results — do not retry.

Handling page outcomes:

if (response.metadata.failedPages.length > 0) {
  // Some pages failed — retry the full request or reduce numPages
  console.log('Failed pages:', response.metadata.failedPages);
}

if (response.metadata.emptyPages.length > 0) {
  // No more results available — do not request more pages
  console.log('Reached end of results at page', Math.min(...response.metadata.emptyPages));
}

Current platform behavior (not guaranteed by the OpenAPI contract): Each page returns approximately 10 results. If metadata.emptyPages contains page numbers, you have reached the end of available results.

Field presence by source

Use this reference to determine which fields are present for each source type.

Naming note: The API uses metadata in two different contexts. The response-level metadata is an object with totalResults, failedPages, and emptyPages. The per-result metadata field (scholar-articles only) is a citation string like "Author - Year - Publisher". Always use the full path (response.metadata vs result.metadata) to avoid confusion.

Standard fields — present in web, news, social, and scholar-articles / scholar-articles-enriched:

Field	Sources with this field	Notes
`source`	All sources	Always present
`title`	`web`, `news`, `social`, `scholar-articles*`, `ai`	AI: always `"AI Overview"`
`url`	`web`, `news`, `social`, `scholar-articles*`, `scholar-author`	Scholar-author: profile link
`snippet`	`web`, `news`, `social`, `scholar-articles*`	Absent in `ai`, `scholar-author`
`position`	`web`, `news`, `social`, `scholar-articles*`	Absent in `ai`, `scholar-author`

Scholar article fields — scholar-articles and scholar-articles-enriched only:

Field	Type	Notes
`metadata`	string	Citation string: `"Author - Year - Publisher"`
`pdf_url`	string?	Direct PDF download link — handle outside Web Fetch
`authors`	array	`[{ name, profile_url, profile_id }]`
`citations`	integer	Total citation count

Scholar author fields — scholar-author only:

Field	Type	Notes
`name`	string	Author full name
`affiliation`	string	Institutional affiliation
`website`	string?	Personal or institutional website
`interests`	array	`[{ title, link }]`
`thumbnail`	string?	Profile photo URL
`citations`	object	`{ all, since_2020 }` — different type than scholar-articles
`h_index`	object	`{ all, since_2020 }`
`i10_index`	object	`{ all, since_2020 }`
`articles`	array	`[{ title, url, year, citations, authors, publication }]`

AI mode fields — ai only:

Field	Type	Notes
`content`	string	AI-generated overview text
`references`	array	`[{ title, url, snippet }]` — fetch these URLs
`images`	array	`[{ url, alt, width, height }]`

For full request/response examples of each source type, see the Web API Examples page.

Error handling

Search returns 400 for invalid requests and 401 for auth failures.

{
    "error": {
        "type": "invalid_request",
        "message": "query: This field is required.",
        "metadata": []
    }
}

Common gotchas

Mistake	Fix
Omitting `sources` and expecting uniform results	Different sources return different fields. Specify `sources` explicitly for predictable parsing.
Using `site` with `scholar-author` or `ai` sources	`site` only applies to `web` and `news` sources. It has no effect on Scholar or AI searches.
Expecting `snippet` in AI mode results	AI mode returns `content` and `references` instead of `snippet` and `position`.
Expecting `position` in scholar-author results	Scholar author results don’t have `position` — they have `name`, `affiliation`, `citations`, etc.
Using `startDate` >= `endDate`	`startDate` must be strictly less than `endDate`.

Next steps

Web Fetch — fetch the HTML content of URLs returned by search results.
Web API Examples — ready-to-copy patterns for common workflows.

Documentation

Products

Parameters

Result shapes

Field matrix

Request body

Source capabilities

Response body

Your first search

Restrict results to a specific site

Search with date filtering

Result shapes by source

Result ordering and ranking

Parsing multi-source responses

Multi-page search

Field presence by source

Error handling

Common gotchas

Next steps

Documentation

Products

Parameters

Result shapes

Field matrix

​Request body

​Source capabilities

​Response body

​Your first search

​Restrict results to a specific site

​Search with date filtering

​Result shapes by source

​Result ordering and ranking

​Parsing multi-source responses

​Multi-page search

​Field presence by source

​Error handling

​Common gotchas

​Next steps

Request body

Source capabilities

Response body

Your first search

Restrict results to a specific site

Search with date filtering

Result shapes by source

Result ordering and ranking

Parsing multi-source responses

Multi-page search

Field presence by source

Error handling

Common gotchas

Next steps