> ## Documentation Index
> Fetch the complete documentation index at: https://docs.crustdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Web Search examples

> Web Search examples: site filtering, date filtering, multi-page, human mode, academic research, deep research mode, social, and workflow examples.

Ready-to-copy patterns for [Web Search](/web-docs/search/introduction). Each example shows
a real request, the response, and what to extract.

<Snippet file="web-auth-headers.mdx" />

***

## Restrict results to a specific site

Use the `site` parameter to limit results to a single domain. Useful for
finding company pages on LinkedIn, profiles on GitHub, or content on a
specific website.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "ADAMSBROWN, LLC",
      "sources": ["web"],
      "site": "linkedin.com/company"
    }'
  ```

  ```json Response theme={"theme":"vitesse-black"}
  {
      "success": true,
      "query": "site:linkedin.com/company ADAMSBROWN, LLC",
      "timestamp": 1775195371211,
      "results": [
          {
              "source": "web",
              "title": "Adams Brown",
              "url": "https://www.linkedin.com/company/adams-brown-cpa",
              "snippet": "Adams Brown, LLC, a leading CPA and advisory firm, has delivered value-added accounting and advisory services to businesses and their owners since 1945.",
              "position": 1
          }
      ],
      "metadata": {
          "total_results": 10,
          "failed_pages": [],
          "empty_pages": []
      }
  }
  ```
</CodeGroup>

**Extract:** The first result URL is typically the best match. For company
profile URLs, pass the result to
[Company Identify](/company-docs/identify/introduction) for a full profile.

***

## Search with date filtering

Use `start_date` and `end_date` (Unix timestamps in seconds) to limit
results to a specific time range.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "distributed systems",
      "location": "US",
      "sources": ["web", "news"],
      "site": "example.com",
      "start_date": 1728259200,
      "end_date": 1730937600
    }'
  ```
</CodeGroup>

<Tip>
  Convert dates to Unix timestamps: October 7, 2024 = `1728259200`. You can
  use any Unix timestamp converter tool.
</Tip>

***

## Multi-page search

Use `page` to request multiple result pages in a single response. The
`metadata` object tells you which pages succeeded.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "artificial intelligence startups",
      "sources": ["web"],
      "location": "US",
      "page": 3
    }'
  ```

  ```json Response (with page 2 failure) theme={"theme":"vitesse-black"}
  {
      "success": true,
      "query": "artificial intelligence startups",
      "timestamp": 1775195500000,
      "results": [
          {
              "source": "web",
              "title": "AI Startup Landscape 2026",
              "url": "https://example.com/ai-startups",
              "snippet": "...",
              "position": 1
          },
          {
              "source": "web",
              "title": "Top AI Companies to Watch",
              "url": "https://example.com/top-ai",
              "snippet": "...",
              "position": 2
          }
      ],
      "metadata": {
          "total_results": 25,
          "failed_pages": [2],
          "empty_pages": [3]
      }
  }
  ```
</CodeGroup>

<Note>
  Response trimmed for clarity. Pages 1 succeeded, page 2 failed, page 3 was
  empty.
</Note>

The response aggregates results across all successful pages into a single
`results[]` array. Check `metadata` to understand page-level outcomes:

* **`metadata.total_results`** — total results available across all sources and pages.
* **`metadata.failed_pages`** — page numbers that returned errors. Retry the request with a smaller `page` value if needed.
* **`metadata.empty_pages`** — page numbers that returned no results. You have reached the end of available results — **do not retry**.

```javascript theme={"theme":"vitesse-black"}
if (response.metadata.failed_pages.length > 0) {
    console.log("Failed pages:", response.metadata.failed_pages);
}

if (response.metadata.empty_pages.length > 0) {
    console.log(
        "Reached end of results at page",
        Math.min(...response.metadata.empty_pages),
    );
}
```

<Note>
  **Current platform behavior (not guaranteed by the OpenAPI contract):** Each
  page returns approximately 10 results. If `metadata.empty_pages` contains
  page numbers, you have reached the end of available results.
</Note>

***

## Company and profile discovery

### Find a company's website domain

Search for a company by name followed by "website". The first result URL is
typically the company's website.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "ADAMSBROWN, LLC website",
      "sources": ["web"]
    }'
  ```

  ```json Response theme={"theme":"vitesse-black"}
  {
      "success": true,
      "query": "ADAMSBROWN, LLC website",
      "timestamp": 1775195388180,
      "results": [
          {
              "source": "web",
              "title": "Adams Brown",
              "url": "https://www.adamsbrowncpa.com/",
              "snippet": "Adams Brown is a holistic professional services firm with a team of strategic allies.",
              "position": 1
          }
      ],
      "metadata": { "total_results": 10, "failed_pages": [], "empty_pages": [] }
  }
  ```
</CodeGroup>

**Extract:** `results[0].url` → `https://www.adamsbrowncpa.com/`

<Tip>
  Do **not** wrap the company name in quotes — this lets the search engine
  match partial name variations. If the company name is common, add city and
  state: `"ADAMSBROWN, LLC WICHITA KS website"`.
</Tip>

**Bridge to Company API:** Extract the domain from the URL, then pass it to
[Company Enrich](/company-docs/enrichment/introduction) for the full company profile:

```javascript theme={"theme":"vitesse-black"}
const url = new URL("https://www.adamsbrowncpa.com/");
const domain = url.hostname.replace("www.", ""); // "adamsbrowncpa.com"
```

```json Company Enrich request body theme={"theme":"vitesse-black"}
{ "domains": ["adamsbrowncpa.com"] }
```

***

### Find a person's profile URL

Use the `site` parameter with the profile host (e.g. `linkedin.com/in`) to
find a person's profile, then enrich via the Person API.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "Jeff Dean Google",
      "sources": ["web"],
      "site": "linkedin.com/in"
    }'
  ```

  ```json Response theme={"theme":"vitesse-black"}
  {
      "success": true,
      "query": "site:linkedin.com/in Jeff Dean Google",
      "timestamp": 1775195400000,
      "results": [
          {
              "source": "web",
              "title": "Jeff Dean - Google",
              "url": "https://www.linkedin.com/in/jeff-dean-8b212555",
              "snippet": "Chief Scientist at Google DeepMind.",
              "position": 1
          }
      ],
      "metadata": { "total_results": 5, "failed_pages": [], "empty_pages": [] }
  }
  ```
</CodeGroup>

**Bridge to Person API:** Pass the profile URL to
[Person Enrich](/person-docs/enrichment/introduction):

```json theme={"theme":"vitesse-black"}
{
    "professional_network_profile_urls": [
        "https://www.linkedin.com/in/jeff-dean-8b212555"
    ]
}
```

***

### Find a developer profile

Use `site: "github.com"` to search for developer profiles on GitHub.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "Tyler Lambe",
      "sources": ["web"],
      "site": "github.com",
      "location": "US"
    }'
  ```

  ```json Response theme={"theme":"vitesse-black"}
  {
      "success": true,
      "query": "site:github.com Tyler Lambe",
      "timestamp": 1775195572177,
      "results": [
          {
              "source": "web",
              "title": "Tyler Lambe tylambe",
              "url": "https://github.com/tylambe",
              "snippet": "Autodidactic entrepreneur, engineer, educator. tylambe has 9 repositories available.",
              "position": 1
          }
      ],
      "metadata": { "total_results": 10, "failed_pages": [], "empty_pages": [] }
  }
  ```
</CodeGroup>

***

## Research and analysis

### Search for academic research on a topic

Search for academic articles with date filtering to find papers with
citation data and PDF links.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "deep learning",
      "location": "US",
      "sources": ["scholar-articles"],
      "start_date": 1672531200,
      "end_date": 1704067200
    }'
  ```

  ```json Response theme={"theme":"vitesse-black"}
  {
      "success": true,
      "query": "deep learning",
      "timestamp": 1775195398144,
      "results": [
          {
              "source": "scholar-articles",
              "title": "Understanding deep learning",
              "url": "https://books.google.com/books?hl=en&lr=lang_en&id=rvyxEAAAQBAJ",
              "snippet": "...to this field understand the principles behind deep learning.",
              "metadata": "SJD Prince - 2023 - books.google.com",
              "pdf_url": null,
              "position": 1,
              "authors": [
                  {
                      "name": "SJD Prince",
                      "profile_url": "https://scholar.google.com/citations?user=fjm67xYAAAAJ",
                      "profile_id": "fjm67xYAAAAJ"
                  }
              ],
              "citations": 618
          }
      ],
      "metadata": { "total_results": 10, "failed_pages": [], "empty_pages": [] }
  }
  ```
</CodeGroup>

**Extract:**

* `citations` — citation count to gauge impact.
* `pdf_url` — direct PDF download link (when available).
* `authors[].profile_url` — Author profile link.
* `metadata` — citation string: `"Author - Year - Publisher"`.

***

### Look up an academic researcher's profile

Search for a researcher by name to get their full academic profile with
h-index, citation metrics, and top publications.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "jeff dean",
      "location": "US",
      "sources": ["scholar-author"]
    }'
  ```

  ```json Response theme={"theme":"vitesse-black"}
  {
      "success": true,
      "query": "jeff dean",
      "timestamp": 1775195567878,
      "results": [
          {
              "source": "scholar-author",
              "url": "https://scholar.google.com/citations?user=NMS69lQAAAAJ",
              "name": "Jeff Dean",
              "affiliation": "Google Chief Scientist, Google Research and Google DeepMind",
              "citations": { "all": 401624, "since_2020": 231008 },
              "h_index": { "all": 114, "since_2020": 78 },
              "i10_index": { "all": 319, "since_2020": 203 },
              "articles": [
                  {
                      "title": "MapReduce: simplified data processing on large clusters",
                      "year": "2008",
                      "citations": "37255",
                      "authors": "J Dean, S Ghemawat"
                  }
              ]
          }
      ],
      "metadata": { "total_results": 1, "failed_pages": [], "empty_pages": [] }
  }
  ```
</CodeGroup>

**Extract:** `citations.all` for total impact, `h_index.all` for research
quality, `articles[]` for top publications.

***

### Get an AI-generated overview of a topic

Use deep research mode for a synthesized answer with source references.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "uv vs pip",
      "location": "US",
      "sources": ["ai"]
    }'
  ```

  ```json Response theme={"theme":"vitesse-black"}
  {
      "success": true,
      "query": "uv vs pip",
      "timestamp": 1775195563283,
      "results": [
          {
              "source": "ai",
              "title": "AI Overview",
              "content": "The primary difference between uv and pip is speed and scope: uv is a modern, high-performance replacement for pip that prioritizes speed and a unified workflow.",
              "references": [
                  {
                      "title": "uv vs pip: Managing Python Packages and Dependencies",
                      "url": "https://realpython.com/uv-vs-pip/",
                      "snippet": "When it comes to Python package managers..."
                  }
              ],
              "images": []
          }
      ],
      "metadata": { "total_results": 1, "failed_pages": [], "empty_pages": [] }
  }
  ```
</CodeGroup>

**Extract:** `content` for the overview text, `references[].url` for source
verification.

***

### Search news with date filtering

Filter news results to a specific date range by providing `start_date` and
`end_date` as Unix timestamps in seconds.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "artificial intelligence developments",
      "location": "US",
      "sources": ["news"],
      "start_date": 1728259200,
      "end_date": 1730937600
    }'
  ```
</CodeGroup>

<Tip>
  `start_date` and `end_date` are Unix timestamps in **seconds**. October 7,
  2024 = `1728259200`. November 7, 2024 = `1730937600`.
</Tip>

***

### Search social media posts

Search for recent social media mentions of a topic or person.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "crustdata AI agents",
      "location": "US",
      "sources": ["social"]
    }'
  ```
</CodeGroup>

<Note>
  **Current platform behavior:** Social media results may return an empty
  `results` array for some queries depending on availability. Always check
  `results.length` before processing.
</Note>

***

### Mixed-source search with safe parsing

When searching multiple sources, the `results[]` array contains items with
different shapes. Always branch on `result.source`.

<CodeGroup>
  ```bash Request theme={"theme":"vitesse-black"}
  curl --request POST \
    --url https://api.crustdata.com/web/search/live \
    --header 'authorization: Bearer YOUR_API_KEY' \
    --header 'content-type: application/json' \
    --header 'x-api-version: 2025-11-01' \
    --data '{
      "query": "machine learning infrastructure",
      "location": "US",
      "sources": ["web", "news", "scholar-articles"]
    }'
  ```
</CodeGroup>

**Safe parsing logic:**

```javascript theme={"theme":"vitesse-black"}
const fetchableUrls = [];

for (const result of response.results) {
    switch (result.source) {
        case "web":
        case "news":
        case "social":
            fetchableUrls.push(result.url);
            break;
        case "scholar-articles":
        case "scholar-articles-enriched":
            fetchableUrls.push(result.url);
            break;
        case "scholar-author":
            console.log(`Author: ${result.name} (${result.affiliation})`);
            break;
        case "ai":
            console.log(`AI Overview: ${result.content}`);
            result.references?.forEach((ref) => fetchableUrls.push(ref.url));
            break;
    }
}
```

<Note>
  Not every search result should go to Fetch. Academic author results are
  profiles, not content pages. Deep research results provide content inline
  and use `references[].url` for source URLs instead of a top-level `url`.
</Note>

***

## Search-then-fetch workflows

### End-to-end competitive intelligence

Search for competitor news, then fetch the full article content for
analysis.

<Steps>
  <Step title="Search for competitor news">
    ```bash theme={"theme":"vitesse-black"}
    curl --request POST \
      --url https://api.crustdata.com/web/search/live \
      --header 'authorization: Bearer YOUR_API_KEY' \
      --header 'content-type: application/json' \
      --header 'x-api-version: 2025-11-01' \
      --data '{
        "query": "OpenAI funding 2026",
        "location": "US",
        "sources": ["news", "web"]
      }'
    ```

    Extract URLs from `results[].url`.
  </Step>

  <Step title="Select URLs and fetch content">
    ```bash theme={"theme":"vitesse-black"}
    curl --request POST \
      --url https://api.crustdata.com/web/enrich/live \
      --header 'authorization: Bearer YOUR_API_KEY' \
      --header 'content-type: application/json' \
      --header 'x-api-version: 2025-11-01' \
      --data '{
        "urls": [
          "https://www.reuters.com/technology/openai-funding-2026/",
          "https://techcrunch.com/2026/01/15/openai-funding/"
        ]
      }'
    ```
  </Step>

  <Step title="Parse results and handle failures">
    Check `success` for each entry. Parse HTML from successful fetches. To
    identify which URLs failed, compare requested URLs against successful
    `url` values.

    ```javascript theme={"theme":"vitesse-black"}
    const successfulUrls = new Set(
      fetchResponse.filter(r => r.success).map(r => r.url)
    );
    const failedUrls = requestedUrls.filter(u => !successfulUrls.has(u));
    ```

    <Note>
      Failed entries have `url: null`, so correlate failures by comparing
      successful URLs to your input list. See
      [Fetch: correlating failures](/web-docs/fetch/introduction#correlating-failures-to-input-urls).
    </Note>
  </Step>
</Steps>

***

### Full pipeline in Python: search → fetch → parse

A complete Python example that searches, filters fetchable URLs by source,
fetches content, and handles failures.

```python theme={"theme":"vitesse-black"}
import requests

API_KEY = "YOUR_API_KEY"
HEADERS = {
    "authorization": f"Bearer {API_KEY}",
    "content-type": "application/json",
    "x-api-version": "2025-11-01",
}

# Step 1: Search
search_resp = requests.post(
    "https://api.crustdata.com/web/search/live",
    headers=HEADERS,
    json={"query": "OpenAI funding 2026", "sources": ["web", "news"]},
).json()

# Step 2: Extract fetchable URLs
fetchable_urls = []
for result in search_resp["results"]:
    if result["source"] in ("web", "news", "social", "scholar-articles", "scholar-articles-enriched"):
        fetchable_urls.append(result["url"])
    elif result["source"] == "ai":
        for ref in result.get("references", []):
            fetchable_urls.append(ref["url"])

# Step 3: Fetch (max 10 URLs per request)
fetch_resp = requests.post(
    "https://api.crustdata.com/web/enrich/live",
    headers=HEADERS,
    json={"urls": fetchable_urls[:10]},
).json()

# Step 4: Process results and correlate failures
successful_urls = set()
for item in fetch_resp:
    if item["success"]:
        successful_urls.add(item["url"])
        print(f"Fetched: {item['title']} ({len(item['content'])} chars)")

failed_urls = [u for u in fetchable_urls[:10] if u not in successful_urls]
if failed_urls:
    print(f"Failed URLs: {failed_urls}")
```

***

### AI overview → fetch source references

When using deep research mode, the overview `content` is inline. To get the
full source articles, fetch the URLs from `references[]`.

<Steps>
  <Step title="Search with deep research mode">
    ```bash theme={"theme":"vitesse-black"}
    curl --request POST \
      --url https://api.crustdata.com/web/search/live \
      --header 'authorization: Bearer YOUR_API_KEY' \
      --header 'content-type: application/json' \
      --header 'x-api-version: 2025-11-01' \
      --data '{"query": "uv vs pip", "sources": ["ai"], "location": "US"}'
    ```

    **Extract:** `results[0].references[].url` — the source article URLs.
  </Step>

  <Step title="Fetch the reference URLs">
    ```bash theme={"theme":"vitesse-black"}
    curl --request POST \
      --url https://api.crustdata.com/web/enrich/live \
      --header 'authorization: Bearer YOUR_API_KEY' \
      --header 'content-type: application/json' \
      --header 'x-api-version: 2025-11-01' \
      --data '{"urls": ["https://realpython.com/uv-vs-pip/"]}'
    ```

    Parse the `content` from successful entries to read the full source
    articles.
  </Step>
</Steps>

<Note>
  AI results do not have a top-level `url` field. Always use
  `references[].url` for fetch targets.
</Note>

***

## Next steps

* [Sources](/web-docs/search/reference#sources) — full result shapes and field presence for each source.
* [Web Search](/web-docs/search/introduction) — main Search page with request/response and errors.
* [Web Fetch](/web-docs/fetch/introduction) — fetch the HTML content of URLs returned by search results.
