Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.crustdata.com/llms.txt

Use this file to discover all available pages before exploring further.

Reference material for Search Jobs: filter grammar and operators, common indexed fields, the full Job catalog, id semantics, aggregation bucket metadata, null behavior, and errors. For worked examples, see Examples. For sorting, pagination, field selection, and aggregations, see Pagination & sorting.
Jobs ID cheat sheet. The Jobs APIs use three id concepts — keep them straight:
  • crustdata_job_id — the Crustdata job identifier. Returned on every Job. Use it as your dedupe key.
  • company.basic_info.crustdata_company_id — the Crustdata company identifier returned on every Job.
  • company.basic_info.company_id (filter alias) — the dot-path used in filters and aggregations.column for indexed Search Jobs. It points to the same integer as company.basic_info.crustdata_company_id. This alias is not sortable; for deterministic pagination, sort on metadata.date_added instead.
When you group_by on company.basic_info.company_id, each bucket also returns metadata.company_name, metadata.company_website_domain, and metadata.linkedin_id for labeling.

Filter grammar

Every filter describes which individual job rows to keep. The API checks each job listing against your filter independently — it never groups or combines rows before filtering. There are two building blocks:
Building blockWhat it does
SearchCondition (leaf)Tests one field on one job row — e.g. title = "Software Engineer".
SearchConditionGroup (node)Combines conditions with and or or. Groups can nest inside other groups.
Exact-match AND on the same field always returns zero results. One listing has one title, so (title = "Software Engineer") AND (title = "Account Executive") can never match. This applies to = and in.
All-words operators ((.)) work fine in AND. Because (.) checks for individual words — not a contiguous substring — a query like (title (.) "Software Development") AND (title (.) "Software Engineer") matches any title containing all three words “Software”, “Development”, and “Engineer” (e.g. “Software Development Engineer”).
Need companies hiring for both role X and role Y (two different listings)? Run two separate queries and intersect company ids client-side. See Companies indexing both Software Engineers and Account Executives.

Single condition

{
    "filters": {
        "field": "job_details.category",
        "type": "=",
        "value": "Engineering"
    }
}

AND / OR group

{
    "filters": {
        "op": "and",
        "conditions": [
            {
                "field": "company.basic_info.company_id",
                "type": "=",
                "value": 631394
            },
            {
                "field": "job_details.category",
                "type": "=",
                "value": "Engineering"
            },
            {
                "field": "metadata.date_added",
                "type": "=>",
                "value": "2025-01-01"
            }
        ]
    }
}

Array-field filters and grouping

When you filter on a string-array field like company.basic_info.industries, the condition is satisfied if any element of the array matches.For example:
{ "field": "company.basic_info.industries", "type": "=", "value": "Technology, Information and Internet" }
This matches any company whose industries array contains that exact string. Use (.) to match words within any element.
When you group_by on an array field, each array element becomes its own bucket key. A company in two industries contributes one count to each of the two industry buckets — so the sum of bucket counts can exceed total_count for array columns.

Filter operators

Use the table below to pick the right type for each condition. Every operator works on indexed fields only.
Operatorvalue shapeMeaning
=scalar (string/number/boolean)Exact match.
!=scalarNot equal.
<scalar (numeric or ISO date)Less than.
=<scalar (numeric or ISO date)Less than or equal. Not <=.
>scalar (numeric or ISO date)Greater than.
=>scalar (numeric or ISO date)Greater than or equal. Not >=.
inarray of scalarsField value is any entry in the array.
not_inarray of scalarsField value is none of the entries in the array.
(.)stringCase-insensitive all-words match. Every word in the query must appear somewhere in the field, but not necessarily next to each other or in the same order. "Software Engineer" matches "Software Engineer", "Software Development Engineer", and "Engineer, Software Systems". A single word like "engineer" also matches "Engineering Manager". Great for broad keyword hunting in job_details.title or content.description.
[.]stringCase-insensitive exact-phrase match. The words must appear contiguously and in order. "Software Engineer" matches "Senior Software Engineer" but not "Software Development Engineer" (extra word in between) and not "Engineer Software" (wrong order). Use [.] when you need precision over recall.
Operator footguns.
  • Use => for greater-than-or-equal and =< for less-than-or-equal — they are not >= and <=.
  • in and not_in require JSON arrays, not comma-separated strings.
  • is_null / is_not_null are currently not implemented — request the field via fields and filter for null presence client-side.

Common indexed fields

These are the indexed fields most often used in filters, sorts, and aggregations.column. This table is a summary of the most common paths, not an authoritative catalog. For the deeper field catalog — including id semantics, null handling, and bucket metadata — see the full Field reference below.
Company id filter alias. The filterable field path uses the short alias company.basic_info.company_id, but the response shape returns the same integer at company.basic_info.crustdata_company_id. They point to the same value. See Jobs IDs: a quick map.
FieldExample
job_details.title"Software Engineer"
job_details.category"Engineering", "Sales", "Operations", "Others"
job_details.workplace_type"Remote", "Hybrid", "On-site", ""
job_details.reposted_jobtrue / false
job_details.url"https://www.linkedin.com/jobs/view/4398377738"
Sending a filter on a non-indexed field returns 400 with Unsupported columns in conditions: ['...']. Sending an unsupported group_by column returns a similar error listing every supported aggregation column.

Field reference

This section covers the return shape, id semantics, aggregation bucket metadata, and the most important indexed field catalogs in one place.

Annotated full Job example

The code fence below uses jsonc because it includes inline // comments for annotation. Strip the comments before sending it to a strict JSON parser.
{
    "crustdata_job_id": 41053563, // stable job id (use as dedupe key)
    "job_details": {
        "job_id": 41053563, // mirrors crustdata_job_id
        "title": "Integration Engineer (AUNZ)",
        "category": "Engineering",
        "workplace_type": "",
        "url": "https://www.linkedin.com/jobs/view/4398377738",
        "reposted_job": false,
        "number_of_openings": 1,
    },
    "company": {
        "basic_info": {
            "crustdata_company_id": 631394, // filter with company.basic_info.company_id
            "name": "Stripe",
            "primary_domain": "stripe.com",
            "website": "https://stripe.com",
            "professional_network_id": "2135371",
            "industries": ["Technology, Information and Internet"],
        },
        "locations": {
            "country": "USA",
            "state": "California",
            "city": "South San Francisco",
            "street_address": "354 Oyster Point Blvd, South San Francisco, California, United States",
        },
        "headcount": {
            "total": 7234,
            "range": "5001-10000",
            "largest_headcount_country": "USA",
        },
        "followers": { "count": 1335688 },
        "revenue": {
            "estimated": {
                "lower_bound_usd": 500000000,
                "upper_bound_usd": 1000000000,
            },
            "public_markets": null,
            "acquisition_status": "",
        },
        "funding": {
            "total_investment_usd": 9440247725.0,
            "valuation_usd": 50000000000.0,
            "last_fundraise_date": "2026-03-09T00:00:00",
            "last_round_type": "secondary_market",
            "num_funding_rounds": 23,
            "investors": [
                "Sequoia Capital",
                "Andreessen Horowitz",
                "Founders Fund",
            ],
        },
        "competitors": {
            "websites": ["https://plaid.com", "https://paystack.com"],
        },
    },
    "location": {
        "raw": "Melbourne, Victoria, Australia",
        "city": "Melbourne",
        "district": null,
        "state": "Victoria",
        "country": "Australia",
        "pincode": null,
    },
    "content": {
        "description": "Stripe is a financial infrastructure platform for businesses...",
    },
    "metadata": {
        "date_added": "2026-04-07T11:37:29",
        "date_updated": "2026-04-08T00:00:00",
    },
}
Nulls are normal. Nested objects such as revenue.public_markets, location.district, location.pincode, and parts of company.funding can legitimately be null or missing.

Jobs IDs: a quick map

IDLives onPurpose
crustdata_job_idTop-level on each JobCrustdata job identifier. Use it as your dedupe key in your own store.
job_details.job_idInside Job.job_detailsSecondary job identifier. It currently mirrors crustdata_job_id and is kept for backwards compatibility.
company.basic_info.crustdata_company_idInside Job.company.basic_infoCrustdata company identifier returned on each row.
company.basic_info.company_idSearch filter / aggregation pathIndexed alias for the same company identifier. Use this in filters.field and aggregations.column.

Aggregation bucket metadata

When you group_by on company.basic_info.company_id, each bucket carries a metadata object whose keys use bucket-specific names rather than the Job response dot-paths:
Bucket metadata keyEquivalent Job valueNotes
company_namecompany.basic_info.namePlain company name.
company_website_domaincompany.basic_info.primary_domainPrimary website domain.
linkedin_idcompany.basic_info.professional_network_idPublic-profile identifier returned only inside aggregation buckets.
crustdata_company_idcompany.basic_info.crustdata_company_idCrustdata company id. Defined in the spec as nullable; the bucket key already carries this value.

Job identifiers

PathTypeFilterSortGroupReturnExample
crustdata_job_idinteger41053563
job_details.job_idinteger41053563

Job details (job_details.*)

PathTypeFilterSortGroupReturnExample
job_details.titlestring"Software Engineer"
job_details.categorystring"Engineering"
job_details.workplace_typestring"Remote", "Hybrid", "On-site", ""
job_details.reposted_jobbooleanfalse
job_details.urlstring"https://www.linkedin.com/jobs/view/4398377738"
job_details.number_of_openingsinteger1

Company basic info (company.basic_info.*)

PathTypeFilterSortGroupReturnExample
company.basic_info.company_idinteger631394
company.basic_info.crustdata_company_idinteger631394
company.basic_info.namestring"Stripe"
company.basic_info.primary_domainstring"stripe.com"
company.basic_info.websitestring"https://stripe.com"
company.basic_info.professional_network_idstring"2135371"
company.basic_info.industriesstring[]["Technology, Information and Internet"]
company.basic_info.company_id and company.basic_info.crustdata_company_id refer to the same integer. Use the short alias in filters and aggregations.column. The response shape writes the value under crustdata_company_id.

Company firmographics

Headcount (company.headcount.*)

PathTypeFilterSortGroupReturnExample
company.headcount.totalinteger14522
company.headcount.rangestring"5001-10000"
company.headcount.largest_headcount_countrystring"USA"

Followers (company.followers.*)

PathTypeFilterSortGroupReturnExample
company.followers.countinteger1335688

Revenue (company.revenue.*)

PathTypeFilterSortGroupReturnExample
company.revenue.estimated.lower_bound_usdinteger500000000
company.revenue.estimated.upper_bound_usdinteger1000000000
company.revenue.acquisition_statusstring""
company.revenue.public_markets.stock_symbolsstring[]["STRIPE"]
company.revenue.public_markets.fiscal_year_endstring""

Funding (company.funding.*)

PathTypeFilterSortGroupReturnExample
company.funding.total_investment_usdnumber9440247725.0
company.funding.valuation_usdnumber50000000000.0
company.funding.last_fundraise_datestring (ISO 8601)"2026-03-09T00:00:00"
company.funding.last_round_typestring"secondary_market"
company.funding.num_funding_roundsinteger23
company.funding.investorsstring[]["Sequoia Capital"]

Competitors and company locations

PathTypeFilterSortGroupReturnExample
company.competitors.websitesstring[]["https://plaid.com"]
company.locations.countrystring"USA"
company.locations.statestring"California"
company.locations.citystring"South San Francisco"
company.locations.street_addressstring"354 Oyster Point Blvd, ..."

Job location (location.*)

PathTypeFilterSortGroupReturnExample
location.rawstring"Melbourne, Victoria, Australia"
location.citystring"Melbourne"
location.districtstring"Southbank"
location.statestring"Victoria"
location.countrystring"Australia"
location.pincodestring"3006"
Country value normalization. location.country can appear as full names ("United States"), ISO-style short forms ("USA"), and occasional variants ("United States of America"). When filtering by country, either match multiple variants with in or pre-discover the exact indexed values by running a group_by on location.country.

Content (content.*)

PathTypeFilterSortGroupReturnExample
content.descriptionstring"Stripe is a financial infrastructure..."
Use the (.) operator on content.description to find listings by technology, skill, or keyword.

Metadata (metadata.*)

PathTypeFilterSortGroupReturnExample
metadata.date_addedstring (ISO 8601)"2026-04-07T11:37:29"
metadata.date_updatedstring (ISO 8601)"2026-04-08T00:00:00"

Null, blank, and sparse field behavior

Most Job fields are nullable in the spec and can legitimately be absent or empty.
  • Null or missing — the field is not present on a given Job.
  • Blank string "" — the field was present but had no indexable value (common for job_details.workplace_type). Treat blank as “unspecified”, not as the same thing as null.
  • Sparse nested objectscompany.funding, company.revenue, and company.competitors are often missing for smaller or private companies.
  • is_null / is_not_null operators are currently not implemented — request the field via fields and filter for null presence client-side.

Errors

StatusEnvelope shapeMeaning
400{ "error": { "type", "message", "metadata" } }Invalid request — unsupported filter column, unsupported aggregation column, limit out of range, or malformed body. error.type is invalid_request for validation failures and internal_error for unsupported-column checks.
401{ "message": "..." } (flat — not the error envelope)Unauthorized — the Authorization header is missing, malformed, or contains an invalid API key.
500{ "error": { "type", "message", "metadata" } }Internal server error — retry after a short delay.
401 uses a different response shape than 400/500. Parse the response based on HTTP status: 401 is a flat { "message": ... }, every other 4xx/5xx is the nested { "error": { "type", "message", "metadata" } } envelope.
{
    "error": {
        "type": "internal_error",
        "message": "Unsupported columns in conditions: ['invalid_field']",
        "metadata": []
    }
}

Pagination & sorting

How to paginate, sort, select fields, and aggregate results in Search Jobs. For worked examples, see Examples. For filter grammar, operators, and the full field catalog, see Reference.
Replace YOUR_API_KEY in each example with your actual API key. All requests require the x-api-version: 2025-11-01 header.

Sorting

sorts is an ordered array. Each item has a field and order ("asc" or "desc"). Sorts apply in array order — the first sort is the primary key, the second breaks ties, and so on.
{
    "sorts": [
        { "field": "metadata.date_added", "order": "desc" },
        { "field": "company.headcount.total", "order": "desc" }
    ]
}
Sort allowlist is narrower than filter allowlist. Sort only works on numeric, date, and a small set of scalar fields. Sorting on text fields like job_details.title, job_details.category, or company.basic_info.name returns Unsupported columns in conditions.

Sortable fields

The following indexed fields are verified sortable:
  • metadata.date_added
  • metadata.date_updated
  • company.headcount.total
  • company.followers.count
  • company.revenue.estimated.lower_bound_usd
  • company.revenue.estimated.upper_bound_usd
  • company.funding.total_investment_usd
  • company.funding.valuation_usd
  • company.funding.last_fundraise_date
  • company.funding.num_funding_rounds
Common sort choices:
  • Newest postings first{ "field": "metadata.date_added", "order": "desc" }
  • Biggest companies first{ "field": "company.headcount.total", "order": "desc" }
  • Most followed companies first{ "field": "company.followers.count", "order": "desc" }
  • Highest-funded companies first{ "field": "company.funding.total_investment_usd", "order": "desc" }

Pagination

Pagination is cursor-based. Each response returns a next_cursor (or null when you reach the end). To fetch the next page, resend the original request body with cursor set to the previous next_cursor.
1

Fetch the first page

Omit cursor and set limit to your page size (max 1000).
2

Walk forward

Take next_cursor from the response and pass it back as cursor in the next request. Keep filters, sorts, and fields identical — if you change them, the cursor becomes meaningless.
3

Stop when `next_cursor` is null

A null cursor means you’ve reached the end of the result set.

Consistency between pages

Current platform behavior — best-effort, not strict snapshot. A cursor is consistent with respect to the filter, sort, and field selection you sent on the first page, so the same query will keep paging forward over a coherent result stream. However, because the underlying indexed dataset is continuously updated, new jobs indexed between page requests can cause minor drift in total_count and in the exact position of individual rows. Treat pagination as best-effort, not a strict snapshot.For bulk exports where every row matters:
  • Constrain your filter to a bounded date window (for example metadata.date_added >= 2025-01-01 AND < 2025-07-01) so newly indexed jobs outside the window do not affect the walk, and
  • Re-run the full walk periodically and diff against the prior snapshot using crustdata_job_id as the dedupe key.

Dataset freshness and lifecycle

What the indexed Jobs dataset represents. The Search Jobs dataset is a rolling index of job listings discovered from the web, refreshed on an ongoing basis. Each row has:
  • metadata.date_added — when Crustdata first saw the listing.
  • metadata.date_updated — most recent refresh.
Closed or removed listings are not guaranteed to disappear from the index immediately. To approximate “currently hiring” queries, filter on a recent metadata.date_added or metadata.date_updated window (for example, within the last 30 days) and pair it with the hiring company’s firmographics. For alerting or repeated exports, keep your date windows bounded and dedupe rows with crustdata_job_id.

Date filter semantics

Dates and timezones. When you pass a date-only value like "2025-01-01", the backend interprets it as 2025-01-01T00:00:00 in UTC. Ranges using => are inclusive of the boundary and < is exclusive, so "metadata.date_added" >= "2025-01-01" AND < "2025-07-01" covers every listing indexed between Jan 1 (inclusive) and Jul 1 (exclusive) in UTC. Pass full timestamps like "2025-01-01T08:00:00" when you need finer precision.

Fetch page 2

{
    "filters": {
        "field": "company.basic_info.company_id",
        "type": "=",
        "value": 631394
    },
    "fields": ["job_details.title"],
    "limit": 1,
    "cursor": "H4sIANBG1mkC_xXMOQ4CMQxA0auMUk9hx3YScxWERs6CpkBEzFIgxN0J1W-e_se9zra9l9X21V0mx0kjMgvdq4dYPZGgclOTlLEZl2bJG_lcdVSqFa-okDlwo1qbmye39-0YryvGKBIwsQLAPDGCkAS6DXL0wx5L6efzL2MC_P4A250zQYoAAAA="
}

Field selection

Use fields to return only the dot-paths you need. The top-level groups are crustdata_job_id, job_details, company, location, content, metadata. You can request:
  • A whole group"company" returns every company.* sub-object.
  • A sub-object"company.basic_info" returns only the basic info block.
  • A single field"company.basic_info.name" returns just the name.
{
    "fields": [
        "job_details.title",
        "job_details.url",
        "company.basic_info.name",
        "company.basic_info.primary_domain",
        "location.raw",
        "metadata.date_added"
    ]
}
Recommended default field set for most dashboards: ["job_details.title", "job_details.category", "job_details.url", "company.basic_info.name", "company.basic_info.primary_domain", "location.raw", "metadata.date_added"].

Aggregations

Aggregations let you roll up results without returning individual job rows. Set limit: 0 when you only want aggregation output. Two types are supported:
  • count — returns the total number of jobs matching filters.
  • group_by — buckets the results by column and returns per-bucket counts.

AggregationRequest schema

FieldTypeRequiredDescription
typestring (enum)Yes"count" for a simple total, "group_by" to bucket by column.
columnstringRequired for group_byDot-path to group by. Must be in the Groupable fields allowlist.
aggstring (enum)Required for group_bySub-aggregation inside each bucket. Currently only "count" is supported.
sizeintegerNo (default 100)Maximum number of buckets to return. Min 1, max 1000.
Each AggregationResponseItem echoes type and column, then carries:
  • value (integer) — populated for count aggregations. The total match count.
  • buckets (array) — populated for group_by aggregations. Each bucket has a key, count, and a metadata object whose keys depend on the grouped column. See Aggregation bucket metadata.
You can include multiple aggregations in a single request; the response returns them in aggregations[] in the same order you sent them.

Count all Engineering jobs

curl --request POST \
  --url https://api.crustdata.com/job/search \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "filters": { "field": "job_details.category", "type": "=", "value": "Engineering" },
    "limit": 0,
    "aggregations": [ { "type": "count" } ]
  }'

Top companies indexing “Software Engineer” listings (bounded window)

curl --request POST \
  --url https://api.crustdata.com/job/search \
  --header 'authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --header 'x-api-version: 2025-11-01' \
  --data '{
    "filters": {
      "op": "and",
      "conditions": [
        { "field": "job_details.title",   "type": "=",  "value": "Software Engineer" },
        { "field": "metadata.date_added", "type": "=>", "value": "2025-01-01" },
        { "field": "metadata.date_added", "type": "<",  "value": "2026-01-01" }
      ]
    },
    "limit": 0,
    "aggregations": [
      {
        "type": "group_by",
        "column": "company.basic_info.company_id",
        "agg": "count",
        "size": 5
      }
    ]
  }'

Groupable fields

group_by.column is restricted to the following indexed fields:
  • company.basic_info.company_id
  • company.basic_info.industries
  • company.basic_info.primary_domain
  • company.funding.last_round_type
  • company.headcount.range
  • company.locations.country
  • job_details.category
  • job_details.title
  • job_details.workplace_type
  • location.country
Sending any other column returns 400 with Unsupported aggregation column: '...'. Supported: ....

What’s next

  • Search Jobs — back to the main Search page.
  • Examples — SDR/BDR keyword hunting, mid-market filtering, funding-triggered queries, and aggregations.
  • Pagination & sorting — sorting, pagination, field selection, and aggregations.
  • OpenAPI reference — the formal schema for every request, response, and error.