Reference for Crustdata Jobs Search: common indexed fields, full field catalog, id map, bucket metadata, and errors.
Reference material for Search Jobs: filter
grammar and operators, common indexed fields, the full Job catalog, id
semantics, aggregation bucket metadata, null behavior, and errors.For worked examples, see Examples. For sorting,
pagination, field selection, and aggregations, see
Pagination & sorting.
Jobs ID cheat sheet. The Jobs APIs use three id concepts — keep them straight:
crustdata_job_id — the Crustdata job identifier. Returned on every Job. Use it as your dedupe key.
company.basic_info.crustdata_company_id — the Crustdata company identifier returned on every Job.
company.basic_info.company_id (filter alias) — the dot-path used in filters and aggregations.column for indexed Search Jobs. It points to the same integer as company.basic_info.crustdata_company_id. This alias is not sortable; for deterministic pagination, sort on metadata.date_added instead.
When you group_by on company.basic_info.company_id, each bucket also returns metadata.company_name, metadata.company_website_domain, and metadata.linkedin_id for labeling.
Every filter describes which individual job rows to keep. The API checks
each job listing against your filter independently — it never groups or
combines rows before filtering.There are two building blocks:
Building block
What it does
SearchCondition (leaf)
Tests one field on one job row — e.g. title = "Software Engineer".
SearchConditionGroup (node)
Combines conditions with and or or. Groups can nest inside other groups.
Exact-match AND on the same field always returns zero results. One
listing has one title, so (title = "Software Engineer") AND (title = "Account Executive") can never match. This applies to = and in.
All-words operators ((.)) work fine in AND. Because (.) checks for
individual words — not a contiguous substring — a query like (title (.) "Software Development") AND (title (.) "Software Engineer") matches any
title containing all three words “Software”, “Development”, and “Engineer”
(e.g. “Software Development Engineer”).
When you filter on a string-array field like
company.basic_info.industries, the condition is satisfied if any
element of the array matches.For example:
{ "field": "company.basic_info.industries", "type": "=", "value": "Technology, Information and Internet" }
This matches any company whose industries array contains that
exact string. Use (.) to match words within any element.
Grouping by array fields
When you group_by on an array field, each array element becomes
its own bucket key. A company in two industries contributes one
count to each of the two industry buckets — so the sum of bucket
counts can exceed total_count for array fields.
Use the table below to pick the right type for each condition. Every
operator works on indexed fields only.
Operator
value shape
Meaning
=
scalar (string/number/boolean)
Exact match.
!=
scalar
Not equal.
<
scalar (numeric or ISO date)
Less than.
=<
scalar (numeric or ISO date)
Less than or equal. Not<=.
>
scalar (numeric or ISO date)
Greater than.
=>
scalar (numeric or ISO date)
Greater than or equal. Not>=.
in
array of scalars
Field value is any entry in the array.
not_in
array of scalars
Field value is none of the entries in the array.
(.)
string
Case-insensitive all-words match. Every word in the query must appear somewhere in the field, but not necessarily next to each other or in the same order. "Software Engineer" matches "Software Engineer", "Software Development Engineer", and "Engineer, Software Systems". A single word like "engineer" also matches "Engineering Manager". Great for broad keyword hunting in job_details.title or content.description.
[.]
string
Case-insensitive exact-phrase match. The words must appear contiguously and in order. "Software Engineer" matches "Senior Software Engineer" but not"Software Development Engineer" (extra word in between) and not"Engineer Software" (wrong order). Use [.] when you need precision over recall.
geo_distance
object (see below)
Geographic radius include. Keeps jobs located within distance of a center point. Only valid on location (or location.raw — both target the same geo point). See Geographic radius filters.
geo_exclude
object (see below)
Geographic radius exclude. Removes jobs located within distance of a center point. Same value format and field restrictions as geo_distance.
Operator footguns.
Use => for greater-than-or-equal and =< for less-than-or-equal — they are not>= and <=.
in and not_in require JSON arrays, not comma-separated strings.
is_null / is_not_null are currently not implemented — request the field via fields and filter for null presence client-side.
geo_distance / geo_exclude only work on location and location.raw. Using them on any other field, or sending a malformed geo value, returns 500 — see Errors.
If your center point comes from your own geocoder or a map UI, pass
lat_lng directly — for example { "lat_lng": [37.7749, -122.4194], "distance": 25 } — and the server-side geocoding step is skipped.
These are the indexed fields most often used in filters, sorts, and
aggregations.field. This table is a summary of the most common paths,
not an authoritative catalog. For the deeper field catalog — including id
semantics, null handling, and bucket metadata — see the full
Field reference below.
Company id filter alias. The filterable field path uses the short alias
company.basic_info.company_id, but the response shape returns the same
integer at company.basic_info.crustdata_company_id. They point to the same
value. See Jobs IDs: a quick map.
Job details
Company basic info
Company firmographics
Location
Content, metadata, IDs
Field
Example
job_details.title
"Software Engineer"
job_details.category
"Engineering", "Sales", "Operations", "Others"
job_details.workplace_type
"Remote", "Hybrid", "On-site", ""
job_details.reposted_job
true / false
job_details.url
"https://www.linkedin.com/jobs/view/4398377738"
Field
Example
company.basic_info.company_id
631394
company.basic_info.name
"Stripe"
company.basic_info.primary_domain
"stripe.com"
company.basic_info.professional_network_id
"2135371"
company.basic_info.industries
["Technology, Information and Internet"]
Field
Example
company.headcount.total
14522
company.headcount.range
"5001-10000"
company.followers.count
1335688
company.revenue.estimated.lower_bound_usd
500000000
Field
Example
location.raw
"Melbourne, Victoria, Australia"
location.country
"Australia"
location.state
"Victoria"
location.district
"Southbank"
location.city
"Melbourne"
For radius queries (“within 25 km of San Francisco”), filter on location
with the geo_distance / geo_exclude operators — see
Geographic radius filters.
Field
Example
content.description
Full job description text.
crustdata_job_id
41053563
metadata.date_added
"2026-04-07T11:37:29"
metadata.date_updated
"2026-04-08T00:00:00"
Sending a filter on a non-indexed field returns 500 with Unsupported columns in conditions: ['...']. Sending an unsupported group_by field
returns a similar 500 listing every supported aggregation field.
The code fence below uses jsonc because it includes inline // comments
for annotation. Strip the comments before sending it to a strict JSON
parser.
{ "crustdata_job_id": 41053563, // stable job id (use as dedupe key) "job_details": { "job_id": 41053563, // mirrors crustdata_job_id "title": "Integration Engineer (AUNZ)", "category": "Engineering", "workplace_type": "", "url": "https://www.linkedin.com/jobs/view/4398377738", "reposted_job": false, "number_of_openings": 1, }, "company": { "basic_info": { "crustdata_company_id": 631394, // filter with company.basic_info.company_id "name": "Stripe", "primary_domain": "stripe.com", "website": "https://stripe.com", "professional_network_id": "2135371", "industries": ["Technology, Information and Internet"], }, "locations": { "country": "USA", "state": "California", "city": "South San Francisco", "street_address": "354 Oyster Point Blvd, South San Francisco, California, United States", }, "headcount": { "total": 7234, "range": "5001-10000", "largest_headcount_country": "USA", }, "followers": { "count": 1335688 }, "revenue": { "estimated": { "lower_bound_usd": 500000000, "upper_bound_usd": 1000000000, }, "public_markets": null, "acquisition_status": "", }, "funding": { "total_investment_usd": 9440247725.0, "valuation_usd": 50000000000.0, "last_fundraise_date": "2026-03-09T00:00:00", "last_round_type": "secondary_market", "num_funding_rounds": 23, "investors": [ "Sequoia Capital", "Andreessen Horowitz", "Founders Fund", ], }, "competitors": { "websites": ["https://plaid.com", "https://paystack.com"], }, }, "location": { "raw": "Melbourne, Victoria, Australia", // as advertised on the posting "city": "Melbourne", // city/state/country are geocoded from raw "district": null, "state": "Victoria", "country": "Australia", // normalized country name "pincode": null, }, "content": { "description": "Stripe is a financial infrastructure platform for businesses...", }, "metadata": { "date_added": "2026-04-07T11:37:29", "date_updated": "2026-04-08T00:00:00", },}
Nulls are normal. Nested objects such as revenue.public_markets,
location.district, location.pincode, and parts of company.funding can
legitimately be null or missing.
When you group_by on company.basic_info.company_id, each bucket carries
a metadata object whose keys use bucket-specific names rather than the
Job response dot-paths:
Bucket metadata key
Equivalent Job value
Notes
company_name
company.basic_info.name
Plain company name.
company_website_domain
company.basic_info.primary_domain
Primary website domain.
linkedin_id
company.basic_info.professional_network_id
Public-profile identifier returned only inside aggregation buckets.
crustdata_company_id
company.basic_info.crustdata_company_id
Crustdata company id. Defined in the spec as nullable; the bucket key already carries this value.
company.basic_info.company_id and
company.basic_info.crustdata_company_id refer to the same integer. Use the
short alias in filters and aggregations.field. The response shape
writes the value under crustdata_company_id.
The city, state, and country fields are derived by geocoding the
raw location string, so they carry normalized place names rather than the
raw text’s wording. city can be an empty string "" when the raw
location resolves to an area broader than a city (for example
"San Francisco Bay Area" geocodes to state: "California" with an
empty city).
Country values are geocoded and normalized.location.country carries
normalized full country names ("United States", "United Kingdom"). A
small share of rows still carries a residual variant such as
"United States of America". When completeness matters, match both with
in, or pre-discover the exact indexed values by running a group_by on
location.country.
Most Job fields are nullable in the spec and can legitimately be absent
or empty.
Null or missing — the field is not present on a given Job.
Blank string "" — the field was present but had no indexable value (common for job_details.workplace_type). Treat blank as “unspecified”, not as the same thing as null.
Sparse nested objects — company.funding, company.revenue, and company.competitors are often missing for smaller or private companies.
is_null / is_not_null operators are currently not implemented — request the field via fields and filter for null presence client-side.
Request body failed validation — limit out of range, a missing required key, or an unrecognized parameter (for example sending column instead of field).
401
unauthorized
The Authorization header is missing, malformed, or contains an invalid API key.
500
internal_error
The query could not be executed — including when a filter or aggregation references an unsupported field, when a geo_distance/geo_exclude value is malformed or used on a non-geo field, or when a geo location string cannot be resolved to coordinates. Also covers transient server errors; retry after a short delay.
Every error — including 401 — uses the same nested envelope:
{ "error": { "type", "message", "metadata" } }. Branch on error.type
rather than string-matching message.
{ "error": { "type": "invalid_request", "message": "'limit' must be at most 1000. Got 5000.", "metadata": [ { "field": "limit", "type": "less_than_equal", "message": "'limit' must be at most 1000. Got 5000." } ] }}
How to paginate, sort, select fields, and aggregate results in
Search Jobs.For worked examples, see Examples. For filter
grammar, operators, and the full field catalog, see
Reference.
Replace YOUR_API_KEY in each example with your actual API key. All
requests require the x-api-version: 2025-11-01 header.
sorts is an ordered array. Each item has a field and order ("asc"
or "desc"). Sorts apply in array order — the first sort is the primary
key, the second breaks ties, and so on.
Sort allowlist is narrower than filter allowlist. Sort only works on
numeric, date, and a small set of scalar fields. Sorting on text fields like
job_details.title, job_details.category, or company.basic_info.name
returns Unsupported columns in conditions.
Pagination is cursor-based. Each response returns a next_cursor (or
null when you reach the end). To fetch the next page, resend the original
request body with cursor set to the previous next_cursor.
1
Fetch the first page
Omit cursor and set limit to your page size (max 1000).
2
Walk forward
Take next_cursor from the response and pass it back as cursor in the
next request. Keep filters, sorts, and fields identical — if you
change them, the cursor becomes meaningless.
3
Stop when `next_cursor` is null
A null cursor means you’ve reached the end of the result set.
Best-effort, not strict snapshot. A
cursor is consistent with respect to the filter, sort, and field
selection you sent on the first page, so the same query will keep
paging forward over a coherent result stream. However, because the
underlying indexed dataset is continuously updated, new jobs indexed
between page requests can cause minor drift in total_count and in the
exact position of individual rows. Treat pagination as best-effort,
not a strict snapshot.For bulk exports where every row matters:
Constrain your filter to a bounded date window (for example
metadata.date_added >= 2025-01-01 AND < 2025-07-01) so newly
indexed jobs outside the window do not affect the walk, and
Re-run the full walk periodically and diff against the prior
snapshot using crustdata_job_id as the dedupe key.
What the indexed Jobs dataset represents. The Search Jobs dataset
is a rolling index of job listings discovered from the web, refreshed
on an ongoing basis. Each row has:
metadata.date_added — when Job was posted.
Closed or removed listings are not guaranteed to disappear from the
index immediately. To approximate “currently hiring” queries, filter on
a recent metadata.date_added window (for example, within the last 30 days) and pair it with the hiring
company’s firmographics. For alerting or repeated exports, keep your
date windows bounded and dedupe rows with crustdata_job_id.
Dates and timezones. When you pass a date-only value like
"2025-01-01", the backend interprets it as 2025-01-01T00:00:00 in
UTC. Ranges using => are inclusive of the boundary and < is
exclusive, so "metadata.date_added" >= "2025-01-01"AND< "2025-07-01" covers every listing indexed between Jan 1 (inclusive)
and Jul 1 (exclusive) in UTC. Pass full timestamps like
"2025-01-01T08:00:00" when you need finer precision.
Use fields to return only the dot-paths you need. The top-level groups
are crustdata_job_id, job_details, company, location, content,
metadata. You can request:
A whole group — "company" returns every company.* sub-object.
A sub-object — "company.basic_info" returns only the basic info block.
A single field — "company.basic_info.name" returns just the name.
Recommended default field set for most dashboards:
["job_details.title", "job_details.category", "job_details.url", "company.basic_info.name", "company.basic_info.primary_domain", "location.raw", "metadata.date_added"].
Aggregations let you roll up results without returning individual job rows.
Set limit: 0 when you only want aggregation output. Two types are
supported:
count — returns the total number of jobs matching filters.
group_by — buckets the results by field and returns per-bucket counts.
"count" for a simple total, "group_by" to bucket by field.
field
string
Required for group_by
Dot-path to group by. Must be in the Groupable fields allowlist.
agg
string (enum)
Required for group_by
Sub-aggregation inside each bucket. Currently only "count" is supported.
size
integer
No (default 100)
Maximum number of buckets to return. Min 1, max 1000.
Each AggregationResponseItem echoes type and field, then carries:
value (integer) — populated for count aggregations. The total match count.
buckets (array) — populated for group_by aggregations. Each bucket has a key, count, and a metadata object whose keys depend on the grouped field. See Aggregation bucket metadata.
You can include multiple aggregations in a single request; the response
returns them in aggregations[] in the same order you sent them.