Jobs & recruitment

Scrape ZipRecruiter Jobs for Aggregator (2026)

Pull ZipRecruiter US jobs using Thirdwatch. Hourly + blue-collar coverage + recipes for jobs aggregator builders.

Apr 28, 2026 · 5 min read · 1,155 words

Thirdwatch's ZipRecruiter Scraper returns US hourly + mid-market jobs — title, company, location, salary, job type, description, posted date, apply URL. Built for jobs-aggregator developers, US-recruiter pipelines, hourly/blue-collar talent platforms, and labor-market research targeting non-tech US segments.

Why scrape ZipRecruiter

ZipRecruiter dominates US hourly + mid-market hiring. According to ZipRecruiter's 2024 report, the platform indexes 9M+ active US listings — the largest single-source US jobs corpus alongside Indeed. For aggregator-builders targeting non-tech US labor markets (retail, hospitality, healthcare, logistics, blue-collar trades), ZipRecruiter is essential coverage.

The job-to-be-done is structured. A US-jobs aggregator covers 50 metros × 100 keywords = 5,000 queries per refresh. A hourly-jobs platform monitors 50 retail/hospitality categories nationally. A US-labor-market research function tracks mid-market hiring shifts on hourly cadence. A staffing-agency pipeline matches candidates to ZipRecruiter postings via cross-referencing skills + location. All reduce to category + city queries + ZipRecruiter-specific result aggregation.

How does this compare to the alternatives?

Three options for ZipRecruiter data:

Approach	Cost per 1,000 records	Reliability	Setup time	Maintenance
ZipRecruiter API (paid)	$5K–$50K/year	Official	Days (approval)	Per-tier license
Indeed scraper (overlap)	Pay per result	Coverage gap on hourly/SMB	5 minutes	Generic US coverage
Thirdwatch ZipRecruiter Scraper	Pay per result	production-grade anti-bot tooling + Turnstile	5 minutes	Thirdwatch tracks ZipRecruiter changes

ZipRecruiter's first-party API is gated behind partner approval. Indeed alone misses ~30% of ZipRecruiter coverage (especially hourly/SMB). The ZipRecruiter Scraper actor page gives you raw data at the lowest unit cost.

How to scrape ZipRecruiter in 4 steps

Step 1: How do I authenticate against Apify?

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I pull a category × city batch?

Pass title + location queries.

import os, requests, pandas as pd
from itertools import product

ACTOR = "thirdwatch~ziprecruiter-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

TITLES = ["registered nurse", "warehouse associate",
          "truck driver", "retail manager",
          "restaurant server", "home health aide",
          "delivery driver", "customer service rep"]
CITIES = ["Phoenix, AZ", "Charlotte, NC", "Memphis, TN",
          "Indianapolis, IN", "Columbus, OH"]

queries = [{"title": t, "location": c} for t, c in product(TITLES, CITIES)]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={"queries": queries, "maxResults": 50},
    timeout=3600,
)
df = pd.DataFrame(resp.json())
print(f"{len(df)} jobs across {df.location.nunique()} locations")

8 titles × 5 cities = 40 queries × 50 results = up to 2,000 records — well within budget for a daily refresh at the actor's pay-per-result pricing.

Step 3: How do I dedupe + filter?

Filter to fresh, salary-disclosed roles.

df["posted_days_ago"] = df.posted_date.str.extract(r"(\d+)").astype(float)
df["has_salary"] = df.salary.notna() & (df.salary != "")

active = df[
    (df.posted_days_ago <= 7)
    & df.has_salary
].drop_duplicates(subset=["apply_url"])

print(f"{len(active)} fresh + salary-disclosed jobs")
print(active[["title", "company_name", "location", "salary"]].head(15))

Fresh + salary-disclosed cohort enables high-quality aggregator content with strong consumer signals.

Step 4: How do I push to a Postgres index?

Upsert on apply_url for cross-snapshot dedup.

import psycopg2.extras

with psycopg2.connect(...) as conn, conn.cursor() as cur:
    psycopg2.extras.execute_values(
        cur,
        """INSERT INTO jobs (apply_url, title, company_name, location,
                              salary, job_type, source, posted_date, scraped_at)
           VALUES %s
           ON CONFLICT (apply_url) DO UPDATE SET
             salary = EXCLUDED.salary,
             scraped_at = now()""",
        [(j["apply_url"], j["title"], j["company_name"], j.get("location"),
          j.get("salary"), j.get("job_type"), "ziprecruiter",
          j.get("posted_date"), "now()") for _, j in active.iterrows()],
    )
print(f"Upserted {len(active)} ZipRecruiter rows")

Sample output

A single ZipRecruiter record looks like this. Five rows weigh ~7 KB.

{
  "title": "Warehouse Associate",
  "company_name": "Amazon",
  "location": "Phoenix, AZ 85042",
  "salary": "$18.50 - $22.00 an hour",
  "job_type": "Full-time",
  "description": "Pick + pack orders in our Phoenix fulfillment center...",
  "posted_date": "2 days ago",
  "apply_url": "https://www.ziprecruiter.com/jobs/amazon-/...",
  "remote": false
}

apply_url is the canonical natural key. salary (when present) follows hourly format ($X.XX an hour) — convert to annual via × 2080 for cross-source benchmarking. posted_date enables freshness filtering.

Common pitfalls

Three things go wrong in ZipRecruiter pipelines. anti-bot bypass drift — Turnstile periodically updates challenge mechanics; the actor's Turnstile iframe click pattern is robust but may need updates. Thirdwatch tracks these changes. Salary format variance — hourly ($X/hr), weekly ($X/wk), monthly ($X/mo), annual ($X/yr); always normalize to annual before benchmark aggregation. Re-listing inflation — small/mid-market employers re-post the same role frequently; smooth velocity calculations with 7-day rolling averages.

Thirdwatch's actor handles the anti-bot work and proxy rotation so you can focus on the data. Pair ZipRecruiter with Indeed Scraper for full US coverage and SimplyHired Scraper for additional aggregator depth. A fourth subtle issue worth flagging: ZipRecruiter aggressively cross-posts listings from other sources (especially Indeed and direct ATS), so cross-source dedup is essential — typical 25-40% overlap with Indeed, normalized on (title, company, location, salary_min). A fifth pattern unique to ZipRecruiter: hourly-rate roles (warehouse, retail, hospitality) cluster heavily around state minimum-wage bands; for accurate per-region wage analysis, segment by state minimum-wage tier rather than treating national medians as comparable. A sixth and final pitfall: ZipRecruiter's "estimated salary" feature populates a salary range when the employer didn't disclose — these are model-derived estimates, not employer-published. For employer-truth analysis, filter on salary_disclosure: employer-published rather than including estimated values. A seventh and final pattern worth flagging for production teams: data-pipeline cost optimization. The actor's pricing scales linearly with record volume, so for high-cadence operations (hourly polling on large watchlists), the dominant cost driver is the size of the watchlist rather than the per-record fee. For cost-disciplined teams, tier the watchlist (Tier 1 hourly, Tier 2 daily, Tier 3 weekly) rather than running everything at the highest cadence — typical 60-80% cost reduction with minimal signal loss. Combine tiered cadence with explicit dedup keys and incremental snapshot diffing to keep storage and downstream-compute proportional to new signal rather than total watchlist size.

An eighth subtle issue: snapshot-storage strategy materially affects long-term economics. Raw JSON snapshots compressed with gzip typically run 4-8x smaller than uncompressed; for multi-year retention, always compress at write-time. Partition storage by date prefix (snapshots/YYYY/MM/DD/) to enable fast date-range queries and incremental processing rather than full-scan re-aggregation. Most production pipelines keep 90 days of raw snapshots at full fidelity + 12 months of derived per-record aggregates + indefinite retention of derived metric time-series — three retention tiers managed separately.

A ninth pattern unique to research-grade data work: schema validation should run continuously, not just at pipeline build-time. Run a daily validation suite that asserts each scraper returns the expected core fields with non-null rates above 80% (for required fields) and 50% (for optional). Alert on schema breakage same-day so consumers don't degrade silently. Most schema drift on third-party platforms shows up as one or two missing fields rather than total breakage; catch it early.

Related use cases

Frequently asked questions

Why ZipRecruiter for jobs aggregation?

ZipRecruiter dominates US hourly + mid-market hiring (retail, hospitality, healthcare, logistics, blue-collar trades) where Indeed and LinkedIn both under-index. According to ZipRecruiter's 2024 report, the platform indexes 9M+ active US listings with strong coverage of small/mid-market employers. For aggregator-builders targeting non-tech US labor markets, ZipRecruiter is essential alongside Indeed.

How does ZipRecruiter handle anti-bot defenses?

ZipRecruiter uses anti-bot bypass aggressively. Thirdwatch's actor uses production-grade anti-bot tooling + production-grade tooling behavior + Turnstile iframe click at (28,28) — production-tested with 100% bypass rate. Failed queries auto-retry with fresh proxy.

What data does the actor return?

Per job: title, company, location (city + state + zip), salary (when published, ~25% of US listings), job type, description, posted date, apply URL. Per query: results across keyword + location filtering. Salary publication has improved with state pay-transparency laws (CA, NY, CO, WA require disclosure on certain roles).

How does ZipRecruiter compare to Indeed?

Indeed has broader US coverage (7M listings) but skews toward mid-market + tech. ZipRecruiter (9M listings) catches more SMB + hourly + blue-collar roles Indeed misses. For comprehensive US labor-market coverage, run both — typically 30-40% non-overlap. For tech recruiting, Indeed is primary; for hourly/blue-collar, ZipRecruiter is primary.

What's the cost for an aggregator with ZipRecruiter?

Pay-per-result pricing. A 200-keyword daily run at 100 results each (~20K records/day) is the typical aggregator scale; combined with Indeed + LinkedIn for full US-jobs coverage, costs scale linearly with scope but stay materially below paid jobs-API alternatives. Higher-volume tiers reduce per-record cost further for SMB-mid-market aggregators at 50K+ daily records.

Can I track salary trends with ZipRecruiter?

Yes, with caveats. Salary publication on ZipRecruiter is ~25% of listings (lower than Indeed's ~40%). State pay-transparency laws pushed disclosure rates higher in CA/NY/CO/WA. For mid-market salary benchmarks (where ZipRecruiter dominates), 200+ rows per (title × metro) cell produces stable percentile bands. For tech/professional roles, Indeed has better salary depth.

Find Hourly and Blue-Collar Jobs on ZipRecruiter (2026)Build a ZipRecruiter Salary Database for Mid-Market (2026)Scrape Indeed Jobs for a Recruiter Pipeline (2026 Guide)

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.