Jobs & recruitment

The Complete Guide to Scraping Job Boards (2026)

Pick the right Thirdwatch scraper for any jobs use case — LinkedIn, Indeed, Glassdoor, Naukri, Monster, Career Sites and 10 more. Decision tree + cross-source recipes.

Apr 27, 2026 · 6 min read · 1,295 words

See the scraper →

Thirdwatch publishes 14 dedicated jobs scrapers covering LinkedIn, Indeed, Glassdoor, Naukri, Monster, ZipRecruiter, SimplyHired, Wellfound, CutShort, RemoteOK, Adzuna, Reed, Career Sites (Greenhouse/Lever ATS), Google Jobs, and Google Search jobs aggregation. This guide is the decision tree for picking the right one (or combination) for your use case — recruiter pipelines, salary benchmarks, hiring-velocity dashboards, talent-market research.

▶ Skip the setup: Run this as a ready-to-go task on Apify → — pre-loaded with the exact configuration from this guide. No code required.

The job-scraping landscape

Job board coverage is fragmented by geography, employer tier, and role type. According to LinkedIn's 2024 Workforce report, the platform indexes 14M+ active listings globally; Indeed's Hiring Lab reports 7M+ US listings. No single board has more than 30% of total US public job postings — which is why production aggregators run 3-5 sources in parallel.

For a recruiter team, the right answer is usually 2-3 sources. For a meta-search aggregator or labor-economics research function, 5-8. For monopoly-source intelligence (India IT-services, US healthcare, EU mid-market), one or two specialists per geography.

Compare Thirdwatch jobs scrapers

Scraper	Coverage	Approach	Cost/1K	Best for
LinkedIn Jobs	Global, MNC + product	Pure HTTP guest API	Pay per job	Senior + product hiring
Indeed	US-broad	production-grade anti-bot tooling stealth	Pay per job	Mid-market + salary
Glassdoor	US/UK	Playwright	Pay per job	Reviews, salary, interviews
Naukri	India dominant	Browser fetch	Pay per job	India IT services
Google Jobs	20+ boards aggregated	production-grade anti-bot tooling + JSON	Pay per job	Single-query meta-search
Google Search Jobs	SERP-level	search-engine-friendly proxy proxy	Pay per result	SEO competition research
Monster	US/UK	production-grade anti-bot tooling	Pay per job	Mid-market non-tech
ZipRecruiter	US	production-grade anti-bot tooling + Turnstile	Pay per job	Hourly/blue-collar
SimplyHired	US	Playwright	Pay per job	Aggregator coverage
Wellfound	Startup-focused	production-grade anti-bot tooling	Pay per job	Early-stage tech
CutShort	India startups	TLS-level fingerprinting + JSON-LD	Pay per job	India tech startups
RemoteOK	Remote-only	Public JSON API	Pay per job	Remote-first roles
Adzuna	UK/EU	Lightweight HTTP path	Pay per job	UK/EU mid-market
Reed	UK	HTTP + Next.js data	Pay per job	UK structured data
Career Site Scraper	Direct ATS	HTTP (Lever/Greenhouse APIs)	Pay per job	Greenhouse, Lever direct
LinkedIn Profiles	Candidate side	HTTP + Sec-Fetch	Pay per profile	Candidate enrichment

Decision tree: which scraper for which use case?

"I'm building a US-focused jobs aggregator." Start with Google Jobs (one query covers 20+ boards). Layer LinkedIn Jobs and Indeed for depth on top-priority listings. Add Career Site Scraper for direct-ATS data on high-value employers (Greenhouse, Lever).

"I'm benchmarking salaries across roles." Indeed (US, employer-published) + Naukri (India, Lacs format) + Glassdoor (US, with estimates as fallback). For role × experience × metro percentile bands, target 200+ rows per cell.

"I'm tracking hiring velocity at competitor companies." Indeed for mid-market employers + LinkedIn for enterprise. Daily snapshot, dedupe on apply_url, alert on 3x+ delta over 30-day baseline.

"I'm building an India-only recruiter pipeline." Naukri (primary, IT services + mid-market) + LinkedIn India (MNC + product) + CutShort (startups). For comp data, layer in AmbitionBox.

"I'm running an ABM pipeline that includes hiring signals." LinkedIn Jobs (filtered by companyName) + Career Site Scraper (direct ATS for top accounts) + LinkedIn Profile Scraper for decision-maker enrichment. Cross-reference hiring spikes with profile-side intent signals.

"I want remote jobs only." RemoteOK (cheapest, JSON API, no proxy needed) + LinkedIn filtered to "Remote". For India remote, add CutShort.

"I need raw labor-market velocity data for research." Google Jobs (cross-source aggregation) for breadth + Indeed for depth + BLS for authoritative reference. Compute 7-day rolling deltas vs prior 28-day average.

Cross-source recipe: build a 3-source aggregator

import os, requests, pandas as pd

TOKEN = os.environ["APIFY_TOKEN"]

def run(actor, payload, timeout=3600):
    r = requests.post(
        f"https://api.apify.com/v2/acts/{actor}/run-sync-get-dataset-items",
        params={"token": TOKEN}, json=payload, timeout=timeout
    )
    return r.json()

QUERIES = ["software engineer", "data scientist", "product manager"]
LOCS = ["New York", "San Francisco", "Austin"]

linkedin = run("thirdwatch~linkedin-jobs-scraper",
               {"queries": [f"{q} {loc}" for q in QUERIES for loc in LOCS],
                "maxResults": 100})
indeed = run("thirdwatch~indeed-jobs-scraper",
             {"queries": [f"{q} {loc}" for q in QUERIES for loc in LOCS],
              "country": "us", "maxResults": 100})
google = run("thirdwatch~google-jobs-scraper",
             {"queries": [f"{q} {loc}" for q in QUERIES for loc in LOCS],
              "country": "us", "maxResults": 100})

df = pd.concat([
    pd.DataFrame(linkedin).assign(source="linkedin"),
    pd.DataFrame(indeed).assign(source="indeed"),
    pd.DataFrame(google).assign(source="google_jobs"),
], ignore_index=True)

# Cross-source dedup on canonical 4-tuple
df["title_norm"] = df.title.str.lower().str.replace(r"[^a-z0-9 ]", "", regex=True)
df["company_norm"] = df.company_name.str.lower().str.strip()
df["loc_norm"] = df.location.str.split(",").str[0].str.lower().str.strip()
df["salary_min"] = df.salary.str.extract(r"(\d{4,6})").astype(float)
df = df.drop_duplicates(subset=["title_norm", "company_norm", "loc_norm", "salary_min"])

print(f"{len(df)} unique jobs across {df.source.nunique()} sources")
print(df.source.value_counts())

About 50-60% of Google Jobs records overlap with direct LinkedIn or Indeed rows; the unique 40-50% is the lift from running Google Jobs alongside.

All use-case guides for jobs scrapers

LinkedIn Jobs

LinkedIn Profiles

Indeed

Glassdoor

Naukri

Google Jobs

Career Site Scraper (Greenhouse / Lever ATS)

CutShort, RemoteOK, Monster, SimplyHired

(Full Wave 1 + Wave 2 list — 100+ guides — at /blog.)

Common patterns across jobs scrapers

Canonical natural keys. Each source has one stable per-posting key:

LinkedIn / Indeed: apply_url
Naukri: apply_url (job-listing URL)
Google Jobs: (title, company, location) since apply_url varies per source
Career Sites: ATS-job-id within domain

Re-listing inflation. Companies close and re-post roles within 30-90 days. Smooth velocity calculations with 7-day rolling averages and cross-source dedup on the 4-tuple (title-norm, company-norm, location-norm, salary_min).

Salary normalization. Indeed publishes ranges + units ("$80K-$120K a year", "$25-$35 an hour"). Naukri publishes Lacs ("12-18 Lacs P.A."). LinkedIn shows parsed bands when available. Always extract min/max integers + normalize unit (annual / hourly × 2080 / monthly × 12) before benchmark aggregation.

Function classification. Title-keyword matching produces stable cohorts: engineering, sales, marketing, product, ops. About 90% of US tech roles classify cleanly via title keywords; for the long tail, fall back to description-keyword matching.

Frequently asked questions

Which job board has the best coverage?

Depends on geography and seniority. LinkedIn dominates US/UK/EU senior + product roles. Indeed has the broadest US coverage including mid-market and blue-collar. Naukri is the canonical India source for IT services and mid-market. Google Jobs aggregates 20+ boards into one query — best single discovery surface, but descriptions are sometimes shorter than originals. Most aggregators run Google Jobs for breadth + LinkedIn/Indeed for depth on top results.

Where do I start if I'm building a recruiter pipeline?

Three actors cover 90%+ of recruiter use cases: LinkedIn Jobs for breadth + structured fields + applicant counts; Indeed for mid-market + salary disclosure; Career Site Job Scraper for direct ATS data from Greenhouse/Lever (skips the aggregator markup). All run on pay-per-result pricing with volume tiers. Layer Glassdoor for company-research depth on shortlisted employers.

Which job board reveals salary most often?

Indeed publishes salary on ~40% of US listings (employer-published, regulated by state pay-transparency laws). Naukri publishes on ~25% (Lacs format). Glassdoor estimates salary on most listings (model-derived, with disclosure). LinkedIn rarely publishes salary on the job page itself but exposes parsed bands via the structured data field. For salary benchmarking, combine Indeed (US/UK/EU) + Naukri (India) + Glassdoor (estimates) + LinkedIn (structured).

What's the right strategy for a multi-source aggregator?

Two-tier approach: (1) Google Jobs for primary discovery (one query covers 20+ boards); (2) per-source scrapers for deep enrichment on top-priority listings (Google Jobs descriptions are sometimes truncated). Dedupe across sources on (title-norm, company-norm, location-norm, salary_min) — about 50-60% of Google Jobs rows overlap with direct-source rows; the unique 40-50% is what direct scraping misses.

Are remote jobs handled well?

RemoteOK is the canonical remote-first source — uses their public JSON API (near-zero infra cost) and prices on a pay-per-record basis. LinkedIn and Indeed both expose remote flags in structured fields. For remote-only aggregators, RemoteOK + LinkedIn (filtered to remote) covers 80%+ of US/UK remote tech jobs. For India-remote, layer in CutShort which has strong remote-startup coverage.

How much does a full jobs aggregator cost monthly?

Depends on scope. Pay-per-result pricing with volume tiers, so cost scales linearly with how many records you pull and drops at higher tiers. A daily 200-keyword run on LinkedIn + Indeed + Google Jobs surfaces tens of thousands of records per day; mid-market aggregators (50K daily records, mixed sources) stay well within typical data-pipeline budgets. Most pipelines are cost-bound by scope, not unit price.

Scrape LinkedIn Jobs Without Login at Scale (2026 Guide)Scrape Indeed Jobs for a Recruiter Pipeline (2026 Guide)Scrape Naukri Jobs for India Recruiting at Scale (2026)

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.