Skip to main content
Thirdwatchthirdwatch
Jobs & recruitment

The Complete Guide to Scraping Job Boards (2026)

Pick the right Thirdwatch scraper for any jobs use case — LinkedIn, Indeed, Glassdoor, Naukri, Monster, Career Sites and 10 more. Decision tree + cross-source recipes.

Apr 27, 2026 · 6 min read · 1,267 words
See the scraper →

Thirdwatch publishes 14 dedicated jobs scrapers covering LinkedIn, Indeed, Glassdoor, Naukri, Monster, ZipRecruiter, SimplyHired, Wellfound, CutShort, RemoteOK, Adzuna, Reed, Career Sites (Greenhouse/Lever ATS), Google Jobs, and Google Search jobs aggregation. This guide is the decision tree for picking the right one (or combination) for your use case — recruiter pipelines, salary benchmarks, hiring-velocity dashboards, talent-market research.

The job-scraping landscape

Job board coverage is fragmented by geography, employer tier, and role type. According to LinkedIn's 2024 Workforce report, the platform indexes 14M+ active listings globally; Indeed's Hiring Lab reports 7M+ US listings. No single board has more than 30% of total US public job postings — which is why production aggregators run 3-5 sources in parallel.

For a recruiter team, the right answer is usually 2-3 sources. For a meta-search aggregator or labor-economics research function, 5-8. For monopoly-source intelligence (India IT-services, US healthcare, EU mid-market), one or two specialists per geography.

Compare Thirdwatch jobs scrapers

Scraper Coverage Approach Cost/1K Best for
LinkedIn Jobs Global, MNC + product Pure HTTP guest API Pay per job Senior + product hiring
Indeed US-broad production-grade anti-bot tooling stealth Pay per job Mid-market + salary
Glassdoor US/UK Playwright Pay per job Reviews, salary, interviews
Naukri India dominant Browser fetch Pay per job India IT services
Google Jobs 20+ boards aggregated production-grade anti-bot tooling + JSON Pay per job Single-query meta-search
Google Search Jobs SERP-level search-engine-friendly proxy proxy Pay per result SEO competition research
Monster US/UK production-grade anti-bot tooling Pay per job Mid-market non-tech
ZipRecruiter US production-grade anti-bot tooling + Turnstile Pay per job Hourly/blue-collar
SimplyHired US Playwright Pay per job Aggregator coverage
Wellfound Startup-focused production-grade anti-bot tooling Pay per job Early-stage tech
CutShort India startups TLS-level fingerprinting + JSON-LD Pay per job India tech startups
RemoteOK Remote-only Public JSON API Pay per job Remote-first roles
Adzuna UK/EU Lightweight HTTP path Pay per job UK/EU mid-market
Reed UK HTTP + Next.js data Pay per job UK structured data
Career Site Scraper Direct ATS HTTP (Lever/Greenhouse APIs) Pay per job Greenhouse, Lever direct
LinkedIn Profiles Candidate side HTTP + Sec-Fetch Pay per profile Candidate enrichment

Decision tree: which scraper for which use case?

"I'm building a US-focused jobs aggregator." Start with Google Jobs (one query covers 20+ boards). Layer LinkedIn Jobs and Indeed for depth on top-priority listings. Add Career Site Scraper for direct-ATS data on high-value employers (Greenhouse, Lever).

"I'm benchmarking salaries across roles." Indeed (US, employer-published) + Naukri (India, Lacs format) + Glassdoor (US, with estimates as fallback). For role × experience × metro percentile bands, target 200+ rows per cell.

"I'm tracking hiring velocity at competitor companies." Indeed for mid-market employers + LinkedIn for enterprise. Daily snapshot, dedupe on apply_url, alert on 3x+ delta over 30-day baseline.

"I'm building an India-only recruiter pipeline." Naukri (primary, IT services + mid-market) + LinkedIn India (MNC + product) + CutShort (startups). For comp data, layer in AmbitionBox.

"I'm running an ABM pipeline that includes hiring signals." LinkedIn Jobs (filtered by companyName) + Career Site Scraper (direct ATS for top accounts) + LinkedIn Profile Scraper for decision-maker enrichment. Cross-reference hiring spikes with profile-side intent signals.

"I want remote jobs only." RemoteOK (cheapest, JSON API, no proxy needed) + LinkedIn filtered to "Remote". For India remote, add CutShort.

"I need raw labor-market velocity data for research." Google Jobs (cross-source aggregation) for breadth + Indeed for depth + BLS for authoritative reference. Compute 7-day rolling deltas vs prior 28-day average.

Cross-source recipe: build a 3-source aggregator

import os, requests, pandas as pd

TOKEN = os.environ["APIFY_TOKEN"]

def run(actor, payload, timeout=3600):
    r = requests.post(
        f"https://api.apify.com/v2/acts/{actor}/run-sync-get-dataset-items",
        params={"token": TOKEN}, json=payload, timeout=timeout
    )
    return r.json()

QUERIES = ["software engineer", "data scientist", "product manager"]
LOCS = ["New York", "San Francisco", "Austin"]

linkedin = run("thirdwatch~linkedin-jobs-scraper",
               {"queries": [f"{q} {loc}" for q in QUERIES for loc in LOCS],
                "maxResults": 100})
indeed = run("thirdwatch~indeed-jobs-scraper",
             {"queries": [f"{q} {loc}" for q in QUERIES for loc in LOCS],
              "country": "us", "maxResults": 100})
google = run("thirdwatch~google-jobs-scraper",
             {"queries": [f"{q} {loc}" for q in QUERIES for loc in LOCS],
              "country": "us", "maxResults": 100})

df = pd.concat([
    pd.DataFrame(linkedin).assign(source="linkedin"),
    pd.DataFrame(indeed).assign(source="indeed"),
    pd.DataFrame(google).assign(source="google_jobs"),
], ignore_index=True)

# Cross-source dedup on canonical 4-tuple
df["title_norm"] = df.title.str.lower().str.replace(r"[^a-z0-9 ]", "", regex=True)
df["company_norm"] = df.company_name.str.lower().str.strip()
df["loc_norm"] = df.location.str.split(",").str[0].str.lower().str.strip()
df["salary_min"] = df.salary.str.extract(r"(\d{4,6})").astype(float)
df = df.drop_duplicates(subset=["title_norm", "company_norm", "loc_norm", "salary_min"])

print(f"{len(df)} unique jobs across {df.source.nunique()} sources")
print(df.source.value_counts())

About 50-60% of Google Jobs records overlap with direct LinkedIn or Indeed rows; the unique 40-50% is the lift from running Google Jobs alongside.

All use-case guides for jobs scrapers

LinkedIn Jobs

LinkedIn Profiles

Indeed

Glassdoor

Naukri

Google Jobs

Career Site Scraper (Greenhouse / Lever ATS)

CutShort, RemoteOK, Monster, SimplyHired

(Full Wave 1 + Wave 2 list — 100+ guides — at /blog.)

Common patterns across jobs scrapers

Canonical natural keys. Each source has one stable per-posting key:

  • LinkedIn / Indeed: apply_url
  • Naukri: apply_url (job-listing URL)
  • Google Jobs: (title, company, location) since apply_url varies per source
  • Career Sites: ATS-job-id within domain

Re-listing inflation. Companies close and re-post roles within 30-90 days. Smooth velocity calculations with 7-day rolling averages and cross-source dedup on the 4-tuple (title-norm, company-norm, location-norm, salary_min).

Salary normalization. Indeed publishes ranges + units ("$80K-$120K a year", "$25-$35 an hour"). Naukri publishes Lacs ("12-18 Lacs P.A."). LinkedIn shows parsed bands when available. Always extract min/max integers + normalize unit (annual / hourly × 2080 / monthly × 12) before benchmark aggregation.

Function classification. Title-keyword matching produces stable cohorts: engineering, sales, marketing, product, ops. About 90% of US tech roles classify cleanly via title keywords; for the long tail, fall back to description-keyword matching.

Frequently asked questions

Which job board has the best coverage?

Depends on geography and seniority. LinkedIn dominates US/UK/EU senior + product roles. Indeed has the broadest US coverage including mid-market and blue-collar. Naukri is the canonical India source for IT services and mid-market. Google Jobs aggregates 20+ boards into one query — best single discovery surface, but descriptions are sometimes shorter than originals. Most aggregators run Google Jobs for breadth + LinkedIn/Indeed for depth on top results.

Where do I start if I'm building a recruiter pipeline?

Three actors cover 90%+ of recruiter use cases: LinkedIn Jobs for breadth + structured fields + applicant counts; Indeed for mid-market + salary disclosure; Career Site Job Scraper for direct ATS data from Greenhouse/Lever (skips the aggregator markup). All run on pay-per-result pricing with volume tiers. Layer Glassdoor for company-research depth on shortlisted employers.

Which job board reveals salary most often?

Indeed publishes salary on ~40% of US listings (employer-published, regulated by state pay-transparency laws). Naukri publishes on ~25% (Lacs format). Glassdoor estimates salary on most listings (model-derived, with disclosure). LinkedIn rarely publishes salary on the job page itself but exposes parsed bands via the structured data field. For salary benchmarking, combine Indeed (US/UK/EU) + Naukri (India) + Glassdoor (estimates) + LinkedIn (structured).

What's the right strategy for a multi-source aggregator?

Two-tier approach: (1) Google Jobs for primary discovery (one query covers 20+ boards); (2) per-source scrapers for deep enrichment on top-priority listings (Google Jobs descriptions are sometimes truncated). Dedupe across sources on (title-norm, company-norm, location-norm, salary_min) — about 50-60% of Google Jobs rows overlap with direct-source rows; the unique 40-50% is what direct scraping misses.

Are remote jobs handled well?

RemoteOK is the canonical remote-first source — uses their public JSON API (near-zero infra cost) and prices on a pay-per-record basis. LinkedIn and Indeed both expose remote flags in structured fields. For remote-only aggregators, RemoteOK + LinkedIn (filtered to remote) covers 80%+ of US/UK remote tech jobs. For India-remote, layer in CutShort which has strong remote-startup coverage.

How much does a full jobs aggregator cost monthly?

Depends on scope. Pay-per-result pricing with volume tiers, so cost scales linearly with how many records you pull and drops at higher tiers. A daily 200-keyword run on LinkedIn + Indeed + Google Jobs surfaces tens of thousands of records per day; mid-market aggregators (50K daily records, mixed sources) stay well within typical data-pipeline budgets. Most pipelines are cost-bound by scope, not unit price.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.