Build Comprehensive US Jobs Coverage with SimplyHired (2026)
Thirdwatch's SimplyHired Scraper at $0.008 per record is the fourth source in a comprehensive US jobs coverage architecture — paired with Indeed, Monster, and ZipRecruiter to cross 90% of unique US postings. This guide is the canonical recipe for achieving full US jobs coverage at $32/day across four boards, with coverage-gap detection and source-priority dedupe.
Why aim for comprehensive US jobs coverage
Coverage matters disproportionately for jobs aggregator products. According to the 2024 State of the Job Market report from Glassdoor Economic Research, US job-seekers visit a median 3.4 boards during an active search and abandon any meta-search that misses listings they find elsewhere. The retention threshold is harsh: a meta-search at 70% coverage loses to one at 90% even if the 90% interface is mediocre. Coverage is the dominant ranking factor for repeat use.
The job-to-be-done is structured. A general-purpose US jobs meta-search needs four sources for >90% coverage. A workforce-analytics SaaS embedded in HR products needs the same comprehensive feed. A salary-research platform needs cross-board sample sizes large enough that any single board's biases don't dominate. All of these reduce to four-source parallel ingestion, dedupe, gap-detection, and source-priority field merging. SimplyHired is the fourth source that closes the long-tail mid-market coverage gap that the bigger three leave open.
How does this compare to the alternatives?
Three options for full US jobs coverage:
| Approach | Cost per 4K jobs × daily × 4 sources | Reliability | Setup time | Maintenance |
|---|---|---|---|---|
| Curated jobs-feed licence (LinkedIn Jobs API, Indeed Hiring Insights) | $50K–$300K/year flat | High | Months | Vendor lock-in, API rate limits |
| Per-source DIY scrapers | Free compute, weeks of dev | Brittle without humanize tuning | 12–20 weeks | You own four stealth layers |
| Thirdwatch 4-actor combo | $32/day at FREE = $960/month | Production-tested across all four | One day | Thirdwatch maintains all four |
The DIY estimate reflects the cumulative time to stabilize Camoufox + DataDome + Cloudflare Turnstile bypass across four boards. The SimplyHired Scraper actor page bundles the fourth board's stealth layer with the same canonical schema as the other three — making the integration a same-day project.
How to achieve full US jobs coverage in 4 steps
Step 1: How do I authenticate against Apify?
Sign in at apify.com (free tier, no credit card), open Settings → Integrations, and copy your personal API token:
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"
Step 2: How do I run all four US sources in parallel?
Spawn one async run per source × metro × major role-family.
import os, requests, time, json, pathlib
TOKEN = os.environ["APIFY_TOKEN"]
ACTORS = {
"indeed": "thirdwatch~indeed-jobs-scraper",
"monster": "thirdwatch~monster-jobs-scraper",
"ziprecruiter": "thirdwatch~ziprecruiter-scraper",
"simplyhired": "thirdwatch~simplyhired-jobs-scraper",
}
QUERIES = ["software engineer", "registered nurse",
"accountant", "sales representative", "truck driver"]
METROS = ["New York, NY", "Los Angeles, CA", "Chicago, IL",
"Houston, TX", "Phoenix, AZ", "Atlanta, GA", "Boston, MA"]
run_ids = []
for source, actor in ACTORS.items():
for metro in METROS:
r = requests.post(
f"https://api.apify.com/v2/acts/{actor}/runs",
params={"token": TOKEN},
json={"queries": QUERIES, "location": metro, "maxResults": 200},
)
run_ids.append((source, metro, r.json()["data"]["id"]))
time.sleep(0.5)
results = []
for source, metro, run_id in run_ids:
while True:
s = requests.get(f"https://api.apify.com/v2/actor-runs/{run_id}",
params={"token": TOKEN}).json()["data"]["status"]
if s in ("SUCCEEDED", "FAILED", "ABORTED"):
break
time.sleep(20)
if s == "SUCCEEDED":
items = requests.get(
f"https://api.apify.com/v2/actor-runs/{run_id}/dataset/items",
params={"token": TOKEN}).json()
for j in items:
j["source"] = source
j["metro"] = metro
results.extend(items)
print(f"Total raw jobs: {len(results)}")
4 sources × 7 metros × 5 queries × ~150 jobs = roughly 21,000 raw jobs/day. After dedupe expect 12,000-14,000 unique. Cost: $168 at FREE pricing or $84 at GOLD.
Step 3: How do I dedupe across all four sources with source priority?
Build the canonical 4-tuple key. When a job appears across boards, prefer the source with the most complete data per field.
import pandas as pd, re
def normalise(s):
return re.sub(r"\W+", " ", (s or "").lower()).strip()
df = pd.DataFrame(results)
df["dedupe_key"] = (
df.title.fillna("").apply(normalise) + "|"
+ df.company.fillna("").apply(normalise) + "|"
+ df.location.fillna("").apply(normalise) + "|"
+ df.salary_min.fillna(-1).astype(int).astype(str)
)
# Salary fill rate priority: ZipRecruiter > Monster > SimplyHired > Indeed
SOURCE_PRIORITY = {"ziprecruiter": 0, "monster": 1, "simplyhired": 2, "indeed": 3}
df["priority"] = df.source.map(SOURCE_PRIORITY)
unique = (df.sort_values(["dedupe_key", "priority"])
.drop_duplicates(subset=["dedupe_key"], keep="first"))
print(f"Deduped: {len(df)} → {len(unique)} unique ({len(unique)/len(df):.0%})")
Expect 35-45% overlap to collapse to 55-65% unique listings.
Step 4: How do I detect and report coverage gaps?
Compute per-source contribution and flag redundancies or anomalies.
contribution = (
unique.groupby("source")
.agg(unique_jobs=("dedupe_key", "count"))
.sort_values("unique_jobs", ascending=False)
)
contribution["pct_of_total"] = contribution.unique_jobs / contribution.unique_jobs.sum()
print(contribution)
# Per-pair overlap
for src_a, src_b in [("indeed", "monster"), ("indeed", "ziprecruiter"),
("indeed", "simplyhired"), ("monster", "ziprecruiter"),
("monster", "simplyhired"), ("ziprecruiter", "simplyhired")]:
a_keys = set(df[df.source == src_a].dedupe_key)
b_keys = set(df[df.source == src_b].dedupe_key)
overlap_pct = len(a_keys & b_keys) / max(len(a_keys), 1)
print(f"{src_a} ↔ {src_b}: {overlap_pct:.0%} of {src_a} also on {src_b}")
A pair with >70% overlap suggests one source is mostly redundant. SimplyHired typically shares 50-65% with Indeed (which is expected since SimplyHired syndicates Indeed listings) and 25-40% with Monster and ZipRecruiter.
Sample output
A single SimplyHired record looks like this — same canonical schema as the other three sources, with source distinguishing origin.
{
"title": "Product Manager",
"company": "Salesforce",
"location": "San Francisco, CA",
"salary_text": "$130,000 - $175,000 a year",
"salary_min": 130000,
"salary_max": 175000,
"salary_currency": "USD",
"salary_period": "yearly",
"description": "We're seeking a Product Manager to lead our platform team and shape the roadmap for our developer-facing APIs...",
"posted_date": "2026-04-06",
"source": "simplyhired",
"url": "https://www.simplyhired.com/job/abc123"
}
A typical four-source contribution table for one daily snapshot looks like:
| Source | Unique jobs | % of total | Avg salary fill |
|---|---|---|---|
| Indeed | 5,840 | 41% | 42% |
| ZipRecruiter | 4,120 | 29% | 64% |
| Monster | 2,680 | 19% | 53% |
| SimplyHired | 1,560 | 11% | 51% |
SimplyHired contributing 11% incremental unique listings on top of the other three is the textbook fourth-source value — meaningful but with diminishing returns past this point.
Common pitfalls
Three things break full-coverage US jobs aggregators. Source-overlap masking — when SimplyHired syndicates Indeed listings, it sometimes returns slightly modified URLs that fail dedupe-by-URL but match on the 4-tuple key; trust the 4-tuple key over URL-only dedup. Salary-period mixing across sources — ZipRecruiter publishes more hourly bands than the other three; aggregating without filtering on salary_period produces nonsensical averages. Cost compounding — at $0.008 per job × 4 sources × daily, the unit economics need monetisation that scales with usage; meta-search products with affiliate revenue typically break even at low-thousands of monthly active users, but pure-data subscriptions need higher pricing to cover the four-source baseline.
Thirdwatch's four US-jobs scrapers all use the same Camoufox + humanize stealth-browser architecture and the same canonical schema. The deliberate symmetry is what makes the four-source aggregator a same-day integration rather than a multi-month project. The combined cost (~$960/month at FREE pricing for 14K-job daily ingestion across four boards) is two orders of magnitude below any commercial full-coverage feed licence.
Related use cases
- Scrape SimplyHired jobs for an aggregator
- Find blue-collar jobs with SimplyHired
- Track mid-market US hiring via SimplyHired
- The complete guide to scraping job boards
- All Thirdwatch use-case guides
Frequently asked questions
How much US jobs coverage do you actually need?
Depends on use case. A vertical-specific aggregator (only nursing, only software engineering) covers 70-80% of relevant postings with two sources. A general-purpose meta-search needs four sources to cross 90% coverage. Adding a fifth source (e.g. Glassdoor) past four typically adds <3% incremental unique listings — diminishing returns set in fast.
What does SimplyHired add beyond Indeed, Monster, and ZipRecruiter?
Two things specifically. (1) Long-tail mid-market employers (200-2000 employees, especially regional chains) under-indexed by tech-mainstream boards. (2) Older or smaller US partner-board syndication that doesn't show up on the bigger three. SimplyHired contributes roughly 5-10% incremental unique listings to a 3-board aggregator.
How do I detect coverage gaps in my multi-source dataset?
Three diagnostics: (1) plot daily new-unique-listing counts per source — sources adding consistently below 10% suggest you have a duplicate-source mix and one is redundant. (2) compute per-source overlap matrix; pairs above 70% overlap suggest one of the two could be dropped. (3) sample manually against a known-large employer's career page — if your aggregator is missing >20% of their open roles, you have a real gap not a sampling artifact.
Can SimplyHired serve as a backup when other boards have blocking issues?
Yes. SimplyHired aggregates from many partner boards including Indeed and Monster syndication, so when one of those direct scrapers has a temporary anti-bot regression, SimplyHired typically retains coverage of the same listings (with a 6-12 hour delay). For high-availability aggregator pipelines, treat SimplyHired as a tier-2 fallback when tier-1 sources go offline.
How fresh does the multi-source aggregator need to be?
Six-hourly across all sources is the practical sweet spot. Indeed and Monster are typically refreshed at six-hourly cadence in production aggregators; SimplyHired can run twelve-hourly because its source-syndication adds latency anyway. ZipRecruiter sits in between. The combined dataset captures roughly 90% of US postings within 12 hours of original publication.
What's the practical cost of full US coverage?
Four-source ingestion (Indeed + Monster + ZipRecruiter + SimplyHired) at 1,000 jobs/source/day at FREE pricing = $32/day or ~$960/month. At GOLD volume tiers, the same coverage runs ~$480/month. A general-purpose US jobs meta-search with full coverage breaks even on a few thousand monthly users at typical affiliate-revenue rates.
Run the SimplyHired Scraper on Apify Store — pay-per-job, free to try, no credit card to test.