Build a US Jobs Meta-Search from Monster (2026 Guide)
Thirdwatch's Monster Scraper at $0.008 per job is one of three foundation sources for a US jobs meta-search engine — combine with Indeed Scraper and ZipRecruiter Scraper to cover the bulk of US job postings in a single search interface. This guide is the canonical recipe for building a meta-search on top of Apify's three Camoufox-based jobs actors, with Postgres ingestion, Meilisearch faceted search, and source-priority dedupe.
Why build a US jobs meta-search
US job-search is fragmented across LinkedIn, Indeed, Monster, ZipRecruiter, Glassdoor, and a long tail of niche boards. According to Pew Research's 2024 survey on US job-seeking behaviour, the median US job-seeker visits 3-5 boards during an active search and consolidates results manually. A meta-search interface that returns deduped listings across boards captures real user value — and the unit economics work because Apify's stealth-browser architecture compresses the per-source data cost to under a cent per job.
The job-to-be-done is structured. A meta-search builder wants daily ingestion across three or four US sources, dedupe, and a fast search UX. A staffing agency wants their internal applicant-search interface to cover the same breadth as the public-facing meta-searches. A salary-research platform wants comprehensive cross-board coverage to compute robust median bands. A workforce-analytics SaaS targeting HR teams wants to embed cross-board listing search as a feature alongside their primary product. A US-recruiting agency building a candidate-attraction landing page wants the meta-search as content marketing rather than as the primary product. All of these reduce to multi-source pull → dedupe → Postgres or search-engine ingestion. Monster at $0.008 sits in the middle of the cost band; combined with Indeed and ZipRecruiter it produces a complete US dataset that no single board provides on its own.
How does this compare to the alternatives?
Three options for building a US jobs meta-search data layer:
| Approach | Cost per 1,000 jobs × daily × 3 sources | Reliability | Setup time | Maintenance |
|---|---|---|---|---|
| Per-source DIY scrapers | Free compute, weeks of dev | Brittle without humanize tuning | 8–16 weeks | You own three stealth layers |
| Indeed Hiring Insights API + paid feeds | $30K–$200K/year flat | High | Weeks–months | Vendor lock-in |
| Thirdwatch Monster + Indeed + ZipRecruiter | $24/day at FREE = $720/month | Production-tested across all three | Half a day | Thirdwatch maintains all three |
The DIY estimate reflects what most teams burn before having all three boards stable; Camoufox + DataDome bypass is a real engineering project per board. The Monster Scraper actor page and the Indeed and ZipRecruiter pages all use the same canonical schema, which collapses the meta-search build to a half-day for the integration plus search/UI work on top.
How to build a US jobs meta-search in 4 steps
Step 1: How do I authenticate against Apify?
Sign in at apify.com (free tier, no credit card), open Settings → Integrations, and copy your personal API token:
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"
Step 2: How do I pull from all three US sources in parallel?
Spawn one async run per source × metro. The three Thirdwatch scrapers all return the same canonical schema.
import os, requests, time, json, pathlib
TOKEN = os.environ["APIFY_TOKEN"]
ACTORS = {
"monster": "thirdwatch~monster-jobs-scraper",
"indeed": "thirdwatch~indeed-jobs-scraper",
"ziprecruiter": "thirdwatch~ziprecruiter-scraper",
}
QUERIES = ["software engineer", "registered nurse", "accountant"]
METROS = ["New York, NY", "Los Angeles, CA", "Chicago, IL",
"Houston, TX", "Phoenix, AZ"]
run_ids = []
for source, actor in ACTORS.items():
for metro in METROS:
r = requests.post(
f"https://api.apify.com/v2/acts/{actor}/runs",
params={"token": TOKEN},
json={"queries": QUERIES, "location": metro, "maxResults": 200},
)
run_ids.append((source, metro, r.json()["data"]["id"]))
time.sleep(0.5)
results = []
for source, metro, run_id in run_ids:
while True:
s = requests.get(f"https://api.apify.com/v2/actor-runs/{run_id}",
params={"token": TOKEN}).json()["data"]["status"]
if s in ("SUCCEEDED", "FAILED", "ABORTED"):
break
time.sleep(20)
if s == "SUCCEEDED":
items = requests.get(
f"https://api.apify.com/v2/actor-runs/{run_id}/dataset/items",
params={"token": TOKEN}).json()
for j in items:
j["source"] = source
j["metro"] = metro
results.extend(items)
print(f"Total raw jobs: {len(results)}")
3 sources × 5 metros × 200 jobs = 3,000 raw, completing in 25-40 minutes wall-clock with parallelism, costing $24.
Step 3: How do I dedupe with source-priority preference?
Build the canonical 4-tuple key. When the same job appears across boards, prefer the source with the most complete data.
import pandas as pd, re
def normalise(s):
return re.sub(r"\W+", " ", (s or "").lower()).strip()
df = pd.DataFrame(results)
df["dedupe_key"] = (
df.title.fillna("").apply(normalise) + "|"
+ df.company.fillna("").apply(normalise) + "|"
+ df.location.fillna("").apply(normalise) + "|"
+ df.salary_min.fillna(-1).astype(int).astype(str)
)
# Salary fill rate: ZipRecruiter > Monster > Indeed
SOURCE_PRIORITY = {"ziprecruiter": 0, "monster": 1, "indeed": 2}
df["priority"] = df.source.map(SOURCE_PRIORITY)
unique = (df.sort_values(["dedupe_key", "priority"])
.drop_duplicates(subset=["dedupe_key"], keep="first")
.drop(columns=["priority"]))
print(f"Deduped: {len(df)} → {len(unique)} unique ({len(unique)/len(df):.0%})")
Expect 35-45% overlap to collapse to 55-65% unique listings.
Step 4: How do I serve fast faceted search via Meilisearch?
Push the deduped dataset to Meilisearch with source, salary, and location facets.
import meilisearch
client = meilisearch.Client("http://meilisearch:7700", os.environ["MEILI_KEY"])
index = client.index("us_jobs")
index.update_settings({
"filterableAttributes": ["source", "salary_min", "location", "salary_period"],
"sortableAttributes": ["salary_max", "posted_date"],
"searchableAttributes": ["title", "company", "description"],
})
docs = unique.to_dict("records")
for d in docs:
d["id"] = d["dedupe_key"]
index.add_documents(docs, primary_key="id")
print(f"Indexed {len(docs)} jobs in Meilisearch")
Pair with a Next.js or Astro frontend; users get sub-100ms typo-tolerant search across thousands of fresh deduped US listings updated daily.
Sample output
A single deduped record looks like this — same canonical schema across all three sources, with source distinguishing origin.
{
"title": "Registered Nurse - ICU",
"company": "Beth Israel Deaconess Medical Center",
"location": "Boston, MA",
"salary_text": "$80,000 - $115,000",
"salary_min": 80000,
"salary_max": 115000,
"salary_currency": "USD",
"salary_period": "yearly",
"description": "Beth Israel Deaconess seeks an experienced ICU Registered Nurse...",
"posted_date": "2026-04-21",
"source": "monster",
"url": "https://www.monster.com/job-openings/registered-nurse-icu-boston-ma"
}
source lets the meta-search UI offer per-board filtering. dedupe_key (computed downstream, not stored on the record) is the natural key for upserts. salary_min and salary_max are normalised to integer USD across all three sources.
Common pitfalls
Three things break US jobs meta-searches on multi-source data. Salary-period mixing across sources — ZipRecruiter publishes more hourly bands than Monster or Indeed; dedupe keys including salary_min will treat hourly $30 and yearly $30 as different (correctly), but downstream salary-band filters need to filter on salary_period first to avoid mixing. Posted-date drift — Indeed often returns relative dates ("1 day ago"); for chronological sorting, use your ingestion timestamp rather than posted_date for cross-source consistency. Meta-search source attribution — most users want to know which board a listing came from for click-through; surface source in the UI rather than hiding it behind a unified URL, otherwise users don't trust the meta-search.
Thirdwatch's three US-jobs scrapers all use the same Camoufox + humanize architecture and same canonical schema, which is the deliberate symmetry that makes a meta-search straightforward to build. The combined cost (~$720/month at FREE pricing for 3K-job daily ingestion across three boards) sits well below any commercial meta-search-data subscription. A fourth subtle issue worth flagging is that meta-search products that hide the underlying source from the user usually fare worse on long-term retention than ones that surface the source clearly — users want to know whether a listing came from Indeed, Monster, or ZipRecruiter so they can decide which platform to apply through and which board to bookmark for future searches.
Related use cases
- Scrape Monster jobs for a recruiter pipeline
- Track US job market with Monster data
- Monster vs. Indeed vs. ZipRecruiter — data coverage
- The complete guide to scraping job boards
- All Thirdwatch use-case guides
Frequently asked questions
How much does it cost to run a US jobs meta-search?
Pulling 1,000 jobs/day per source across three sources at FREE pricing = $24/day or ~$720/month. At GOLD volume tiers ($0.004/job each), the same coverage runs $360/month. Most meta-search products monetize via affiliate commissions, ATS partnerships, or premium recruiter tools and break even at this cost basis.
Which sources should I include for full US coverage?
Three core US sources: Indeed (volume + tech-mainstream), Monster (mid-market + healthcare/manufacturing), ZipRecruiter (hourly + retail + gig). Optional fourth: SimplyHired (long-tail mid-market) for bonus coverage. The three core boards capture roughly 80% of unique US postings any given week.
How fresh does a meta-search need to be?
Six-hourly is the sweet spot. Hourly is overkill — most US job postings stay live 14-30 days, so daily diff is fine. Six-hourly catches early-morning recruiter postings within hours and gives users a meaningful freshness advantage over Indeed-only or LinkedIn-only competitors.
What database and search layer should I use?
For under 100K active listings, Postgres with full-text GIN handles search at sub-100ms. Past 100K, push to Meilisearch or Typesense for typo-tolerant faceted search. Both run on $20-$40/month VMs at this scale.
How do I dedupe across sources?
Build a 4-tuple key on (title-normalised, company-normalised, location-normalised, salary_min). Cross-source URLs differ even for the same role. The 4-tuple key catches 85-90% of cross-source duplicates; remaining 10-15% are usually distinct legitimate listings.
Can I prioritise certain sources for specific fields?
Yes. ZipRecruiter has the highest salary fill-rate; Monster has the longest descriptions; Indeed has the freshest postings. Build a source-priority map per field and merge using priority during dedupe. This is the canonical pattern for high-quality multi-source aggregation.
Run the Monster Scraper on Apify Store — pay-per-job, free to try, no credit card to test.