Build a UK Jobs Aggregator from Reed (2026)
Build a UK jobs aggregator product from Reed.co.uk using Thirdwatch. Multi-source UK pipeline + recipes for job-board builders.

Thirdwatch's Reed Scraper makes UK jobs-aggregator development a structured workflow — multi-source UK pipeline foundation, dedup-validated job feeds, UK-specific filtering, niche-vertical-friendly. Built for UK aggregator builders, niche job-board operators, UK recruiting-platform startups, and UK HR-tech founders.
Why build a UK jobs aggregator from Reed
Reed is the canonical UK employer-direct job source. According to Reed Group's 2024 annual report, the platform indexes 250K+ active UK jobs with deep employer-direct relationships — material foundation for UK aggregator products. For UK aggregator builders + niche job-board operators, Reed provides the canonical UK starting feed.
The job-to-be-done is structured. A UK aggregator startup builds a multi-source job-feed combining Reed + Adzuna + LinkedIn + Indeed UK. A niche-vertical job-board (NHS-only, FinTech-only, GovTech) curates from Reed's deep coverage. A UK HR-tech founder powers customer-facing job-search tools with Reed data. A UK recruiting-platform startup ingests Reed feeds for SDR-pipeline tooling. All reduce to multi-source ingestion + dedup + UK-specific filtering.
How does this compare to the alternatives?
Three options for UK jobs-aggregator data:
| Approach | Cost per UK aggregator-tier daily | Reliability | Setup time | Maintenance |
|---|---|---|---|---|
| Indeed/LinkedIn Apply API | Partnership-required | Official | Weeks-months | Per-partnership tier |
| Manual UK board scraping | Free (manual), time-intensive | Slow | Hours/day | Daily manual work |
| Thirdwatch Reed Scraper | Pay per result | Lightweight HTTP path | 5 minutes | Thirdwatch tracks Reed |
The Reed Scraper actor page gives you raw real-time UK jobs-feed data at materially lower per-record cost.
How to build the aggregator in 4 steps
Step 1: Authenticate
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"Step 2: Daily multi-source UK aggregation
import os, requests, datetime, json, pathlib
REED = "thirdwatch~reed-jobs-scraper"
ADZUNA = "thirdwatch~adzuna-jobs-scraper"
LINKEDIN = "thirdwatch~linkedin-jobs-scraper"
TOKEN = os.environ["APIFY_TOKEN"]
UK_QUERIES = ["software developer", "data scientist", "marketing manager",
"accountant", "nurse", "teacher", "consultant", "engineer"]
# Aggregate from 3 sources
sources = {
"reed": REED,
"adzuna": ADZUNA,
"linkedin": LINKEDIN,
}
ts = datetime.datetime.utcnow().strftime("%Y%m%d")
for source_name, actor_id in sources.items():
resp = requests.post(
f"https://api.apify.com/v2/acts/{actor_id}/run-sync-get-dataset-items",
params={"token": TOKEN},
json={"queries": UK_QUERIES, "country": "uk", "maxResults": 100},
timeout=3600,
)
pathlib.Path(f"snapshots/uk-agg-{source_name}-{ts}.json").write_text(json.dumps(resp.json()))
print(f"{source_name}: {len(resp.json())} listings")Step 3: Cross-source dedup + UK-specific filtering
import pandas as pd
from rapidfuzz import fuzz
reed = pd.DataFrame(json.loads(open(f"snapshots/uk-agg-reed-{ts}.json").read())).assign(source="reed")
adzuna = pd.DataFrame(json.loads(open(f"snapshots/uk-agg-adzuna-{ts}.json").read())).assign(source="adzuna")
linkedin = pd.DataFrame(json.loads(open(f"snapshots/uk-agg-linkedin-{ts}.json").read())).assign(source="linkedin")
combined = pd.concat([reed, adzuna, linkedin], ignore_index=True)
combined["company_norm"] = combined.company.str.lower().str.strip()
combined["title_norm"] = combined.title.str.lower().str.strip()
# Cross-source dedup: same company+title+location across sources = duplicate
combined["dedup_key"] = (
combined.company_norm + "|" +
combined.title_norm + "|" +
combined.location.str.lower().fillna("")
)
deduped = combined.drop_duplicates("dedup_key", keep="first")
print(f"Aggregated {len(combined)} → deduped {len(deduped)} ({len(combined)-len(deduped)} duplicates)")
# UK-specific filtering
deduped["region"] = deduped.location.apply(
lambda l: "London" if "london" in str(l).lower() else "Outside London"
)
deduped["salary_gbp"] = pd.to_numeric(deduped.salary_max, errors="coerce")
print(deduped.groupby("region").agg(
listings=("dedup_key", "count"),
median_salary=("salary_gbp", "median"),
))Step 4: Persist to aggregator database + Slack alerts
import psycopg2
with psycopg2.connect(...) as conn, conn.cursor() as cur:
for _, job in deduped.iterrows():
cur.execute(
"""INSERT INTO uk_jobs
(job_id, source, title, company, location, region, salary_min,
salary_max, salary_currency, contract_type, posted_at,
url, last_scraped)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s, current_date)
ON CONFLICT (job_id) DO UPDATE SET
last_scraped = current_date""",
(job.job_id, job.source, job.title, job.company, job.location,
job.region, job.salary_min, job.salary_max, job.salary_currency,
job.contract_type, job.posted_at, job.url),
)
print(f"Persisted {len(deduped)} UK jobs to aggregator database")Sample output
{
"job_id": "12345678",
"title": "Senior Software Developer",
"company": "FinTech London Ltd",
"location": "London, EC2",
"salary_min": 70000,
"salary_max": 95000,
"salary_currency": "GBP",
"contract_type": "permanent",
"remote_friendly": true,
"category": "IT Jobs",
"posted_at": "2026-04-22"
}Common pitfalls
Three things go wrong in UK aggregator pipelines. Cross-source dedup fragility — slight title variations ("Senior Software Engineer" vs "Senior Software Developer") prevent dedup matching; for accurate aggregation, use fuzzy-matching with 85%+ similarity threshold rather than exact-match. Salary-currency confusion — Reed shows GBP-only; Adzuna mixes GBP+EUR for Ireland-adjacent roles; LinkedIn shows multi-currency for global postings. Normalize all to GBP via daily FX. Recruitment-agency duplicate-posting — same role posted by 3-5 agencies inflates apparent supply; for accurate inventory, dedupe on (company, title, location) regardless of source.
Thirdwatch's actor uses a lightweight HTTP path so you pay only for the data, not for proxy or compute overhead. Pair Reed with Adzuna Scraper for second-source UK + LinkedIn Jobs Scraper for corporate-direct triangulation. A fourth subtle issue worth flagging: UK-specific niche-verticals (NHS, GovTech, FinTech) require canonical-vertical mapping — Reed's category-tags differ from Adzuna's; for niche-aggregator products, build canonical-category mapping. A fifth pattern unique to UK aggregators: post-Brexit visa-sponsorship dynamics drive 30-40% of UK roles to filter for UK-resident-only; for accurate aggregator UX, surface visa-sponsorship-availability as a top-tier filter. A sixth and final pitfall: UK fiscal-year-start (April 1) drives 30-40% of annual UK hiring activity — for accurate aggregator-traffic projections, factor in fiscal-cycle seasonality.
Operational best practices for production pipelines
Tier the cadence: Tier 1 (active aggregator-feed, 6-hourly), Tier 2 (broader UK coverage, daily), Tier 3 (long-tail roles, weekly). 60-80% cost reduction with negligible signal loss when watchlist is properly tiered.
Snapshot raw payloads with gzip compression. Re-derive dedup + filtering metrics from raw JSON as your dedup-clustering algorithms evolve. Cross-snapshot diff alerts on per-company posting-velocity catch UK-specific hiring-cycle inflection points.
Schema validation. Daily validation suite asserting expected core fields with non-null rates above 80% (required) and 50% (optional). Reed schema occasionally changes during platform UI revisions — catch drift early. A seventh pattern at scale: cross-snapshot diff alerts for material UK posting-volume shifts (>15% week-over-week per source) catch labour-market inflection points before they appear in lagged ONS data. An eighth pattern for cost-controlled aggregators: implement an incremental-diff pipeline that only re-processes records whose hash changed since the previous snapshot. For watchlists where 90%+ of records are unchanged between snapshots, hash-comparison-driven incremental processing reduces downstream-compute by 80-90% while preserving full data fidelity.
A ninth pattern unique to research-grade data work: schema validation should run continuously, not just at pipeline build-time. Run a daily validation suite that asserts each scraper returns the expected core fields with non-null rates above 80% (for required fields) and 50% (for optional). Alert on schema breakage same-day so consumers don't degrade silently.
A tenth pattern around alert-fatigue management: tune alert thresholds quarterly based on actual aggregator-team-action rates. If teams ignore 80%+ of alerts at a given threshold, raise the threshold. If they manually surface signals the alerts missed, lower the threshold.
An eleventh and final pattern at production scale: cross-snapshot diff alerts. Beyond detecting individual changes, build alerts on cross-snapshot field-level diffs — name changes, category re-classifications, status changes. These structural changes precede or follow material events and are leading indicators of organization-level disruption. Persist a structured-diff log alongside aggregate snapshots: for each entity, persist (field, old_value, new_value) tuples per scrape. Surface high-leverage diffs to human reviewers; low-leverage diffs stay in the audit log.
A twelfth pattern: cost attribution per consumer. Tag every API call with a downstream-consumer identifier (team, product, feature) so you can attribute compute spend back to the workflow that drove it. When a downstream consumer's spend exceeds projected budget, you can have a precise conversation with them about the queries driving cost.
Related use cases
Frequently asked questions
Why build a UK jobs aggregator from Reed?
Reed.co.uk is the UK's longest-running employer-direct job board (founded 1995) with 250K+ active UK listings + deepest employer-direct relationships. According to Reed Group's 2024 report, the platform powers many UK aggregators + niche-vertical job-boards. For UK aggregator builders, niche job-board operators, and UK recruiting-platform startups, Reed provides the canonical foundation feed.
What features make a competitive UK jobs aggregator?
Five features: (1) multi-source aggregation (Reed + Adzuna + LinkedIn + Indeed UK); (2) UK-specific filtering (London weighting, tech-corridor, public-sector); (3) salary normalization (annual/hourly/daily mix); (4) remote-work flagging (post-2024 UK); (5) niche-vertical curation (FinTech, NHS, Public Sector). Combined features differentiate from generic aggregators like Indeed UK + Glassdoor.
How fresh do aggregator data feeds need to be?
6-hourly cadence catches new listings within 6 hours — critical for time-sensitive aggregators. Daily cadence sufficient for weekly-curation models. UK tech roles post 14-28 days before fill; daily cadence captures most demand-cycle activity. For premium-tier aggregators, hourly cadence catches breaking job-postings within 1 hour.
How do I dedupe UK jobs across sources?
Cross-source dedup: cluster on (company, title, location, posted_within_3_days). Same role posted on Reed + LinkedIn + Indeed shows ~70-80% duplicate-rate. For accurate aggregator, dedupe via fuzzy-match (token_set_ratio >= 85%) on title + company + location. Maintain canonical company-name mapping to handle 'Acme Ltd' vs 'Acme Limited'.
Can I monetize a UK jobs aggregator legally?
Yes. UK job-data is publicly accessible. Successful UK aggregators ([Adzuna](https://www.adzuna.co.uk/), [Totaljobs](https://www.totaljobs.com/)) reference Reed + Indeed + LinkedIn extensively. For commercial products: (1) attribute Reed as data source; (2) link to Reed job-detail pages for full descriptions; (3) layer your own value-add (better filtering, AI-resume-matching, Slack alerts); (4) monetize via employer-side premium-listings or recruiter-side subscriptions.
How does this compare to Indeed Apply API + LinkedIn Jobs Search API?
[Indeed Apply API](https://www.indeed.com/employers/products/apply): requires partnership negotiation. [LinkedIn Jobs Search API](https://docs.microsoft.com/en-us/linkedin/talent/job-search-api): requires LinkedIn Talent Solutions agreement. Reed: scraping is the practical access path for UK-specific aggregator builders. The actor delivers raw real-time Reed data on pay-per-result pricing without partnership-negotiation overhead.
Related
100 free credits, no credit card.
About 30 real searches. Add the MCP to Claude or Cursor in two minutes.