Jobs & recruitment

Build a UK Jobs Aggregator from Reed (2026)

Build a UK jobs aggregator product from Reed.co.uk using Thirdwatch. Multi-source UK pipeline + recipes for job-board builders.

Apr 28, 2026 · 5 min read · 1,225 words

See the scraper →

Thirdwatch's Reed Scraper makes UK jobs-aggregator development a structured workflow — multi-source UK pipeline foundation, dedup-validated job feeds, UK-specific filtering, niche-vertical-friendly. Built for UK aggregator builders, niche job-board operators, UK recruiting-platform startups, and UK HR-tech founders.

▶ Skip the setup: Run this as a ready-to-go task on Apify → — pre-loaded with the exact configuration from this guide. No code required.

Why build a UK jobs aggregator from Reed

Reed is the canonical UK employer-direct job source. According to Reed Group's 2024 annual report, the platform indexes 250K+ active UK jobs with deep employer-direct relationships — material foundation for UK aggregator products. For UK aggregator builders + niche job-board operators, Reed provides the canonical UK starting feed.

The job-to-be-done is structured. A UK aggregator startup builds a multi-source job-feed combining Reed + Adzuna + LinkedIn + Indeed UK. A niche-vertical job-board (NHS-only, FinTech-only, GovTech) curates from Reed's deep coverage. A UK HR-tech founder powers customer-facing job-search tools with Reed data. A UK recruiting-platform startup ingests Reed feeds for SDR-pipeline tooling. All reduce to multi-source ingestion + dedup + UK-specific filtering.

How does this compare to the alternatives?

Three options for UK jobs-aggregator data:

Approach	Cost per UK aggregator-tier daily	Reliability	Setup time	Maintenance
Indeed/LinkedIn Apply API	Partnership-required	Official	Weeks-months	Per-partnership tier
Manual UK board scraping	Free (manual), time-intensive	Slow	Hours/day	Daily manual work
Thirdwatch Reed Scraper	Pay per result	Lightweight HTTP path	5 minutes	Thirdwatch tracks Reed

The Reed Scraper actor page gives you raw real-time UK jobs-feed data at materially lower per-record cost.

How to build the aggregator in 4 steps

Step 1: Authenticate

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: Daily multi-source UK aggregation

import os, requests, datetime, json, pathlib

REED = "thirdwatch~reed-jobs-scraper"
ADZUNA = "thirdwatch~adzuna-jobs-scraper"
LINKEDIN = "thirdwatch~linkedin-jobs-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

UK_QUERIES = ["software developer", "data scientist", "marketing manager",
              "accountant", "nurse", "teacher", "consultant", "engineer"]

# Aggregate from 3 sources
sources = {
    "reed": REED,
    "adzuna": ADZUNA,
    "linkedin": LINKEDIN,
}

ts = datetime.datetime.utcnow().strftime("%Y%m%d")
for source_name, actor_id in sources.items():
    resp = requests.post(
        f"https://api.apify.com/v2/acts/{actor_id}/run-sync-get-dataset-items",
        params={"token": TOKEN},
        json={"queries": UK_QUERIES, "country": "uk", "maxResults": 100},
        timeout=3600,
    )
    pathlib.Path(f"snapshots/uk-agg-{source_name}-{ts}.json").write_text(json.dumps(resp.json()))
    print(f"{source_name}: {len(resp.json())} listings")

Step 3: Cross-source dedup + UK-specific filtering

import pandas as pd
from rapidfuzz import fuzz

reed = pd.DataFrame(json.loads(open(f"snapshots/uk-agg-reed-{ts}.json").read())).assign(source="reed")
adzuna = pd.DataFrame(json.loads(open(f"snapshots/uk-agg-adzuna-{ts}.json").read())).assign(source="adzuna")
linkedin = pd.DataFrame(json.loads(open(f"snapshots/uk-agg-linkedin-{ts}.json").read())).assign(source="linkedin")

combined = pd.concat([reed, adzuna, linkedin], ignore_index=True)
combined["company_norm"] = combined.company.str.lower().str.strip()
combined["title_norm"] = combined.title.str.lower().str.strip()

# Cross-source dedup: same company+title+location across sources = duplicate
combined["dedup_key"] = (
    combined.company_norm + "|" +
    combined.title_norm + "|" +
    combined.location.str.lower().fillna("")
)
deduped = combined.drop_duplicates("dedup_key", keep="first")
print(f"Aggregated {len(combined)} → deduped {len(deduped)} ({len(combined)-len(deduped)} duplicates)")

# UK-specific filtering
deduped["region"] = deduped.location.apply(
    lambda l: "London" if "london" in str(l).lower() else "Outside London"
)
deduped["salary_gbp"] = pd.to_numeric(deduped.salary_max, errors="coerce")
print(deduped.groupby("region").agg(
    listings=("dedup_key", "count"),
    median_salary=("salary_gbp", "median"),
))

Step 4: Persist to aggregator database + Slack alerts

import psycopg2

with psycopg2.connect(...) as conn, conn.cursor() as cur:
    for _, job in deduped.iterrows():
        cur.execute(
            """INSERT INTO uk_jobs
                  (job_id, source, title, company, location, region, salary_min,
                   salary_max, salary_currency, contract_type, posted_at,
                   url, last_scraped)
               VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s, current_date)
               ON CONFLICT (job_id) DO UPDATE SET
                 last_scraped = current_date""",
            (job.job_id, job.source, job.title, job.company, job.location,
             job.region, job.salary_min, job.salary_max, job.salary_currency,
             job.contract_type, job.posted_at, job.url),
        )

print(f"Persisted {len(deduped)} UK jobs to aggregator database")

Sample output

{
  "job_id": "12345678",
  "title": "Senior Software Developer",
  "company": "FinTech London Ltd",
  "location": "London, EC2",
  "salary_min": 70000,
  "salary_max": 95000,
  "salary_currency": "GBP",
  "contract_type": "permanent",
  "remote_friendly": true,
  "category": "IT Jobs",
  "posted_at": "2026-04-22"
}

Common pitfalls

Three things go wrong in UK aggregator pipelines. Cross-source dedup fragility — slight title variations ("Senior Software Engineer" vs "Senior Software Developer") prevent dedup matching; for accurate aggregation, use fuzzy-matching with 85%+ similarity threshold rather than exact-match. Salary-currency confusion — Reed shows GBP-only; Adzuna mixes GBP+EUR for Ireland-adjacent roles; LinkedIn shows multi-currency for global postings. Normalize all to GBP via daily FX. Recruitment-agency duplicate-posting — same role posted by 3-5 agencies inflates apparent supply; for accurate inventory, dedupe on (company, title, location) regardless of source.

Thirdwatch's actor uses a lightweight HTTP path so you pay only for the data, not for proxy or compute overhead. Pair Reed with Adzuna Scraper for second-source UK + LinkedIn Jobs Scraper for corporate-direct triangulation. A fourth subtle issue worth flagging: UK-specific niche-verticals (NHS, GovTech, FinTech) require canonical-vertical mapping — Reed's category-tags differ from Adzuna's; for niche-aggregator products, build canonical-category mapping. A fifth pattern unique to UK aggregators: post-Brexit visa-sponsorship dynamics drive 30-40% of UK roles to filter for UK-resident-only; for accurate aggregator UX, surface visa-sponsorship-availability as a top-tier filter. A sixth and final pitfall: UK fiscal-year-start (April 1) drives 30-40% of annual UK hiring activity — for accurate aggregator-traffic projections, factor in fiscal-cycle seasonality.

Operational best practices for production pipelines

Tier the cadence: Tier 1 (active aggregator-feed, 6-hourly), Tier 2 (broader UK coverage, daily), Tier 3 (long-tail roles, weekly). 60-80% cost reduction with negligible signal loss when watchlist is properly tiered.

Snapshot raw payloads with gzip compression. Re-derive dedup + filtering metrics from raw JSON as your dedup-clustering algorithms evolve. Cross-snapshot diff alerts on per-company posting-velocity catch UK-specific hiring-cycle inflection points.

Schema validation. Daily validation suite asserting expected core fields with non-null rates above 80% (required) and 50% (optional). Reed schema occasionally changes during platform UI revisions — catch drift early. A seventh pattern at scale: cross-snapshot diff alerts for material UK posting-volume shifts (>15% week-over-week per source) catch labour-market inflection points before they appear in lagged ONS data. An eighth pattern for cost-controlled aggregators: implement an incremental-diff pipeline that only re-processes records whose hash changed since the previous snapshot. For watchlists where 90%+ of records are unchanged between snapshots, hash-comparison-driven incremental processing reduces downstream-compute by 80-90% while preserving full data fidelity.

A ninth pattern unique to research-grade data work: schema validation should run continuously, not just at pipeline build-time. Run a daily validation suite that asserts each scraper returns the expected core fields with non-null rates above 80% (for required fields) and 50% (for optional). Alert on schema breakage same-day so consumers don't degrade silently.

A tenth pattern around alert-fatigue management: tune alert thresholds quarterly based on actual aggregator-team-action rates. If teams ignore 80%+ of alerts at a given threshold, raise the threshold. If they manually surface signals the alerts missed, lower the threshold.

An eleventh and final pattern at production scale: cross-snapshot diff alerts. Beyond detecting individual changes, build alerts on cross-snapshot field-level diffs — name changes, category re-classifications, status changes. These structural changes precede or follow material events and are leading indicators of organization-level disruption. Persist a structured-diff log alongside aggregate snapshots: for each entity, persist (field, old_value, new_value) tuples per scrape. Surface high-leverage diffs to human reviewers; low-leverage diffs stay in the audit log.

A twelfth pattern: cost attribution per consumer. Tag every API call with a downstream-consumer identifier (team, product, feature) so you can attribute compute spend back to the workflow that drove it. When a downstream consumer's spend exceeds projected budget, you can have a precise conversation with them about the queries driving cost.

Related use cases

Frequently asked questions

Why build a UK jobs aggregator from Reed?

Reed.co.uk is the UK's longest-running employer-direct job board (founded 1995) with 250K+ active UK listings + deepest employer-direct relationships. According to Reed Group's 2024 report, the platform powers many UK aggregators + niche-vertical job-boards. For UK aggregator builders, niche job-board operators, and UK recruiting-platform startups, Reed provides the canonical foundation feed.

What features make a competitive UK jobs aggregator?

Five features: (1) multi-source aggregation (Reed + Adzuna + LinkedIn + Indeed UK); (2) UK-specific filtering (London weighting, tech-corridor, public-sector); (3) salary normalization (annual/hourly/daily mix); (4) remote-work flagging (post-2024 UK); (5) niche-vertical curation (FinTech, NHS, Public Sector). Combined features differentiate from generic aggregators like Indeed UK + Glassdoor.

How fresh do aggregator data feeds need to be?

6-hourly cadence catches new listings within 6 hours — critical for time-sensitive aggregators. Daily cadence sufficient for weekly-curation models. UK tech roles post 14-28 days before fill; daily cadence captures most demand-cycle activity. For premium-tier aggregators, hourly cadence catches breaking job-postings within 1 hour.

How do I dedupe UK jobs across sources?

Cross-source dedup: cluster on (company, title, location, posted_within_3_days). Same role posted on Reed + LinkedIn + Indeed shows ~70-80% duplicate-rate. For accurate aggregator, dedupe via fuzzy-match (token_set_ratio >= 85%) on title + company + location. Maintain canonical company-name mapping to handle 'Acme Ltd' vs 'Acme Limited'.

Can I monetize a UK jobs aggregator legally?

Yes. UK job-data is publicly accessible. Successful UK aggregators ([Adzuna](https://www.adzuna.co.uk/), [Totaljobs](https://www.totaljobs.com/)) reference Reed + Indeed + LinkedIn extensively. For commercial products: (1) attribute Reed as data source; (2) link to Reed job-detail pages for full descriptions; (3) layer your own value-add (better filtering, AI-resume-matching, Slack alerts); (4) monetize via employer-side premium-listings or recruiter-side subscriptions.

How does this compare to Indeed Apply API + LinkedIn Jobs Search API?

[Indeed Apply API](https://www.indeed.com/employers/products/apply): requires partnership negotiation. [LinkedIn Jobs Search API](https://docs.microsoft.com/en-us/linkedin/talent/job-search-api): requires LinkedIn Talent Solutions agreement. Reed: scraping is the practical access path for UK-specific aggregator builders. The actor delivers raw real-time Reed data on pay-per-result pricing without partnership-negotiation overhead.

Scrape Reed Jobs for UK Recruiter Pipeline (2026)Track UK Tech Hiring with Reed (2026)

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.