Skip to main content
Thirdwatchthirdwatch
Jobs & recruitment

Build a Talent Pipeline from LinkedIn Candidate Data (2026)

Build a structured recruiter talent pipeline from LinkedIn profile data. Automate sourcing by role, seniority, and skills with Python and the Apify API.

May 26, 2026 · 6 min read · 1,303 words
See the scraper →

Thirdwatch's LinkedIn Candidate Finder lets you build a structured talent pipeline by role, skills, location, and seniority -- no LinkedIn login required. Schedule recurring runs, deduplicate across batches, and push candidates into your ATS or CRM automatically. Built for recruiting teams who need always-on sourcing without manual Google x-ray searches.

Why build a talent pipeline from LinkedIn data

A talent pipeline is only as good as its sourcing layer. According to LinkedIn's Global Talent Trends report, 87% of professionals are open to hearing about new opportunities, but fewer than 30% are actively applying. The implication: recruiters who wait for inbound applications miss the majority of qualified candidates. Proactive sourcing -- systematically finding and tracking passive candidates before a role opens -- is what separates a pipeline from a job board.

The problem is operational. Building a pipeline by hand means pasting boolean strings into Google, copying results into spreadsheets, deduping across weeks of searches, and manually checking for overlap with candidates already in the ATS. According to SHRM's 2025 Talent Acquisition Benchmarks, the average time-to-fill for tech roles is 44 days, with sourcing consuming 40% of that timeline. A senior recruiter running 5 roles across 3 locations spends 10+ hours per week on sourcing mechanics alone. Automating the sourcing layer -- turning role + skills + location into a daily feed of new candidate profiles -- collapses that to minutes. The LinkedIn Candidate Finder is the data collection layer that makes the rest of the pipeline possible.

How does this compare to the alternatives?

Three approaches to building a LinkedIn talent pipeline:

Approach Cost Reliability Setup time Maintenance
Manual Google x-ray + spreadsheet tracking Free, 10+ hours/week labor Inconsistent, gaps in coverage Ongoing manual work You own dedup and refresh
LinkedIn Recruiter + Projects $100-180+/seat/month High, native LinkedIn filters Account provisioning LinkedIn manages, limited export
Thirdwatch LinkedIn Candidate Finder + ATS Pay per result Production-tested, API-driven 30 minutes to automate Thirdwatch maintains the scraper

LinkedIn Recruiter locks pipeline data inside LinkedIn's platform -- exporting to your ATS requires manual CSV downloads or expensive ATS integrations. The Candidate Finder returns structured JSON via API, fitting directly into your existing data infrastructure. Manual x-ray search works but does not scale past a handful of roles without dedicated sourcing headcount. The LinkedIn Candidate Finder returns structured JSON via API, so you build once and reuse across every requisition.

How to build a talent pipeline in 4 steps

Step 1: How do I define pipeline segments by role and location?

Each pipeline segment maps to one run configuration. Define the role, required skills, location, and seniority for each open requisition or proactive talent pool.

import os, requests, pandas as pd

ACTOR = "thirdwatch~linkedin-candidate-finder-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

SEGMENTS = [
    {
        "role": "Senior Backend Engineer",
        "skills": ["python", "aws", "microservices"],
        "location": "San Francisco",
        "seniority": "senior",
        "minExperienceYears": 5,
        "maxResults": 50,
    },
    {
        "role": "Data Engineer",
        "skills": ["spark", "airflow", "sql"],
        "location": "New York",
        "seniority": "mid",
        "minExperienceYears": 3,
        "maxExperienceYears": 8,
        "maxResults": 50,
    },
    {
        "role": "Product Manager",
        "skills": ["b2b saas", "analytics"],
        "location": "Austin",
        "maxResults": 30,
    },
]

all_candidates = []
for seg in SEGMENTS:
    resp = requests.post(
        f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
        params={"token": TOKEN},
        json=seg,
        timeout=300,
    )
    candidates = resp.json()
    for c in candidates:
        c["segment"] = seg["role"]
        c["segment_location"] = seg["location"]
    all_candidates.extend(candidates)

df = pd.DataFrame(all_candidates)
print(f"{len(df)} total candidates across {df.segment.nunique()} segments")

Each segment runs independently. Tag every candidate with the segment name and location so downstream routing knows which pipeline the candidate belongs to.

Step 2: How do I deduplicate candidates across runs and segments?

Candidates may appear in multiple segments (a "Senior Backend Engineer" with Spark skills shows up in both the backend and data engineering pools). Deduplicate on the url field.

# Remove exact duplicates by profile URL, keeping first occurrence
df = df.drop_duplicates(subset=["url"], keep="first")
print(f"{len(df)} unique candidates after dedup")

# Load previously-seen candidates from your master tracking file
try:
    seen = pd.read_csv("pipeline_master.csv")
    seen_urls = set(seen["url"])
except FileNotFoundError:
    seen_urls = set()

new_candidates = df[~df["url"].isin(seen_urls)]
print(f"{len(new_candidates)} net-new candidates this batch")

Persist the master URL set across pipeline refreshes. This prevents duplicate outreach -- the most common source of candidate frustration with recruiter pipelines.

Step 3: How do I enrich candidates with full profile data?

The Candidate Finder returns name, headline, and URL. For outreach decisions, you often need full experience, education, and skills. Feed the URLs into the LinkedIn Profile Scraper.

PROFILE_ACTOR = "thirdwatch~linkedin-profile-scraper"

urls_to_enrich = new_candidates["url"].tolist()[:20]  # enrich top 20

resp = requests.post(
    f"https://api.apify.com/v2/acts/{PROFILE_ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={"urls": urls_to_enrich},
    timeout=600,
)
enriched = pd.DataFrame(resp.json())
print(f"Enriched {len(enriched)} profiles with full experience data")

The two-step workflow -- Candidate Finder for broad sourcing, Profile Scraper for selective enrichment -- keeps costs low. You only pay for full-profile scrapes on candidates who pass your initial headline filter.

Step 4: How do I automate the pipeline with scheduled runs?

Set up a recurring Apify schedule for each pipeline segment. New candidates flow in daily without manual intervention.

curl -X POST "https://api.apify.com/v2/schedules?token=$APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "pipeline-senior-backend-sf-daily",
    "cronExpression": "0 7 * * 1-5",
    "timezone": "America/Los_Angeles",
    "isEnabled": true,
    "actions": [{
      "type": "RUN_ACTOR",
      "actorId": "thirdwatch~linkedin-candidate-finder-scraper",
      "runInput": {
        "role": "Senior Backend Engineer",
        "skills": ["python", "aws", "microservices"],
        "location": "San Francisco",
        "seniority": "senior",
        "minExperienceYears": 5,
        "maxResults": 50
      }
    }]
  }'

Attach an ACTOR.RUN.SUCCEEDED webhook to your dedup and ATS ingestion endpoint. Every weekday morning, fresh candidates land in the pipeline, deduplicated against your master tracking set, and routed to the right recruiter.

Sample output

A batch of pipeline candidates with segment tags looks like this:

[
  {
    "fullName": "Anjali Nair",
    "headline": "Senior Backend Engineer at Stripe | Python, AWS, Distributed Systems",
    "url": "https://www.linkedin.com/in/anjali-nair-stripe/",
    "segment": "Senior Backend Engineer",
    "segment_location": "San Francisco"
  },
  {
    "fullName": "Marcus Chen",
    "headline": "Data Engineer at Airbnb | Spark, Airflow, dbt",
    "url": "https://www.linkedin.com/in/marcus-chen-data/",
    "segment": "Data Engineer",
    "segment_location": "New York"
  }
]

The segment and segment_location tags are added by your pipeline script, not by the actor. The actor's native output is fullName, headline, and url. Tag candidates at ingestion time so your ATS can route them to the right requisition or talent pool.

Common pitfalls

Three patterns break production talent pipelines. No dedup across time -- running the same query daily without maintaining a master URL set means you push the same candidates into your ATS repeatedly. Your recruiters waste time reviewing profiles they already passed on, and candidates get duplicate InMails. Maintain a persistent set of seen URLs and filter every batch. Enriching too eagerly -- running full profile scrapes on every candidate in a 200-person shortlist is wasteful. Filter by headline relevance first, then enrich only the top tier. The Candidate Finder's headline field is a strong first-pass signal. Stale segments -- pipeline segments tied to roles that were filled months ago keep generating candidates nobody reviews. Audit active segments quarterly and disable schedules for closed requisitions.

The actor handles query construction, location variant expansion (Bangalore/Bengaluru, NYC/New York, Gurgaon/Gurugram), and structured output so your pipeline logic stays clean. A fourth consideration is segment granularity. Overly broad segments ("Software Engineer, US") return thousands of results with low relevance. Overly narrow segments ("Senior Rust Engineer with WASM experience, Austin") return too few. The sweet spot is a specific role plus 2-3 skills plus a metro area, which typically yields 30-80 candidates per run with high headline relevance.

Related use cases

Frequently asked questions

How often should I refresh a talent pipeline?

Weekly is the standard cadence for active roles. Daily refreshes make sense for urgent hires or high-volume staffing. Use Apify schedules to automate the cadence so new candidates flow in without manual intervention.

Can I run multiple role searches in one pipeline?

Yes. Schedule separate runs per role-location pair and merge the datasets downstream. Each run targets one role and skills combination, giving you clean segments to route into the right pipeline stage.

How do I avoid contacting the same candidate twice?

Deduplicate on the LinkedIn profile URL across all runs. The url field is stable per candidate. Maintain a master set of contacted URLs and filter each new batch against it before outreach.

What ATS integrations work with this data?

Any ATS that accepts CSV import or has a REST API for candidate creation works. Greenhouse, Lever, Ashby, and BambooHR all support bulk CSV upload. For real-time sync, use Apify webhooks to POST new candidates to your ATS API endpoint.

Does the pipeline include candidates who are not actively looking?

Yes. The actor searches publicly indexed profiles regardless of job-seeking status. Most results are passive candidates who have not flagged themselves as open to work, which is exactly the talent pool recruiters want to reach.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.