Track LinkedIn Post Engagement for B2B Influencer Research
Score B2B influencers using real LinkedIn post engagement — reactions, comments, and edit history. Python workflow with Thirdwatch's LinkedIn Post Scraper.

Thirdwatch's LinkedIn Post Scraper returns reactions, comments, edit status, and author metadata for any public LinkedIn post — the exact primitives growth and marketing teams need to score B2B influencers, rank thought-leadership content, and validate creator-partnership shortlists. This guide walks through the influencer-research workflow end-to-end: from a watchlist of post URLs to a ranked leaderboard with a defensible engagement score.
Why scrape LinkedIn engagement for influencer research
B2B influencer marketing is now a budget line. According to a 2024 Edelman/LinkedIn B2B Thought Leadership Impact Report, 73% of decision-makers say a piece of thought leadership content gave them a more positive impression of an organization than traditional product-marketing material, and 54% of C-suite buyers say they spend at least one hour a week consuming thought leadership. Translation: the right voices on LinkedIn move pipeline, and the wrong ones drain budget. Marketing teams need a way to separate them.
The honest version of B2B influencer measurement is engagement-weighted reach. Follower counts are inflated by years-old accumulation and vanity follow-backs; reach numbers are LinkedIn-internal and not exposed to outside tools; impressions are a private analytics field. What you can observe from the outside, on every public post, is reactions and comments — and those two numbers, sampled across a creator's recent posting history, give you a defensible influencer score. The job is to pull them at scale across a candidate list of 50-500 creators, then rank.
How does this compare to alternatives?
Three options for sourcing per-post LinkedIn engagement:
| Approach | Reliability | Setup time | Maintenance |
|---|---|---|---|
| Influencer-marketing platform (Modash, Heepsy, etc.) | High, but coverage is curated | Hours (vendor signup, contract) | Vendor-managed |
| Manual copy-paste from LinkedIn | Reliable for tiny lists, doesn't scale | Minutes per post | Manual, painful |
| Thirdwatch LinkedIn Post Scraper | Production-tested, raw data direct from LinkedIn | 5 minutes | Thirdwatch tracks LinkedIn changes |
The platform option is fine when you need 5-10 known creators for a campaign. For research where you are exploring a 200-name candidate list across an industry, the per-post pricing of the actor wins because you only pay for the posts you actually scrape — see the LinkedIn Post Scraper actor page for the live spec.
How to build an influencer engagement score in 6 steps
Step 1: How do I authenticate against Apify?
Sign in at apify.com, copy your API token from Settings → Integrations:
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"Step 2: How do I collect post URLs for a list of creators?
The actor reads single-post URLs. The sourcing step — turning a list of creator handles into a list of recent post URLs — happens outside the actor. The cheapest pattern is Google search restricted to LinkedIn:
# Pseudocode — your URL sourcing layer
candidate_creators = ["satyanadella", "lara_acosta_", "justinwelsh", "diptiparmar2"]
post_urls = []
for handle in candidate_creators:
# via your existing Google search scraper, profile activity scrape,
# or curated newsroom list
urls = source_recent_posts(handle, max_posts=20)
post_urls.extend(urls)
print(f"{len(post_urls)} posts queued across {len(candidate_creators)} creators")You can also feed the actor URNs you already have from a profile-activity scrape — the postUrls field accepts both canonical URLs and bare urn:li:activity:... strings.
Step 3: How do I batch-fetch engagement counts?
Pass the full URL list to the actor with maxPosts set high enough to cover the batch. The actor processes sequentially with internal rate-limiting.
import os, requests, pandas as pd
ACTOR = "thirdwatch~linkedin-post-scraper"
TOKEN = os.environ["APIFY_TOKEN"]
resp = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
params={"token": TOKEN},
json={"postUrls": post_urls, "maxPosts": 200},
timeout=900,
)
df = pd.DataFrame(resp.json())
print(f"{len(df)} posts pulled, {df.author_name.nunique()} authors")200 is the per-run cap; for larger candidate lists, chunk and fire runs in parallel.
Step 4: How do I compute an influencer engagement score?
A defensible B2B score weights comments heavier than reactions because comments cost more effort per unit and signal actual conversation rather than passive thumbs-up.
import numpy as np
df["reactions_count"] = df["reactions_count"].fillna(0).astype(int)
df["comments_count"] = df["comments_count"].fillna(0).astype(int)
# Comments cost ~8x more effort than reactions on LinkedIn — calibrate
# this weight against your industry's baseline if you have one.
COMMENT_WEIGHT = 8
df["engagement"] = df["reactions_count"] + df["comments_count"] * COMMENT_WEIGHT
leaderboard = (
df.groupby("author_name")
.agg(posts=("url", "count"),
median_reactions=("reactions_count", "median"),
median_comments=("comments_count", "median"),
median_engagement=("engagement", "median"),
total_engagement=("engagement", "sum"))
.sort_values("median_engagement", ascending=False)
.head(25)
)
print(leaderboard)Median rather than mean is deliberate: a single viral post can drag a mean up by 10x and fool you into shortlisting a one-hit creator. Median engagement across the last 20-30 posts is the more honest "what does this person normally do" number.
Step 5: How do I detect engagement-pod patterns?
A pod is a private group of LinkedIn users who agree to like and comment on each other's posts on demand. Pod activity shows up as suspiciously high reaction velocity with low organic comment depth. You can not see who reacted (the actor returns counts only, not actor identities), but you can flag suspicious post shapes:
# Pod-suspicious: very high reaction count, very low comment count.
# Genuine high-reach posts almost always pick up proportional comments.
df["pod_score"] = df["reactions_count"] / (df["comments_count"] + 1)
# Flag posts with >100 reactions and <3 comments — pod-suspicious shape.
suspicious = df[(df["reactions_count"] > 100) & (df["comments_count"] < 3)]
pod_rate_by_author = (
suspicious.groupby("author_name").size() /
df.groupby("author_name").size()
).sort_values(ascending=False)
print(pod_rate_by_author.head(10))A creator with >30% pod-suspicious posts is worth a manual review before adding them to a partnership shortlist; >50% is usually a hard pass.
Step 6: How do I track engagement growth over time?
Re-scrape the same posts on a schedule and persist by (url, scraped_at) to get an engagement velocity curve.
import sqlite3
conn = sqlite3.connect("li_engagement.db")
conn.execute("""
CREATE TABLE IF NOT EXISTS snapshots (
url TEXT, scraped_at TEXT,
reactions_count INTEGER, comments_count INTEGER,
PRIMARY KEY(url, scraped_at)
)
""")
for _, p in df.iterrows():
conn.execute(
"INSERT OR REPLACE INTO snapshots VALUES (?, datetime('now'), ?, ?)",
(p.url, int(p.reactions_count or 0), int(p.comments_count or 0))
)
conn.commit()A t+1h, t+24h, t+7d sampling cadence captures the engagement-velocity curve well enough to compare creators on "how fast do their posts heat up" rather than just endpoint totals.
Sample output
The fields you'll actually use for scoring:
[
{
"url": "https://www.linkedin.com/feed/update/urn:li:activity:74440XXXXX0000000/",
"author_name": "[CREATOR_A]",
"author_headline": "Founder & CEO at [REDACTED]",
"author_is_company": false,
"reactions_count": 1247,
"comments_count": 89,
"reposts_count": null,
"posted_relative": "2d",
"edited": false
},
{
"url": "https://www.linkedin.com/feed/update/urn:li:activity:74445XXXXX0000000/",
"author_name": "[CREATOR_B]",
"author_headline": "VP Marketing at [REDACTED]",
"author_is_company": false,
"reactions_count": 612,
"comments_count": 4,
"reposts_count": null,
"posted_relative": "3d",
"edited": true
}
]Creator A has a 14:1 reaction-to-comment ratio — healthy for B2B. Creator B has a 153:1 ratio at >600 reactions, which is exactly the pod-suspicious shape worth a manual review before partnering.
Common pitfalls
Four patterns trip up influencer-research pipelines. Reaction counts include reposts the original author made — when a creator reshares their own old post, reactions on the reshare flow back to the original urn; counts are cumulative, not per-impression. Edited posts shift counts retroactively — edited: true is your flag that the post text and the reactions on it may not match what was originally published; treat edited posts as soft data. Company page posts ride on follower notifications — author_is_company: true posts get a baseline of opt-in audience reach individual creators don't have; rank companies and individuals separately. Timestamp precision degrades with age — posted_relative is "3d" or "1mo", not an exact timestamp; for velocity analysis on posts older than a week, you need to derive an absolute time from the post_id encoding.
Thirdwatch's actor uses production-grade anti-bot tooling on the LinkedIn embed endpoint and rotates outbound IPs by default, so a 100-post batch typically completes in under three minutes regardless of how aggressively your team is scaling the watchlist. A fifth note worth flagging: LinkedIn occasionally rate-limits the embed endpoint for high-volume scrapers; the actor handles retries and exponential backoff internally so this rarely surfaces, but very large pulls (10K+ posts in a single run) are better split into multiple smaller runs to keep the failure-recovery surface small.
Related use cases
Frequently asked questions
Which engagement metrics does the actor return?
Two per-post metrics: reactions_count (the aggregate of likes, celebrates, supports, loves, insightful, and curious reactions LinkedIn shows below the post) and comments_count (top-level comment count). reposts_count is currently null because LinkedIn does not expose it on the public embed view.
How do I weight reactions vs comments in an influencer score?
Comments are roughly 8-15x rarer than reactions on LinkedIn and demand far more effort, so most B2B benchmarks weight one comment as 5-10 reactions. Pick a weight that matches your goal — pure reach weighting under-weights comments; debate-quality weighting over-weights them. Run both and compare ranks.
Can I detect paid pods and engagement farming?
Partially. The actor returns reactions_count and comments_count but not the actor identities behind them — so you can not directly verify whether the same accounts comment on every post. What you can detect is suspicious engagement velocity: posts that gain 500+ reactions in under an hour with comments_count near zero are pod-suspicious signal.
How often should I re-scrape a post to track engagement?
LinkedIn engagement on a typical B2B post is roughly 80% complete within 48 hours and 95% complete within 7 days. Re-scrape at t+1h, t+24h, t+7d for high-effort tracking, or just t+7d for monthly influencer leaderboards. Re-scraping the same URL refreshes reactions_count and comments_count in place.
Does the actor work for company page posts?
Yes. Company page posts use the same /feed/update/urn:li:activity:{id} permalink format and the actor returns author_is_company: true so you can distinguish them from individual creator posts in your influencer ranking. Company posts typically have higher reach but lower comment-to-reaction ratios than individual creators.
Can I get historical engagement before I started tracking?
Yes — for any post you can find a URL for. LinkedIn's public embed shows current reaction and comment counts; if you only began tracking a creator last month, scraping their last 50 posts gives you the cumulative engagement state today, even on posts published a year ago. You lose the time-series view of how engagement accumulated, but the endpoint snapshot is intact.
Related
100 free credits, no credit card.
About 30 real searches. Add the MCP to Claude or Cursor in two minutes.