Skip to main content
Thirdwatchthirdwatch
Social media

Scrape LinkedIn Posts Without Login — Complete 2026 Guide

Pull public LinkedIn post data — text, author, reactions, comments, media — without logging in. Python recipes using Thirdwatch's LinkedIn Post Scraper.

May 12, 2026 · 6 min read · 1,347 words
See the scraper →

Thirdwatch's LinkedIn Post Scraper returns structured data from any public LinkedIn post — full text, author identity, reactions, comments, and attached media — without a login, cookies, or LinkedIn API key. Built for developers, growth engineers, and researchers who need a clean per-post row and do not want to babysit LinkedIn's anti-bot stack. Pass one or more post URLs, get back JSON.

Why scrape LinkedIn posts without an account

LinkedIn is the default professional publishing channel for executives, recruiters, and B2B operators. LinkedIn reported over one billion members globally as of 2024, and its public Feed surfaces tens of millions of posts per day. For developers building competitive intelligence dashboards, executive-monitoring tools, or B2B content datasets, the data is right there — every public post has a stable URL that LinkedIn happily renders to anonymous visitors via its embed view.

The blocker is plumbing, not access. LinkedIn's official Marketing Developer Platform requires partner approval, restricts data to your own owned pages, and does not expose third-party post analytics at all. Most teams who actually need to read a competitor's post or aggregate an industry's thought leadership are looking at the wrong product. The right primitive is the public embed page itself — a static HTML response containing the post body, author block, and reaction counters in a predictable JSON envelope. The actor is a thin, maintained extraction layer on top of that.

The job-to-be-done is usually narrow: "given this list of post URLs, return one structured row per post." Newsletter writers verifying quote sources, growth analysts ranking competitor posts by engagement, AI teams sampling B2B thought-leadership content for fine-tuning — all reduce to the same input/output contract.

How does this compare to alternatives?

Three realistic ways to get LinkedIn post data into a pipeline:

Approach Reliability Setup time Maintenance
DIY Python + requests Brittle — LinkedIn pre-blocks most cloud IPs 1-2 weeks (anti-bot + parser + IP pool) Quarterly when LinkedIn ships markup changes
Generic scraping API (proxy-as-service) Reliable for fetch, you still write the parser 2-3 days for the parser alone You own the parser drift
Thirdwatch LinkedIn Post Scraper Production-tested, fields stay stable across markup shifts 5 minutes Thirdwatch tracks LinkedIn changes

For one-off research the DIY path is tempting; the moment your watchlist grows past a few hundred posts, the actor's pay-per-result pricing usually beats engineering time spent on parser regressions. See the LinkedIn Post Scraper actor page for the live spec.

How to scrape LinkedIn posts in 5 steps

Step 1: How do I authenticate against Apify?

Sign in at apify.com (free tier, no credit card required), open Settings → Integrations, and copy your personal API token. Every example below assumes the token is in APIFY_TOKEN.

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I scrape a single post by URL?

The actor accepts both canonical share URLs and bare URNs in the postUrls field.

import os, requests

ACTOR = "thirdwatch~linkedin-post-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "postUrls": [
            "https://www.linkedin.com/feed/update/urn:li:activity:7193400851995021312/"
        ],
        "maxPosts": 1,
    },
    timeout=120,
)
post = resp.json()[0]
print(post["author_name"], "—", post["reactions_count"], "reactions")
print(post["text"][:200])

run-sync-get-dataset-items returns the dataset rows directly in the response body, which keeps single-post pulls under a second of network time after the actor has warmed.

Step 3: How do I batch-scrape a watchlist of post URLs?

Pass an array. The actor processes posts sequentially with internal rate-limiting, and maxPosts caps how many it will touch in a single run.

import os, requests, pandas as pd

ACTOR = "thirdwatch~linkedin-post-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

URLS = [
    "https://www.linkedin.com/feed/update/urn:li:activity:7193400851995021312/",
    "https://www.linkedin.com/feed/update/urn:li:share:7441772941413085184/",
    "urn:li:activity:7444032259956903936",
    # ...up to 200 per run
]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={"postUrls": URLS, "maxPosts": len(URLS)},
    timeout=900,
)
df = pd.DataFrame(resp.json())
print(f"Pulled {len(df)} posts from {df.author_name.nunique()} authors")

maxPosts caps at 200 per run; for larger watchlists chunk the URL list and fire multiple runs in parallel via the Apify API.

Step 4: How do I persist results and de-dup across runs?

url is the canonical natural key — it survives URL-tracking-parameter variations because the actor strips query strings before storing. Persist to your warehouse keyed on url (or urn if you prefer the LinkedIn-native form):

import sqlite3, json

conn = sqlite3.connect("linkedin_posts.db")
conn.execute("""
    CREATE TABLE IF NOT EXISTS posts (
        url TEXT PRIMARY KEY,
        urn TEXT, author_name TEXT, author_headline TEXT,
        text TEXT, posted_relative TEXT, edited INTEGER,
        reactions_count INTEGER, comments_count INTEGER,
        media_json TEXT, scraped_at TEXT
    )
""")

for _, p in df.iterrows():
    conn.execute("""
        INSERT OR REPLACE INTO posts VALUES
        (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))
    """, (
        p.url, p.urn, p.author_name, p.author_headline,
        p.text, p.posted_relative, int(bool(p.edited)),
        p.reactions_count, p.comments_count,
        json.dumps(p.media if isinstance(p.media, list) else []),
    ))

conn.commit()

INSERT OR REPLACE keyed on url gives you idempotent re-runs — re-scraping the same post just refreshes reaction and comment counts.

Step 5: How do I schedule recurring pulls?

For a moving watchlist (e.g. all posts from the last week by a list of competitor execs), pair a URL-sourcing step with this actor on Apify's scheduler. The sourcing step can be a separate scrape of profile activity, a Google search for site:linkedin.com/posts {company}, or a manually curated list of URNs.

curl -X POST "https://api.apify.com/v2/acts/thirdwatch~linkedin-post-scraper/runs" \
  -H "Content-Type: application/json" \
  -d '{"postUrls": ["urn:li:activity:7193400851995021312"], "maxPosts": 50}' \
  "?token=$APIFY_TOKEN"

A daily schedule at 06:00 UTC is enough for most monitoring loops; for breaking-news handles, hourly works.

Sample output

A single post record looks like this. Identifying fields have been redacted to placeholders.

{
  "url": "https://www.linkedin.com/feed/update/urn:li:activity:74440XXXXX0000000/",
  "urn": "urn:li:activity:74440XXXXX0000000",
  "post_id": "74440XXXXX0000000",
  "text": "LinkedIn is currently chock-a-block with AI hype, and most of it is recycled. Here are the three patterns I'd actually bet on...",
  "posted_relative": "1mo",
  "edited": true,
  "author_name": "[REDACTED]",
  "author_headline": "Content Marketer & Strategist",
  "author_profile_url": "https://in.linkedin.com/in/redacted-handle",
  "author_profile_image": "https://media.licdn.com/dms/image/...",
  "author_is_company": false,
  "reactions_count": 27,
  "comments_count": 5,
  "reposts_count": null,
  "media": [
    {"type": "image", "url": "https://media.licdn.com/dms/image/.../carousel-1.jpg", "thumbnail": ""}
  ]
}

url is your natural primary key; it dedups cleanly across re-runs because the actor strips tracking parameters. text contains the full body, not the truncated og:description LinkedIn exposes in HTML head tags — that distinction matters for thought-leadership analysis where the value sits in the back half of the post. reposts_count is null by design: LinkedIn does not expose it on the public embed and the actor refuses to guess.

Common pitfalls

Three things go wrong in production LinkedIn-post pipelines. Relative timestamps driftposted_relative is what LinkedIn renders ("1mo", "3d"), not an absolute time. If you persist the string and re-scrape next week, the same post will say "1mo" today and "2mo" later. Always store scraped_at and either derive an absolute time from post_id or store the relative string only as a snapshot value. Edited posts mutate — when a post shows the edited: true marker, text and reactions_count can change between scrapes; treat the post as a slowly-changing dimension, not a fact row. Carousels and video posts have multiple media entriesmedia is always an array; iterating without a length check breaks on text-only posts.

Thirdwatch's actor uses production-grade anti-bot tooling on the LinkedIn embed endpoint and rotates outbound IPs by default — LinkedIn pre-blocks a large fraction of public cloud IP ranges, so DIY scrapers running from AWS or GCP typically see >50% failure rates on cold pulls. Two more subtle issues: very long posts (LinkedIn caps post text at 3,000 characters) sometimes render with a "see more" toggle on the public embed, but the embed endpoint always returns the full body in the source, so the actor never serves truncated text. Finally, posts deleted by the author or hidden by LinkedIn moderation return a 404 — handle that as legitimate disappearance rather than data error.

Related use cases

Frequently asked questions

Do I need a LinkedIn account to use the actor?

No. The LinkedIn Post Scraper reads only the public embed view of each post — the same page LinkedIn serves to logged-out visitors. You do not need a LinkedIn account, cookies, an OAuth token, or a developer key. You only need an Apify API token to call the actor.

Which LinkedIn post URL formats does the actor accept?

Four formats: the canonical /posts/{user}_{slug}-activity-{id}-{hash} share URL, the /feed/update/urn:li:activity:{id} permalink, /feed/update/urn:li:share:{id} for older posts, and bare URNs like urn:li:activity:{id}. Tracking query parameters are stripped automatically before the fetch.

What fields are returned per post?

Up to 15 fields: url, urn, post_id, text (full body, not the truncated og:description), posted_relative, edited, author_name, author_headline, author_profile_url, author_profile_image, author_is_company, reactions_count, comments_count, reposts_count (currently null), and a media array with image and video URLs.

Can I get comment text or full timestamps?

Not from this actor. Comment bodies require a logged-in session and are intentionally out of scope. The posted_relative field returns LinkedIn's rendered string (1mo, 3d, 5h) rather than an ISO timestamp; if you need exact UTC time, decode the millisecond from the upper bits of post_id.

How does this compare to writing my own LinkedIn scraper?

LinkedIn changes its public embed markup every few quarters and pre-blocks many cloud and datacenter IP ranges. A DIY scraper means weeks of anti-bot work plus ongoing maintenance whenever the markup shifts. The actor is production-tested, transparently priced per result, and handles the IP and parsing problems for you.

What is the actor not good for?

It is a single-post extractor. It does not search LinkedIn by keyword, paginate a profile's full post history, or hydrate comments. For those, you need a separate sourcing step that produces post URLs — typically Google search, a curated newsroom feed, or a LinkedIn profile scraper that surfaces recent activity URNs.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.