Social media

Scrape YouTube Transcripts for Content Research (2026 Guide)

Extract YouTube video transcripts and captions at scale using Thirdwatch's YouTube Transcripts Scraper. Full text, timestamps, multi-language. No API key.

May 26, 2026 · 6 min read · 1,294 words

See the scraper →

Thirdwatch's YouTube Transcripts Scraper extracts transcripts and closed captions from YouTube videos at scale. Returns full transcript text, per-segment timestamps, all available language tracks, and caption type (manual vs auto-generated). No API key required. Built for content researchers, RAG pipeline builders, SEO strategists, and anyone who needs to turn video into searchable text programmatically.

Why scrape YouTube transcripts for content research

YouTube is the second-largest search engine globally, with over 800 million videos and 500+ hours of new content uploaded every minute. The vast majority of that knowledge is locked inside video — invisible to text search, unsearchable in spreadsheets, and impossible to analyze at scale without transcription.

Content researchers need transcript text for competitive analysis, topic mapping, keyword extraction, and audience sentiment studies. A product marketing team tracking how competitors position their features on YouTube cannot watch 200 videos manually. An SEO strategist analyzing what topics a niche covers on video cannot ctrl-F a video timeline. A research team building a corpus of expert interviews needs machine-readable text, not MP4 files. The blocker in every case: YouTube's official Data API captions endpoint requires OAuth and channel ownership, making third-party transcript extraction impossible through official channels. The actor fills that gap with a hosted, pay-per-result service that returns structured transcript data from any public video.

How does this compare to the alternatives?

Three paths exist for extracting YouTube transcripts programmatically:

Approach	Reliability	Setup time	Maintenance
DIY with youtube-transcript-api (Python)	Works but you manage proxy, consent cookies, retries	2-4 hours	You fix breakage
Official YouTube Data API (captions endpoint)	Only your own channel's videos (OAuth required)	1-2 hours	Google maintains
Thirdwatch YouTube Transcripts Scraper	Production-tested, handles errors + fallback	5 minutes	Thirdwatch maintains

The open-source youtube-transcript-api library works for small batches of 10-50 videos, but at scale you need proxy rotation, consent cookie management (the EU GDPR consent banner blocks unauthenticated requests from European IPs), and retry logic for rate limits and region locks. The official YouTube Data API requires OAuth consent and only returns captions for channels you own — useless for competitive research. The Thirdwatch actor is a hosted drop-in that handles all of this and returns structured error records when videos lack captions.

How to scrape YouTube transcripts in 4 steps

Step 1: How do I set up my Apify API token?

Sign up at apify.com (free tier available, no credit card required). Navigate to Settings, then Integrations, and copy your personal API token. All examples below assume it is stored in the APIFY_TOKEN environment variable:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I extract transcripts from a list of videos?

Pass video URLs in videoUrls and set your preferred language with languageCode. The actor accepts youtube.com/watch?v=, youtu.be/, youtube.com/shorts/, and youtube.com/embed/ formats.

import os, requests, json

ACTOR = "thirdwatch~youtube-transcripts-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "videoUrls": [
            "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
            "https://www.youtube.com/watch?v=9bZkp7q19f0",
            "https://www.youtube.com/watch?v=kJQP7kiw5Fk"
        ],
        "languageCode": "en",
        "preferManual": True,
        "includeTimestamps": True,
        "maxResults": 10,
    },
    timeout=300,
)
transcripts = resp.json()
print(f"Got {len(transcripts)} transcripts")
for t in transcripts:
    if "error" not in t:
        print(f"  {t['video_id']}: {t['segment_count']} segments, "
              f"{t['total_duration_seconds']:.0f}s, "
              f"lang={t['language_code']}")

Three URLs in, structured transcript records out. Videos without captions return a record with an error field instead of crashing your pipeline.

Step 3: How do I get transcripts using video IDs instead of URLs?

If you already have video IDs from the YouTube Scraper or another source, pass them directly via videoIds to skip URL parsing:

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "videoIds": ["dQw4w9WgXcQ", "9bZkp7q19f0", "kJQP7kiw5Fk"],
        "languageCode": "en",
        "preferManual": True,
        "includeTimestamps": False,
        "maxResults": 10,
    },
    timeout=300,
)
for t in resp.json():
    if "error" not in t:
        word_count = len(t["transcript_text"].split())
        print(f"  {t['video_id']}: {word_count} words")

Setting includeTimestamps to False drops the segments array and returns only the joined transcript_text — smaller payloads, ideal for feeding into a vector store or LLM.

Step 4: How do I get multilingual transcripts or auto-translations?

Set languageCode to any ISO 639-1 code. Enable includeAutoTranslate to fall back to YouTube's machine translation when the requested language is not natively available:

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "videoUrls": ["https://www.youtube.com/watch?v=9bZkp7q19f0"],
        "languageCode": "ja",
        "includeAutoTranslate": True,
        "preferManual": True,
        "includeTimestamps": True,
        "maxResults": 5,
    },
    timeout=300,
)
for t in resp.json():
    print(f"  Language: {t.get('language_name')}, "
          f"Auto-translated: {t.get('auto_translated')}")

The available_languages array in each record lists every caption track the video offers, so you can audit language coverage before requesting specific translations. This is particularly valuable for multilingual content research — you can identify which channels invest in manual translations versus relying on auto-generated captions.

Sample output

A single transcript record looks like this. The transcript_text field is the full joined text ready for RAG ingestion, while segments provides per-line timing for subtitle overlays or chapter generation.

{
    "video_id": "dQw4w9WgXcQ",
    "video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "language_code": "en",
    "language_name": "English",
    "is_auto_generated": false,
    "auto_translated": false,
    "available_languages": [
        {"code": "en", "name": "English", "is_auto_generated": false},
        {"code": "es", "name": "Spanish", "is_auto_generated": false},
        {"code": "en", "name": "English (auto-generated)", "is_auto_generated": true}
    ],
    "transcript_text": "We're no strangers to love You know the rules and so do I ...",
    "segment_count": 58,
    "total_duration_seconds": 212.48,
    "segments": [
        {"text": "We're no strangers to love", "start": 18.8, "duration": 7.0},
        {"text": "You know the rules and so do I", "start": 25.8, "duration": 3.5},
        {"text": "A full commitment's what I'm thinking of", "start": 29.3, "duration": 3.7}
    ],
    "data_source": "youtube_timedtext"
}

Key fields: is_auto_generated tells you whether the captions are human-written or machine-generated — critical for research that demands accuracy. available_languages lets you audit what tracks exist before requesting specific ones. total_duration_seconds helps filter by video length (skip 15-second clips, keep 20-minute deep dives).

Common pitfalls

Three issues trip up production transcript pipelines. Missing captions — not every YouTube video has captions enabled. Music videos, very new uploads, and some comedy content disable captions entirely. The actor returns a structured error: "no_captions_available" record rather than failing silently; filter these in your post-processing. Auto-generated quality — YouTube's auto-captions are decent for clear speech but degrade sharply for accents, overlapping speakers, and domain-specific jargon. Always set preferManual to true (the default) and treat is_auto_generated as a quality signal in downstream analysis. Region and age restrictions — some videos are locked by country or require login for age verification. These return region_locked or age_restricted_or_login_required error codes.

Thirdwatch's actor handles rate limiting with automatic backoff, rotates proxies for reliability, and returns structured error records for every failure mode so your pipeline degrades gracefully instead of crashing.

Two additional considerations for production transcript pipelines. Transcript length as a content signal -- the total_duration_seconds field and segment_count together reveal content density. A 20-minute video with 180 segments contains roughly one caption segment every 6.7 seconds, which indicates fast-paced spoken content. A 20-minute video with 60 segments contains one segment every 20 seconds, indicating slower delivery with pauses, visuals, or music. For content research, filtering by segment density helps separate talking-head tutorials from music-heavy vlogs without watching the video. Corpus deduplication across channels -- popular topics produce near-identical transcripts across multiple channels (especially for news recaps, product reviews, and tutorial content). Before running NLP analysis on your corpus, deduplicate by computing pairwise cosine similarity on transcript text embeddings and removing near-duplicates above a 0.92 threshold. This prevents topic frequency distortion in keyword extraction and topic modeling downstream.

Related use cases

Frequently asked questions

Can I scrape transcripts from YouTube Shorts?

Yes. YouTube Shorts use the same caption system as regular videos. Pass any youtube.com/shorts/ URL and the actor returns the transcript identically to a standard watch page, including timestamps and language tracks.

Do I need a YouTube API key to extract transcripts?

No. The actor uses YouTube's public caption endpoints directly. No OAuth, no Google Cloud project, no API key. You only need an Apify account token to run the actor.

What happens if a video has no captions?

The actor returns a structured error record with the video ID and error code no_captions_available. Your pipeline can filter or skip these cleanly without crashing.

Can I get transcripts in languages other than English?

Yes. Set the languageCode input to any ISO 639-1 code. The actor returns captions in that language if the video has them. Enable includeAutoTranslate for YouTube's machine-translated fallback.

How accurate are auto-generated captions?

YouTube's auto-captions are roughly 85-95% accurate for clear English speech. Accuracy drops for accents, music, and technical jargon. Set preferManual to true to prioritize human-written captions when available.

Build a Video Content Dataset from YouTube Captions (2026)Extract YouTube Podcast Transcripts at Scale for Content Find YouTube Content Gaps from Transcript Analysis (2026)

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.