Track IMDb Cast and Genre Trends Over Time (2026 Guide)
Analyze cast popularity, genre shifts, and director output using IMDb data. Python pipeline with Thirdwatch's IMDb Scraper for entertainment trend research.

Thirdwatch's IMDb Scraper returns cast lists, genres, director, ratings, votes, year, and runtime for any movie or TV show on IMDb. Use it to track cast popularity over time, analyze genre shifts across decades, and map director output patterns. No login, no API key. Built for entertainment researchers, talent agencies, content strategists, and data journalists covering the film industry.
Why track cast and genre trends on IMDb
The entertainment industry runs on pattern recognition. Which actors are consistently attached to high-rated projects? Which genres are growing in audience share? Which directors maintain quality across a large body of work? These questions drive casting decisions, greenlighting, content acquisition, and marketing budget allocation.
According to a 2024 PwC Global Entertainment and Media Outlook report, the global filmed entertainment market exceeds $100 billion annually, with streaming platforms now accounting for more than half of that spend. Platforms competing for subscribers need data-driven content strategies — and the raw material for those strategies is structured title metadata: who acted in what, when, in which genre, and how audiences rated it.
IMDb is the canonical source. Its cast credits are the most comprehensive public record of who worked on which project. Its genre taxonomy is standardized. Its ratings represent the largest public audience-sentiment dataset in entertainment. The Thirdwatch IMDb Scraper makes this data programmatically accessible for trend analysis.
How does this compare to the alternatives?
| Approach | Cast data | Genre taxonomy | Year range | Limitations |
|---|---|---|---|---|
| IMDb Non-Commercial Datasets | Names + roles (separate files) | Yes | Full catalog | No plot, no poster, requires join across 7 TSV files |
| OMDb API (free) | Limited at free tier | Yes | Full catalog | 1,000 calls/day, no bulk mode |
| Wikipedia scraping | Inconsistent formatting | No standard taxonomy | Variable | Parsing nightmare, no structured API |
| Thirdwatch IMDb Scraper | Top cast per title | Yes | Full catalog | 100 titles per run, top-billed cast only |
For trend analysis across hundreds of titles, the bulk datasets require downloading gigabytes and joining seven TSV files to reconstruct what the IMDb Scraper returns as a single JSON record. OMDb's daily cap makes it impractical for batch work. The scraper returns cast, genres, director, year, and rating in one flat record — ready for pandas.
How to track cast and genre trends in 4 steps
Step 1: How do I collect titles for trend analysis?
Build a dataset by searching for titles across the time periods you want to analyze. Include year-specific queries to ensure temporal coverage.
import os, requests, pandas as pd, time
ACTOR = "thirdwatch~imdb-scraper"
TOKEN = os.environ["APIFY_TOKEN"]
YEAR_QUERIES = [
"best movies 2020", "best movies 2021", "best movies 2022",
"best movies 2023", "best movies 2024", "best movies 2025",
"best TV shows 2023", "best TV shows 2024", "best TV shows 2025",
]
all_titles = []
for query in YEAR_QUERIES:
resp = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
params={"token": TOKEN},
json={
"searchQuery": query,
"maxResults": 15,
},
timeout=300,
)
all_titles.extend(resp.json())
time.sleep(2)
df = pd.DataFrame(all_titles)
df = df.drop_duplicates(subset=["url"])
print(f"{len(df)} unique titles spanning {df['year'].nunique()} years")Nine queries at 15 results each yields roughly 100-130 unique titles after dedup, covering a six-year window for movies and three years for TV.
Step 2: How do I analyze which actors appear in the highest-rated projects?
Explode the cast array and compute per-actor statistics across your dataset.
# Explode cast into one row per actor-title pair
df["cast_list"] = df["cast"].apply(lambda c: c if isinstance(c, list) else [])
cast_df = df.explode("cast_list")
cast_df = cast_df[cast_df["cast_list"].str.len() > 0]
actor_stats = cast_df.groupby("cast_list").agg(
title_count=("title", "count"),
avg_rating=("rating", "mean"),
total_votes=("votes", "sum"),
genres=("genres", lambda x: [g for sublist in x for g in (sublist if isinstance(sublist, list) else [])]),
).reset_index()
actor_stats.columns = ["actor_name", "title_count", "avg_rating", "total_votes", "all_genres"]
# Filter to actors with 3+ titles for statistical significance
prolific = actor_stats[actor_stats["title_count"] >= 3].sort_values(
"avg_rating", ascending=False
)
print(prolific[["actor_name", "title_count", "avg_rating", "total_votes"]].head(15))This surfaces actors who consistently appear in well-rated projects — the "quality signal" that talent agencies and casting directors care about. An actor with five titles averaging 8.0+ is a different proposition than one with five titles averaging 6.2.
Step 3: How do I track genre popularity shifts over time?
Explode genres and compute per-year genre counts and average ratings.
df["genre_list"] = df["genres"].apply(lambda g: g if isinstance(g, list) else [])
genre_df = df.explode("genre_list")
genre_df["year_int"] = pd.to_numeric(genre_df["year"], errors="coerce")
genre_df = genre_df.dropna(subset=["year_int", "genre_list"])
genre_trends = genre_df.groupby(["year_int", "genre_list"]).agg(
count=("title", "count"),
avg_rating=("rating", "mean"),
avg_votes=("votes", "mean"),
).reset_index()
# Pivot to see genre share per year
genre_pivot = genre_trends.pivot_table(
index="year_int", columns="genre_list", values="count", fill_value=0
)
genre_share = genre_pivot.div(genre_pivot.sum(axis=1), axis=0) * 100
print("Genre share (%) by year:")
print(genre_share.round(1))Rising genre share (e.g., Sci-Fi going from 8% to 15% over three years) signals audience appetite shifts. Declining share with stable ratings means fewer titles but maintained quality — a niche opportunity. Declining share with falling ratings means the genre is losing both supply and audience interest.
Step 4: How do I map director output and consistency?
Group by director to identify who is most prolific and who maintains the highest quality floor.
director_stats = df.groupby("director").agg(
title_count=("title", "count"),
avg_rating=("rating", "mean"),
min_rating=("rating", "min"),
max_rating=("rating", "max"),
total_votes=("votes", "sum"),
years=("year", lambda x: sorted(x.unique().tolist())),
).reset_index()
director_stats["rating_range"] = director_stats["max_rating"] - director_stats["min_rating"]
# Prolific + consistent directors (3+ titles, small rating range)
consistent = director_stats[
(director_stats["title_count"] >= 3)
& (director_stats["rating_range"] <= 1.5)
].sort_values("avg_rating", ascending=False)
print("Most consistent prolific directors:")
print(consistent[["director", "title_count", "avg_rating",
"min_rating", "max_rating", "rating_range"]].head(10))A director with 4 titles, 8.1 average, and a 0.8 rating range is remarkably consistent. A director with 4 titles, 7.0 average, and a 3.5 range is volatile — high ceiling but unreliable. Content buyers use this distinction when evaluating attached talent for greenlighting decisions.
Sample output
Two records showing the fields relevant for trend analysis:
[
{
"title": "Dune: Part Two",
"year": "2024",
"rating": 8.5,
"votes": 650000,
"genres": ["Action", "Adventure", "Drama"],
"director": "Denis Villeneuve",
"cast": ["Timothee Chalamet", "Zendaya", "Austin Butler"],
"plot": "Paul Atreides unites with the Fremen to seek revenge against the conspirators...",
"runtime": 166,
"content_rating": "PG-13",
"url": "https://www.imdb.com/title/tt15239678/"
},
{
"title": "The Bear",
"year": "2022",
"rating": 8.6,
"votes": 120000,
"genres": ["Comedy", "Drama"],
"director": "Christopher Storer",
"cast": ["Jeremy Allen White", "Ebon Moss-Bachrach", "Ayo Edebiri"],
"plot": "A young chef from the fine dining world returns to Chicago to run...",
"runtime": null,
"content_rating": "TV-MA",
"url": "https://www.imdb.com/title/tt14452776/"
}
]The cast array is ordered by billing — the first name is the lead. For co-star analysis, pair the first two names in each cast list. genres is a clean array for multi-genre titles. year combined with rating and genres gives you the basic time-series grain for trend analysis. Note that runtime may be null for TV series — filter by content_rating prefix ("TV-" vs. MPAA codes) to separate movies from shows in your analysis.
Common pitfalls
Three issues affect cast and genre trend analysis. Cast list truncation — the actor returns top-billed cast members, not the full credits. Supporting actors and cameos are not included. For deep cast-network analysis, you may need to supplement with the IMDb full credits page or the Non-Commercial Datasets' name.basics.tsv. Genre multi-tagging — most titles have 2-3 genres, which inflates genre counts when you explode the array. A title tagged "Action, Adventure, Sci-Fi" counts once in each genre. Normalize by dividing each genre's count by the title's total genre count if you need a proportional share. Year ambiguity for TV shows — the year field for a TV series reflects the premiere year, not the run period. A show that premiered in 2019 and ran through 2024 shows as "2019" in the data. For temporal analysis of TV, you may need to manually assign date ranges.
Thirdwatch's actor handles the extraction and returns consistent JSON across movies and TV shows, so your trend pipeline stays clean.
Related use cases
- Scrape IMDb movies and TV ratings for data research
- Build a movie recommendation dataset with IMDb
- Monitor IMDb ratings for entertainment industry
- Build a YouTube content trend pipeline
- Track Pinterest board velocity for niche discovery
- The complete guide to scraping media and entertainment data
- All Thirdwatch use-case guides
Frequently asked questions
Does the actor return box-office revenue data?
No. IMDb displays box-office data on title pages but the actor extracts ratings, cast, genres, plot, runtime, and content rating. For box-office figures, pair IMDb metadata with a dedicated box-office source like Box Office Mojo or The Numbers.
How far back can I analyze cast and genre trends?
As far back as IMDb has data — titles from the 1920s onward are indexed. Older titles may have incomplete cast or runtime fields, but genres and ratings are typically present.
Can I track which actors appear together most often?
Yes. The cast array per title lets you build a co-occurrence matrix. Actors who appear in the same cast list across multiple titles form strong co-star clusters — useful for casting analytics and talent network analysis.
Does the actor support filtering by year or genre?
The actor searches IMDb by query string or accepts direct URLs. To filter by year, include the year in your searchQuery (e.g., 'best movies 2024') or curate IMDb URLs from IMDb's own advanced search and pass them via the urls field.
Related
100 free credits, no credit card.
About 30 real searches. Add the MCP to Claude or Cursor in two minutes.