Skip to main content
Thirdwatchthirdwatch
Business & local data

Scrape TripAdvisor Hotels and Attractions (2026)

Pull TripAdvisor hotels + attractions + restaurants using Thirdwatch. Reviews + ratings + photos + recipes for travel research.

Apr 28, 2026 · 5 min read · 1,233 words
See the scraper →

Thirdwatch's TripAdvisor Scraper returns hotels + restaurants + attractions data — name, address, rating, review count, ranking, photos, amenities, reviews. Built for travel-research teams, hospitality reputation-monitoring, attraction-discovery products, and travel-content publishing.

Why scrape TripAdvisor for travel research

TripAdvisor is the largest travel-review platform globally. According to TripAdvisor's 2024 Annual report, the platform serves 460M+ monthly travelers across 8M+ businesses + attractions with the deepest travel-review corpus on the public web (1B+ reviews). For travel-research teams, hospitality competitive analysis, and attraction-discovery products, TripAdvisor is materially deeper than Booking (booking-focused) or Google Maps (general-purpose).

The job-to-be-done is structured. A travel-content publisher mines TripAdvisor for editorial city-guides + attraction roundups. A hospitality reputation-monitoring function tracks per-hotel rating + rank drift weekly across competitor sets. An attraction-discovery product surfaces top-ranked attractions per city for travel-app users. A hospitality-investment research function studies per-market hotel + restaurant density × rating distributions. All reduce to city + category queries + per-business detail extraction.

How does this compare to the alternatives?

Three options for TripAdvisor data:

Approach Cost per 10K records Reliability Setup time Maintenance
TripAdvisor Content API $25K+/year (partnership) Official Weeks (approval) Strict TOS
Reputation.com / Birdeye $5K–$50K/year per seat Multi-platform Days Vendor contract
Thirdwatch TripAdvisor Scraper Pay per result Production-grade anti-bot handling 5 minutes Thirdwatch tracks TripAdvisor changes

TripAdvisor's Content API is gated behind $25K+ partnerships. The TripAdvisor Scraper actor page gives you raw data at the lowest unit cost.

How to scrape TripAdvisor in 4 steps

Step 1: How do I authenticate against Apify?

Sign in at apify.com (free tier, no credit card), open Settings → Integrations, and copy your personal API token:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I pull a city + category batch?

Pass city + category queries.

import os, requests, pandas as pd
from itertools import product

ACTOR = "thirdwatch~tripadvisor-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

CITIES = ["Paris", "Tokyo", "Bali", "New York", "Barcelona"]
CATEGORIES = ["hotels", "restaurants", "attractions"]

queries = [{"city": c, "category": cat} for c, cat in product(CITIES, CATEGORIES)]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={"queries": queries, "maxResults": 100},
    timeout=3600,
)
df = pd.DataFrame(resp.json())
print(f"{len(df)} businesses across {df.city.nunique()} cities × {df.category.nunique()} categories")

5 cities × 3 categories = 15 queries × 100 results = up to 1,500 records — well within budget for a weekly refresh at the actor's pay-per-result pricing.

Step 3: How do I filter by quality + rank?

Filter to top-ranked per-city per-category cohorts.

df["rating"] = pd.to_numeric(df.rating, errors="coerce")
df["review_count"] = pd.to_numeric(df.review_count, errors="coerce")
df["city_rank"] = pd.to_numeric(df.city_rank, errors="coerce")

quality = df[
    (df.rating >= 4.5)
    & (df.review_count >= 200)
    & (df.city_rank <= 50)  # top 50 in city per category
].sort_values(["city", "category", "city_rank"])

print(f"{len(quality)} top-ranked, well-reviewed businesses")
print(quality[["name", "city", "category", "city_rank", "rating", "review_count"]].head(20))

Top-50 per city × per category cohort is the canonical "best of" content seed — used by travel-content publishers + tourism boards globally.

Step 4: How do I track rank drift over time?

Persist (business_id, city, category, rank, snapshot_date) tuples.

import datetime, pathlib, json

ts = datetime.datetime.utcnow().strftime("%Y%m%d")
out = pathlib.Path(f"snapshots/tripadvisor-{ts}.json")
out.parent.mkdir(parents=True, exist_ok=True)
df[["business_id", "name", "city", "category", "city_rank",
    "rating", "review_count"]].to_json(out, orient="records")

# Compare to last week
prev = pd.read_json("snapshots/tripadvisor-20260421.json", orient="records")
combined = df.merge(prev, on="business_id", suffixes=("", "_prev"))
combined["rank_delta"] = combined.city_rank - combined.city_rank_prev

drops = combined[combined.rank_delta >= 10].sort_values("rank_delta", ascending=False)
print(f"{len(drops)} businesses dropped 10+ ranks over 7 days")
print(drops[["name", "city", "city_rank_prev", "city_rank", "rank_delta"]].head(15))

Rank drops of 10+ positions in 7 days warrant investigation — either real reputation events or coordinated review attacks.

Sample output

A single TripAdvisor business record looks like this. Five rows weigh ~12 KB.

{
  "business_id": "d12345-Hotel_Le_Bristol_Paris",
  "name": "Hotel Le Bristol Paris",
  "category": "Hotel",
  "address": "112 rue du Faubourg Saint-Honoré, 75008 Paris, France",
  "city": "Paris",
  "country": "France",
  "city_rank": 7,
  "city_rank_total": 1834,
  "rating": 4.8,
  "review_count": 2450,
  "price_band": "$$$$",
  "amenities": ["Pool", "Spa", "Pet-friendly", "Restaurant", "24h Concierge"],
  "lat": 48.8716,
  "lng": 2.3175,
  "photos": ["https://media-cdn.tripadvisor.com/..."],
  "url": "https://www.tripadvisor.com/Hotel_Review-..."
}

city_rank ("ranked #7 of 1,834 hotels in Paris") is TripAdvisor's killer per-city positioning signal. price_band ($-$$$$) enables market-segment filtering. category distinguishes hotels vs restaurants vs attractions for analysis cohort segmentation.

Common pitfalls

Three things go wrong in TripAdvisor pipelines. Reviewer-language variance — TripAdvisor reviews appear in 30+ languages; for English-only sentiment analysis, filter by review_language: "en" (about 40-60% of reviews depending on city). Owner-response bias — businesses with engaged owner-response programs see ratings 0.2-0.4 stars higher than non-responders; for accurate quality assessment, supplement star-rating with response-rate metric. Rank-volatility for low-volume cities — small cities (under 100 listed businesses per category) see noisier rank-shifts. Apply minimum-city-volume threshold (200+ businesses) before treating rank drift as signal.

Thirdwatch's actor handles the anti-bot work and proxy rotation so you can focus on the data. Pair TripAdvisor with Booking.com Scraper for OTA-pricing depth and Google Maps Scraper for general business context. A fourth subtle issue worth flagging: TripAdvisor's "Travelers' Choice" award badging materially inflates rating-stability — award-winners see rating-volatility 50% lower than peers despite similar review-volume, because award-status creates self-reinforcing positive-review bias. For accurate competitive research, normalize ratings by award-tier rather than treating all 4.5+ ratings as equivalent. A fifth pattern unique to TripAdvisor: per-city ranking depends heavily on per-category business density — being #7 of 1,834 in Paris hotels is meaningfully different from #7 of 24 in a small-town hotels list. For cross-city ranking-research, normalize by percentile-rank within city × category total. A sixth and final pitfall: TripAdvisor moderates reviews more aggressively than Google Maps — about 8-12% of submitted reviews are removed for policy violations within 30 days. Apparent rating "improvements" can lag actual sentiment by moderation cycle. Cross-reference with same-period booking-volume data for interpretation.

Operational best practices for production pipelines

Tier the cadence to match signal half-life. Hotel/restaurant rating drift moves slowly — daily polling is over-frequent. Tier the watchlist into Tier 1 (active reputation-monitoring, weekly), Tier 2 (broader competitor set, monthly), Tier 3 (long-tail research, quarterly). Typical 60-80% cost reduction with negligible signal loss.

Snapshot raw payloads alongside derived fields. Pipeline cost is dominated by scrape volume, not storage. Persisting raw JSON snapshots lets you re-derive metrics without re-scraping when sentiment models or category-classifiers evolve. Compress with gzip at write-time (4-8x size reduction). Most production pipelines run: 90 days of raw snapshots + 12 months of derived per-record aggregates + indefinite retention of derived metric time-series.

Run a daily validation suite that asserts each scraper returns expected core fields with non-null rates above 80% (required) and 50% (optional). TripAdvisor schema changes occasionally — catch drift early before downstream consumers degrade silently. A seventh and final operational pattern: cross-snapshot diff alerts. Beyond detecting individual rating drops, build alerts on cross-snapshot field-level diffs — owner-response status changes, category re-classifications, name changes, ownership transfers. These structural changes precede or follow material brand events (acquisitions, rebrands, regulatory issues) and are leading indicators of category-level disruption. Persist a structured-diff log alongside aggregate snapshots: for each business, for each scrape, persist (field, old_value, new_value) tuples. Surface high-leverage diffs (name changes, category re-classifications, owner-response policy shifts) to human reviewers; low-leverage diffs (single-review additions, minor count updates) stay in the audit log.

Related use cases

Frequently asked questions

Why TripAdvisor for travel research?

TripAdvisor is the world's largest travel-review platform — 1B+ reviews across 8M+ businesses (hotels, restaurants, attractions, activities). According to TripAdvisor's 2024 report, the platform reaches 460M+ monthly travelers globally with deeper review-text richness than Booking.com or Google Maps for travel-specific decisions. For travel-content research, hospitality competitive analysis, and attraction-discovery products, TripAdvisor is essential.

What data does the actor return?

Per business: name, address, city, country, category (hotel/restaurant/attraction), rating, review count, ranking within city, photos, amenities, hours, price band, lat/lng. Per review (when scraped separately): rating, title, text, reviewer name + country, review date, helpful-count, owner-response. About 90%+ of active TripAdvisor businesses have comprehensive metadata.

How does TripAdvisor handle anti-bot defenses?

TripAdvisor uses aggressive anti-bot (the site's anti-bot protection variants + custom). Thirdwatch's actor uses Production-grade anti-bot handling with stealth-browser bypass. Production-tested at sustained weekly volumes with 90%+ success rate. Sustained polling rate: 50-100 detail-pages per hour per proxy IP.

Can I track hotel rankings within a city?

Yes. TripAdvisor maintains per-city per-category rankings (`#3 of 240 hotels in Paris`). Snapshot weekly + persist (city, category, hotel, rank) tuples; alert on rank-shift events. A hotel moving from #5 to #25 in 4 weeks correlates with material rating drops or coordinated negative-review events. For hospitality reputation tracking, rank-drift is the canonical alert signal.

How fresh do TripAdvisor signals need to be?

For active hospitality reputation-monitoring, weekly cadence catches rating + rank drift. For competitive-research benchmarking, monthly is sufficient. For longitudinal trajectory analysis, quarterly snapshots produce stable trend data. Most active TripAdvisor businesses see 5-20 new reviews per month; daily cadence is over-frequent for most use cases.

How does this compare to TripAdvisor's Content API?

TripAdvisor's Content API is gated behind enterprise partnership ($25K+/year minimums). The actor delivers similar coverage at competitive pay-per-result pricing without partnership gatekeeping. For ad-revenue-driven products requiring TripAdvisor's official content + branding, the API path is required. For research + monitoring (no branding requirements), the actor is materially cheaper.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.