Scrape TripAdvisor Hotels and Attractions (2026)

Thirdwatch's TripAdvisor Scraper returns hotels + restaurants + attractions data at $0.008 per record — name, address, rating, review count, ranking, photos, amenities, reviews. Built for travel-research teams, hospitality reputation-monitoring, attraction-discovery products, and travel-content publishing.

Why scrape TripAdvisor for travel research

TripAdvisor is the largest travel-review platform globally. According to TripAdvisor's 2024 Annual report, the platform serves 460M+ monthly travelers across 8M+ businesses + attractions with the deepest travel-review corpus on the public web (1B+ reviews). For travel-research teams, hospitality competitive analysis, and attraction-discovery products, TripAdvisor is materially deeper than Booking (booking-focused) or Google Maps (general-purpose).

The job-to-be-done is structured. A travel-content publisher mines TripAdvisor for editorial city-guides + attraction roundups. A hospitality reputation-monitoring function tracks per-hotel rating + rank drift weekly across competitor sets. An attraction-discovery product surfaces top-ranked attractions per city for travel-app users. A hospitality-investment research function studies per-market hotel + restaurant density × rating distributions. All reduce to city + category queries + per-business detail extraction.

How does this compare to the alternatives?

Three options for TripAdvisor data:

Approach Cost per 10K records Reliability Setup time Maintenance
TripAdvisor Content API $25K+/year (partnership) Official Weeks (approval) Strict TOS
Reputation.com / Birdeye $5K–$50K/year per seat Multi-platform Days Vendor contract
Thirdwatch TripAdvisor Scraper $80 ($0.008 × 10K) Camoufox + residential proxy 5 minutes Thirdwatch tracks TripAdvisor changes

TripAdvisor's Content API is gated behind $25K+ partnerships. The TripAdvisor Scraper actor page gives you raw data at the lowest unit cost.

How to scrape TripAdvisor in 4 steps

Step 1: How do I authenticate against Apify?

Sign in at apify.com (free tier, no credit card), open Settings → Integrations, and copy your personal API token:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I pull a city + category batch?

Pass city + category queries.

import os, requests, pandas as pd
from itertools import product

ACTOR = "thirdwatch~tripadvisor-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

CITIES = ["Paris", "Tokyo", "Bali", "New York", "Barcelona"]
CATEGORIES = ["hotels", "restaurants", "attractions"]

queries = [{"city": c, "category": cat} for c, cat in product(CITIES, CATEGORIES)]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={"queries": queries, "maxResults": 100},
    timeout=3600,
)
df = pd.DataFrame(resp.json())
print(f"{len(df)} businesses across {df.city.nunique()} cities × {df.category.nunique()} categories")

5 cities × 3 categories = 15 queries × 100 results = up to 1,500 records, costing $12.

Step 3: How do I filter by quality + rank?

Filter to top-ranked per-city per-category cohorts.

df["rating"] = pd.to_numeric(df.rating, errors="coerce")
df["review_count"] = pd.to_numeric(df.review_count, errors="coerce")
df["city_rank"] = pd.to_numeric(df.city_rank, errors="coerce")

quality = df[
    (df.rating >= 4.5)
    & (df.review_count >= 200)
    & (df.city_rank <= 50)  # top 50 in city per category
].sort_values(["city", "category", "city_rank"])

print(f"{len(quality)} top-ranked, well-reviewed businesses")
print(quality[["name", "city", "category", "city_rank", "rating", "review_count"]].head(20))

Top-50 per city × per category cohort is the canonical "best of" content seed — used by travel-content publishers + tourism boards globally.

Step 4: How do I track rank drift over time?

Persist (business_id, city, category, rank, snapshot_date) tuples.

import datetime, pathlib, json

ts = datetime.datetime.utcnow().strftime("%Y%m%d")
out = pathlib.Path(f"snapshots/tripadvisor-{ts}.json")
out.parent.mkdir(parents=True, exist_ok=True)
df[["business_id", "name", "city", "category", "city_rank",
    "rating", "review_count"]].to_json(out, orient="records")

# Compare to last week
prev = pd.read_json("snapshots/tripadvisor-20260421.json", orient="records")
combined = df.merge(prev, on="business_id", suffixes=("", "_prev"))
combined["rank_delta"] = combined.city_rank - combined.city_rank_prev

drops = combined[combined.rank_delta >= 10].sort_values("rank_delta", ascending=False)
print(f"{len(drops)} businesses dropped 10+ ranks over 7 days")
print(drops[["name", "city", "city_rank_prev", "city_rank", "rank_delta"]].head(15))

Rank drops of 10+ positions in 7 days warrant investigation — either real reputation events or coordinated review attacks.

Sample output

A single TripAdvisor business record looks like this. Five rows weigh ~12 KB.

{
  "business_id": "d12345-Hotel_Le_Bristol_Paris",
  "name": "Hotel Le Bristol Paris",
  "category": "Hotel",
  "address": "112 rue du Faubourg Saint-Honoré, 75008 Paris, France",
  "city": "Paris",
  "country": "France",
  "city_rank": 7,
  "city_rank_total": 1834,
  "rating": 4.8,
  "review_count": 2450,
  "price_band": "$$$$",
  "amenities": ["Pool", "Spa", "Pet-friendly", "Restaurant", "24h Concierge"],
  "lat": 48.8716,
  "lng": 2.3175,
  "photos": ["https://media-cdn.tripadvisor.com/..."],
  "url": "https://www.tripadvisor.com/Hotel_Review-..."
}

city_rank ("ranked #7 of 1,834 hotels in Paris") is TripAdvisor's killer per-city positioning signal. price_band ($-$$$$) enables market-segment filtering. category distinguishes hotels vs restaurants vs attractions for analysis cohort segmentation.

Common pitfalls

Three things go wrong in TripAdvisor pipelines. Reviewer-language variance — TripAdvisor reviews appear in 30+ languages; for English-only sentiment analysis, filter by review_language: "en" (about 40-60% of reviews depending on city). Owner-response bias — businesses with engaged owner-response programs see ratings 0.2-0.4 stars higher than non-responders; for accurate quality assessment, supplement star-rating with response-rate metric. Rank-volatility for low-volume cities — small cities (under 100 listed businesses per category) see noisier rank-shifts. Apply minimum-city-volume threshold (200+ businesses) before treating rank drift as signal.

Thirdwatch's actor uses Camoufox + residential proxy at $3.50/1K, ~56% margin. Pair TripAdvisor with Booking.com Scraper for OTA-pricing depth and Google Maps Scraper for general business context. A fourth subtle issue worth flagging: TripAdvisor's "Travelers' Choice" award badging materially inflates rating-stability — award-winners see rating-volatility 50% lower than peers despite similar review-volume, because award-status creates self-reinforcing positive-review bias. For accurate competitive research, normalize ratings by award-tier rather than treating all 4.5+ ratings as equivalent. A fifth pattern unique to TripAdvisor: per-city ranking depends heavily on per-category business density — being #7 of 1,834 in Paris hotels is meaningfully different from #7 of 24 in a small-town hotels list. For cross-city ranking-research, normalize by percentile-rank within city × category total. A sixth and final pitfall: TripAdvisor moderates reviews more aggressively than Google Maps — about 8-12% of submitted reviews are removed for policy violations within 30 days. Apparent rating "improvements" can lag actual sentiment by moderation cycle. Cross-reference with same-period booking-volume data for interpretation.

Operational best practices for production pipelines

Tier the cadence to match signal half-life. Hotel/restaurant rating drift moves slowly — daily polling is over-frequent. Tier the watchlist into Tier 1 (active reputation-monitoring, weekly), Tier 2 (broader competitor set, monthly), Tier 3 (long-tail research, quarterly). Typical 60-80% cost reduction with negligible signal loss.

Snapshot raw payloads alongside derived fields. Pipeline cost is dominated by scrape volume, not storage. Persisting raw JSON snapshots lets you re-derive metrics without re-scraping when sentiment models or category-classifiers evolve. Compress with gzip at write-time (4-8x size reduction). Most production pipelines run: 90 days of raw snapshots + 12 months of derived per-record aggregates + indefinite retention of derived metric time-series.

Run a daily validation suite that asserts each scraper returns expected core fields with non-null rates above 80% (required) and 50% (optional). TripAdvisor schema changes occasionally — catch drift early before downstream consumers degrade silently. A seventh and final operational pattern: cross-snapshot diff alerts. Beyond detecting individual rating drops, build alerts on cross-snapshot field-level diffs — owner-response status changes, category re-classifications, name changes, ownership transfers. These structural changes precede or follow material brand events (acquisitions, rebrands, regulatory issues) and are leading indicators of category-level disruption. Persist a structured-diff log alongside aggregate snapshots: for each business, for each scrape, persist (field, old_value, new_value) tuples. Surface high-leverage diffs (name changes, category re-classifications, owner-response policy shifts) to human reviewers; low-leverage diffs (single-review additions, minor count updates) stay in the audit log.

Related use cases

Frequently asked questions

Why TripAdvisor for travel research?

TripAdvisor is the world's largest travel-review platform — 1B+ reviews across 8M+ businesses (hotels, restaurants, attractions, activities). According to TripAdvisor's 2024 report, the platform reaches 460M+ monthly travelers globally with deeper review-text richness than Booking.com or Google Maps for travel-specific decisions. For travel-content research, hospitality competitive analysis, and attraction-discovery products, TripAdvisor is essential.

What data does the actor return?

Per business: name, address, city, country, category (hotel/restaurant/attraction), rating, review count, ranking within city, photos, amenities, hours, price band, lat/lng. Per review (when scraped separately): rating, title, text, reviewer name + country, review date, helpful-count, owner-response. About 90%+ of active TripAdvisor businesses have comprehensive metadata.

How does TripAdvisor handle anti-bot defenses?

TripAdvisor uses aggressive anti-bot (DataDome variants + custom). Thirdwatch's actor uses Camoufox + residential proxy with stealth-browser bypass. Production-tested at sustained weekly volumes with 90%+ success rate. Sustained polling rate: 50-100 detail-pages per hour per proxy IP.

Can I track hotel rankings within a city?

Yes. TripAdvisor maintains per-city per-category rankings (#3 of 240 hotels in Paris). Snapshot weekly + persist (city, category, hotel, rank) tuples; alert on rank-shift events. A hotel moving from #5 to #25 in 4 weeks correlates with material rating drops or coordinated negative-review events. For hospitality reputation tracking, rank-drift is the canonical alert signal.

How fresh do TripAdvisor signals need to be?

For active hospitality reputation-monitoring, weekly cadence catches rating + rank drift. For competitive-research benchmarking, monthly is sufficient. For longitudinal trajectory analysis, quarterly snapshots produce stable trend data. Most active TripAdvisor businesses see 5-20 new reviews per month; daily cadence is over-frequent for most use cases.

How does this compare to TripAdvisor's Content API?

TripAdvisor's Content API is gated behind enterprise partnership ($25K+/year minimums). The actor delivers similar coverage at $0.008/record without partnership gatekeeping. For ad-revenue-driven products requiring TripAdvisor's official content + branding, the API path is required. For research + monitoring (no branding requirements), the actor is materially cheaper.

Run the TripAdvisor Scraper on Apify Store — pay-per-record, free to try, no credit card to test.