Skip to main content
Thirdwatchthirdwatch
Other

Build a Hotel Price Comparison Database With Trip.com Data

Build a queryable hotel price comparison database using Trip.com / Ctrip data. Structured nightly rates, star class, ratings, and coordinates for 20+ cities.

May 26, 2026 · 6 min read · 1,392 words
See the scraper →

Thirdwatch's Ctrip / Trip.com Hotel Scraper feeds a hotel price comparison database with structured nightly rates, star class, guest ratings, coordinates, and amenities from Trip.com across 20+ global cities. Built for developers building travel comparison products, founders prototyping hotel meta-search, and data engineers feeding pricing pipelines. Pay-per-result pricing keeps data acquisition costs proportional to your actual query volume.

Why build a hotel price database from Trip.com

Trip.com (Ctrip's international platform) serves over 1.2 million hotel properties globally, with particularly deep inventory in Asia-Pacific markets that Booking.com and Expedia underserve. For developers building hotel comparison products, Trip.com fills a critical coverage gap — a meta-search engine without Trip.com data misses the majority of bookable inventory in China, Southeast Asia, and Japan.

The builder's job-to-be-done is specific. A startup founder prototyping a hotel meta-search MVP needs structured pricing data from multiple OTAs without negotiating enterprise API contracts. A data engineer at a travel company needs to backfill their pricing warehouse with Asia-Pacific rates. A developer building a travel chatbot needs real hotel prices to ground LLM responses in actual availability. A freelance developer building a Telegram hotel-deal bot needs a reliable data source that returns clean JSON without parsing headaches.

All of these start with the same technical requirement: a programmatic way to query Trip.com by city and dates, receive structured JSON with price, stars, rating, coordinates, and amenities, and load it into a database. The scraper's output schema maps directly to a normalized hotel pricing table.

How does this compare to the alternatives?

Three paths to building a hotel price database:

Approach Reliability Setup time Maintenance
DIY Python + Playwright against Trip.com Low — Trip.com uses dynamic session tokens and anti-bot scripts that break within days 3-6 weeks to handle all edge cases Weekly fixes as defences rotate
Generic scraping API (Bright Data, ScraperAPI) Medium — handles rendering but returns raw HTML you must parse yourself 1-2 weeks including parser development You own the parsing layer
Thirdwatch Trip.com Scraper High — returns structured JSON with 20+ fields per hotel Under 1 hour to first database load Thirdwatch maintains extraction logic

For a comparison database, the critical factor is schema stability. DIY approaches break whenever Trip.com changes its client-side rendering. The Thirdwatch actor returns a consistent schema regardless of upstream changes, so your database ingestion pipeline stays stable.

How to build a hotel price comparison database in 5 steps

Step 1: How do I set up the Apify client?

Install the Python client and configure your API token. Sign up at apify.com for a free-tier account:

pip install apify-client
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"
import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_TOKEN"])

Step 2: How do I define my city and date matrix?

Build a configuration that covers your comparison footprint. Each city-date combination becomes one actor run:

CITIES = [
    {"city": "Tokyo", "cityId": 228},
    {"city": "Singapore", "cityId": 73},
    {"city": "Bangkok"},
    {"city": "Paris", "cityId": 187},
    {"city": "Dubai"},
    {"city": "London"},
    {"city": "New York"},
]

DATE_PAIRS = [
    ("2026-07-01", "2026-07-03"),
    ("2026-08-15", "2026-08-17"),
    ("2026-11-01", "2026-11-03"),
]

Step 3: How do I run searches and collect structured data?

Iterate over your matrix, call the actor, and accumulate results with snapshot metadata:

import datetime

all_hotels = []
snapshot_date = datetime.date.today().isoformat()

for city_cfg in CITIES:
    for check_in, check_out in DATE_PAIRS:
        run_input = {
            "city": city_cfg["city"],
            "checkIn": check_in,
            "checkOut": check_out,
            "guests": 2,
            "rooms": 1,
            "maxResults": 50,
        }
        if "cityId" in city_cfg:
            run_input["cityId"] = city_cfg["cityId"]

        run = client.actor("thirdwatch/ctrip-hotels-scraper").call(run_input=run_input)
        items = client.dataset(run["defaultDatasetId"]).list_items().items

        for item in items:
            item["snapshot_date"] = snapshot_date
            all_hotels.append(item)

print(f"Collected {len(all_hotels)} hotel records across {len(CITIES)} cities")

Step 4: How do I load the data into a database?

Transform the flat JSON records into a normalized schema. Here is a SQLite example that works for prototyping; swap for PostgreSQL or BigQuery in production:

import sqlite3
import json

conn = sqlite3.connect("hotel_prices.db")
conn.execute("""
    CREATE TABLE IF NOT EXISTS hotels (
        hotel_id INTEGER PRIMARY KEY,
        hotel_name TEXT,
        stars INTEGER,
        city TEXT,
        district TEXT,
        address TEXT,
        latitude REAL,
        longitude REAL,
        amenities TEXT,
        url TEXT
    )
""")
conn.execute("""
    CREATE TABLE IF NOT EXISTS price_observations (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        hotel_id INTEGER,
        checkin_date TEXT,
        checkout_date TEXT,
        price REAL,
        original_price REAL,
        currency TEXT,
        rating REAL,
        reviews_count INTEGER,
        room_types TEXT,
        snapshot_date TEXT,
        FOREIGN KEY (hotel_id) REFERENCES hotels(hotel_id)
    )
""")

for h in all_hotels:
    conn.execute("""
        INSERT OR REPLACE INTO hotels
        (hotel_id, hotel_name, stars, city, district, address,
         latitude, longitude, amenities, url)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        h.get("hotel_id"), h.get("hotel_name"), h.get("stars"),
        h.get("city"), h.get("district"), h.get("address"),
        h.get("latitude"), h.get("longitude"),
        json.dumps(h.get("amenities", [])), h.get("url")
    ))
    conn.execute("""
        INSERT INTO price_observations
        (hotel_id, checkin_date, checkout_date, price, original_price,
         currency, rating, reviews_count, room_types, snapshot_date)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        h.get("hotel_id"), h.get("checkin_date"), h.get("checkout_date"),
        h.get("price"), h.get("original_price"), h.get("currency"),
        h.get("rating"), h.get("reviews_count"),
        json.dumps(h.get("room_types", [])), h.get("snapshot_date")
    ))

conn.commit()
print(f"Loaded {len(all_hotels)} observations into hotel_prices.db")

Step 5: How do I query the comparison database?

Run analytical queries to find price differentials, discount patterns, and competitive positioning:

# Cheapest 5-star hotels by city for a specific check-in
cursor = conn.execute("""
    SELECT h.city, h.hotel_name, p.price, p.original_price, h.stars
    FROM price_observations p
    JOIN hotels h ON p.hotel_id = h.hotel_id
    WHERE h.stars = 5 AND p.checkin_date = '2026-07-01'
    ORDER BY h.city, p.price ASC
""")
for row in cursor:
    discount = ""
    if row[3] and row[3] > row[2]:
        discount = f" (was ${row[3]:.0f}, {((row[3]-row[2])/row[3]*100):.0f}% off)"
    print(f"{row[0]}: {row[1]} — ${row[2]:.0f}{discount}")

Sample output

Each hotel record contains the fields you need for a comparison database — pricing, quality signals, and geographic coordinates:

[
  {
    "hotel_name": "The Ritz-Carlton Tokyo",
    "hotel_id": 693241,
    "url": "https://www.trip.com/hotels/detail/?hotelId=693241",
    "price": 485,
    "currency": "USD",
    "original_price": 610,
    "rating": 4.9,
    "rating_label": "Excellent",
    "reviews_count": 3850,
    "stars": 5,
    "address": "Tokyo Midtown, 9-7-1 Akasaka, Minato-ku",
    "city": "Tokyo",
    "district": "Roppongi",
    "latitude": 35.6657,
    "longitude": 139.7314,
    "image_url": "https://ak-d.tripcdn.com/images/hotel/693241/exterior.jpg",
    "amenities": ["Pool", "Spa", "Fitness Center", "Restaurant", "Bar", "Free WiFi"],
    "room_types": ["Deluxe Room", "Club Room", "Suite"],
    "tags": ["Member Price", "Free Cancellation"],
    "distance_from_center": "2.1 km from city center",
    "checkin_date": "2026-07-01",
    "checkout_date": "2026-07-03",
    "search_city": "Tokyo",
    "source": "api"
  },
  {
    "hotel_name": "Hotel Gracery Shinjuku",
    "hotel_id": 2175843,
    "url": "https://www.trip.com/hotels/detail/?hotelId=2175843",
    "price": 89,
    "currency": "USD",
    "original_price": null,
    "rating": 4.4,
    "rating_label": "Very Good",
    "reviews_count": 6120,
    "stars": 3,
    "address": "1-19-1 Kabukicho, Shinjuku-ku",
    "city": "Tokyo",
    "district": "Shinjuku",
    "latitude": 35.6942,
    "longitude": 139.7014,
    "image_url": "https://ak-d.tripcdn.com/images/hotel/2175843/exterior.jpg",
    "amenities": ["Restaurant", "Free WiFi", "Laundry"],
    "room_types": ["Standard Double"],
    "tags": [],
    "distance_from_center": "1.5 km from city center",
    "checkin_date": "2026-07-01",
    "checkout_date": "2026-07-03",
    "search_city": "Tokyo",
    "source": "api"
  }
]

The hotel_id field is your primary key for deduplication across snapshots. original_price versus price reveals discount depth. latitude and longitude enable distance-based queries and map visualizations.

Common pitfalls

Deduplication across snapshots. The same hotel appears in every snapshot. Use hotel_id as your dedup key in the hotels dimension table and insert new rows only into the price_observations fact table. Without this, your database inflates with duplicate hotel metadata on every load.

City ID reliability. The city field accepts free text, but Trip.com's internal resolution can be ambiguous. "Madrid" resolves correctly; "San Jose" might not. For production pipelines that run unattended, always pass cityId alongside city. The README lists verified IDs for major cities (Tokyo: 228, Singapore: 73, Paris: 187, Beijing: 1, Shanghai: 2).

Currency normalization. Trip.com defaults to USD, but if your comparison database includes data from other OTAs that report in local currencies, you need a normalization layer. Store the raw currency and price fields, then apply exchange rates in a separate transformation step.

Null handling in original_price. Not every hotel has a crossed-out price. When original_price is null, the hotel is not running a visible discount. Your discount-depth calculations should handle this gracefully rather than dividing by zero or treating null as zero.

The Thirdwatch actor handles Trip.com's anti-bot defences and session management, so your database ingestion pipeline can treat the actor as a stable JSON API.

Related use cases

Frequently asked questions

Can I build a real-time hotel price comparison from Trip.com data?

Near-real-time, not true real-time. Each actor run takes 1-3 minutes per city and returns live prices from Trip.com at execution time. For a comparison product, schedule runs every 6-12 hours across your target cities. The structured output with hotel_id enables clean deduplication and price-change detection between snapshots.

What database schema works best for Trip.com hotel data?

A star schema with a hotels dimension table (hotel_id, hotel_name, stars, city, latitude, longitude, amenities) and a prices fact table (hotel_id, checkin_date, checkout_date, price, original_price, currency, snapshot_date). This separates slowly changing hotel metadata from rapidly changing price observations and enables efficient time-series queries.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.