Skip to main content
Thirdwatchthirdwatch
Food delivery

Build an India Restaurant Database with Zomato Data (2026)

Build a structured India restaurant database from Zomato across 20 cities. Cuisine tags, cost bands, ratings, and delivery data with a Python pipeline.

May 26, 2026 · 6 min read · 1,327 words
See the scraper →

Thirdwatch's Zomato Scraper lets you build a comprehensive India restaurant database across 20 cities. Each record includes name, cuisine array, dual ratings (delivery and dining), cost for two, delivery time, address, and listing URL. Search by dish, cuisine, or city-wide browse. Built for investors, food-tech companies, and researchers who need structured, city-scale India restaurant supply data.

Why build an India restaurant database from Zomato

India's food services market is valued at over $60 billion and growing at 10-12% annually, according to the National Restaurant Association of India (NRAI) 2024 report. Yet structured, machine-readable data on individual restaurants remains surprisingly scarce. Government databases cover compliance registrations, not consumer-facing attributes like ratings, delivery availability, or cuisine mix.

Zomato is the closest thing to a census of Indian restaurants. With 300,000+ active partners and presence in 800+ cities, it captures the broadest cross-section of the market — from street food stalls to fine dining. According to RedSeer's 2025 India foodtech report, the online food delivery market in India crossed $8 billion in GMV in 2025, with Zomato commanding roughly half. For a PE fund evaluating cloud kitchen roll-ups, a food-tech startup sizing its addressable market, or an academic studying urbanization patterns through food infrastructure, a Zomato-sourced database is the foundation layer. The challenge is extraction at scale: 20 cities, thousands of restaurants per city, structured fields per listing. The actor solves the extraction problem; you bring the analysis.

How does this compare to the alternatives?

Three paths to building an India restaurant database:

Approach Coverage Setup time Data freshness Pricing model
FSSAI/NRAI public registers Compliance-only fields Weeks of scraping + cleaning Annual updates Free but minimal fields
Manual Zomato browsing + copy-paste Limited by human effort Months for 20 cities Stale by completion Free but not scalable
Thirdwatch Zomato Scraper 20 cities, 22 fields per listing 5 minutes Live at run time Pay per restaurant returned

Government registers give you license numbers and addresses but not ratings, cuisines, or delivery data. Manual collection does not scale past a single city. The Zomato Scraper returns 22 structured fields per restaurant, live at run time, across all 20 supported cities.

How to build an India restaurant database in 4 steps

Step 1: How do I authenticate and set up the pipeline?

Create a free account at apify.com, copy your API token from Settings, and store it as an environment variable:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I pull all restaurants across multiple cities?

Loop through the 20 supported cities and pull delivery restaurants for each. Use maxResults to control depth per city.

import os, requests, pandas as pd, time

ACTOR = "thirdwatch~zomato-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

CITIES = [
    "bangalore", "mumbai", "delhi", "hyderabad", "chennai",
    "kolkata", "pune", "ahmedabad", "jaipur", "lucknow",
    "chandigarh", "kochi", "goa", "indore", "coimbatore",
    "nagpur", "vizag", "bhopal", "gurgaon", "noida",
]

all_restaurants = []

for city in CITIES:
    resp = requests.post(
        f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
        params={"token": TOKEN},
        json={
            "city": city,
            "maxResults": 200,
            "deliveryOnly": False,
        },
        timeout=3600,
    )
    restaurants = resp.json()
    all_restaurants.extend(restaurants)
    print(f"{city}: {len(restaurants)} restaurants")
    time.sleep(2)

df = pd.DataFrame(all_restaurants)
print(f"Total: {len(df)} restaurants across {df.city.nunique()} cities")

Setting deliveryOnly to false includes dine-out-only restaurants, giving the broadest coverage for a supply-side database. Each city run returns up to 200 restaurants; increase maxResults for deeper coverage.

Step 3: How do I enrich the database with cuisine and cost analysis?

The raw data contains cuisine arrays and cost strings. Parse these into analytical columns.

# Explode cuisines for multi-label analysis
cuisine_df = df.explode("cuisine")
cuisine_counts = cuisine_df.groupby(["city", "cuisine"]).size().reset_index(name="count")
top_cuisines = cuisine_counts.sort_values("count", ascending=False).groupby("city").head(5)
print(top_cuisines)

# Parse cost_for_two to numeric
import re

def parse_cost(cost_str):
    if not cost_str:
        return None
    match = re.search(r"[\d,]+", str(cost_str).replace(",", ""))
    return int(match.group()) if match else None

df["cost_numeric"] = df["cost_for_two"].apply(parse_cost)

# Cost distribution by city
cost_summary = df.groupby("city")["cost_numeric"].describe()
print(cost_summary[["count", "mean", "50%", "min", "max"]])

This gives you two analytical layers: cuisine penetration by city (which cuisines dominate which markets) and cost-for-two distributions (affordable vs premium market composition). Both are core inputs for market sizing and site selection.

Step 4: How do I deduplicate and store the database for ongoing updates?

Use restaurant_id as the primary key. On each refresh, upsert new data and track changes over time.

import sqlite3
from datetime import date

conn = sqlite3.connect("india_restaurants.db")

df["scrape_date"] = date.today().isoformat()
df["cuisine_str"] = df["cuisine"].apply(lambda x: "|".join(x) if isinstance(x, list) else "")

# Create table on first run, upsert on subsequent runs
df.to_sql("restaurants_staging", conn, if_exists="replace", index=False)

conn.execute("""
    CREATE TABLE IF NOT EXISTS restaurants (
        restaurant_id TEXT PRIMARY KEY,
        name TEXT, city TEXT, location TEXT, address TEXT,
        cuisine_str TEXT, cost_for_two TEXT, cost_numeric INTEGER,
        rating REAL, delivery_rating REAL, dining_rating REAL,
        delivery_time TEXT, is_serviceable INTEGER,
        has_online_ordering INTEGER, url TEXT,
        first_seen TEXT, last_updated TEXT
    )
""")

conn.execute("""
    INSERT OR REPLACE INTO restaurants
    SELECT restaurant_id, name, city, location, address,
           cuisine_str, cost_for_two, cost_numeric,
           rating, delivery_rating, dining_rating,
           delivery_time, is_serviceable, has_online_ordering, url,
           COALESCE(
               (SELECT first_seen FROM restaurants r2
                WHERE r2.restaurant_id = restaurants_staging.restaurant_id),
               scrape_date
           ) as first_seen,
           scrape_date as last_updated
    FROM restaurants_staging
""")
conn.commit()

count = conn.execute("SELECT COUNT(*) FROM restaurants").fetchone()[0]
print(f"Database now contains {count} restaurants")

The first_seen and last_updated timestamps let you track restaurant churn -- new openings and closures -- which is one of the highest-value signals for F&B market analysis. A restaurant that appeared last week in Koramangala, Bangalore with a 4.5 delivery rating is likely a new opening worth monitoring. One that disappeared after three months with a declining rating signals a closure. These longitudinal signals are invisible in single-snapshot datasets.

Sample output

Two records from a multi-city database build:

[
  {
    "name": "Meghana Foods",
    "cuisine": ["Biryani", "Andhra", "North Indian", "Chinese"],
    "rating": 4.2,
    "delivery_rating": 4.2,
    "dining_rating": 4.2,
    "cost_for_two": "₹1,000 for two",
    "delivery_time": "29 min",
    "location": "St. Marks Road, Bangalore",
    "city": "bangalore",
    "restaurant_id": "19282473",
    "is_promoted": false,
    "has_online_ordering": true
  },
  {
    "name": "Bademiya",
    "cuisine": ["Mughlai", "Kebab", "North Indian"],
    "rating": 4.0,
    "delivery_rating": 3.8,
    "dining_rating": 4.3,
    "cost_for_two": "₹800 for two",
    "delivery_time": "35 min",
    "location": "Colaba, Mumbai",
    "city": "mumbai",
    "restaurant_id": "38471",
    "is_promoted": false,
    "has_online_ordering": true
  }
]

Notice the rating divergence on Bademiya: dining_rating of 4.3 vs delivery_rating of 3.8. This 0.5-point gap is a concrete data point for delivery experience analysis — the kind of signal that only emerges at database scale across thousands of restaurants.

Common pitfalls

Three things trip up database builders. Incomplete city coverage — leaving queries empty returns the broadest set of restaurants per city, but Zomato's default sort may under-represent certain neighborhoods. Running additional queries for specific cuisines or dishes surfaces restaurants that a blank browse misses. Cost string parsingcost_for_two is a formatted string like "₹1,000 for two" and requires regex extraction for numeric analysis; do not cast directly to int. Dedup across data sources — if combining Zomato with Swiggy data, deduplicate on (name_normalized, locality) rather than any single ID, since restaurant IDs are platform-specific.

A fourth issue is rating interpretation. Zomato returns separate delivery_rating and dining_rating fields, not just a single composite score. Restaurants with high dining ratings may have mediocre delivery ratings due to packaging or cold food complaints. For delivery-focused analytics, always filter on delivery_rating; for dine-out recommendations, use dining_rating. The composite rating field is Zomato's own weighted average but may not match your use case. The is_veg field indicates whether a restaurant is marked as pure vegetarian on Zomato, which is relevant for filtering in Indian market contexts where roughly 30% of the population follows a vegetarian diet.

Thirdwatch's actor handles the extraction complexity -- request pacing and geo-routing -- so you can focus on building and maintaining the database layer.

Related use cases

Frequently asked questions

How many restaurants can I pull from Zomato across all 20 cities?

Each city can return up to 5,000 restaurants per run. Across 20 cities with targeted queries, you can build a database of tens of thousands of restaurants. Run multiple queries per city to maximize coverage.

Can I combine Zomato data with Swiggy for a more complete database?

Yes. Deduplicate on restaurant name plus locality, then merge fields. Zomato provides separate delivery and dining ratings that Swiggy does not, while Swiggy may list restaurants not on Zomato. The combination gives the most complete India F&B picture.

How do I keep the database fresh over time?

Schedule daily or weekly runs via Apify schedules. Use restaurant_id as the primary key and upsert new data on each refresh. Ratings, delivery times, and cost-for-two change frequently and should be tracked as time series.

Does the data include restaurant coordinates or latitude/longitude?

The actor returns full text address and locality but not GPS coordinates. For geocoding, pair the address field with a geocoding API like Google Maps or combine with the Thirdwatch Google Maps Scraper for coordinate-level data.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.