Skip to main content
Thirdwatchthirdwatch
E-commerce & products

Scrape Tata CLiQ Luxury and Premium Products (2026 Guide)

Extract Tata CLiQ luxury and premium catalogue data with Thirdwatch's Tata CLiQ Scraper — structured fields for research, ops, and premium pricing workflows.

May 12, 2026 · 5 min read · 1,203 words
See the scraper →

Thirdwatch's Tata CLiQ Scraper extracts premium and luxury catalogue data from Tata CLiQ — product name, brand, price, MRP, discount, rating, image, and URL — in clean, typed JSON. Built for researchers studying India's premium-segment e-commerce, ops teams pricing private-label assortments against an authorised-luxury baseline, and analysts tracking Tata Digital's positioning against Reliance and Amazon Luxury Stores.

TL;DR

Tata CLiQ is the only large Indian marketplace with a dedicated authorised-luxury arm carrying international labels. Scraping it gives a different catalogue lens than Flipkart, Amazon India, or Myntra — premium-skewed, fewer SKUs, higher average price points. The actor returns nine structured fields per product: name, brand, price, MRP, discount, rating, review count, image, and URL. Inputs are search queries plus an optional category from a fixed enum. Default 30 results per query is enough to sanity-check; scale once the schema fits your pipeline.

Why scrape Tata CLiQ for premium product research

Tata Digital, the digital arm of the $300B+ Tata Group, has invested heavily in turning Tata CLiQ into India's go-to premium destination. The platform's parent group reported $128 billion in 2023 group-level revenue per Tata.com's published financials, and Tata CLiQ Luxury — the authorised-luxury sub-platform — sits in a category that Bain & Company's India Luxury Report 2024 sizes at roughly $8 billion and projects to grow at high-double-digit CAGR through 2030.

The catalogue composition is the differentiator. Where Flipkart and Amazon India bias toward mass-market value SKUs, Tata CLiQ leans premium and authorised: Tag Heuer and Tissot in watches, Canali and Diesel in apparel, Coach and Furla in bags, and a curated beauty assortment with Estée Lauder, La Mer, and Jo Malone. For a researcher mapping India's premium consumer segment, Tata CLiQ is closer to a representative sample than the mass-market platforms.

The actor returns the catalogue as structured records. Build a 5,000-SKU premium watchlist, compare prices against authorised European boutique pricing, track which brands deepen their India distribution quarter over quarter, or feed a private-label pricing engine that wants a premium reference rather than a Flipkart median. The use cases share the same data layer — clean, typed Tata CLiQ records.

How does this compare to alternatives?

Three options for getting Tata CLiQ catalogue data into your pipeline:

Approach Reliability Setup time Maintenance
DIY scraper (Playwright + selectors) Brittle — site refactors break extraction monthly Two to five engineer-days Continuous
Paid Indian retail data SaaS (DataWeave, BrandIQ) Production-grade, includes dashboards One to three weeks plus contract Vendor lock-in, six-figure annual minimums common
Thirdwatch Tata CLiQ Scraper Production-tested, fields stable Under an hour to integrate Thirdwatch maintains it

DIY is fine for a one-shot research pull. Paid SaaS makes sense if you want a managed dashboard and have budget. The actor sits between — pay per result, no contract, structured data you can pipe straight into pandas, BigQuery, or a notebook.

How to scrape Tata CLiQ for premium products in 4 steps

Step 1: How do I get an Apify API token?

Sign up at apify.com (free tier, no card needed) and copy your token from Settings → Integrations. Every example below assumes it is exported in your shell:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I run the actor against a luxury query?

Trigger a synchronous run with a watch-category query, narrowed by category. The actor blocks until the dataset is ready and returns items directly.

import os, requests, json

TOKEN = os.environ["APIFY_TOKEN"]

resp = requests.post(
    "https://api.apify.com/v2/acts/thirdwatch~tatacliq-scraper/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "queries": ["tag heuer watch", "tissot men", "canali blazer"],
        "category": "watches",
        "sortBy": "popularity",
        "maxResults": 50,
    },
    timeout=300,
)
resp.raise_for_status()
products = resp.json()
print(f"Fetched {len(products)} products")
for p in products[:3]:
    print(p["brand"], "—", p["product_name"][:60], "₹", p["price"])

The category enum is fixed — watches, jewellery, clothing, footwear, bags-luggage, beauty, electronics, appliances, and a handful more. Pass all to search the whole catalogue. sortBy accepts relevance, popularity, priceAsc, priceDesc, discount, newest, and rating.

Step 3: How do I run a full category browse without queries?

Leave queries empty and set a category — the actor browses that category instead of running a keyword search. Pair with minPrice to filter the luxury slice.

resp = requests.post(
    "https://api.apify.com/v2/acts/thirdwatch~tatacliq-scraper/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "queries": [],
        "category": "bags-luggage",
        "minPrice": 15000,
        "sortBy": "priceDesc",
        "maxResults": 200,
    },
    timeout=600,
)
luxury_bags = resp.json()
print(f"Premium bags above ₹15,000: {len(luxury_bags)}")

This is the cleanest way to pull a premium-only slice of a category. minPrice is enforced server-side; you do not need to filter again in Python.

Step 4: How do I shape it into a research dataset?

Convert to a pandas DataFrame, derive a discount baseline, and write Parquet for downstream analysis.

import pandas as pd, datetime, pathlib

df = pd.DataFrame(luxury_bags)
df["discount_inr"] = df["original_price"] - df["price"]
df["scraped_at"] = datetime.datetime.utcnow().isoformat()

# Brand-level summary
summary = (df.groupby("brand")
             .agg(skus=("product_name", "count"),
                  median_price=("price", "median"),
                  median_discount_pct=("discount_percent", "median"),
                  median_rating=("rating", "median"))
             .sort_values("median_price", ascending=False))
print(summary.head(20))

out = pathlib.Path("data/tatacliq")
out.mkdir(parents=True, exist_ok=True)
df.to_parquet(out / f"premium-bags-{datetime.date.today()}.parquet", index=False)

The summary above is the typical first slice — which brands carry the most SKUs, what their typical premium price point is, how aggressively discounted they are. Pair with a similar pull from Myntra for a fashion-only premium-vs-mass comparison.

Sample output

A single product record looks like this:

[
  {
    "product_name": "Canali Slim Fit Wool Blazer",
    "brand": "Canali",
    "price": 145000,
    "original_price": 175000,
    "discount_percent": 17,
    "rating": 4.6,
    "rating_count": 23,
    "image": "https://img.tatacliq.com/images/i20/.../MP000000023456789_w_l.jpg",
    "url": "https://www.tatacliq.com/canali-slim-fit-wool-blazer/p-mp000000023456789"
  },
  {
    "product_name": "Tag Heuer Carrera Calibre 5 Automatic 41mm",
    "brand": "Tag Heuer",
    "price": 218500,
    "original_price": 230000,
    "discount_percent": 5,
    "rating": 4.8,
    "rating_count": 41,
    "image": "https://img.tatacliq.com/images/i19/.../MP000000019876543_w_l.jpg",
    "url": "https://www.tatacliq.com/tag-heuer-carrera-calibre-5/p-mp000000019876543"
  }
]

price and original_price are integers in INR. discount_percent is the percent shown on the product card. rating is on a 5-point scale (null when unrated). rating_count is the review tally backing the rating. url is the canonical product detail page if you need to follow up with additional scrapes.

Common pitfalls

Three things to watch for. Premium SKUs have low review counts. A Canali blazer with 23 reviews is healthy for that price tier — do not treat low rating_count as a quality red flag the way you would on a mass-market platform. Discount percentages compress at the very top end. Luxury items often show single-digit or zero discounts because authorised inventory is rarely deep-discounted; do not exclude discount_percent < 10 rows when filtering for "premium" — you will throw out most of the catalogue. Brand names are not always canonical. Tata CLiQ uses both Tag Heuer and TAG HEUER depending on the listing source; normalise to lowercase before grouping.

The actor itself handles Tata CLiQ's site-level access controls and product-card variation internally — Thirdwatch maintains the extraction recipe so you do not have to chase selector changes. If a run returns empty, retry once; transient blocks resolve on the next attempt.

Related use cases

Frequently asked questions

Why scrape Tata CLiQ instead of Flipkart or Amazon India?

Tata CLiQ is India's premium-skewed marketplace. It is the only large Indian platform with a dedicated luxury arm (Tata CLiQ Luxury) carrying authorised inventory from international labels like Canali, Coach, Furla, and Diesel. For premium-segment research, Flipkart and Amazon India underrepresent this catalogue — Tata CLiQ is the better lens.

What fields does the actor return?

Each product record has product_name, brand, price (current), original_price (MRP), discount_percent, rating, rating_count, image, and url. Numbers come typed as integers so you can do arithmetic and filtering without parsing rupee strings or scraping commas.

Can I narrow results to a specific category?

Yes. The category input is an enum of Tata CLiQ's top-level paths — clothing, footwear, bags-luggage, watches, jewellery, beauty, electronics, home-kitchen, and more. Combine the category with a search query for a narrow slice (for example, category=watches with query='tag heuer').

How fresh is the data?

The actor fetches live results at request time. There is no cache layer between the run and Tata CLiQ — what you get back is what the site is currently showing to an India visitor. Schedule a daily or weekly run for trend research; run on-demand for spot checks.

What is a reasonable maxResults for a first run?

Start with 30. That gives one to two pages of results per query, enough to confirm the schema fits your downstream pipeline and that your search and category combination returns sensible products. Scale to a few hundred per query once the recipe is validated.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.