E-commerce & products

Scrape Nykaa Products for Indian Beauty Market Research Data

Pull Nykaa product search and category pages with Thirdwatch — product name, brand, price, MRP, discount, rating, reviews. Python recipes for analysts.

May 12, 2026 · 5 min read · 1,233 words

See the scraper →

Thirdwatch's Nykaa Scraper returns Nykaa beauty product listings — product_name, brand, price, original_price (MRP), discount_percent, rating, rating_count, image_url, category, in_stock, and a stable SKU. Built for beauty analysts sizing the Indian BPC market, brand teams tracking retail price compliance, indie founders benchmarking their pricing against the category, and researchers building longitudinal Indian beauty datasets.

▶ Skip the setup: Run this as a ready-to-go task on Apify → — pre-loaded with the exact configuration from this guide. No code required.

Why scrape Nykaa for India beauty market research

Nykaa is India's largest dedicated beauty and personal-care marketplace. Per the company's FY24 annual report (NSE: NYKAA), the BPC vertical processed ~INR 11,300 crore in GMV across roughly 6,800 brands and 4.8K SKUs in just makeup. The 2021 IPO valued the company at over $7B at listing, and beauty contributes a majority of that GMV. For the Indian beauty intelligence problem — category sizing, brand share, prestige-vs-mass price laddering, indie-brand discovery — Nykaa is the single most data-rich primary source.

The job-to-be-done is structured. A market researcher sizing the India serum category wants every serum SKU on Nykaa with brand, price, and review counts to estimate revenue distribution. A brand strategist at L'Oreal monitors competitor pricing across MAC, Maybelline, Lakme, and Sugar in real time. An indie founder launching a clean-beauty SKU benchmarks against the 200 most-reviewed face washes to set price and positioning. A trend analyst tracks newly indexed brands week-over-week to identify rising labels before they hit mainstream press. All reduce to keyword or category pulls with structured product rows.

How does this compare to alternatives?

Three paths exist for getting Nykaa product data into a research pipeline:

Approach	Reliability	Setup time	Maintenance
Nykaa public API	None — no public catalog API exists	N/A	N/A
Manual Nykaa browsing + spreadsheet	Low; doesn't scale beyond ~50 SKUs	Continuous analyst time	Painful
Thirdwatch Nykaa Scraper	Production-tested with production-grade anti-bot tooling	5 minutes	Thirdwatch tracks Nykaa changes

Nykaa publishes no public catalog API. Affiliate feeds (via vCommission or similar) are coupon-led and exclude the long tail of indie brands. The Nykaa Scraper actor page gives you the public catalog at transparent per-result pricing — no application process, no quota gate.

How to scrape Nykaa for beauty market research in 4 steps

Step 1: How do I authenticate against Apify?

Sign in at apify.com (free tier, no credit card), open Settings → Integrations, copy your personal API token, and export it:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I pull a full category for sizing research?

Leave queries empty and set category to the leaf category you want to size. The actor will browse the category index. maxResults caps the per-query haul; for a research-scale sweep, 500-1000 is reasonable.

import os, requests, json, datetime, pathlib

ACTOR = "thirdwatch~nykaa-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "queries": [],
        "category": "serum",
        "sortBy": "popularity",
        "maxResults": 500,
    },
    timeout=900,
)
records = resp.json()
today = datetime.date.today().isoformat()
pathlib.Path(f"snapshots/nykaa-serum-{today}.json").write_text(json.dumps(records))
print(f"{today}: pulled {len(records)} serum SKUs")

The category enum supports both top-level (makeup, skin, hair, bath-body, fragrance, mom-baby, men) and leaf categories (lipstick, foundation, serum, face-wash, shampoo, conditioner, eye-makeup, nail, moisturizer).

Step 3: How do I run a multi-keyword sweep for cross-brand comparison?

For brand-vs-brand research, supply your competitor set as queries.

BRANDS = ["maybelline", "lakme", "sugar cosmetics", "mac cosmetics",
          "huda beauty", "charlotte tilbury", "nykaa cosmetics"]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "queries": BRANDS,
        "category": "makeup",
        "sortBy": "popularity",
        "maxResults": 200,
    },
    timeout=900,
)
records = resp.json()
print(f"{len(records)} SKUs across {len(BRANDS)} brands")

Each brand returns up to 200 popularity-sorted makeup SKUs — enough to characterize a brand's Nykaa catalog without paginating the long tail.

Step 4: How do I analyze price laddering and discount depth by brand?

Pandas does the rest. Compute median price by brand, share-of-shelf, and average discount.

import pandas as pd

df = pd.DataFrame(records)
df = df.dropna(subset=["brand", "price"])

# Price ladder by brand
ladder = df.groupby("brand").agg(
    sku_count=("sku", "nunique"),
    median_price=("price", "median"),
    median_mrp=("original_price", "median"),
    avg_discount=("discount_percent", "mean"),
    avg_rating=("rating", "mean"),
    total_reviews=("rating_count", "sum"),
).sort_values("total_reviews", ascending=False)

print(ladder.head(20))

# Prestige vs mass split (MRP cutoff at INR 1500 is a useful heuristic)
df["tier"] = (df.original_price.fillna(df.price) >= 1500).map(
    {True: "prestige", False: "mass"}
)
print(df.groupby("tier").price.describe())

rating_count is the proxy for sales volume on Nykaa — unlike GMV, it's public, monotone, and rarely faked. Sort by rating_count for an honest popularity ranking.

Step 5: How do I dedupe across snapshots and track new SKU launches?

sku and product_id are stable identifiers, ideal as join keys.

import glob

frames = []
for f in sorted(glob.glob("snapshots/nykaa-serum-*.json")):
    date = pathlib.Path(f).stem.split("-")[-1]
    for j in json.loads(pathlib.Path(f).read_text()):
        frames.append({"date": date, **j})

ts = pd.DataFrame(frames).dropna(subset=["sku"])
ts["date"] = pd.to_datetime(ts["date"])
first_seen = ts.groupby("sku").date.min().reset_index().rename(columns={"date": "first_seen"})

# New launches this week
this_week = pd.Timestamp.today() - pd.Timedelta(days=7)
new_skus = first_seen[first_seen.first_seen >= this_week]
print(f"{len(new_skus)} new serum SKUs indexed in the last 7 days")

Sample output

A single Nykaa serum record looks like this. Twenty rows of this shape weigh ~10 KB.

[
  {
    "sku": "8904245700001",
    "product_id": "8904245700001",
    "product_name": "Minimalist 10% Niacinamide Face Serum",
    "brand": "Minimalist",
    "price": 599,
    "original_price": 699,
    "discount_percent": 14.31,
    "currency": "INR",
    "rating": 4.4,
    "rating_count": 38217,
    "image_url": "https://images-static.nykaa.com/media/.../niacinamide.jpg",
    "url": "https://www.nykaa.com/minimalist-10-niacinamide-face-serum/p/4567890",
    "category": "Face Serum",
    "in_stock": true,
    "source_query": "minimalist"
  },
  {
    "sku": "8901030713590",
    "product_id": "8901030713590",
    "product_name": "The Ordinary Hyaluronic Acid 2% + B5",
    "brand": "The Ordinary",
    "price": 950,
    "original_price": null,
    "discount_percent": 0,
    "currency": "INR",
    "rating": 4.6,
    "rating_count": 12044,
    "url": "https://www.nykaa.com/the-ordinary-hyaluronic-acid/p/2345678",
    "in_stock": true,
    "source_query": ""
  }
]

sku and product_id are duplicate fields by design — sku is the more idiomatic join key for analytics warehouses, product_id matches Nykaa's internal naming. rating_count is the most reliable proxy for sales volume since GMV isn't disclosed at the SKU level. original_price: null indicates a full-price SKU (no strike-through). in_stock reflects the listing-page badge.

Common pitfalls

Three things go wrong in production Nykaa pipelines. Brand name normalization — the same brand appears as "MAC", "M.A.C.", and "Mac Cosmetics" across listings; build a lookup table or normalize to lowercase-no-punctuation before grouping. MRP truthing — Nykaa's original_price is the brand-declared MRP, not necessarily a recent shelf price; for true discount-depth research, build your own rolling baseline rather than trusting the strike-through. Indie brand churn — small indie brands cycle on and off Nykaa quickly; a SKU disappearing between snapshots usually means delisted-not-deleted, so retain historical rows rather than overwriting.

A fourth subtle issue: Nykaa's category taxonomy mixes top-level departments (makeup) and leaf categories (lipstick) under one enum. For tree-structured research, fetch by leaf category and treat the top-level enums as fallback "everything-else" buckets.

Thirdwatch's actor handles Nykaa's production-grade anti-bot tooling by intercepting the page's embedded JSON for the listing payload, falling back to DOM extraction when the JSON shape shifts. The pure-pull architecture means a 500-SKU category sweep finishes in roughly two minutes — small enough to run nightly across 20+ categories. Pair Nykaa with our Myntra Scraper for the beauty overlap on Myntra (Tira, Maybelline, Lakme) and AJIO Scraper for AJIO Luxe's growing beauty section.

Related use cases

Frequently asked questions

What fields are returned per product?

Fifteen fields including product_name, brand, price, original_price (MRP), discount_percent, currency (INR), rating, rating_count, image_url, url, category, in_stock, sku, product_id, and source_query. All prices are integers in INR, ratings are floats on a 5-point scale, and the SKU works as a stable join key across snapshots.

How is Nykaa different from Amazon India or Flipkart for beauty?

Nykaa is India's largest dedicated beauty and personal-care marketplace, carrying ~6,800 brands as of FY24 disclosures including indie and prestige brands (Charlotte Tilbury, Huda Beauty, Dior, La Mer) that horizontal marketplaces do not stock. For category-specific beauty research, Nykaa coverage is materially deeper than Amazon India or Flipkart.

Can I browse a Nykaa category without a search keyword?

Yes. Leave the queries array empty and set the category enum (for example skin, hair, fragrance, or a leaf category like lipstick). The actor will browse the category index instead of running a keyword search. Combining a query and a category constrains the search to that category.

Does the actor return MRP and discount for all products?

MRP and discount_percent populate when Nykaa displays a strike-through original price. Many products especially full-priced prestige SKUs ship at MRP, in which case original_price is null and discount_percent is 0. Treat null original_price as a structural signal rather than a missing value.

How fresh is the pricing data?

Each run pulls live from nykaa.com at request time. Beauty prices change less frequently than electronics, but sale events (Pink Friday, Hot Pink Sale, Beauty Bonanza) trigger 30-50 percent moves. Daily snapshots are sufficient for steady-state monitoring; switch to hourly during named sale windows.

Track Nykaa Beauty Brand Pricing for Competitive Brand Recon Monitor Nykaa Bestsellers and New India Beauty Brand Drops Build an India Beauty Trend Pipeline With Nykaa Scraper Data

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.