E-commerce & products

Build an India Baby-Care Market Database With FirstCry Data

Build an India baby-care market intelligence database using Thirdwatch's FirstCry Scraper. Postgres schema, daily ETL ingest, dashboards and SQL rollups.

May 12, 2026 · 6 min read · 1,412 words

See the scraper →

Thirdwatch's FirstCry Scraper is the data spine for an India baby-care market database — product, brand, price, MRP, discount, rating_count, image and URL across every FirstCry category, refreshed on whatever cadence your research workflow needs. Built for researchers, consultancies, brand teams and founders who need a defensible India parenting-economy dataset rather than a one-off CSV.

Why build an India baby-care market database with FirstCry

India's baby-care category is one of the most attractive online retail segments in the country, and one of the most under-instrumented. According to the India Brand Equity Foundation's ecommerce report and category trackers like RedSeer and Praxis Global Alliance, the India baby-and-kids retail market is on track to cross USD 30 billion by 2027, with online penetration accelerating fastest in Tier 2 and Tier 3 cities. Brainbees Solutions (FirstCry's parent) reported in its 2024 IPO prospectus on BSE India that the platform had more than 9 million active customers and 75 million app downloads at listing — the largest pure-play India baby-care platform by an order of magnitude.

The job-to-be-done is structured. A research consultancy builds a quarterly India baby-care category report for institutional clients. A brand team runs a private market-share dashboard tracking FirstCry, Flipkart and Amazon India together. A D2C founder maintains a competitive baby-skincare cube refreshed nightly. A pricing consultant runs index reports for baby-formula clients. All of them need the same substrate — clean, repeatable FirstCry product-and-price data over time, normalized into a queryable schema.

How does this compare to alternatives?

Three options for building the underlying dataset:

Approach	Reliability	Setup time	Maintenance
Build your own FirstCry scraper end-to-end	Mixed — breaks when FirstCry changes	Weeks to months	Continuous engineering
Outsource to a custom-scraping vendor	Mixed — vendor SLA risk	Days to weeks	Per-vendor contracting
Thirdwatch FirstCry Scraper	Production-tested with production-grade anti-bot tooling	5 minutes	Thirdwatch tracks FirstCry changes

Self-built scrapers absorb engineering time that should go into the database, the dashboards and the analysis. The FirstCry Scraper actor page gives you the public catalog at transparent per-result pricing — you focus on the database, we keep the scrape working.

How to build the database in 4 steps

Step 1: How do I set up Apify and Postgres?

Sign in at apify.com, open Settings → Integrations, and copy your personal API token. Stand up a Postgres instance (managed or self-hosted) for the data:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"
export DATABASE_URL="postgresql://user:pass@host:5432/babycare"

Step 2: How should I shape the schema?

Three core tables, plus derived rollups created later. Keep the raw price_snapshots table append-only — it's your audit trail.

CREATE TABLE IF NOT EXISTS product_master (
    url            TEXT PRIMARY KEY,
    name           TEXT,
    brand          TEXT,
    category       TEXT,
    image_url      TEXT,
    first_seen     DATE NOT NULL,
    last_seen      DATE NOT NULL
);

CREATE TABLE IF NOT EXISTS price_snapshots (
    url            TEXT NOT NULL,
    snapshot_date  DATE NOT NULL,
    price_inr      INTEGER,
    mrp_inr        INTEGER,
    discount_pct   NUMERIC(5,2),
    rating_count   INTEGER,
    PRIMARY KEY (url, snapshot_date)
);

CREATE TABLE IF NOT EXISTS brand_master (
    brand_raw      TEXT PRIMARY KEY,
    brand_norm     TEXT NOT NULL
);

CREATE INDEX IF NOT EXISTS price_snapshots_date  ON price_snapshots (snapshot_date);
CREATE INDEX IF NOT EXISTS product_master_brand  ON product_master (brand);
CREATE INDEX IF NOT EXISTS product_master_cat    ON product_master (category);

The URL is the natural key throughout — FirstCry encodes a product ID into the URL slug, so the URL is stable across snapshots even when name or image change.

Step 3: How do I run the daily ingest?

Sweep every category, normalize brand, upsert into product_master, append a price snapshot row per URL.

import os, re, datetime, json, psycopg2, requests
from psycopg2.extras import execute_values

ACTOR = "thirdwatch~firstcry-scraper"
TOKEN = os.environ["APIFY_TOKEN"]
conn  = psycopg2.connect(os.environ["DATABASE_URL"])
today = datetime.date.today()

CATEGORIES = [
    "diapers", "baby-skincare", "baby-feeding", "baby-toys",
    "baby-clothing", "baby-gear-strollers",
    "kids-clothing", "kids-footwear", "kids-toys",
    "school-supplies", "mom-care",
]

def parse_inr(s):
    digits = re.sub(r"[^\d]", "", str(s or ""))
    return int(digits) if digits else None

def normalize_brand(b):
    if not b: return None
    return re.sub(r"\s+(india|inc\.?|ltd\.?)$", "", str(b).strip().title(), flags=re.I)

product_rows, snapshot_rows = [], []
for cat in CATEGORIES:
    r = requests.post(
        f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
        params={"token": TOKEN},
        json={"queries": [], "category": cat,
              "sortBy": "popularity", "maxResults": 300},
        timeout=900,
    )
    for row in r.json():
        url = row.get("url")
        if not url: continue
        price = parse_inr(row.get("price"))
        mrp   = parse_inr(row.get("original_price"))
        disc  = round((mrp - price) / mrp * 100, 2) if mrp and price and mrp >= price else None
        product_rows.append((
            url, row.get("product_name"), normalize_brand(row.get("brand")),
            cat, row.get("image_url"), today, today,
        ))
        snapshot_rows.append((url, today, price, mrp, disc, row.get("rating_count")))

with conn, conn.cursor() as cur:
    execute_values(cur, """
        INSERT INTO product_master (url, name, brand, category, image_url, first_seen, last_seen)
        VALUES %s
        ON CONFLICT (url) DO UPDATE SET
            name = EXCLUDED.name,
            brand = COALESCE(EXCLUDED.brand, product_master.brand),
            last_seen = EXCLUDED.last_seen
    """, product_rows)
    execute_values(cur, """
        INSERT INTO price_snapshots (url, snapshot_date, price_inr, mrp_inr, discount_pct, rating_count)
        VALUES %s
        ON CONFLICT (url, snapshot_date) DO UPDATE SET
            price_inr = EXCLUDED.price_inr,
            mrp_inr = EXCLUDED.mrp_inr,
            discount_pct = EXCLUDED.discount_pct,
            rating_count = EXCLUDED.rating_count
    """, snapshot_rows)

print(f"{today}: {len(product_rows)} product upserts, {len(snapshot_rows)} snapshots")

A daily run across 11 categories at maxResults 300 lands roughly 3,000-3,500 rows per snapshot — a representative India baby-care cross-section.

Step 4: How do I derive useful metrics?

Build category and brand rollups as SQL views that your dashboard or report queries hit. Refresh nightly via materialized views if your dataset is large.

CREATE MATERIALIZED VIEW IF NOT EXISTS category_daily AS
SELECT
    pm.category,
    ps.snapshot_date,
    COUNT(*) AS sku_count,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ps.price_inr) AS median_price,
    AVG(ps.discount_pct) AS avg_discount_pct,
    SUM(ps.rating_count) AS total_ratings
FROM price_snapshots ps
JOIN product_master  pm USING (url)
GROUP BY pm.category, ps.snapshot_date;

CREATE MATERIALIZED VIEW IF NOT EXISTS brand_daily AS
SELECT
    pm.brand,
    pm.category,
    ps.snapshot_date,
    COUNT(*) AS sku_count,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ps.price_inr) AS median_price,
    AVG(ps.discount_pct) AS avg_discount_pct,
    SUM(ps.rating_count) AS total_ratings
FROM price_snapshots ps
JOIN product_master  pm USING (url)
WHERE pm.brand IS NOT NULL
GROUP BY pm.brand, pm.category, ps.snapshot_date;

category_daily is your headline category index: median price, average discount, total ratings (a popularity proxy) per category per day. brand_daily is the same axis sliced by brand — the spine of brand-level dashboards. Refresh nightly with REFRESH MATERIALIZED VIEW CONCURRENTLY category_daily;.

Sample output

A few rows from price_snapshots joined with product_master look like this. Five rows weigh roughly 4 KB.

[
  {
    "url": "https://www.firstcry.com/pampers/.../product-detail",
    "name": "Pampers Premium Care Pant Style Diapers M - 76",
    "brand": "Pampers",
    "category": "diapers",
    "snapshot_date": "2026-05-12",
    "price_inr": 1299,
    "mrp_inr": 1599,
    "discount_pct": 18.76,
    "rating_count": 4825
  },
  {
    "url": "https://www.firstcry.com/mamaearth/.../product-detail",
    "name": "Mamaearth Mineral Sunscreen for Babies 100ml",
    "brand": "Mamaearth",
    "category": "baby-skincare",
    "snapshot_date": "2026-05-12",
    "price_inr": 349,
    "mrp_inr": 399,
    "discount_pct": 12.53,
    "rating_count": 3187
  }
]

url joins product_master and price_snapshots. snapshot_date gives you the time axis. price_inr, mrp_inr, discount_pct and rating_count are the four numeric series most India baby-care dashboards spend their time on.

Common pitfalls

Three things go wrong building a baby-care database from FirstCry. Brand-name fragmentation — the same brand appears as "Pampers", "Pampers India" and occasionally with a co-brand prefix; without a brand_master normalization table you'll undercount brand share by 5-15 percent. URL drift on relisted SKUs — occasionally FirstCry retires a URL and relists the same product at a new URL; the old time-series ends, the new one starts. Build a successor_url mapping table for the small fraction of SKUs where this matters. Snapshot-vs-event confusion — a price-snapshot pipeline that runs once daily can miss a flash sale that opens and closes within hours; for full deal coverage, layer an event-window job at 30-minute cadence on top of the daily baseline rather than trying to run everything hourly.

Thirdwatch's actor uses production-grade anti-bot tooling under the hood, sustained at India-residential network conditions, so the catalog data lands like a real shopper's queries. The pure-HTTP architecture keeps daily ingest jobs fast and predictable, which matters when the database is sitting behind a scheduled pipeline rather than a one-off analyst session. Layer FirstCry with our Flipkart Scraper and Amazon Scraper to build a three-source India baby-care intelligence stack, and add our Myntra Scraper for kids fashion adjacency. A fourth subtle issue: rating_count is a strong popularity proxy but it accumulates over a SKU's lifetime; for recent demand signal, compute the daily delta in rating_count rather than the absolute value, and weight your dashboards on the delta.

Related use cases

Frequently asked questions

What is an India baby-care market database?

A structured, queryable store of FirstCry (and optionally Flipkart and Amazon India) product, price and popularity data over time. Schema typically holds product master, daily price-and-stock snapshots, brand master, and derived aggregates by category and brand. It is the data substrate for India baby-care research, consulting and brand-team dashboards.

Why FirstCry as the anchor source?

FirstCry is India's largest baby and kids ecommerce platform with a catalog depth no general marketplace matches. For India baby-care category coverage, FirstCry data is the single best anchor — Flipkart and Amazon India layer in as complements, not replacements. Most India baby-care databases start with FirstCry and add marketplace coverage second.

What schema works for this kind of database?

Three core tables: product_master (url-keyed, with brand, category, name, first_seen, last_seen); price_snapshots (url + snapshot_date, holding price, mrp, discount_pct, rating_count); brand_master (brand normalization). Optional: derived aggregates (category_daily, brand_daily) refreshed nightly for fast dashboard queries.

How often should I refresh the database?

Daily ingest is the right baseline for steady-state India baby-care research. During announced FirstCry sale events (Birthday Bash, festive, end-of-season), bump the relevant categories to hourly for the sale window. Most teams run a single daily job and an event-window job that turns on and off around the calendar.

What can I do with the database that I can't do with one-off scrapes?

Track brand momentum over months, compute category price indices, surface SKUs whose rating_count growth is outpacing the category, watch MRP drift, build comparative competitive heatmaps. Anything that requires a time-series rather than a snapshot — which is most of the interesting research and consulting work.

Can I commercialize derived data?

You can build internal dashboards, research reports and consulting deliverables on top of derived metrics. For commercial redistribution of raw rows, talk to your legal counsel about platform terms of service and Indian competition-law considerations. Most India market-intelligence shops sell derived insights and retain raw rows internally.

Scrape FirstCry Products for India Baby Care Research 2026 Track FirstCry Pricing on Baby and Kids Products (2026)Monitor FirstCry Deals and Bestsellers: 2026 India Playbook

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.