Build an eBay India Product Database for E-Commerce Research
Build a structured eBay India product database with prices, sellers, conditions, and shipping data. Python ETL recipes using the Thirdwatch Apify actor.

Thirdwatch's eBay India Scraper returns structured product data from eBay India — SKUs, prices, original prices, discount percentages, sellers, conditions, listing formats, and shipping costs. Built for developers and e-commerce founders who need a queryable product database from eBay India's cross-border marketplace without reverse-engineering the site's anti-bot protections.
Why build an eBay India product database
eBay India is the primary cross-border marketplace for Indian buyers purchasing from international sellers. Unlike Flipkart or Amazon India, eBay India exposes listing conditions (new, refurbished, used), auction dynamics, multi-currency pricing, and seller geographies in its search results. According to Statista's India e-commerce report, cross-border e-commerce in India grew 28% year-over-year in 2025, driven by consumer demand for products unavailable on domestic platforms.
For e-commerce founders, a structured eBay India database unlocks three things. First, product discovery — identifying items with consistent demand that you could source domestically or import directly. Second, pricing intelligence — understanding the price floor for imported goods that compete with your catalog. Third, gap analysis — finding categories where eBay India has supply but domestic platforms do not, signaling unmet demand.
Building this database manually is impractical. eBay India's search results are paginated, listing formats vary between auctions and fixed-price, and the site's bot detection blocks standard scraping tools. An automated pipeline that extracts, normalizes, and stores the data is the only scalable path.
How does this compare to the alternatives?
Three approaches to building an eBay India product database:
| Approach | Reliability | Setup time | Maintenance |
|---|---|---|---|
| DIY Python scraper | Low; blocked by anti-bot within hours | 3-7 days engineering + debugging | High; selector changes monthly |
| Generic scraping API (ScraperAPI, Bright Data) | Medium; eBay India-specific parsing often missing | Hours to days | Vendor-managed but schema inconsistent |
| Thirdwatch eBay India Scraper | Production-grade; anti-bot handled | 5 minutes to first data | Thirdwatch maintains selectors |
The DIY path is a time sink. eBay India's bot detection blocks standard HTTP libraries and most headless browsers, which means you spend more time on anti-bot engineering than on the actual data pipeline. Generic scraping APIs often support eBay US but return raw HTML for the India domain, leaving you to write your own parser. The eBay India Scraper returns normalized JSON with 19 typed fields, ready for database ingestion.
How to build an eBay India product database in 5 steps
Step 1: How do I authenticate and install dependencies?
Sign up at apify.com for a free API token. Install the Python client library.
pip install apify-client psycopg2-binary
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"Step 2: How do I define my product categories and run the scraper?
Map your product categories to eBay India search queries. The queries input accepts an array of search strings, and maxResults controls how many listings per query.
from apify_client import ApifyClient
client = ApifyClient("apify_api_xxxxxxxxxxxxxxxx")
PRODUCT_CATEGORIES = {
"electronics": ["bluetooth earbuds", "wireless mouse", "usb hub"],
"watches": ["automatic watch", "smartwatch band", "vintage casio"],
"accessories": ["leather wallet men", "sunglasses polarized", "laptop sleeve"],
}
all_items = []
for category, queries in PRODUCT_CATEGORIES.items():
run = client.actor("thirdwatch/ebay-india-scraper").call(
run_input={
"queries": queries,
"maxResults": 100,
}
)
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
for item in items:
item["research_category"] = category
all_items.extend(items)
print(f"{category}: {len(items)} products collected")
print(f"\nTotal: {len(all_items)} products across {len(PRODUCT_CATEGORIES)} categories")Step 3: How do I normalize and deduplicate the data?
eBay listings can appear under multiple search queries. Use the sku field as a natural primary key for deduplication.
seen_skus = set()
unique_items = []
for item in all_items:
sku = item.get("sku")
if sku and sku not in seen_skus:
seen_skus.add(sku)
# Normalize price to float
item["price"] = float(item["price"]) if item.get("price") else None
item["original_price"] = float(item["original_price"]) if item.get("original_price") else None
unique_items.append(item)
print(f"Deduplicated: {len(all_items)} -> {len(unique_items)} unique SKUs")Step 4: How do I load the data into PostgreSQL?
Create a products table and upsert records by SKU. This pattern supports incremental updates from scheduled runs.
import psycopg2
import json
conn = psycopg2.connect("postgresql://user:pass@localhost/ebay_india")
cur = conn.cursor()
cur.execute("""
CREATE TABLE IF NOT EXISTS ebay_india_products (
sku TEXT PRIMARY KEY,
product_name TEXT,
seller TEXT,
price NUMERIC,
original_price NUMERIC,
discount_percent NUMERIC,
currency TEXT,
shipping TEXT,
condition TEXT,
listing_format TEXT,
image_url TEXT,
url TEXT,
source_query TEXT,
research_category TEXT,
scraped_at TIMESTAMP DEFAULT NOW()
)
""")
for item in unique_items:
cur.execute("""
INSERT INTO ebay_india_products
(sku, product_name, seller, price, original_price, discount_percent,
currency, shipping, condition, listing_format, image_url, url,
source_query, research_category)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
ON CONFLICT (sku) DO UPDATE SET
price = EXCLUDED.price,
original_price = EXCLUDED.original_price,
discount_percent = EXCLUDED.discount_percent,
shipping = EXCLUDED.shipping,
scraped_at = NOW()
""", (
item.get("sku"), item.get("product_name"), item.get("seller"),
item.get("price"), item.get("original_price"), item.get("discount_percent"),
item.get("currency"), item.get("shipping"), item.get("condition"),
item.get("listing_format"), item.get("image_url"), item.get("url"),
item.get("source_query"), item.get("research_category"),
))
conn.commit()
print(f"Upserted {len(unique_items)} records into ebay_india_products")Step 5: How do I schedule recurring updates?
Use Apify's scheduling feature or a cron job to keep your database current. Weekly refreshes are sufficient for catalog building; daily for active pricing.
# Schedule via Apify API
import requests
requests.post(
"https://api.apify.com/v2/schedules",
headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
json={
"name": "ebay-india-weekly-refresh",
"cronExpression": "0 6 * * 1", # Every Monday at 6 AM UTC
"actions": [{
"type": "RUN_ACTOR",
"actorId": "thirdwatch/ebay-india-scraper",
"runInput": {
"queries": ["bluetooth earbuds", "wireless mouse", "automatic watch"],
"maxResults": 100,
}
}]
}
)Sample output
Each product record contains 19 structured fields. Here is a representative sample:
[
{
"sku": "326148792510",
"product_id": "326148792510",
"product_name": "Sony WF-1000XM5 Wireless Earbuds Noise Cancelling Imported",
"brand": "",
"seller": "audio_imports_IN",
"price": 14999.0,
"original_price": 19990.0,
"discount_percent": 24.96,
"currency": "INR",
"shipping": "Free shipping",
"condition": "New",
"listing_format": "Buy It Now",
"rating": null,
"rating_count": null,
"image_url": "https://i.ebayimg.com/images/g/example/s-l225.jpg",
"url": "https://www.ebay.in/itm/326148792510",
"category": "",
"in_stock": null,
"source_query": "bluetooth earbuds"
},
{
"sku": "285647391082",
"product_id": "285647391082",
"product_name": "Logitech MX Master 3S Wireless Mouse Graphite",
"brand": "",
"seller": "peripherals_world",
"price": 6450.0,
"original_price": 8995.0,
"discount_percent": 28.29,
"currency": "INR",
"shipping": "INR 299.00 shipping",
"condition": "New",
"listing_format": "Buy It Now",
"rating": null,
"rating_count": null,
"image_url": "https://i.ebayimg.com/images/g/example/s-l225.jpg",
"url": "https://www.ebay.in/itm/285647391082",
"category": "",
"in_stock": null,
"source_query": "wireless mouse"
}
]The sku and product_id fields serve as unique identifiers for deduplication and primary keys. The source_query field maps each result to the search term that produced it, which is essential for category-level aggregation in multi-query runs.
Common pitfalls
Schema drift is your biggest maintenance risk. eBay updates its search results DOM every few months, which breaks hardcoded CSS selectors in DIY scrapers. If you build a pipeline on your own parser, budget for monthly maintenance. The Thirdwatch actor handles selector updates so your database pipeline stays stable.
Multi-currency pricing requires normalization. According to the Reserve Bank of India's reference rates, INR-USD fluctuations of 2-3% per month are common, making cross-currency comparison unreliable without daily rate normalization. eBay India has both domestic and cross-border sellers. A query for "wireless mouse" may return products priced in INR, USD, and GBP in the same result set. Always check the currency field before aggregating prices. Failing to normalize produces meaningless averages.
Deduplication across queries is non-trivial. The same eBay listing can rank for multiple search terms. Without deduplication on the sku field, your database will contain duplicate rows with different source_query values. Use upsert logic (INSERT ON CONFLICT) rather than blind inserts to handle this cleanly.
Listings disappear without notice. eBay sellers frequently remove, relist, or modify their products. A database built from a single snapshot becomes stale within days. Schedule weekly refreshes at minimum, and track listing survival by comparing SKU sets across snapshots.
Related use cases
- Scrape eBay India products for market research — Research-oriented analysis of eBay India's cross-border marketplace dynamics.
- Monitor eBay India prices for competitive intelligence — Track price movements and discount trends over time.
- Find eBay India seller trends for arbitrage — Identify pricing gaps between eBay India and domestic platforms.
- Build an Amazon product research tool — Compare with the largest domestic marketplace.
- Guide to scraping e-commerce data — Broader strategies for e-commerce data extraction across platforms.
Frequently asked questions
What output format does the eBay India Scraper produce?
The actor outputs JSON records to an Apify dataset. Each record contains 19 typed fields including sku, product_name, price, original_price, discount_percent, seller, condition, listing_format, shipping, currency, image_url, and url. You can export via the Apify API in JSON, CSV, or Excel format.
Can I integrate eBay India product data into my existing database?
Yes. The actor returns structured JSON with a stable schema. Use the Apify API or the apify-client Python library to pull dataset items directly into your ETL pipeline, then upsert by sku into PostgreSQL, MongoDB, or any database. The sku field serves as a natural primary key for deduplication.
Related
100 free credits, no credit card.
About 30 real searches. Add the MCP to Claude or Cursor in two minutes.