Build an Amazon Product Research Tool (2026 Guide)
Build a Jungle Scout / Helium 10 alternative using Thirdwatch's Amazon Scraper. BSR + price + rating + niche-opportunity scoring.

Thirdwatch's Amazon Scraper gives product-research SaaS builders, e-commerce operators, and retail-arbitrage teams a structured Amazon data layer — title, price, BSR, rating, reviews_count, monthly-sold, image, brand, category. Build a Jungle Scout / Helium 10 alternative with custom scoring tuned to your niche.
Why build an Amazon product-research tool
Amazon product research is a $200M+ SaaS market. According to Marketplace Pulse's 2024 Amazon Data report, more than 2 million active third-party sellers use product-research tools to identify niches, validate demand, and price competitively. The two market leaders (Jungle Scout, Helium 10) charge $50-$200/month per seat for a workflow that's fundamentally about ingesting Amazon's product, BSR, and rating data and computing opportunity scores. For platform builders and high-volume operators, owning that data layer directly is materially more flexible.
The job-to-be-done is structured. A product-research SaaS founder wants to ingest 500K+ Amazon listings weekly and compute proprietary opportunity scores. An e-commerce operator wants to evaluate 50 candidate niches per quarter for new-product launches. A retail-arbitrage team wants daily BSR snapshots on a 5K-product watchlist. An Amazon agency wants per-client niche-opportunity dashboards. All reduce to category-level or keyword-level scrapes + BSR + rating filters + custom scoring.
How does this compare to the alternatives?
Three options for Amazon product-research data:
| Approach | Cost per 1,000 records | Reliability | Setup time | Maintenance |
|---|---|---|---|---|
| Jungle Scout / Helium 10 (SaaS) | $50–$200/month per seat | High, with built-in scoring | Hours | Per-seat license |
| Amazon Product Advertising API | Free w/ affiliate, gated | Limited fields | Days (API approval) | Strict TOS limits |
| Thirdwatch Amazon Scraper | Pay per result | Production-tested with proxy | 5 minutes | Thirdwatch tracks Amazon changes |
Jungle Scout and Helium 10 are the SaaS market leaders but their pricing assumes individual-seat consumption. Amazon's official Product Advertising API is gated behind affiliate program approval and limits fields. The Amazon Scraper actor page gives you the raw data at the lowest unit cost — and you compose the scoring algorithm.
How to build a product-research tool in 4 steps
Step 1: How do I authenticate against Apify?
Sign in at apify.com (free tier, no credit card), open Settings → Integrations, and copy your personal API token. Every example below assumes the token is in APIFY_TOKEN:
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"Step 2: How do I pull a category-level batch?
Pass keyword queries that map to the niche category you're researching.
import os, requests, pandas as pd
ACTOR = "thirdwatch~amazon-scraper"
TOKEN = os.environ["APIFY_TOKEN"]
NICHES = ["silicone baking mat", "garlic press stainless steel",
"knife sharpener", "vegetable spiralizer",
"kitchen scale digital", "coffee scale",
"salad spinner", "mandoline slicer"]
resp = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
params={"token": TOKEN},
json={"queries": NICHES, "country": "us", "maxResults": 100},
timeout=3600,
)
df = pd.DataFrame(resp.json())
print(f"{len(df)} products across {len(NICHES)} niches")8 niches × 100 products = up to 800 records.
Step 3: How do I score niche opportunities?
Composite score: high reviews + mediocre rating + competitive price band.
df["rating"] = pd.to_numeric(df.rating, errors="coerce")
df["reviews_count"] = pd.to_numeric(df.reviews_count, errors="coerce")
df["price"] = pd.to_numeric(df.price.str.replace("$", "").str.replace(",", ""), errors="coerce")
niche_scores = (
df.groupby("searchString")
.agg(
product_count=("title", "count"),
median_price=("price", "median"),
median_rating=("rating", "median"),
median_reviews=("reviews_count", "median"),
top_brand_share=("brand", lambda x: x.value_counts(normalize=True).iloc[0] if len(x) > 0 else 0),
)
.assign(
opportunity_score=lambda d: (
(d.product_count.between(50, 500)) # not too crowded
* (d.median_rating.between(3.8, 4.3)) # quality gap
* (d.median_reviews >= 200) # real demand
* (d.top_brand_share < 0.4) # no monopoly
).astype(int)
)
.sort_values("opportunity_score", ascending=False)
)
print(niche_scores)The four-condition opportunity filter surfaces niches where (1) competition is tractable, (2) existing quality is mediocre, (3) demand is real, and (4) no one brand dominates.
Step 4: How do I track BSR drift over time for shortlisted products?
Persist (asin, snapshot_date, bsr) for ongoing tracking.
import psycopg2
shortlist = df[df.bsr.notna() & (df.reviews_count >= 100)].copy()
shortlist["bsr"] = pd.to_numeric(shortlist.bsr, errors="coerce")
with psycopg2.connect(...) as conn, conn.cursor() as cur:
for _, p in shortlist.iterrows():
cur.execute(
"""INSERT INTO bsr_history (asin, title, bsr, price, snapshot_date)
VALUES (%s, %s, %s, %s, current_date)
ON CONFLICT (asin, snapshot_date) DO NOTHING""",
(p.asin, p.title, int(p.bsr), p.price)
)
print(f"Persisted BSR snapshot for {len(shortlist)} products")Daily BSR snapshots over 4-8 weeks build the time-series needed for sales-velocity estimation and seasonality detection.
Sample output
A single Amazon product record looks like this. Five rows weigh ~6 KB.
{
"asin": "B07ABC1234",
"title": "Premium Silicone Baking Mat - 16x12 inch Set of 2",
"price": "$14.99",
"rating": 4.6,
"reviews_count": 12450,
"bsr": 234,
"category": "Kitchen & Dining > Bakeware > Baking Mats",
"brand": "Acme Kitchen Co",
"image_url": "https://m.media-amazon.com/images/I/...",
"in_stock": true,
"prime_eligible": true,
"monthly_sold": "10K+ bought in past month"
}asin is Amazon's canonical natural key — stable across price, title, and image changes. bsr is the headline ranking metric; lower = better. monthly_sold is the new-in-2024 field exposing approximate units sold (Amazon previously only inferred this from BSR). prime_eligible: true indicates Amazon-fulfilled inventory which converts at higher rates than third-party-fulfilled. category provides hierarchical category context for cross-niche normalization.
Common pitfalls
Three things go wrong in product-research pipelines. BSR fluctuation noise — BSR can swing 20-30% within a single day based on hourly purchase volume; for stable analysis, average BSR across 5+ daily snapshots before treating as a niche signal. Sponsored-product contamination — Amazon's search results mix organic and sponsored listings; the actor returns both, but for organic-niche analysis filter out rows where is_sponsored: true. Brand-name normalization — sellers often spell their brand inconsistently across listings ("Acme Kitchen Co" vs "ACME Kitchen" vs "Acme"); for accurate top-brand-share computation, normalize via case-folding and a controlled vocabulary.
Thirdwatch's actor handles proxy rotation so you can focus on the data. Pair Amazon with Flipkart Scraper for cross-marketplace pricing intelligence in India and AliExpress Scraper for sourcing-cost research. A fourth subtle issue worth flagging: Amazon's "X bought in past month" field is bucketed (50+, 100+, 500+, 1K+, 5K+, 10K+) rather than exact, so for fitted BSR-to-sales-volume regressions use bucket-midpoints rather than treating the threshold as the value. A fifth pattern unique to product-research work: certain seasonal niches (holiday gifts, summer outdoor) show 5-10x BSR volatility around Q4; for niche-opportunity scoring, exclude rows scraped between October-January from your year-round opportunity model and treat seasonal-window data as a separate scoring dimension. A sixth and final pitfall: Amazon's search ranking sometimes injects 1-2 unrelated products into category searches (e.g., a knife in a "garlic press" search); for tight category aggregation, post-filter on category_path containing the expected category token before computing per-niche statistics.
Related use cases
Frequently asked questions
What signals does product research need?
Five core signals: BSR (Best Seller Rank, lower is better — top 1000 in a category indicates real volume), monthly sold (Amazon now exposes 'X bought in past month'), price, rating, and reviews_count. Cross-tabulate all five to score niche opportunities — a product with low BSR (high sales), 4.0-4.5 rating, and only 200-500 reviews is the canonical 'underserved-by-quality' opportunity.
How does this compare to Jungle Scout or Helium 10?
Jungle Scout and Helium 10 charge $50-$200/month per seat for product research with BSR-to-sales-volume estimation and trend tracking. Their estimation models are proprietary and reasonably accurate. The actor gives you raw Amazon data — you build the estimation model yourself. For high-volume operators or platform builders, raw data is more flexible; for individual sellers doing one-off research, the SaaS tools win on UX.
How accurate is BSR-to-sales conversion?
BSR moves predictably with sales velocity: top 100 in a major category = 10K+ units/month; top 1000 = 1K-3K units/month; top 10K = 100-500 units/month. The conversion factor varies by category (Books vs Electronics vs Home & Kitchen) and by season. For your own conversion model, scrape BSR + 'X bought in past month' for 200+ products per category and fit a regression.
What categories are best for niche research?
Home & Kitchen, Sports & Outdoors, Pet Supplies, and Beauty have the highest density of low-competition niches with strong demand. Electronics is competitive but high-margin; Books has thin margins but predictable demand. Avoid Apparel (size-fit complexity) and Health (regulatory complexity) for first-time operators. Best results: subcategories 3+ levels deep where Amazon's search returns under 1,000 competing products.
How do I detect underserved-by-quality niches?
An 'underserved-by-quality' niche has 200-500 products, weighted-average rating 3.8-4.2, and at least 10 products with 1000+ reviews — meaning customers buy these products but rate them mediocre. The opportunity: enter with a quality-improved variant. Filter your category snapshot by these thresholds and rank by review-count-to-rating ratio (high reviews + mediocre rating = strong demand for better).
How fresh does product research data need to be?
BSR moves hourly within a category but the structural niche signals (price band, top-3 dominant brands, average review count) are stable on weekly cadence. For seasonal niches (holiday gifts, summer outdoor gear), refresh during peak season at daily cadence; for evergreen categories (basic kitchen tools, pet supplies), monthly refresh is sufficient. Tag scrape timestamps for delta tracking.
Related
100 free credits, no credit card.
About 30 real searches. Add the MCP to Claude or Cursor in two minutes.