Skip to main content
Thirdwatchthirdwatch
E-commerce & products

Scrape Etsy Products for Handmade Market Research (2026)

Pull Etsy listings with shop, price, rating, image, and URL using Thirdwatch's Etsy Scraper for handmade and vintage market research at scale in 2026.

May 12, 2026 · 5 min read · 1,224 words
See the scraper →

Thirdwatch's Etsy Scraper returns Etsy listings with shop name, price, rating, image, and URL across all 17 top-level categories. Built for handmade-market researchers, gifting-platform teams, craft-category analysts, and vintage-marketplace operators.

TL;DR

Etsy is the largest handmade and vintage marketplace on the consumer internet, with 90M+ active buyers and 8M+ active sellers. Most product data lives outside Etsy's gated Open API, which makes general-purpose listing research hard for anyone without shop-level credentials. Thirdwatch's Etsy Scraper takes search queries, categories, or trending mode and returns structured listings: product name, shop, price, rating, image, and URL. This guide walks through running a market-research sweep across categories, parsing the output, and turning it into a category baseline ready for analysis.

Why scrape Etsy for handmade market research

Etsy is the canonical surface for handmade, vintage, and craft commerce. According to Etsy's 2024 Annual Report, the marketplace had 90M+ active buyers and processed roughly $13B+ in annual GMS across 8M+ active sellers, with the average listing price clustered in the $15-$50 band. No other consumer marketplace concentrates this volume of long-tail, maker-economy product data in one place.

The job-to-be-done is structured. A consumer-brand researcher scopes a craft category quarterly to size the addressable market. A gifting platform ingests trending Etsy listings for editorial features. A wedding-supply marketplace baselines pricing for 50 popular wedding categories before launching a competitor product. A craft-trends analyst builds a season-over-season comparison of jewelry, home-living, and bath-beauty listings. All of these reduce to the same primitive: keyword or category queries plus per-listing field extraction, run at a cadence that matches the half-life of the signal.

What makes Etsy harder than Amazon or Walmart is that its data is not surfaced cleanly by the official API for research-only use cases, and the public site uses production-grade anti-bot tooling. The Etsy Scraper actor page handles that surface and returns clean structured data per listing.

How does this compare to alternatives?

Approach Reliability Setup time Maintenance Auth required
Etsy Open API v3 Official Days (OAuth + approval) Strict TOS Yes
DIY scraper Brittle 1-2 weeks High (anti-bot drift) No
Thirdwatch Etsy Scraper Production-tested 5 minutes Thirdwatch tracks Etsy changes Apify token only

Etsy's Open API v3 requires an approved app, OAuth credentials, and in practice shop-level access for most useful endpoints. A DIY scraper means owning anti-bot engineering forever. The Thirdwatch actor sits in the middle: public listing data delivered on transparent per-result pricing with no approval workflow.

How to scrape Etsy in 5 steps

Step 1: How do I authenticate against Apify?

Sign in at apify.com (free tier, no credit card), open Settings → Integrations, and copy your personal API token:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I run a category-baseline sweep?

For market research, the most useful starting point is a category sweep. Leave queries empty and pass a category so the actor browses the category landing page directly.

import os, requests, pandas as pd

ACTOR = "thirdwatch~etsy-scraper"
TOKEN = os.environ["APIFY_TOKEN"]

CATEGORIES = [
    "jewelry", "home-living", "wedding",
    "art-collectibles", "craft-supplies",
    "bags-purses", "vintage", "bath-beauty",
]

records = []
for cat in CATEGORIES:
    resp = requests.post(
        f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
        params={"token": TOKEN},
        json={
            "queries": [],
            "category": cat,
            "sortBy": "bestReviewed",
            "maxResults": 60,
        },
        timeout=900,
    )
    rows = resp.json()
    for r in rows:
        r["seed_category"] = cat
    records.extend(rows)

df = pd.DataFrame(records)
print(f"{len(df)} listings across {len(CATEGORIES)} categories")

Eight categories × 60 results = up to 480 listings — a clean cross-category baseline for under a few minutes of runtime.

Step 3: How do I run a focused keyword sweep within a category?

For more targeted research (e.g. "personalized birthstone rings under $30"), combine queries with category, minPrice, maxPrice, and sortBy.

KEYWORDS = [
    "personalized birthstone ring",
    "minimalist gold ring",
    "stackable silver ring",
    "promise ring couple",
]

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={
        "queries": KEYWORDS,
        "category": "jewelry",
        "subcategorySlug": "jewelry/rings",
        "minPrice": 10,
        "maxPrice": 50,
        "sortBy": "bestReviewed",
        "maxResults": 40,
    },
    timeout=900,
)
df = pd.DataFrame(resp.json())
print(df[["product_name", "shop", "price", "currency", "rating", "url"]].head(10))

The subcategorySlug field lets you walk Etsy's category tree deeper (e.g. jewelry/rings, home-living/candles-holders). Pair it with sortBy=bestReviewed to surface socially-validated listings rather than newest-only.

Step 4: How do I parse the output into a research-ready frame?

Cast prices, normalize shop names, and compute a simple quality cohort.

df["price"] = pd.to_numeric(df.price, errors="coerce")
df["rating"] = pd.to_numeric(df.rating, errors="coerce")
df["shop"] = df.shop.fillna("").str.strip()
df["discount_percent"] = pd.to_numeric(df.discount_percent, errors="coerce").fillna(0)

# Quality cohort: rated 4.5+ on listings that show a rating
quality = df[df.rating >= 4.5].copy()

# Category pricing baseline
baseline = (
    df.groupby("seed_category")
      .agg(median_price=("price", "median"),
           p10=("price", lambda s: s.quantile(0.10)),
           p90=("price", lambda s: s.quantile(0.90)),
           shop_count=("shop", "nunique"),
           listings=("url", "count"))
      .round(2)
)
print(baseline)

median_price per category is the cleanest single number for a research deck. p10 / p90 mark the budget and artisan tiers. shop_count / listings is a rough seller-concentration metric — high ratio means a fragmented maker economy, low ratio means a few power sellers.

Step 5: How do I pull Etsy's trending products?

For trend research, set trending: true and the actor will browse Etsy's featured/trending page.

resp = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
    params={"token": TOKEN},
    json={"queries": [], "trending": True, "maxResults": 80},
    timeout=900,
)
trending = pd.DataFrame(resp.json())
print(trending[["product_name", "shop", "price", "url"]].head(20))

Sample output

A single Etsy listing record looks like this. Three rows weigh about 3 KB.

[
  {
    "product_id": "1452893477",
    "product_name": "Personalized Birthstone Ring, Custom Name Ring, Gold Filled",
    "shop": "RubyAndOakStudio",
    "price": 28.5,
    "original_price": 57.0,
    "discount_percent": 50.0,
    "currency": "USD",
    "rating": 4.9,
    "rating_count": null,
    "image_url": "https://i.etsystatic.com/14938474/r/il/abcd/...",
    "url": "https://www.etsy.com/listing/1452893477/personalized-birthstone-ring-custom-name",
    "source_query": "personalized birthstone ring"
  },
  {
    "product_id": "1378239014",
    "product_name": "Minimalist 14k Gold Stacking Ring Set of 3",
    "shop": "AuriaJewelryCo",
    "price": 32.0,
    "original_price": null,
    "discount_percent": 0,
    "currency": "USD",
    "rating": 4.8,
    "image_url": "https://i.etsystatic.com/27361902/r/il/efgh/...",
    "url": "https://www.etsy.com/listing/1378239014/minimalist-14k-gold-stacking-ring",
    "source_query": "minimalist gold ring"
  }
]

shop is the closest analog to a brand in Etsy's seller-driven model. discount_percent is computed from original_price and price when both are present — useful for identifying listings actively running a sale.

Common pitfalls

Four things go wrong in Etsy research pipelines. First, currency-display variance — Etsy shows listings in the viewer's local currency by default. The actor captures the parsed currency symbol so you can filter to USD listings before computing baselines. Second, rating availability — newer listings and lightly-trafficked shops often show no rating; treat rating as optional and use notna() filters before computing category averages. Third, shop-name collisions — two shops can have very similar names; always key on shop plus listing URL, never on shop name alone. Fourth, category-vs-keyword scope confusion — passing both a query and a category searches Etsy's full graph with the category words prepended; if you want strict in-category browsing, leave the query empty and rely on category plus subcategorySlug.

Thirdwatch's actor handles the production-grade anti-bot surface, page rendering, listing-card extraction, and the homepage-warmup behaviour Etsy expects. You pay per result and the actor tracks Etsy's listing-DOM changes so your pipeline does not break the next time Etsy ships a redesign.

Related use cases

Frequently asked questions

Why scrape Etsy for market research?

Etsy is the canonical surface for handmade, vintage, and craft commerce. According to Etsy's 2024 Annual Report, the marketplace has 90M+ active buyers and processes $13B+ in annual GMS. For consumer researchers, craft-category analysts, and gifting platforms, Etsy is the leading source of long-tail handmade product data unavailable on Amazon or mass marketplaces.

What fields does the Etsy Scraper return?

Per listing: product_name, shop (seller), price, original_price, discount_percent, currency, rating, image_url, url, and source_query. Categories like jewelry, home-living, wedding, and vintage are supported via the category input. The actor extracts data from rendered Etsy listing cards using the page's embedded structure.

Can I research a specific Etsy category without keywords?

Yes. Leave queries empty and pass category=jewelry (or any of 17 top-level categories) plus an optional subcategorySlug like 'jewelry/rings'. The actor will browse the category landing page directly. This is the cleanest way to baseline an entire vertical without bias from keyword choice.

How does this compare to the Etsy Open API?

Etsy's Open API v3 requires OAuth, app approval, and shop-level access for most useful endpoints. Listing-search is rate-limited and gated. Thirdwatch's actor delivers public listing data on transparent per-result pricing with no approval workflow, which is the right shape for cross-shop market research.

How fresh does handmade research data need to be?

Etsy's product graph moves slowly compared to fast-fashion or electronics. Weekly cadence covers new-listing discovery and trend tracking. For wedding and holiday Q4 research, run a 3-day cadence inside the seasonal window. For long-term category baselines, monthly snapshots compounded over a year give the strongest trend signal.

Can I filter Etsy results by price range?

Yes. Pass minPrice and maxPrice (USD integers) and the actor appends Etsy's min and max query parameters. Combine with sortBy=priceAsc or priceDesc to walk a price band cleanly. This is how researchers isolate the under-$25 gift segment from the $200+ artisan-jewelry tier in a single category.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.