Scrape Craigslist Listings for Market Research (2026 Guide)
Pull Craigslist jobs, housing, and car listings across any US city. Get title, price, location, and full description as structured JSON via the Apify API.

Thirdwatch's Craigslist Scraper extracts structured listing data from any Craigslist city and category — jobs, apartments, cars, services, furniture, and more. Returns title, price, location, neighborhood, full description, posting date, and direct URL. Built for market researchers, real estate analysts, used-car dealers, and anyone who needs Craigslist data in a structured format without writing a custom parser.
Why scrape Craigslist for market research
Craigslist remains one of the largest classified-ad platforms in the United States. According to SimilarWeb traffic data, the site draws over 250 million monthly visits across hundreds of US metro areas. It covers job postings, rental listings, used vehicles, services, and general merchandise — a dataset that reflects real local-market pricing with minimal intermediary distortion.
The research problem is access. Craigslist has no public API. Its RSS feeds cap at 100 items per category with no search filtering, no price fields, and no neighborhood data. Manual copy-paste across 400+ city subdomains is not viable for any systematic analysis. Market researchers tracking rental price trends across 10 metros, used-car dealers monitoring competitor pricing, and job-market analysts benchmarking local wages all need the same thing: structured, machine-readable Craigslist data pulled on a schedule. The actor is that extraction layer.
How does this compare to the alternatives?
Three paths to getting Craigslist data into a research pipeline:
| Approach | Reliability | Setup time | Maintenance |
|---|---|---|---|
| DIY Python + BeautifulSoup | Brittle; Craigslist changes layout periodically | 2-4 days | You own the parser and rate-limit logic |
| Craigslist RSS feeds | Limited to 100 items, no price/neighborhood | 30 minutes | Minimal, but data is incomplete |
| Thirdwatch Craigslist Scraper | Maintained against layout changes | 5 minutes | Thirdwatch tracks Craigslist changes |
The DIY route works until Craigslist changes its HTML structure or tightens rate limits — both happen without notice. RSS feeds are free but structurally limited: no keyword search within categories, no price extraction, no neighborhood field. The Craigslist Scraper returns 8 structured fields per listing out of the box and handles pagination automatically.
How to scrape Craigslist listings in 4 steps
Step 1: How do I set up my Apify token?
Sign up at apify.com (free tier, no credit card required). Navigate to Settings, then Integrations, and copy your personal API token. Every example below assumes it lives in APIFY_TOKEN:
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"Step 2: How do I pull listings by city, category, and keyword?
Pass a Craigslist city subdomain, a category code, and an optional query string. The actor returns structured records for each listing.
import os, requests, pandas as pd
ACTOR = "thirdwatch~craigslist-scraper"
TOKEN = os.environ["APIFY_TOKEN"]
resp = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
params={"token": TOKEN},
json={
"city": "sfbay",
"category": "apa",
"query": "2br renovated",
"maxResults": 50,
},
timeout=300,
)
df = pd.DataFrame(resp.json())
print(f"{len(df)} listings in SF Bay apartments matching '2br renovated'")Common category codes: sof (software jobs), jjj (all jobs), apa (apartments/housing for rent), cta (cars/trucks by owner), gms (general merchandise for sale), bik (bikes), fuo (furniture by owner), ret (retail/wholesale jobs), tra (skilled trades), hea (healthcare jobs). The full list is in the URL path when browsing any Craigslist category page. You can also check the Craigslist site map for all available city subdomains.
Step 3: How do I compare listings across multiple cities?
Run separate requests per city and merge the results. This example compares apartment prices in San Francisco, New York, and Chicago:
METROS = [
{"city": "sfbay", "label": "SF Bay"},
{"city": "newyork", "label": "New York"},
{"city": "chicago", "label": "Chicago"},
]
frames = []
for metro in METROS:
resp = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items",
params={"token": TOKEN},
json={
"city": metro["city"],
"category": "apa",
"maxResults": 100,
},
timeout=300,
)
data = resp.json()
for item in data:
item["metro"] = metro["label"]
frames.append(pd.DataFrame(data))
all_listings = pd.concat(frames, ignore_index=True)
print(all_listings.groupby("metro")["price"].describe())Step 4: How do I schedule recurring pulls for ongoing research?
Set up an Apify schedule to run the actor daily or weekly. Each completed run fires a webhook you can route to a database, Google Sheet, or analytics pipeline.
curl -X POST "https://api.apify.com/v2/schedules?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "craigslist-sfbay-apartments-daily",
"cronExpression": "0 8 * * *",
"timezone": "America/Los_Angeles",
"isEnabled": true,
"actions": [{
"type": "RUN_ACTOR",
"actorId": "thirdwatch~craigslist-scraper",
"runInput": {
"city": "sfbay",
"category": "apa",
"maxResults": 100
}
}]
}'Add an ACTOR.RUN.SUCCEEDED webhook to push results into your data warehouse automatically. Webhooks fire within seconds of run completion, so your research pipeline stays current without polling the API for run status.
Sample output
A single record from the dataset for a San Francisco apartment listing. Five rows of this shape weigh under 5 KB.
{
"title": "2BR Apartment in Downtown - Renovated",
"price": "$2,200",
"location": "San Francisco",
"neighborhood": "Downtown / Financial District",
"description": "Beautiful renovated 2-bedroom apartment in the heart of downtown. Hardwood floors, in-unit laundry, rooftop access. Pet-friendly. Available June 1.",
"posted_date": "2026-05-20",
"url": "https://sfbay.craigslist.org/sfc/apa/d/san-francisco-2br-apartment/7812345678.html"
}price is the raw string from the listing including currency symbol. neighborhood is the poster's self-reported sub-location — useful for geo analysis but not standardized. description is the full text body, typically 100-500 words. posted_date is in ISO format. url is a direct link to the live listing on Craigslist. For job listings, the actor also extracts company when the poster includes employer information in the structured schema.
Common pitfalls
Three issues trip up Craigslist research pipelines. Price parsing — price arrives as a formatted string ("$2,200", "$14/hr", sometimes blank). Strip the currency symbol and convert to numeric before aggregating, and watch for hourly vs monthly vs flat-price mixing in job listings. Listing expiration — Craigslist posts expire after 7-45 days depending on category. If you store URLs for later enrichment, some will 404 within weeks. Capture the full record at scrape time. Neighborhood inconsistency — neighborhood is free-text entered by the poster. "Downtown SF", "FiDi", and "Financial District" all refer to the same area. Normalize with a lookup table or fuzzy matching before spatial analysis.
The actor handles pagination, rate limiting, and Craigslist's HTML parsing so you get clean structured records. For large multi-city sweeps, run each city as a separate Apify run and merge downstream. If you need to enrich Craigslist location data with coordinates and reviews, pair it with Google Maps Scraper for complete local-market intelligence.
Downstream enrichment and integration
Craigslist data becomes more powerful when combined with other sources. Housing researchers can cross-reference apartment listings with Google Maps data to add neighborhood walkability scores, nearby transit stations, and business density metrics. Job-market analysts can compare Craigslist wage data against Indeed job listings or LinkedIn job posts to understand how classified-ad compensation stacks up against formal job-board salaries in the same metro.
For used-vehicle research, Craigslist car listings paired with eBay marketplace data reveal cross-platform pricing gaps that arbitrage dealers exploit. The key integration pattern is consistent: pull Craigslist data via the actor, normalize fields (especially price and location), and join against structured datasets from other platforms on shared dimensions like metro area, category, and price range.
All of these workflows start with the same structured JSON output from the Craigslist Scraper, routed through a scheduling layer and into your analytics pipeline or database. The consistent field schema across all Craigslist categories means one ingestion pipeline handles jobs, apartments, cars, and merchandise without category-specific parsing logic on your end.
Related use cases
Frequently asked questions
Is it legal to scrape Craigslist?
Craigslist publishes listings on the open web. The actor accesses only publicly visible data — titles, prices, locations, descriptions — without logging in. Always comply with your jurisdiction's data-use laws and Craigslist's terms of service before building a production pipeline.
How do I find the right Craigslist city code?
Look at the subdomain in a Craigslist URL. For example, sfbay.craigslist.org uses the code sfbay, newyork.craigslist.org uses newyork, and losangeles.craigslist.org uses losangeles. Pass that string as the city input parameter.
Which categories does the actor support?
Every Craigslist category code works. Common ones include sof for software jobs, jjj for all jobs, apa for apartments, cta for cars and trucks, and gms for general merchandise. The code appears in the URL path when you browse Craigslist.
How fresh is the data?
The actor pulls listings live from Craigslist at run time. Results reflect whatever is currently posted. Craigslist listings expire and get flagged, so running on a schedule captures data before it disappears.
Can I scrape multiple cities in one run?
Each run targets one city. To cover multiple metros, trigger separate runs per city — either sequentially in a script or via parallel Apify schedules. The output datasets can be merged downstream in Python or a database.
Related
100 free credits, no credit card.
About 30 real searches. Add the MCP to Claude or Cursor in two minutes.