Skip to main content
Thirdwatchthirdwatch
Compliance & registries

Scrape SEC EDGAR Filings for Financial Research (2026 Guide)

Extract SEC EDGAR filings, 10-K/10-Q financials, and company data at scale using Thirdwatch's SEC EDGAR Scraper. Step-by-step research workflow inside.

May 26, 2026 · 5 min read · 1,162 words
See the scraper →

Thirdwatch's SEC EDGAR Scraper extracts US public company filings with structured financials -- revenue, net income, EPS, total assets, and historical trends -- directly from EDGAR's XBRL-tagged documents. Pay per result, no terminal subscription. Built for equity researchers, academic finance teams, compliance analysts, and anyone who needs filing-level data without a Bloomberg login.

Why use SEC EDGAR for financial research

The SEC's EDGAR database contains every filing made by US public companies since 1993 -- over 21 million documents covering 10-Ks, 10-Qs, 8-Ks, proxy statements, and insider transactions. According to the SEC's 2025 annual report, EDGAR processes more than 3,000 new filings per business day, making it the single most comprehensive source of US corporate disclosure data on the planet.

The research problem is access, not availability. EDGAR's full-text search returns raw HTML documents that require manual parsing to extract structured financial data. A researcher comparing gross margins across 50 semiconductor companies needs to open 50 separate 10-K filings, locate the income statement in each, and transcribe the numbers into a spreadsheet. The Thirdwatch actor automates this entire pipeline -- pass in company names or tickers, specify the filing type, and receive structured JSON with revenue, net income, total assets, EPS, operating income, R&D expense, and multi-year history fields already extracted from XBRL tags.

This matters for three research workflows: cross-sectional financial comparison (comparing metrics across companies at a point in time), longitudinal trend analysis (tracking one company's financials over quarters or years), and event-driven research (detecting restatements, material events, or earnings surprises from 8-K filings).

How does this compare to the alternatives?

Four options for structured SEC filing data:

Approach Coverage Structured financials Setup time Ongoing cost
Manual EDGAR full-text search Complete but unstructured No -- must parse HTML/XBRL yourself Minutes Free but hours of manual work
SEC EDGAR XBRL API (direct) XBRL filings only (post-2009) Yes but requires taxonomy mapping Days to weeks Free but high engineering cost
Commercial terminal (Bloomberg, Refinitiv) Complete + global Yes Weeks Enterprise contract
Thirdwatch SEC EDGAR Scraper Complete US filings Yes -- revenue, net income, EPS, assets, history Minutes Pay per result

The direct XBRL API is free but requires mapping hundreds of GAAP taxonomy concepts to human-readable field names -- a non-trivial engineering project. The SEC EDGAR Scraper actor page handles that mapping and returns clean, researcher-ready JSON.

How to scrape SEC EDGAR filings in 5 steps

Step 1: How do I set up my Apify API token?

Create a free account at apify.com, navigate to Settings, then Integrations, and copy your personal API token. Store it as an environment variable:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"
pip install apify-client

Step 2: How do I extract 10-K filings for a list of companies?

Pass company names or ticker symbols as queries, set filingType to target specific report types, and use includeFinancials to get structured income-statement and balance-sheet data.

from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_TOKEN"])

run = client.actor("thirdwatch/sec-edgar-scraper").call(run_input={
    "queries": ["Apple", "Microsoft", "NVDA", "GOOGL", "META"],
    "maxResults": 5,
    "filingType": "10-K",
    "includeFinancials": True,
})

items = client.dataset(run["defaultDatasetId"]).list_items().items
for item in items:
    print(f"{item['company_name']} ({item['ticker']}) -- "
          f"Revenue: {item.get('revenue')}, Net Income: {item.get('net_income')}")

Step 3: How do I filter filings by date range?

Use dateFrom and dateTo to narrow results to a specific fiscal period. This is essential for quarterly research windows or event-driven analysis.

run = client.actor("thirdwatch/sec-edgar-scraper").call(run_input={
    "queries": ["Tesla", "Rivian", "Lucid Motors"],
    "filingType": "10-Q",
    "dateFrom": "2025-01-01",
    "dateTo": "2025-12-31",
    "includeFinancials": True,
    "maxResults": 10,
})

items = client.dataset(run["defaultDatasetId"]).list_items().items
for item in items:
    print(f"{item['company_name']} | {item['filing_type']} | "
          f"Period: {item['period_of_report']} | Filed: {item['filed_date']}")

Step 4: How do I build a cross-sectional financial comparison?

Pull 10-K filings for an entire sector and compute comparative metrics.

import pandas as pd

SEMICONDUCTORS = ["NVDA", "AMD", "INTC", "AVGO", "QCOM",
                  "TXN", "MU", "MRVL", "AMAT", "LRCX"]

run = client.actor("thirdwatch/sec-edgar-scraper").call(run_input={
    "queries": SEMICONDUCTORS,
    "filingType": "10-K",
    "dateFrom": "2025-01-01",
    "includeFinancials": True,
    "maxResults": 1,
})

items = client.dataset(run["defaultDatasetId"]).list_items().items
df = pd.DataFrame(items)
df["gross_margin"] = df["gross_profit"] / df["revenue"]
df["rd_intensity"] = df["rd_expense"] / df["revenue"]
df["roe"] = df["net_income"] / df["stockholders_equity"]

print(df[["company_name", "ticker", "revenue", "gross_margin",
          "rd_intensity", "roe"]].sort_values("revenue", ascending=False))

Step 5: How do I track multi-year revenue trends?

The revenue_history and net_income_history fields contain year-over-year data extracted from XBRL comparative statements.

run = client.actor("thirdwatch/sec-edgar-scraper").call(run_input={
    "queries": ["AAPL"],
    "filingType": "10-K",
    "includeFinancials": True,
    "maxResults": 1,
})

item = client.dataset(run["defaultDatasetId"]).list_items().items[0]
for year_data in item.get("revenue_history", []):
    print(f"  {year_data['period']}: {year_data['value']:,.0f}")

Sample output

Two records from a 10-K extraction run. Each record weighs approximately 2 KB.

[
  {
    "company_name": "NVIDIA Corporation",
    "cik": "1045810",
    "ticker": "NVDA",
    "filing_type": "10-K",
    "filed_date": "2026-02-26",
    "period_of_report": "2026-01-26",
    "url": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=1045810",
    "revenue": 130497000000,
    "net_income": 72880000000,
    "total_assets": 112198000000,
    "eps": 2.94,
    "operating_income": 81447000000,
    "gross_profit": 97862000000,
    "rd_expense": 12893000000,
    "stockholders_equity": 65899000000,
    "cash": 8495000000,
    "revenue_history": [
      {"period": "FY2026", "value": 130497000000},
      {"period": "FY2025", "value": 60922000000},
      {"period": "FY2024", "value": 26974000000}
    ],
    "net_income_history": [
      {"period": "FY2026", "value": 72880000000},
      {"period": "FY2025", "value": 29760000000},
      {"period": "FY2024", "value": 12285000000}
    ]
  },
  {
    "company_name": "Apple Inc.",
    "cik": "320193",
    "ticker": "AAPL",
    "filing_type": "10-K",
    "filed_date": "2025-11-01",
    "period_of_report": "2025-09-27",
    "url": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=320193",
    "revenue": 391035000000,
    "net_income": 93736000000,
    "total_assets": 364980000000,
    "eps": 6.08,
    "operating_income": 118658000000,
    "gross_profit": 170782000000,
    "rd_expense": 31370000000,
    "stockholders_equity": 56950000000,
    "cash": 29943000000,
    "revenue_history": [
      {"period": "FY2025", "value": 391035000000},
      {"period": "FY2024", "value": 383285000000},
      {"period": "FY2023", "value": 383933000000}
    ],
    "net_income_history": [
      {"period": "FY2025", "value": 93736000000},
      {"period": "FY2024", "value": 93736000000},
      {"period": "FY2023", "value": 96995000000}
    ]
  }
]

revenue, net_income, and total_assets are the primary quantitative fields. revenue_history and net_income_history enable trend analysis without pulling multiple filings. cik is the canonical SEC identifier for cross-referencing with other EDGAR datasets.

Common pitfalls

Three issues that derail SEC EDGAR research projects. Fiscal year misalignment -- not all companies end their fiscal year on December 31. Apple's fiscal year ends in late September; Nvidia's ends in late January. When comparing annual figures across companies, always check period_of_report rather than assuming calendar-year alignment. XBRL coverage gaps -- filings before 2009 lack XBRL tagging, so structured financial extraction returns nulls for older documents. For pre-2009 research, you will need to parse the raw HTML filing text. Amended filings -- companies file 10-K/A and 10-Q/A amendments that supersede the original. If you are building a point-in-time dataset, filter for the latest filing per company per period rather than taking all results at face value.

The actor handles XBRL taxonomy mapping and CIK-to-ticker resolution automatically. For large-scale research across hundreds of companies, batch your queries into groups of 20-30 to keep individual run times manageable. Pair SEC filings with our Google News Scraper to correlate filing dates with media coverage for event-study research.

Related use cases

Frequently asked questions

Why scrape SEC EDGAR instead of using a commercial financial data provider?

SEC EDGAR is the authoritative, zero-cost primary source for all US public company filings. Commercial providers like Bloomberg, Refinitiv, and S&P Capital IQ license and repackage this same data at five- to six-figure annual contracts. For researchers who need filing-level detail (full 10-K text, exhibit cross-references, amendment history), EDGAR is the only source that preserves the original document structure. The Thirdwatch actor extracts structured financials directly from XBRL-tagged filings, giving you revenue, net income, EPS, and balance-sheet fields without a terminal subscription.

How current is the data from SEC EDGAR?

EDGAR indexes new filings within minutes of acceptance. Most 10-K annual reports appear within 60 days of fiscal year-end; 10-Q quarterlies within 40 days of quarter-end; 8-K current reports within 4 business days of a material event. The actor pulls live from EDGAR at request time, so you always get the latest available filings. For time-sensitive research (earnings surprises, restatements, executive departures), schedule daily runs filtered to 8-K filings.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.