Compliance & registries

Extract SEC EDGAR Financials for Stock Screening (2026)

Build a programmable stock screener from SEC EDGAR financials. Filter by margins, growth, and R&D intensity using Thirdwatch SEC EDGAR Scraper. Python.

May 26, 2026 · 6 min read · 1,412 words

See the scraper →

Thirdwatch's SEC EDGAR Scraper extracts structured financials from EDGAR filings -- revenue, net income, EPS, gross profit, R&D expense, total assets, stockholders' equity, cash, and multi-year history -- so you can build programmable stock screeners that filter on any metric or composite score. Pay per result, no terminal subscription. Built for quant researchers, retail investors building systematic strategies, and developers creating screening tools.

Why build a stock screener from SEC EDGAR data

Free stock screeners (Yahoo Finance, Finviz, TradingView) offer pre-built filters but constrain you to their metric definitions, update cadence, and screening logic. A researcher testing whether companies with R&D intensity above 20% and revenue growth above 30% outperform the market cannot express that compound filter on most free tools. According to S&P Global's 2025 quantitative investing survey, 68% of systematic investment strategies now incorporate fundamental data from SEC filings as a primary signal layer -- yet most individual researchers still screen through GUIs that cannot handle multi-factor composite queries.

The bottleneck is structured access. EDGAR contains the authoritative numbers but returns them as HTML and XBRL documents. The Thirdwatch actor extracts revenue, net_income, eps, gross_profit, operating_income, rd_expense, total_assets, stockholders_equity, cash, revenue_history, and net_income_history as typed JSON fields. From there, any screening criterion becomes a DataFrame filter: gross margin above 60%, revenue CAGR above 25% over three years, R&D-to-revenue above 15%, net-cash positive, EPS growth quarter over quarter.

This approach unlocks three capabilities that GUI screeners cannot match: custom composite scoring (weight multiple financial signals into a single rank), backtest-ready datasets (historical filings produce time-series screening results), and pipeline integration (feed screener output into portfolio construction, alert systems, or ML models).

How does this compare to the alternatives?

Four approaches to financial stock screening:

Approach	Custom metrics	Historical screening	API access	Update latency
Free screeners (Yahoo, Finviz)	Limited preset filters	No	Unofficial/fragile	Days to weeks after filing
Premium screeners (Koyfin, Tikr)	Moderate	Limited	Some	1-3 days after filing
Data terminal (Bloomberg, FactSet)	Full	Full	Yes	Same day
Thirdwatch SEC EDGAR Scraper	Full -- any computed metric	Full -- historical filings available	Yes -- Apify API	Minutes after EDGAR acceptance

The SEC EDGAR Scraper gives you terminal-grade data flexibility at pay-per-result pricing, without a terminal subscription.

How to build a stock screener in 5 steps

Step 1: How do I set up the screening environment?

Install dependencies and configure your token.

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"
pip install apify-client pandas numpy

Step 2: How do I extract financials for a broad universe?

Pull the latest 10-K filings for a sector or index constituent list.

from apify_client import ApifyClient
import os, pandas as pd, numpy as np

client = ApifyClient(os.environ["APIFY_TOKEN"])

# Large-cap tech universe for screening
UNIVERSE = [
    "AAPL", "MSFT", "NVDA", "GOOGL", "META", "AMZN", "TSLA",
    "CRM", "ORCL", "ADBE", "INTC", "AMD", "AVGO", "QCOM",
    "NOW", "SNOW", "DDOG", "NET", "PANW", "CRWD",
    "ZS", "MDB", "TEAM", "SHOP", "SQ",
]

all_items = []
for i in range(0, len(UNIVERSE), 5):
    batch = UNIVERSE[i:i+5]
    run = client.actor("thirdwatch/sec-edgar-scraper").call(run_input={
        "queries": batch,
        "filingType": "10-K",
        "includeFinancials": True,
        "maxResults": 1,
    })
    items = client.dataset(run["defaultDatasetId"]).list_items().items
    all_items.extend(items)

df = pd.DataFrame(all_items)
print(f"Screening universe: {len(df)} companies")

Step 3: How do I compute screening metrics?

Derive the financial ratios that form your screening criteria.

# Core profitability metrics
df["gross_margin"] = df["gross_profit"] / df["revenue"]
df["operating_margin"] = df["operating_income"] / df["revenue"]
df["net_margin"] = df["net_income"] / df["revenue"]
df["rd_intensity"] = df["rd_expense"] / df["revenue"]
df["roe"] = df["net_income"] / df["stockholders_equity"]

# Growth from revenue_history
def compute_cagr(history, years=2):
    if not history or len(history) < years + 1:
        return None
    end_val = history[0]["value"]
    start_val = history[years]["value"]
    if start_val <= 0:
        return None
    return (end_val / start_val) ** (1 / years) - 1

df["revenue_cagr_2y"] = df["revenue_history"].apply(lambda h: compute_cagr(h, 2))

# Balance sheet strength
df["net_cash"] = df["cash"] - (df["total_assets"] - df["stockholders_equity"])
df["cash_ratio"] = df["cash"] / df["revenue"]

Step 4: How do I apply multi-factor screening filters?

Define your screening criteria and filter the universe.

# Screen: high-growth, high-margin, R&D-intensive tech companies
screen = df[
    (df["revenue_cagr_2y"] > 0.20) &          # >20% 2-year revenue CAGR
    (df["gross_margin"] > 0.60) &               # >60% gross margin
    (df["rd_intensity"] > 0.15) &               # >15% R&D intensity
    (df["operating_margin"] > 0) &              # Positive operating income
    (df["cash"] > 1e9)                          # >$1B cash
].copy()

# Composite score: weighted rank across metrics
for col in ["revenue_cagr_2y", "gross_margin", "operating_margin", "rd_intensity"]:
    screen[f"{col}_rank"] = screen[col].rank(ascending=True, pct=True)

screen["composite_score"] = (
    screen["revenue_cagr_2y_rank"] * 0.35 +
    screen["gross_margin_rank"] * 0.25 +
    screen["operating_margin_rank"] * 0.25 +
    screen["rd_intensity_rank"] * 0.15
)

result = screen[["company_name", "ticker", "revenue", "revenue_cagr_2y",
                  "gross_margin", "operating_margin", "rd_intensity",
                  "composite_score"]].sort_values("composite_score", ascending=False)
print(result.to_string(index=False))

Step 5: How do I add quarterly momentum signals?

Layer in recent 10-Q filings to detect quarter-over-quarter acceleration or deceleration.

# Pull latest 10-Q for the screened companies
screened_tickers = screen["ticker"].tolist()

run = client.actor("thirdwatch/sec-edgar-scraper").call(run_input={
    "queries": screened_tickers,
    "filingType": "10-Q",
    "includeFinancials": True,
    "maxResults": 2,  # Last 2 quarters
})

q_items = client.dataset(run["defaultDatasetId"]).list_items().items
q_df = pd.DataFrame(q_items)

# Quarter-over-quarter revenue momentum
for ticker in screened_tickers:
    quarters = q_df[q_df["ticker"] == ticker].sort_values("period_of_report", ascending=False)
    if len(quarters) >= 2:
        curr_rev = quarters.iloc[0]["revenue"]
        prev_rev = quarters.iloc[1]["revenue"]
        if prev_rev and prev_rev > 0:
            qoq = (curr_rev / prev_rev - 1) * 100
            print(f"{ticker}: Q/Q revenue change {qoq:+.1f}%")

Sample output

Three records from a screening extraction. Each record weighs approximately 2 KB.

[
  {
    "company_name": "CrowdStrike Holdings, Inc.",
    "cik": "1535527",
    "ticker": "CRWD",
    "filing_type": "10-K",
    "filed_date": "2026-03-10",
    "period_of_report": "2026-01-31",
    "url": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=1535527",
    "revenue": 4440000000,
    "net_income": 620000000,
    "total_assets": 8930000000,
    "eps": 2.53,
    "operating_income": 480000000,
    "gross_profit": 3330000000,
    "rd_expense": 888000000,
    "stockholders_equity": 4120000000,
    "cash": 3210000000,
    "revenue_history": [
      {"period": "FY2026", "value": 4440000000},
      {"period": "FY2025", "value": 3954000000},
      {"period": "FY2024", "value": 3055000000}
    ],
    "net_income_history": [
      {"period": "FY2026", "value": 620000000},
      {"period": "FY2025", "value": 89000000},
      {"period": "FY2024", "value": -183000000}
    ]
  },
  {
    "company_name": "Palo Alto Networks, Inc.",
    "cik": "1327567",
    "ticker": "PANW",
    "filing_type": "10-K",
    "filed_date": "2025-09-12",
    "period_of_report": "2025-07-31",
    "url": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=1327567",
    "revenue": 9120000000,
    "net_income": 1850000000,
    "total_assets": 19300000000,
    "eps": 5.41,
    "operating_income": 1690000000,
    "gross_profit": 6840000000,
    "rd_expense": 1824000000,
    "stockholders_equity": 5670000000,
    "cash": 2890000000,
    "revenue_history": [
      {"period": "FY2025", "value": 9120000000},
      {"period": "FY2024", "value": 8004000000},
      {"period": "FY2023", "value": 6893000000}
    ],
    "net_income_history": [
      {"period": "FY2025", "value": 1850000000},
      {"period": "FY2024", "value": 467000000},
      {"period": "FY2023", "value": 440000000}
    ]
  },
  {
    "company_name": "MongoDB, Inc.",
    "cik": "1441816",
    "ticker": "MDB",
    "filing_type": "10-K",
    "filed_date": "2026-03-14",
    "period_of_report": "2026-01-31",
    "url": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=1441816",
    "revenue": 2280000000,
    "net_income": 180000000,
    "total_assets": 4560000000,
    "eps": 2.46,
    "operating_income": 120000000,
    "gross_profit": 1710000000,
    "rd_expense": 638000000,
    "stockholders_equity": 2340000000,
    "cash": 1920000000,
    "revenue_history": [
      {"period": "FY2026", "value": 2280000000},
      {"period": "FY2025", "value": 1921000000},
      {"period": "FY2024", "value": 1683000000}
    ],
    "net_income_history": [
      {"period": "FY2026", "value": 180000000},
      {"period": "FY2025", "value": -125000000},
      {"period": "FY2024", "value": -175000000}
    ]
  }
]

All three pass a "high-growth, R&D-intensive, recently profitable" screen. revenue_history confirms multi-year growth trajectories. net_income_history shows the profitability inflection point -- CrowdStrike and MongoDB both turned profitable in FY2026, which would trigger a screen for "newly profitable" companies.

Common pitfalls

Four issues that produce false screening signals. Survivorship bias -- screening only current constituents of an index misses companies that were removed due to decline. For backtesting, you need historical index membership lists. One-time items distort margins -- asset write-downs, restructuring charges, and litigation settlements appear in operating_income and net_income. A company showing a sudden margin collapse may have taken a one-time charge rather than experiencing structural decline. Check filing_type for 8-K filings around the same date. Stale filings in the screen -- a company with a January fiscal year-end has a 10-K from March while a December fiscal year-end company has one from February. If you screen in May, the January-FY company has 4-month-old data while the December-FY company has 5-month-old data. Always check period_of_report freshness. Revenue recognition differences -- SaaS companies recognizing revenue ratably versus upfront produce different margin profiles even at similar business economics. Comparing gross margins across different revenue recognition methods without adjustment creates false signals.

For production screeners, normalize all metrics to trailing-twelve-month (TTM) values by summing the last four quarters from 10-Q filings rather than relying on annual 10-K data alone. Pair with our Google News Scraper to add sentiment signals, or combine with Amazon Scraper product data for consumer-facing companies where product review velocity correlates with revenue acceleration. See also scraping India government tenders for another compliance-data extraction pattern.

Related use cases

Frequently asked questions

Why use SEC EDGAR data for stock screening instead of Yahoo Finance or Finviz?

Free screeners like Yahoo Finance and Finviz derive their financial data from SEC filings but apply their own normalization, which can lag filing dates by days to weeks and may miss restated figures. Building directly from EDGAR gives you same-day access to filed numbers, full control over which metrics to screen on (including R&D expense and stockholders' equity that some screeners omit), and the ability to construct custom composite scores. The tradeoff is engineering effort -- which the Thirdwatch actor eliminates by returning structured fields from XBRL filings.

How do I screen across different fiscal year-end dates?

Use the period_of_report field to normalize screening windows. For a calendar Q4 2025 screen, filter for period_of_report between 2025-10-01 and 2026-01-31 to capture companies with October, November, December, and January fiscal year-ends. This ensures you compare the most recent complete fiscal period for each company rather than mixing stale and fresh data.

Scrape SEC EDGAR Filings for Financial Research (2026 Guide)Build a SEC Filing Database for Investment Analysis (Python)Monitor SEC EDGAR 10-K Filings for Competitive Analysis

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.