Skip to main content
Thirdwatchthirdwatch
Compliance & registries

Track India Company Incorporations by State With MCA Data

Analyze Indian company incorporations by state and year using MCA registry data. CIN-based extraction yields regional business formation trend insights.

May 26, 2026 · 6 min read · 1,405 words
See the scraper →

Thirdwatch's MCA India Scraper returns structured company master data with state, incorporation year, NIC industry code, and company class extracted from every CIN. Researchers and analysts use this to map Indian company incorporations by geography, industry sector, and entity type -- turning raw registry data into regional business formation trends. Pass a list of CINs, get back full company profiles including registration status, authorized capital, directors, and RoC jurisdiction. No MCA login required.

Why track India company incorporations by state

India's company incorporation patterns reveal where economic activity is concentrating. According to the MCA Annual Report 2024-25, Maharashtra, Karnataka, and Delhi NCR account for over 45% of all new company registrations, but the growth rates in states like Telangana, Gujarat, and Rajasthan have outpaced the traditional hubs in recent years.

This data matters for three audiences. Policy researchers tracking the impact of state-level ease-of-doing-business reforms need incorporation trends as a leading indicator. Venture capital firms use state-level incorporation velocity as a proxy for startup ecosystem maturity. Economic development agencies benchmark their state against peer states to evaluate the effectiveness of investment incentives and regulatory simplification.

The CIN format itself encodes geographic data -- the 2-letter state code at positions 6-7 identifies the state of registration. Combined with the 4-digit incorporation year at positions 8-11, every CIN is a geographic-temporal data point even without a full registry lookup. The Thirdwatch actor enriches this with the complete company profile: status, capital, directors, and RoC jurisdiction.

How does this compare to the alternatives?

Approach Reliability Setup time Maintenance Geographic granularity
MCA annual reports (PDF) Authoritative but aggregated Hours (manual extraction) Annual refresh only State-level totals only
data.gov.in CSV dumps Partial, stale Hours (download + clean) Irregular updates Company-level, but incomplete fields
RBI/DPIIT publications Authoritative for FDI-linked Hours Quarterly State + sector for FDI only
Thirdwatch MCA India Scraper High, per-company detail 5 minutes Thirdwatch tracks changes Company-level with full master data

The MCA India Scraper provides company-level granularity that aggregate reports cannot match -- every record includes the specific state, year, industry code, and entity type.

How to track incorporations by state in 5 steps

Step 1: How do I source CINs for state-level analysis?

The actor requires CINs as input. For state-level research, source CIN lists from publicly available datasets:

# Option 1: data.gov.in company lists (download CSV, extract CIN column)
# Option 2: zaubacorp.com allows filtering by state + year
# Option 3: Stock exchange filings for listed companies (BSE/NSE)
# Option 4: If you have a partial dataset, the CIN structure itself
#           encodes state -- positions 6-7 are the state code

# Example: CINs from Maharashtra (MH), Karnataka (KA), Delhi (DL)
sample_cins = [
    "L17110MH1973PLC019786",   # MH - Reliance
    "L72910KA1981PLC046065",   # KA - Wipro
    "U74140DL2008PTC179845",   # DL - Example
    "L65910MH2000PLC129408",   # MH - ICICI Bank
    "U72200KA2004PTC035289",   # KA - Infosys BPM
]

For comprehensive state analysis, you need a representative CIN sample per state. The CIN state code mapping covers all Indian states and union territories.

Step 2: How do I extract and enrich the CIN data?

Pass CINs to the actor and receive full company profiles:

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient("apify_api_xxxxxxxxxxxxxxxx")

run = client.actor("thirdwatch/mca-india-scraper").call(
    run_input={
        "queries": sample_cins,
        "maxResults": 500,
        "includeDirectors": False,  # Not needed for geographic analysis
    }
)

items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
df = pd.DataFrame(items)
print(f"Retrieved {len(df)} companies across {df['state'].nunique()} states")

Setting includeDirectors to False speeds up extraction when you only need company-level geographic data.

Step 3: How do I analyze incorporations by state and year?

The actor returns state, state_code, and incorporation_year as structured fields:

# State-wise incorporation count
state_summary = (
    df.groupby(["state_code", "state"])
    .agg(
        total_companies=("cin", "count"),
        listed_count=("listing_status", lambda x: (x == "Listed").sum()),
        active_count=("status", lambda x: (x == "Active").sum()),
        avg_incorporation_year=("incorporation_year", lambda x: pd.to_numeric(x, errors="coerce").mean()),
    )
    .sort_values("total_companies", ascending=False)
    .reset_index()
)

print("State-wise company distribution:")
print(state_summary.to_string(index=False))

# Year-wise incorporation trend
year_trend = (
    df.assign(year=pd.to_numeric(df["incorporation_year"], errors="coerce"))
    .groupby("year")
    .agg(incorporations=("cin", "count"))
    .reset_index()
    .sort_values("year")
)

print("\nIncorporation trend by year:")
print(year_trend.tail(20).to_string(index=False))

The incorporation_year field is derived directly from the CIN structure, so it is always populated even when the full registry lookup returns partial data.

Step 4: How do I segment by industry and company type?

The NIC industry code and company class are embedded in every CIN:

# Industry segmentation using NIC code
# First 2 digits of nic_industry_code map to NIC sections
nic_sections = {
    "01": "Agriculture", "10": "Food Products", "13": "Textiles",
    "26": "Electronics", "29": "Motor Vehicles", "41": "Construction",
    "46": "Wholesale Trade", "47": "Retail Trade", "49": "Transport",
    "58": "Publishing", "62": "IT Services", "64": "Financial Services",
    "68": "Real Estate", "70": "Management Consulting", "72": "R&D",
    "85": "Education", "86": "Healthcare",
}

df["nic_section"] = df["nic_industry_code"].astype(str).str[:2]
df["industry"] = df["nic_section"].map(nic_sections).fillna("Other")

industry_by_state = (
    df.groupby(["state", "industry"])
    .size()
    .reset_index(name="count")
    .sort_values(["state", "count"], ascending=[True, False])
)

print("Top industries per state:")
for state in df["state"].unique():
    top = industry_by_state[industry_by_state["state"] == state].head(3)
    print(f"\n  {state}:")
    for _, row in top.iterrows():
        print(f"    {row['industry']}: {row['count']}")

# Company type distribution
type_dist = df["company_class"].value_counts()
print(f"\nCompany type distribution:")
print(f"  PLC (Public Limited): {type_dist.get('PLC', 0)}")
print(f"  PTC (Private Limited): {type_dist.get('PTC', 0)}")
print(f"  OPC (One Person): {type_dist.get('OPC', 0)}")

Maharashtra dominated by financial services, Karnataka by IT services, Gujarat by manufacturing -- the data confirms or challenges assumptions about state-level economic specialization.

Step 5: How do I visualize state-level trends?

Generate a summary visualization:

import json

# Prepare a state-level summary for visualization or export
state_analysis = (
    df.groupby(["state_code", "state"])
    .agg(
        companies=("cin", "count"),
        active_pct=("status", lambda x: round((x == "Active").mean() * 100, 1)),
        listed_pct=("listing_status", lambda x: round((x == "Listed").mean() * 100, 1)),
        median_year=("incorporation_year", lambda x: int(pd.to_numeric(x, errors="coerce").median())),
        top_industry=("nic_industry_code", lambda x: x.astype(str).str[:2].mode().iloc[0] if len(x) > 0 else "N/A"),
    )
    .sort_values("companies", ascending=False)
    .reset_index()
)

# Export as JSON for dashboard consumption
state_analysis.to_json("state_incorporation_analysis.json", orient="records", indent=2)
print(f"Exported analysis for {len(state_analysis)} states")
print(state_analysis.head(10).to_string(index=False))

The JSON export feeds directly into any dashboard tool -- Metabase, Tableau, or a custom D3.js visualization. The median_year field reveals whether a state's company base skews toward legacy firms or recent startups.

Sample output

Geographic fields from the actor output:

[
  {
    "cin": "L17110MH1973PLC019786",
    "company_name": "Reliance Industries Limited",
    "status": "Active",
    "state": "Maharashtra",
    "state_code": "MH",
    "incorporation_year": "1973",
    "nic_industry_code": "17110",
    "listing_status": "Listed",
    "company_class": "PLC",
    "company_class_description": "Public Limited Company",
    "registration_number": "019786",
    "roc": "RoC-Mumbai",
    "query": "L17110MH1973PLC019786"
  },
  {
    "cin": "U72200KA2004PTC035289",
    "company_name": "Infosys BPM Limited",
    "status": "Active",
    "state": "Karnataka",
    "state_code": "KA",
    "incorporation_year": "2004",
    "nic_industry_code": "72200",
    "listing_status": "Unlisted",
    "company_class": "PTC",
    "company_class_description": "Private Limited Company",
    "registration_number": "035289",
    "roc": "RoC-Bangalore",
    "query": "U72200KA2004PTC035289"
  }
]

The state_code, incorporation_year, nic_industry_code, listing_status, and company_class fields are always populated because they are parsed directly from the CIN structure. Even when the upstream registry lookup returns partial data, these geographic and temporal fields are guaranteed.

Common pitfalls

Three issues researchers encounter with state-level incorporation analysis. Registration state vs operating state -- a company's CIN state code reflects where it was registered with the Registrar of Companies, not necessarily where it operates. Many companies register in Maharashtra or Delhi for regulatory convenience but operate primarily in other states. For operational geography, supplement with the registered_address field or cross-reference with GST registration data which reveals place of supply.

Survivor bias in active companies -- filtering to only Active companies underestimates total incorporation activity. States with aggressive compliance enforcement (like Maharashtra's RoC-Mumbai) have higher struck-off rates, which makes their active count appear lower relative to total registrations. For accurate incorporation trend analysis, include all status categories and report separately.

NIC code granularity -- the 5-digit NIC code provides fine-grained industry classification, but many researchers only need the 2-digit section level (e.g., 62 for IT services). The full 5-digit code (e.g., 62011 for custom software development vs 62099 for other IT services) is useful for sub-sector analysis but creates sparsity in small samples.

Thirdwatch handles the registry access so you can focus on the analysis. For companies that appear in your dataset, pair with the IBBI Insolvency actor to cross-reference against insolvency proceedings.

Related use cases

Frequently asked questions

Can I filter incorporations by industry using the actor?

Yes. Every CIN contains a 5-digit NIC (National Industrial Classification) code at positions 2-6. The actor extracts this as nic_industry_code. Group by this field to segment incorporations by industry -- 62 for IT services, 47 for retail trade, 64 for financial services, and so on.

How do I get CINs for companies incorporated in a specific state?

The actor requires CINs as input. Source state-specific CIN lists from data.gov.in company datasets, RoC filings published by state registrars, or business directories like zaubacorp.com that allow filtering by state and year of incorporation.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.