Skip to main content
Thirdwatchthirdwatch
Compliance & registries

Scrape MCA India Company Data for Due Diligence (2026 Guide)

Pull structured MCA company master data, directors, and capital for KYB due diligence workflows using Thirdwatch actor. CIN lookup to JSON in minutes.

May 26, 2026 · 5 min read · 1,203 words
See the scraper →

Thirdwatch's MCA India Scraper returns structured company master data from India's Ministry of Corporate Affairs registry. Pass CINs, get JSON with company status, directors with DIN, authorized and paid-up capital, registered address, RoC jurisdiction, incorporation date, and PAN. Built for compliance teams, credit risk analysts, and researchers running KYB due diligence on Indian companies.

Why scrape MCA India for due diligence

Every vendor onboarding, credit assessment, and M&A screening on an Indian company starts with the same question: is this entity real, active, and who controls it? The Ministry of Corporate Affairs registry is the canonical source. According to the MCA Annual Report 2024-25, India has over 2.5 million active registered companies, with more than 150,000 new incorporations each year.

The problem is access. MCA21 requires a paid subscription. Third-party APIs from providers like Tofler, Signzy, or Zoho gate director data behind premium plans starting at several thousand rupees per month. Manual lookups on mca.gov.in are slow, CAPTCHA-gated, and return unstructured HTML that needs manual parsing for each company.

For compliance teams running batches of 50-500 company verifications per week, the manual approach breaks down immediately. Due diligence packs for investment committees need structured, machine-readable records that flow directly into risk scoring models. The Thirdwatch MCA India actor returns that structured JSON from a CIN lookup, no subscription required, and feeds directly into CRM or compliance systems via API.

How does this compare to the alternatives?

Three paths to MCA company master data:

Approach Reliability Setup time Maintenance Scale
Manual MCA21 portal lookup CAPTCHA-gated, slow Immediate (with subscription) Per-lookup manual effort Does not scale past 10/day
Paid API (Tofler, Signzy, Zoho) High Days (contract + integration) Vendor manages High, but subscription-gated
Thirdwatch MCA India Scraper High 5 minutes Thirdwatch tracks MCA changes Pay per result, no contract

The MCA India Scraper gives you structured company data at pay-per-result pricing with no subscription lock-in.

How to scrape MCA India company data for due diligence in 4 steps

Step 1: How do I get my Apify API token?

Sign in at apify.com (free tier available, no credit card required). Navigate to Settings, then Integrations, and copy your personal API token:

export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I look up companies by CIN?

Pass an array of 21-character CINs. CIN lookups are exact-match and highly reliable. You can find a company's CIN at mca.gov.in or zaubacorp.com.

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient("apify_api_xxxxxxxxxxxxxxxx")

run = client.actor("thirdwatch/mca-india-scraper").call(
    run_input={
        "queries": [
            "L17110MH1973PLC019786",   # Reliance Industries
            "L45200MH1945PLC004520",   # Tata Consultancy Services
            "U72200KA2004PTC035289",   # Infosys BPM
        ],
        "maxResults": 10,
        "includeDirectors": True,
    }
)

items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
df = pd.DataFrame(items)
print(f"{len(df)} companies retrieved")

Three CINs, structured JSON back in under a minute. Each record includes company name, status, capital, directors, and 15+ additional fields.

Step 3: How do I flag inactive or struck-off companies?

Due diligence requires checking company status. Filter for non-active entities:

flagged = df[df["status"] != "Active"]
print(f"{len(flagged)} companies with non-Active status:")
for _, row in flagged.iterrows():
    print(f"  {row['cin']} - {row['company_name']}: {row['status']}")

# Check for companies with very low paid-up capital
df["paid_up_numeric"] = pd.to_numeric(
    df["paid_up_capital"].astype(str).str.replace(r"[^\d.]", "", regex=True),
    errors="coerce"
)
low_capital = df[df["paid_up_numeric"] < 100000]
print(f"\n{len(low_capital)} companies with paid-up capital under 1 lakh")

Status values include Active, Struck Off, Under Liquidation, Dormant, and Under Process of Striking Off. Any non-Active status is a red flag in vendor onboarding.

Step 4: How do I export results for a diligence pack?

Structure the output for compliance review:

diligence_report = df[[
    "cin", "company_name", "status", "incorporation_date",
    "authorized_capital", "paid_up_capital", "registered_address",
    "roc", "listing_status", "state", "nic_industry_code"
]].copy()

diligence_report.to_excel("mca_diligence_pack.xlsx", index=False)
print("Exported to mca_diligence_pack.xlsx")

# Directors summary
for _, row in df.iterrows():
    directors = row.get("directors", [])
    if directors:
        print(f"\n{row['company_name']} ({row['cin']}):")
        for d in directors:
            print(f"  {d.get('name', 'N/A')} - {d.get('designation', 'N/A')} (DIN: {d.get('din', 'N/A')})")

The Excel export drops directly into a compliance team's review workflow. Director DIN numbers enable cross-referencing against disqualified-director lists.

Step 5: How do I schedule recurring compliance checks?

For ongoing vendor monitoring, schedule weekly runs:

schedule = client.schedules().create(
    name="weekly-vendor-mca-check",
    cron_expression="0 6 * * 1",  # Every Monday at 6 AM
    actions=[{
        "type": "RUN_ACTOR",
        "actorId": "thirdwatch/mca-india-scraper",
        "runInput": {
            "queries": [
                "L17110MH1973PLC019786",
                "L45200MH1945PLC004520",
                "U72200KA2004PTC035289",
            ],
            "maxResults": 100,
            "includeDirectors": True,
        },
    }],
)
print(f"Schedule created: {schedule['id']}")

Weekly checks catch status changes (Active to Struck Off), director resignations, and capital modifications before they surface in downstream credit events.

Sample output

A single MCA company record looks like this:

[
  {
    "cin": "L17110MH1973PLC019786",
    "company_name": "Reliance Industries Limited",
    "status": "Active",
    "incorporation_date": "08 May, 1973",
    "company_type": "Public Limited",
    "authorized_capital": "50,000 Cr",
    "paid_up_capital": "13,532.5 Cr",
    "registered_address": "3rd Floor, Maker Chambers IV, 222, Nariman Point, Mumbai - 400021",
    "email": "investor_relations@ril.com",
    "pan": "AAACR5055K",
    "roc": "RoC-Mumbai",
    "listing_status": "Listed",
    "state": "Maharashtra",
    "state_code": "MH",
    "incorporation_year": "1973",
    "nic_industry_code": "17110",
    "registration_number": "019786",
    "directors": [
      {"name": "Mukesh Dhirubhai Ambani", "din": "00001695", "designation": "Managing Director"}
    ],
    "data_source": "mca-public",
    "query": "L17110MH1973PLC019786"
  },
  {
    "cin": "U72200KA2004PTC035289",
    "company_name": "Infosys BPM Limited",
    "status": "Active",
    "incorporation_date": "04 Aug, 2004",
    "authorized_capital": "500 Cr",
    "paid_up_capital": "338.2 Cr",
    "roc": "RoC-Bangalore",
    "listing_status": "Unlisted",
    "state": "Karnataka",
    "state_code": "KA",
    "incorporation_year": "2004",
    "nic_industry_code": "72200",
    "registration_number": "035289",
    "directors": [
      {"name": "Anantha Radhakrishnan", "din": "07453711", "designation": "Director"}
    ],
    "data_source": "mca-public",
    "query": "U72200KA2004PTC035289"
  }
]

Key fields for due diligence: status (Active vs Struck Off), directors with DIN for identity verification, authorized_capital and paid_up_capital for financial standing, and roc for jurisdiction mapping. The query field maps each result back to your input CIN.

Common pitfalls

Three issues surface repeatedly in MCA data pipelines. CIN vs company name lookups -- always prefer CINs. Company-name searches are approximate and may return incorrect matches for common names like "National Trading Company" or "ABC Industries." Find the CIN first at mca.gov.in or zaubacorp.com, then pass it to the actor for an exact match.

Stale filings -- some companies, particularly older ones or those in semi-dormant states, have not filed annual returns in years. The actor returns whatever MCA has published, which means last_agm_date or last_balance_sheet_date may be several years old. Treat missing or stale filing dates as a risk signal, not a data quality issue.

Capital interpretation -- authorized capital is the maximum the company can raise; paid-up capital is what shareholders have actually paid. A large gap between the two is normal for growth-stage companies but can be a red flag for shell entities. Always examine both fields together.

Thirdwatch's actor handles the registry access and proxy rotation so you can focus on the analysis. Pair MCA data with IBBI Insolvency India for bankruptcy checks and GST Verification for tax compliance status.

Related use cases

Frequently asked questions

Does the actor require an MCA21 subscription?

No. The actor uses publicly disclosed company master data. You do not need an MCA21 subscription, paid API key, or government portal login. Pass a CIN and receive structured JSON with master data, directors, capital, status, and registered address.

What fields does the actor return for each company?

Each record includes CIN, company name, status, incorporation date, authorized and paid-up capital, registered address, email, PAN, RoC jurisdiction, directors with DIN and designation, listing status, NIC industry code, and state. Around 20 structured fields per company.

Related

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.