Compliance & registries

Build an eProcure Tender Database for Your Sales Pipeline

Build a searchable tender database from India's eProcure CPPP portal. Feed government contract data into your sales pipeline with structured extraction.

May 26, 2026 · 6 min read · 1,399 words

See the scraper →

Thirdwatch's India Government Tenders Scraper extracts structured tender data from India's Central Public Procurement Portal into clean JSON, ready to feed a sales pipeline database. Pull tender IDs, reference numbers, organizations, departments, deadlines, and detail links with pay-per-result pricing. Built for developers and growth engineers building government sales intelligence tools or integrating procurement data into CRM workflows.

Why build a tender database from eProcure

India's government procurement market is massive. According to the World Bank's India procurement assessment, central and state government procurement spending exceeds $500 billion annually, with a growing share moving to electronic platforms like CPPP (eprocure.gov.in). For any B2G (business-to-government) company, this spending represents the largest addressable market in India -- larger than any single private-sector vertical.

The problem is discovery. eProcure's search interface is built for one-off lookups, not systematic pipeline building. There is no API, no export function, and no way to set up saved searches with notifications. A sales team targeting government IT contracts needs to check the portal daily across multiple keywords and departments. A growth engineer building a government market intelligence product needs historical tender data in a queryable format. A channel partner selling to system integrators needs to surface relevant subcontracting opportunities before prime contractors lock in their teams.

All of these use cases require the same foundation: a structured, deduplicated, continuously updated database of eProcure tender records. The scraper provides the data extraction layer. This guide covers how to build the pipeline from extraction through storage to sales workflow integration.

How does this compare to the alternatives?

Approach	Data freshness	Schema control	Integration flexibility	Maintenance
Manual portal checking + spreadsheet	Hours behind, human-dependent	None, ad hoc columns	Copy-paste only	Daily human effort
Tender aggregation SaaS (TenderTiger, BidAssist)	Near real-time	Locked to vendor schema	Limited API, vendor-dependent	Subscription renewal
In-house web scraper	Real-time on each run	Full control	Full control	Portal changes break scraper
Thirdwatch eProcure Scraper + your database	Real-time on each run	Full control of downstream schema	Any database, any CRM	Thirdwatch maintains extraction

Building on top of the India Government Tenders Scraper gives you schema control and integration flexibility without the maintenance burden of keeping up with eProcure's DOM changes.

How to build the tender database pipeline in 6 steps

Step 1: How do I set up the project?

Install dependencies and set your Apify token.

pip install apify-client psycopg2-binary pandas
export APIFY_TOKEN="apify_api_xxxxxxxxxxxxxxxx"

Step 2: How do I define the database schema?

Create a PostgreSQL table that maps to the scraper's output fields. tender_id is the natural key for deduplication.

CREATE TABLE IF NOT EXISTS tenders (
    tender_id TEXT PRIMARY KEY,
    tender_title TEXT NOT NULL,
    tender_reference_number TEXT,
    organization TEXT,
    department TEXT,
    published_date TEXT,
    bid_submission_deadline TEXT,
    tender_opening_date TEXT,
    detail_href TEXT,
    first_seen_at TIMESTAMP DEFAULT NOW(),
    last_seen_at TIMESTAMP DEFAULT NOW(),
    pipeline_status TEXT DEFAULT 'new'
);

CREATE INDEX idx_tenders_org ON tenders(organization);
CREATE INDEX idx_tenders_dept ON tenders(department);
CREATE INDEX idx_tenders_deadline ON tenders(bid_submission_deadline);
CREATE INDEX idx_tenders_status ON tenders(pipeline_status);

Step 3: How do I extract tenders for multiple sales verticals?

Run the scraper with keyword arrays that cover your target verticals. Use fetchDetails to get complete metadata for pipeline qualification.

from apify_client import ApifyClient
import os

client = ApifyClient(os.environ["APIFY_TOKEN"])

VERTICALS = {
    "IT Services": {
        "queries": ["IT services", "software development", "cloud computing",
                     "cybersecurity", "data centre"],
        "maxResults": 200,
        "fetchDetails": True,
    },
    "Infrastructure": {
        "queries": ["road construction", "bridge construction", "smart city"],
        "maxResults": 150,
        "organization": "",
        "fetchDetails": True,
    },
    "Medical Equipment": {
        "queries": ["medical equipment", "hospital supplies", "diagnostic instruments"],
        "maxResults": 100,
        "organization": "All India Institute of Medical Sciences",
        "fetchDetails": True,
    },
}

all_tenders = []
for vertical_name, config in VERTICALS.items():
    run = client.actor("thirdwatch/india-government-tenders-scraper").call(
        run_input=config
    )
    items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
    for item in items:
        item["_vertical"] = vertical_name
    all_tenders.extend(items)
    print(f"{vertical_name}: {len(items)} tenders")

# Deduplicate by tender_id
seen = set()
unique_tenders = []
for t in all_tenders:
    if t["tender_id"] not in seen:
        seen.add(t["tender_id"])
        unique_tenders.append(t)

print(f"\nTotal unique tenders: {len(unique_tenders)}")

Step 4: How do I upsert tenders into the database?

Upsert by tender_id to handle amended tenders and deadline extensions without creating duplicates.

import psycopg2
from datetime import datetime

conn = psycopg2.connect("postgresql://user:pass@localhost/tenders_db")
cur = conn.cursor()

for t in unique_tenders:
    cur.execute("""
        INSERT INTO tenders (tender_id, tender_title, tender_reference_number,
                             organization, department, published_date,
                             bid_submission_deadline, tender_opening_date,
                             detail_href, first_seen_at, last_seen_at)
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, NOW(), NOW())
        ON CONFLICT (tender_id) DO UPDATE SET
            bid_submission_deadline = EXCLUDED.bid_submission_deadline,
            tender_opening_date = EXCLUDED.tender_opening_date,
            last_seen_at = NOW()
    """, (
        t["tender_id"], t["tender_title"], t.get("tender_reference_number"),
        t.get("organization"), t.get("department"), t.get("published_date"),
        t.get("bid_submission_deadline"), t.get("tender_opening_date"),
        t.get("detail_href"),
    ))

conn.commit()
print(f"Upserted {len(unique_tenders)} records")

Step 5: How do I build a pipeline qualification view?

Create a SQL view that surfaces actionable tenders for your sales team -- open deadlines, sorted by urgency.

CREATE VIEW pipeline_active AS
SELECT
    tender_id,
    tender_title,
    organization,
    department,
    bid_submission_deadline,
    pipeline_status,
    CASE
        WHEN bid_submission_deadline::timestamp < NOW() + INTERVAL '3 days' THEN 'urgent'
        WHEN bid_submission_deadline::timestamp < NOW() + INTERVAL '7 days' THEN 'upcoming'
        ELSE 'open'
    END AS urgency
FROM tenders
WHERE pipeline_status IN ('new', 'qualified', 'preparing')
  AND bid_submission_deadline::timestamp > NOW()
ORDER BY bid_submission_deadline ASC;

Step 6: How do I schedule daily pipeline updates?

Automate extraction and loading with an Apify schedule plus a cron job for the database sync.

# Schedule the scraper on Apify
schedule = client.schedules().create(
    name="daily-tender-pipeline",
    cron_expression="0 2 * * *",  # 2:00 AM IST daily
    actions=[{
        "type": "RUN_ACTOR",
        "actorId": "thirdwatch/india-government-tenders-scraper",
        "runInput": {
            "queries": ["IT services", "software development", "cloud computing",
                        "cybersecurity", "data centre", "road construction",
                        "medical equipment"],
            "maxResults": 300,
            "fetchDetails": True,
        },
    }],
)
print(f"Schedule created: {schedule['id']}")

Use a webhook or a downstream cron job to pull completed run datasets into your PostgreSQL instance.

Sample output

Three records from a single run targeting IT services tenders. Each record weighs approximately 1.2 KB.

[
  {
    "tender_title": "Development of Integrated Dashboard for National Data Analytics Platform",
    "tender_id": "2026_MEITY_776543_1",
    "tender_reference_number": "MeitY/NDP/2026/DASH-034",
    "organization": "Ministry of Electronics and Information Technology",
    "department": "National e-Governance Division",
    "published_date": "2026-05-21",
    "bid_submission_deadline": "18-Jun-2026 05:00 PM",
    "tender_opening_date": "19-Jun-2026 11:00 AM",
    "detail_href": "https://eprocure.gov.in/eprocure/app?page=FrontEndTendersByOrganisation&service=page"
  },
  {
    "tender_title": "Supply of Firewall and Network Security Appliances for CERT-In",
    "tender_id": "2026_CERTIN_554321_1",
    "tender_reference_number": "CERT-In/PROC/2026/SEC-017",
    "organization": "Indian Computer Emergency Response Team",
    "department": "Ministry of Electronics and Information Technology",
    "published_date": "2026-05-19",
    "bid_submission_deadline": "12-Jun-2026 03:00 PM",
    "tender_opening_date": "13-Jun-2026 10:00 AM",
    "detail_href": "https://eprocure.gov.in/eprocure/app?page=FrontEndTendersByOrganisation&service=page"
  },
  {
    "tender_title": "Cloud Hosting Services for Passport Seva Portal Migration",
    "tender_id": "2026_MEA_998877_1",
    "tender_reference_number": "MEA/CPV/2026/CLOUD-008",
    "organization": "Ministry of External Affairs",
    "department": "Consular Passport and Visa Division",
    "published_date": "2026-05-23",
    "bid_submission_deadline": "25-Jun-2026 02:00 PM",
    "tender_opening_date": "26-Jun-2026 11:00 AM",
    "detail_href": "https://eprocure.gov.in/eprocure/app?page=FrontEndTendersByOrganisation&service=page"
  }
]

tender_id is your deduplication key across runs. organization and department enable routing to the right sales vertical. bid_submission_deadline drives urgency scoring. detail_href links to the full RFP document for bid preparation.

Common pitfalls

Three failure modes are common when building tender pipeline databases. India's GeM (Government e-Marketplace) processed over INR 4.4 lakh crore in procurement in FY2024-25, but CPPP remains the primary portal for works and services contracts. Schema drift from amended tenders -- eProcure allows organizations to amend tenders after publication, changing deadlines, scope, or eligibility criteria. If you only insert new records and never update existing ones, your pipeline shows stale deadlines. Always upsert on tender_id and track last_seen_at to detect amendments. Fiscal year-end volume spikes -- India's fiscal year ends March 31. Departments rush to spend allocated budgets in Q4 (January-March), publishing 2-3x the normal tender volume. Your pipeline and alerting system need to handle this surge without drowning your sales team in noise. Increase filtering strictness during Q4. Missing qualification context -- the scraper provides metadata but not the full tender document. A tender title alone is insufficient for bid/no-bid decisions. Always enable fetchDetails and use the detail_href to download the complete RFP before committing resources to bid preparation.

Build a scoring model on top of the structured fields: weight organization by your historical win rate, weight department by deal size, and penalize tenders with deadlines under 10 days. This turns raw extraction into qualified pipeline.

Related use cases

Frequently asked questions

How do I avoid duplicate tender records in my database?

Use tender_id as your primary deduplication key. Each tender on eProcure has a unique identifier that persists even when the tender is amended or its deadline is extended. On each scrape run, upsert records by tender_id rather than inserting blindly. This lets you track field-level changes (deadline extensions, amended documents) without creating duplicate rows.

What volume of tenders should I expect from a daily scrape?

A broad keyword search like 'IT services' returns 200-500 active tenders on any given day. Narrowing with the organization filter reduces this to 10-50 per department. For a sales pipeline covering 5 keywords across 3 departments, expect 50-300 unique tenders per daily run after deduplication. Volume varies by season -- Q4 (January-March, India's fiscal year-end) sees 2-3x the normal volume as departments rush to spend allocated budgets.

Scrape India Government Tenders for Bid Tracking (2026)Monitor CPPP Tender Deadlines for Compliance Automation Find India Government Contracts by Department (2026 Guide)

Try it yourself

100 free credits, no credit card.

About 30 real searches. Add the MCP to Claude or Cursor in two minutes.