Back to Articles

How to Build an In-House Web Scraping Proxy Checker in Python

Industry Insights L Luke Cage 2 min read
Proxy Validation Engine Multithreaded Latency & Anonymity Tester
Table of Contents
  1. Why check proxy lists before running scrapers?
  2. Building the Multithreaded Proxy Checker

Why check proxy lists before running scrapers?#

When loading bulk proxy lists purchased from various marketplaces, a percentage of the IPs may be offline, slow, or blacklisted. If you load these raw lists directly into your production scrapers, the script will waste valuable seconds waiting for connection timeouts. To ensure your scrapers operate at peak efficiency, you should run a pre-crawling validation script that filters out dead nodes.

In this guide, we walk through how to build an in-house proxy checker in Python. We will write a multithreaded script that tests proxy latency, checks anonymity headers, and output a clean, working list. Sourcing proxies from premium providers like 5-proxy.com and running tests on VPS nodes at vpsrated.com/proxy ensures optimal network routes.

Building the Multithreaded Proxy Checker#

Step 1: Set up the Checker dependencies

We will use Python's concurrent.futures module to run checks in parallel, which is much faster than checking proxies sequentially.

import requests
from concurrent.futures import ThreadPoolExecutor

# Sample proxy list
proxies_list = [
    "http://user:pass@proxy1.com:8000",
    "http://user:pass@proxy2.com:8000"
]

Step 2: Implement the Validation Logic

We will send a lightweight request to a public IP verification endpoint. If the request returns a 200 status within our timeout window, the proxy is active, and we measure the latency.

def check_proxy(proxy):
    proxies = {
        "http": proxy,
        "https": proxy
    }
    try:
        response = requests.get("https://api.ipify.org?format=json", proxies=proxies, timeout=3)
        if response.status_code == 200:
            latency = response.elapsed.total_seconds()
            print(f"Proxy active: {proxy} (Latency: {latency}s)")
            return proxy, latency
    except requests.exceptions.RequestException:
        pass
    return None

# Run Thread Pool Executor to check in parallel
with ThreadPoolExecutor(max_workers=10) as executor:
    results = executor.map(check_proxy, proxies_list)

Step 3: Anonymity Header Verification

Beyond latency, a premium proxy must not leak your real IP address. Anonymity checks look for headers like X-Forwarded-For or Via in the outgoing request. If these headers are present, the target server can see that you are using a proxy. To check latency and anonymity values out-of-the-box, you can also run your lists through our free web-based Bulk Proxy Checker.

L
Author / Editor
Luke Cage

Expert researcher and writer focusing on secure web scraping architectures, dynamic proxy networks, and consumer data privacy controls.

Recommended Reading