Concurrency Without Chaos: Managing Async Workloads in Node.js and Python
When working on large-scale data scraping and machine learning pipelines, I encountered the real-world challenges of concurrency in both Node.js and Python. In this blog, I’ll walk you through how each handles concurrency, how Promise.all can backfire, and the tools I used to safely control concurrent execution in production workloads.
🧵 What Is Concurrency, and Why Does It Matter?
Concurrency refers to the ability to manage multiple tasks at the same time. It doesn’t mean they all run in parallel (that’s parallelism), but that we don’t wait for one to finish before starting another. In high-throughput applications—like scraping hundreds of product pages or generating ML embeddings—concurrency is the only way to scale efficiently.
⚙️ Concurrency in Python with asyncio
While scraping ecommerce product listings using Python, I used the asyncio module to run multiple scraping tasks concurrently. Here's a simplified version of how I did it:
async def scrape_and_store_async(domain):
scraper = ScraperFactory.get_scraper(domain["platform"], domain["url"])
scraped_data = await asyncio.to_thread(scraper.scrape)
return {domain["url"]: scraped_data}
async def start_crawler(domains):
tasks = [scrape_and_store_async(domain) for domain in domains]
results = await asyncio.gather(*tasks)
# Merge and save results
✅ Highlights:
🚀 Concurrency in Node.js with Controlled Promise.all
While building a tool to generate ML embeddings for thousands of documents using BigQuery, I used Node.js with Promise.all. But raw Promise.all can be dangerous:
Recommended by LinkedIn
To fix this, I used the p-limit library to limit the concurrency:
const limit = pLimit(3); // Max 3 tasks at once
const tasks = formattedDocs.map((doc) =>
limit(async () => {
try {
const [rows] = await bigQueryClient.query({
query: generateEmbeddingQuery(doc.content),
location,
});
return {
status: "fulfilled",
value: {
id: doc.id,
embeddings: rows[0].ml_generate_embedding_result,
},
};
} catch (error) {
return {
status: "rejected",
reason: { id: doc.id, error },
};
}
})
);
const results = await Promise.all(tasks);
✅ Benefits:
⚔️ Node.js vs Python: Concurrency Comparison
FeatureNode.jsPython (asyncio)Core Concurrency ModelEvent loop + PromisesEvent loop + CoroutinesThread OffloadingRequires worker_threads or libsasyncio.to_thread()Parallel I/OYesYesBuilt-in ThrottlingNo (Promise.all is unbounded)No (gather() is unbounded)Control Librariesp-limit, PromisePoolaiolimiter, custom semaphores
💡 Key Takeaways
🔚 Final Thoughts
If you're working on web scraping, ML pipelines, or any I/O-heavy workload—mastering concurrency is a must. Both Node.js and Python offer robust tools, but the key is understanding when and how much concurrency is safe.
Thanks for reading! Would love to hear how you've handled concurrency in your own projects 🚀