Concurrency Without Chaos: Managing Async Workloads in Node.js and Python

Concurrency Without Chaos: Managing Async Workloads in Node.js and Python

When working on large-scale data scraping and machine learning pipelines, I encountered the real-world challenges of concurrency in both Node.js and Python. In this blog, I’ll walk you through how each handles concurrency, how Promise.all can backfire, and the tools I used to safely control concurrent execution in production workloads.


🧵 What Is Concurrency, and Why Does It Matter?

Concurrency refers to the ability to manage multiple tasks at the same time. It doesn’t mean they all run in parallel (that’s parallelism), but that we don’t wait for one to finish before starting another. In high-throughput applications—like scraping hundreds of product pages or generating ML embeddings—concurrency is the only way to scale efficiently.


⚙️ Concurrency in Python with asyncio

While scraping ecommerce product listings using Python, I used the asyncio module to run multiple scraping tasks concurrently. Here's a simplified version of how I did it:

async def scrape_and_store_async(domain):
    scraper = ScraperFactory.get_scraper(domain["platform"], domain["url"])
    scraped_data = await asyncio.to_thread(scraper.scrape)
    return {domain["url"]: scraped_data}

async def start_crawler(domains):
    tasks = [scrape_and_store_async(domain) for domain in domains]
    results = await asyncio.gather(*tasks)
    # Merge and save results
        

✅ Highlights:

  • I used asyncio.to_thread() to offload CPU-bound blocking code to a thread without blocking the event loop.
  • asyncio.gather() ran all tasks concurrently.
  • This pattern made scraping fast and scalable—but also risked overwhelming the system with too many threads.


🚀 Concurrency in Node.js with Controlled Promise.all

While building a tool to generate ML embeddings for thousands of documents using BigQuery, I used Node.js with Promise.all. But raw Promise.all can be dangerous:

  • It runs all tasks in parallel.
  • If you're generating 1,000 embeddings, it launches 1,000 queries at once!
  • That leads to rate limits, memory issues, and sometimes crashes.

To fix this, I used the p-limit library to limit the concurrency:

const limit = pLimit(3); // Max 3 tasks at once

const tasks = formattedDocs.map((doc) =>
  limit(async () => {
    try {
      const [rows] = await bigQueryClient.query({
        query: generateEmbeddingQuery(doc.content),
        location,
      });
      return {
        status: "fulfilled",
        value: {
          id: doc.id,
          embeddings: rows[0].ml_generate_embedding_result,
        },
      };
    } catch (error) {
      return {
        status: "rejected",
        reason: { id: doc.id, error },
      };
    }
  })
);

const results = await Promise.all(tasks);
        

✅ Benefits:

  • Prevents system overload by running only 3 queries at a time.
  • Collects results in a Promise.allSettled-style fashion.
  • Resilient and production-safe.


⚔️ Node.js vs Python: Concurrency Comparison

FeatureNode.jsPython (asyncio)Core Concurrency ModelEvent loop + PromisesEvent loop + CoroutinesThread OffloadingRequires worker_threads or libsasyncio.to_thread()Parallel I/OYesYesBuilt-in ThrottlingNo (Promise.all is unbounded)No (gather() is unbounded)Control Librariesp-limit, PromisePoolaiolimiter, custom semaphores


💡 Key Takeaways

  • Concurrency is powerful but can easily overwhelm your resources.
  • Use bounded concurrency: never blindly run Promise.all or asyncio.gather for untrusted input sizes.
  • Use tools like:
  • Always log, handle failures gracefully, and plan for rate limits.


🔚 Final Thoughts

If you're working on web scraping, ML pipelines, or any I/O-heavy workload—mastering concurrency is a must. Both Node.js and Python offer robust tools, but the key is understanding when and how much concurrency is safe.

Thanks for reading! Would love to hear how you've handled concurrency in your own projects 🚀

To view or add a comment, sign in

More articles by Lakshya Tiwari

Insights from the community

Others also viewed

Explore topics