Making sense of multithreading and concurrency with Java

Making sense of multithreading and concurrency with Java

Disclaimer: This is not an expertise article. I'm just sharing my experience with a concept I personally found interesting and genuinely enjoyed. If you notice inaccurate/incorrect, please let me know.

Introduction

The first time I was introduced to multithreading was back in college under operating systems and Java programming. I felt like I knew the concept after grasping the theory behind it until I came across the real engineering requirement in the software industry myself.

After reading a few resources on the internet, I was still not convinced and had a shallow understanding. So, in order to feed my curiosity, I decided to experiment it using a small prototype.

Thread is a process/sub-process that a CPU uses to handle tasks submitted to it. Consider an example to determine the number of prime numbers in a given range. When you execute this function (a notably computational task), your system will allot required resources to it, which includes memory(RAM), I/O and processing power. This processing power is allotted in terms of threads which will handle the task submitted to it. The thread will perform the computation, return back the result if any and will free up for further computations as programmed.

You have the power to spin up as many threads as you want (in theory), but the actual limitations will show based on the number of CPU cores available. We will talk about this in a bit.

Implementation

We'll create a system which will return the number of primes in a range as response. Here we have a simple REST based API in spring boot. Let's wrap the response in a future and attach a callback, to not block HTTP threads from making asynchronous calls to the endpoint.

@GetMapping("/countPrimes")
    public CompletableFuture<ResponseEntity<Integer>> getCountofPrimes(@RequestParam Integer a, @RequestParam Integer b) {
        return service.computePrimesInRange(a, b)
                .thenApply(ResponseEntity::ok);
    }        

/countPrimes(a,b) takes in two request parameters. This states the range within which the application should return the number of prime numbers. Then we have the service layer. We'll use a customExecutor to handle the computation thread pool and supply the tasks to the available threads:

    private final Executor customExecutor;

    PrototypeService(@Qualifier("customExecutor") Executor executor){
        this.customExecutor = executor;
    }

    public CompletableFuture<Integer> computePrimesInRange(int start, int end){
        return CompletableFuture.supplyAsync(()-> countPrimesInRange(start, end), customExecutor);
    }

    public Integer countPrimesInRange(int start, int end) {
        int count = 0;
        for(int curr = start; curr <= end; curr++){
            if(isPrime(curr)){
                count++;
            }
        }
        return count;
    }        

A simple endpoint is exposed which will take in two values and return the number of primes in the range. Nothing fancy. Let's run it locally and test out the endpoint,

Article content
count of primes in between 1 and 1M = 78498

Let's see the response time,

Article content
Processing time for 4 synchronous hits

That's an average response time of 65ms for just one request to our app. That means, if your host machine is idle, not handling any traffic at that moment, gets a single request, it would return the response in about 65ms.

Now, let's assume the app became highly popular. We start seeing a traffic of around 100 TPS. What happens then? Do all the request receive a response within 65ms? In order to see this, let's simulate this traffic in our local system. I'm going to use Apache Jmeter (a free and open source performance testing tool) to achieve this.

We are going to start off with 100 hits/s handled by 1 thread and gradually increase the number of threads to see the difference in response time. This experiment is run on a system with 8 cores. i.e true parallel processing can only be achieved up to 8 submitted tasks.

Our custom executor looks like:

@Bean(name = "customExecutor")
    public Executor taskExecutor(){
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(1); // total threads in pool
        executor.setMaxPoolSize(1); // max threads in pool
        executor.setQueueCapacity(90); //max requests that can be queued
        executor.setThreadNamePrefix("AsyncThread-");
        executor.initialize();
        return executor;
    }        

Results

The below execution results shows the average, min and max response time for a sample of 400 requests i.e four executions with 100 hits/s. The number of threads in our pool are configured in our custom executor through executor.setCorePoolSize(n) and executor.setMaxPoolSize(n) where n = total threads in pool.

Article content
1 thread(s)
Article content
2 thread(s)
Article content
3 thread(s)
Article content
4 thread(s)
Article content
5 thread(s)
Article content
6 thread(s)
Article content
7 thread(s)
Article content
8 thread(s)
Article content
Response time vs Number of Threads


Key Observations

  • Performance Improvement: Average response time decreases significantly as threads increase from 1 to 7 (2454 ms → 311 ms), indicating better CPU usage and concurrency handling.
  • Gradual decrease in Max Response Time: The max response is highest with 1 thread because the last request which was queued will be executed only when all the other 99 requests are processed. This is the case because only 1 resource i.e 1 thread is available and there is no room for parallel processing. As we increase the number of threads, the max response time decreases indicating parallel execution.
  • Saturation Point: At 8 threads, the average response time slightly increases (340 ms), implying diminishing/stagnant returns. As we increase the number of threads, technically, we'll be seeing even higher response times. Now this is the part which I mentioned earlier we'll talk about. There are several reasons for this: CPU contention (threads > available CPU cores), context switching i.e switching between multiple threads and maintaining state, resource contention and many more.

Hope you enjoyed the read and took away some learning.

Joyan Bhathena

MS Computer Science Student | Data Engineer | Data Analyst

2mo

That was a good read

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics