Java programs execute in sequences called threads. By default, a program
runs in a single “main” thread, utilizing only one CPU core. To leverage
modern multi-core processors, we can instantiate additional Thread objects
[docs]
to perform work in parallel.
However, a standard Thread executes a Runnable, which has a void return
type. If we need a parallel task to compute and return a value, we can use a
FutureTask
[docs]
and a Callable
[docs].
Contents
1. The Core Components¶
To get a result from a parallel thread, you need to coordinate three different objects. Think of these as the three layers of a parallel task:
-
The Task Logic (
Callable<V>): This is an interface type for defining the code you want to run. Unlike a standardRunnable, theCallable’scall()method has a return value of typeV. In practice, you define this using a lambda expression:-
A block of code:
() -> { /* your logic */ return result; } -
A method call:
() -> computeSum(myArray)
-
A block of code:
-
The Result Container (
FutureTask<V>): Since a thread takes time to finish, you need a “placeholder” to hold the result while the work is happening. TheFutureTaskwraps your logic and provides the.get()method to retrieve the result once it’s ready. -
The Execution Engine (
Thread): TheFutureTaskitself doesn’t “run” anything; it is just a piece of data. You must hand that task to aThreadobject and call.start()to actually begin the execution on a separate CPU core.
Template¶
The following boilerplate demonstrates how to define a single task, execute it
in a new thread, and retrieve the result. The main items to define yourself
are the ResultType, the task logic, and how you choose to divide work and
assign it to different threads (repeating parts of this pattern for each).
// 1. Define the task logic Callable<ResultType> taskLogic = () -> { // compute something... return /* result */; }; // 2. Wrap the logic in a FutureTask FutureTask<ResultType> future = new FutureTask<>(taskLogic); // 3. Start the thread new Thread(future).start(); // 4. Perform other work in the main thread, including possibly launching other threads... // 5. Retrieve the result (.get() blocks (waits) until the thread completes) ResultType result = future.get();
2. Parallelizing with Array Slicing¶
A common strategy for parallelizing a large computation is to divide the data into n equal-sized segments (slices) and assign each slice to a separate thread.
The Algorithm¶
- Determine the number of threads to use (e.g., based on available CPU cores).
- Calculate the size of each slice.
- Create and start a
FutureTaskfor each slice. - Iterate through the tasks and call
.get()to collect and aggregate the partial results.
3. Complete Example: Parallel Array Sum¶
This example sums a large array by splitting it into slices based on the number of available CPU cores.
3.1 The Sequential Version (for reference)¶
public static long sequentialSum(long[] arr) { long sum = 0; for (long v : arr) { sum += v; } return sum; }
3.2 The Parallel Version¶
import java.util.ArrayList; import java.util.List; import java.util.concurrent.*; public class ParallelSum { // We declare 'throws Exception' because future.get() can throw InterruptedException // (if the thread is interrupted) and ExecutionException (if the task logic itself fails). public static long sum(long[] arr, int numThreads) throws Exception { int length = arr.length; // Calculate slice size, rounding up to ensure we cover the entire array int sliceSize = (int) Math.ceil((double) length / numThreads); List<FutureTask<Long>> tasks = new ArrayList<>(); // FIRST LOOP: Spawn and start all threads for (int i = 0; i < numThreads; i++) { int lo = i * sliceSize; int hi = Math.min(lo + sliceSize, length); // Define the task for this slice using a lambda (Callable<Long>) Callable<Long> sliceTask = () -> { long partialSum = 0; for (int j = lo; j < hi; j++) { partialSum += arr[j]; } return partialSum; }; // Wrap logic in FutureTask and add to our list to track it FutureTask<Long> future = new FutureTask<>(sliceTask); tasks.add(future); // Hand the task to a Thread and start it immediately new Thread(future).start(); } // SECOND LOOP: Aggregate results long totalSum = 0; for (FutureTask<Long> task : tasks) { // .get() blocks until this specific thread finishes its work totalSum += task.get(); } return totalSum; } public static void main(String[] args) throws Exception { long[] data = new long[10_000_000]; for (int i = 0; i < data.length; i++) data[i] = i + 1; int cores = Runtime.getRuntime().availableProcessors(); long result = sum(data, cores); System.out.println("Summing with " + cores + " threads."); System.out.println("Result: " + result); } }
Question: In
sum(), what would happen if we calledfuture.get()inside the first loop immediately after starting each thread, instead of using a second, separate loop to aggregate the results?
4. The Rules: What to Do and What to Avoid¶
Creating a new Thread for every task is feasible for a small, fixed number
of threads, but it does not scale well to thousands of tasks.
✅ DO¶
Divide work into coarse-grained chunks. Calculate the number of available CPU cores and divide your data into exactly that many slices to maximize throughput.
Start all threads before waiting on any of them. Always use two separate loops when managing multiple threads manually: one to spawn and .start() all of them, and a second loop to .get() their results.
❌ DON’T¶
Don’t create a new Thread for every tiny task. Threads have significant overhead. Each thread requires a dedicated memory stack (typically 1MB), and spawning thousands can quickly consume all available RAM or hit OS thread limits (OutOfMemoryError).
Don’t exceed the number of physical cores. If the number of threads greatly exceeds the number of physical CPU cores, the CPU spends excessive time swapping threads in and out of execution (“context switching”) rather than performing useful work.
Summary¶
- Threads allow for parallel execution, but manual management carries significant overhead.
-
Callable<V>defines a task that returns a value of typeV. -
FutureTask<V>acts as a placeholder for the result and can be executed by aThread. -
task.get()retrieves the result and blocks the current thread until the result is available. - Avoid spawning too many threads. Prefer a small, fixed number of threads to avoid resource exhaustion and context-switching overhead.
Next Steps¶
In production code, threads are rarely managed manually. For more robust and efficient thread management, explore these frameworks:
- ExecutorService: The standard API for managing reusable thread pools.
- ForkJoinPool: A specialized pool optimized for recursive, divide-and-conquer algorithms.