Java programs execute in sequences called threads. By default, a program runs in a single “main” thread, utilizing only one CPU core. To leverage modern multi-core processors, we can instantiate additional Thread objects [docs] to perform work in parallel.

However, a standard Thread executes a Runnable, which has a void return type. If we need a parallel task to compute and return a value, we can use a FutureTask [docs] and a Callable [docs].

Contents


1. The Core Components

To get a result from a parallel thread, you need to coordinate three different objects. Think of these as the three layers of a parallel task:

  1. The Task Logic (Callable<V>): This is an interface type for defining the code you want to run. Unlike a standard Runnable, the Callable’s call() method has a return value of type V. In practice, you define this using a lambda expression:
    • A block of code: () -> { /* your logic */ return result; }
    • A method call: () -> computeSum(myArray)
  2. The Result Container (FutureTask<V>): Since a thread takes time to finish, you need a “placeholder” to hold the result while the work is happening. The FutureTask wraps your logic and provides the .get() method to retrieve the result once it’s ready.
  3. The Execution Engine (Thread): The FutureTask itself doesn’t “run” anything; it is just a piece of data. You must hand that task to a Thread object and call .start() to actually begin the execution on a separate CPU core.

Template

The following boilerplate demonstrates how to define a single task, execute it in a new thread, and retrieve the result. The main items to define yourself are the ResultType, the task logic, and how you choose to divide work and assign it to different threads (repeating parts of this pattern for each).

// 1. Define the task logic
Callable<ResultType> taskLogic = () -> {
    // compute something...
    return /* result */;
};

// 2. Wrap the logic in a FutureTask
FutureTask<ResultType> future = new FutureTask<>(taskLogic);

// 3. Start the thread
new Thread(future).start();

// 4. Perform other work in the main thread, including possibly launching other threads...

// 5. Retrieve the result (.get() blocks (waits) until the thread completes)
ResultType result = future.get();

2. Parallelizing with Array Slicing

A common strategy for parallelizing a large computation is to divide the data into n equal-sized segments (slices) and assign each slice to a separate thread.

The Algorithm

  1. Determine the number of threads to use (e.g., based on available CPU cores).
  2. Calculate the size of each slice.
  3. Create and start a FutureTask for each slice.
  4. Iterate through the tasks and call .get() to collect and aggregate the partial results.

3. Complete Example: Parallel Array Sum

This example sums a large array by splitting it into slices based on the number of available CPU cores.

3.1 The Sequential Version (for reference)

public static long sequentialSum(long[] arr) {
    long sum = 0;
    for (long v : arr) {
        sum += v;
    }
    return sum;
}

3.2 The Parallel Version

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;

public class ParallelSum {

    // We declare 'throws Exception' because future.get() can throw InterruptedException 
    // (if the thread is interrupted) and ExecutionException (if the task logic itself fails).
    public static long sum(long[] arr, int numThreads) throws Exception {
        int length = arr.length;
        // Calculate slice size, rounding up to ensure we cover the entire array
        int sliceSize = (int) Math.ceil((double) length / numThreads);

        List<FutureTask<Long>> tasks = new ArrayList<>();

        // FIRST LOOP: Spawn and start all threads
        for (int i = 0; i < numThreads; i++) {
            int lo = i * sliceSize;
            int hi = Math.min(lo + sliceSize, length);

            // Define the task for this slice using a lambda (Callable<Long>)
            Callable<Long> sliceTask = () -> {
                long partialSum = 0;
                for (int j = lo; j < hi; j++) {
                    partialSum += arr[j];
                }
                return partialSum;
            };

            // Wrap logic in FutureTask and add to our list to track it
            FutureTask<Long> future = new FutureTask<>(sliceTask);
            tasks.add(future);

            // Hand the task to a Thread and start it immediately
            new Thread(future).start();
        }

        // SECOND LOOP: Aggregate results
        long totalSum = 0;
        for (FutureTask<Long> task : tasks) {
            // .get() blocks until this specific thread finishes its work
            totalSum += task.get();
        }

        return totalSum;
    }

    public static void main(String[] args) throws Exception {
        long[] data = new long[10_000_000];
        for (int i = 0; i < data.length; i++) data[i] = i + 1;

        int cores = Runtime.getRuntime().availableProcessors();
        long result = sum(data, cores);

        System.out.println("Summing with " + cores + " threads.");
        System.out.println("Result: " + result);
    }
}

Question: In sum(), what would happen if we called future.get() inside the first loop immediately after starting each thread, instead of using a second, separate loop to aggregate the results?


4. The Rules: What to Do and What to Avoid

Creating a new Thread for every task is feasible for a small, fixed number of threads, but it does not scale well to thousands of tasks.

DO

Divide work into coarse-grained chunks. Calculate the number of available CPU cores and divide your data into exactly that many slices to maximize throughput.

Start all threads before waiting on any of them. Always use two separate loops when managing multiple threads manually: one to spawn and .start() all of them, and a second loop to .get() their results.

DON’T

Don’t create a new Thread for every tiny task. Threads have significant overhead. Each thread requires a dedicated memory stack (typically 1MB), and spawning thousands can quickly consume all available RAM or hit OS thread limits (OutOfMemoryError).

Don’t exceed the number of physical cores. If the number of threads greatly exceeds the number of physical CPU cores, the CPU spends excessive time swapping threads in and out of execution (“context switching”) rather than performing useful work.


Summary

  • Threads allow for parallel execution, but manual management carries significant overhead.
  • Callable<V> defines a task that returns a value of type V.
  • FutureTask<V> acts as a placeholder for the result and can be executed by a Thread.
  • task.get() retrieves the result and blocks the current thread until the result is available.
  • Avoid spawning too many threads. Prefer a small, fixed number of threads to avoid resource exhaustion and context-switching overhead.

Next Steps

In production code, threads are rarely managed manually. For more robust and efficient thread management, explore these frameworks:

  • ExecutorService: The standard API for managing reusable thread pools.
  • ForkJoinPool: A specialized pool optimized for recursive, divide-and-conquer algorithms.