In parallel programming on a shared-memory system, threads let you do multiple things at once, and you may have written programs that split work across a fixed number of threads. But what happens when the work itself has a recursive, tree-shaped structure — like merge sort, tree traversal, or computing a sum over a huge array? Manually dividing that work across threads can be painful and error-prone.
Java’s ForkJoinPool (java.util.concurrent.ForkJoinPool
[docs])
solves exactly this problem. It is a special thread pool designed for the
divide-and-conquer pattern: split a problem into smaller sub-problems,
solve them in parallel, and combine the results. The pool manages all the
scheduling for you, and it uses a technique called work stealing to keep
every thread busy.
Contents
1. The Core Idea: Fork and Join¶
The name comes from two operations:
| Operation | Meaning |
|---|---|
| fork | Asynchronously submit a sub-task to the pool |
| join | Wait for a sub-task to finish and get its result |
A straightforward way to use them is to fork both halves of a problem, then join both results:
solve(problem): if problem is small enough: solve it directly # base case else: left = fork(solve(left half)) # spawn left sub-task right = fork(solve(right half)) # spawn right sub-task # wait for both, then combine the results left_result = left.join() right_result = right.join() return combine(left_result, right_result)
The tasks spawned by fork() run in parallel, and the current thread waits until both
are done before combining results.
However, there is an important optimization: compute one half yourself. After forking both sub-tasks, the current thread has nothing to do but wait. That’s a wasted opportunity. Instead, you can fork only one sub-task and do the other half’s work yourself in the current thread, running in parallel with the forked task:
solve(problem): if problem is small enough: solve it directly # base case else: left = fork(solve(left half)) # spawn left sub-task right_result = solve(right half) # do right half yourself left_result = left.join() # wait only for left return combine(left_result, right_result)
This avoids creating an extra task object and keeps the current thread productive. But be careful: don’t join the left task until after you have completed the right half.
Question: Can you see what would go wrong if you joined the left task first?
2. The Class Hierarchy¶
Everything lives in java.util.concurrent. First, there are some important
classes used for setting up tasks.
ForkJoinTask<V> ← abstract base class for all tasks ├── RecursiveTask<V> ← use when your task returns a value └── RecursiveAction ← use when your task returns nothing (void)
You will never extend ForkJoinTask directly. Instead, you subclass one of the two concrete
classes and override a single method called compute(), which is where your logic lives.
RecursiveTask<V> is for tasks that compute and return a value — a sum, a
sorted array, a search result. The type parameter V is the return type. When
you call join() on a RecursiveTask, you get back a value of that type.
RecursiveAction is for tasks that do their work by side effect and return
nothing — applying a transform to every element of an array, filling a buffer,
updating a shared data structure. There is no type parameter, compute()
returns void, and join() will just wait for completion without returning a
value.
The choice is usually clear: if the caller needs a result back, use
RecursiveTask. If the task just mutates data that the caller already has a
reference to, use RecursiveAction.
Template¶
Here is a skeleton you can fill in for a RecursiveTask. The structure is
always the same — only the ResultType, the base case logic, and the way you
combine results change.
class MyTask extends RecursiveTask<ResultType> { // your fields here MyTask(/* your parameters */) { // store fields } @Override protected ResultType compute() { if (/* problem is small enough */) { // BASE CASE: solve directly and return the result return /* direct result */; } // RECURSIVE CASE: split, fork both halves, join both results MyTask left = new MyTask(/* left half */); MyTask right = new MyTask(/* right half */); left.fork(); right.fork(); ResultType leftResult = left.join(); ResultType rightResult = right.join(); return /* combine leftResult and rightResult */; } }
Once you have a task, you hand it to ForkJoinPool.commonPool() using invoke(), which submits
the root task and blocks (waits) until it finishes, returning the result:
MyTask task = new MyTask(...); ResultType result = ForkJoinPool.commonPool().invoke(task);
The common pool is a JVM-wide shared pool that sizes itself to the number of available processors. You don’t need to create, configure, or shut it down — it’s managed by the JVM.
A note on this template: it uses the simpler fork+fork+join+join approach to keep the structure clear. Section 3 introduces the one-fork-one-compute optimization once you have a working example in front of you.
3. A Complete Example: Parallel Sum with RecursiveTask
¶
Let’s compute the sum of a large array of long values in parallel.
3.1 The Sequential Version (for reference)¶
long sequentialSum(long[] arr, int lo, int hi) { long sum = 0; for (int i = lo; i < hi; i++) sum += arr[i]; return sum; }
3.2 The Parallel Version¶
Starting with the straightforward fork+fork+join+join approach:
import java.util.concurrent.RecursiveTask; public class SumTask extends RecursiveTask<Long> { private static final int THRESHOLD = 10_000; // tune this! private final long[] arr; private final int lo, hi; // work on arr[lo..hi) public SumTask(long[] arr, int lo, int hi) { this.arr = arr; this.lo = lo; this.hi = hi; } @Override protected Long compute() { int length = hi - lo; // Base case: array slice is small enough to sum sequentially if (length <= THRESHOLD) { long sum = 0; for (int i = lo; i < hi; i++) sum += arr[i]; return sum; } // Recursive case: fork both halves, then join both int mid = lo + length / 2; SumTask leftTask = new SumTask(arr, lo, mid); SumTask rightTask = new SumTask(arr, mid, hi); leftTask.fork(); // submit left half asynchronously rightTask.fork(); // submit right half asynchronously long leftResult = leftTask.join(); // wait for left half long rightResult = rightTask.join(); // wait for right half return leftResult + rightResult; } }
This works correctly, but there is one inefficiency: after forking both
sub-tasks, the current thread sits idle waiting — it could be doing useful work
instead. Here is an optimized compute() method, replacing the one above, that
forks only the left half and runs the right half directly in the current
thread:
@Override protected Long compute() { int length = hi - lo; if (length <= THRESHOLD) { long sum = 0; for (int i = lo; i < hi; i++) sum += arr[i]; return sum; } int mid = lo + length / 2; SumTask leftTask = new SumTask(arr, lo, mid); leftTask.fork(); // submit left asynchronously SumTask rightTask = new SumTask(arr, mid, hi); long rightResult = rightTask.compute(); // not forking: run in current thread long leftResult = leftTask.join(); // now collect left result return leftResult + rightResult; }
The right half runs in the current thread at the same time the left half runs in a pool thread, and
join() is only called after the current thread has already finished its share of the work. The
rest of the tutorial uses this optimized pattern.
3.3 Running It¶
// All classes used in this tutorial are in java.util.concurrent — // in a real project, replace the wildcard with specific imports. import java.util.concurrent.*; public class Main { public static void main(String[] args) { long[] data = new long[10_000_000]; for (int i = 0; i < data.length; i++) { data[i] = i + 1; } SumTask task = new SumTask(data, 0, data.length); long result = ForkJoinPool.commonPool().invoke(task); System.out.println("Sum: " + result); // Expected output: Sum: 50000005000000 } }
4. The Rules: What to Do and What to Avoid¶
✅ DO¶
Always have a base case. Without one, your recursion will either stack overflow or create so many tiny tasks that scheduling overhead dominates.
Prefer computing one branch yourself. Once you’re comfortable with the basics, apply the
one-fork-one-compute optimization shown in Section 3.2: fork one sub-task and call compute()
directly on the other in the current thread, then join() only after your own work is done.
Tune your threshold. The threshold controls granularity. Too small → too many tiny tasks and excessive overhead. Too large → you don’t get enough parallelism. A common heuristic is to aim for sub-tasks that take at least a few microseconds of work. Benchmark with your real data.
❌ DON’T¶
Don’t block inside a task. ForkJoinPool assumes its tasks are non-blocking. If you call
Thread.sleep(), lock a mutex, or do blocking I/O inside compute(), you can starve the pool of
threads and severely hurt performance.
Don’t share mutable state between tasks without synchronization. The same rules about memory visibility and race conditions apply here. Prefer designs where each task owns its own slice of data, like the examples above.
5. Work Stealing — Why ForkJoinPool Is Fast¶
Traditional thread pools have a single shared work queue. All threads contend for that queue, which becomes a bottleneck.
ForkJoinPool gives each worker thread its own double-ended queue (deque). When a thread forks a sub-task, it pushes it onto the bottom of its own deque and continues executing the current task. When the thread needs new work — because the current task completes or blocks on a join() — it pops from the bottom of its own deque. Because tasks are forked recursively, the bottom always holds the most recently created, finest-grained sub-problems; the top holds the oldest, largest ones.
When a thread runs out of work, rather than waiting, it steals a task from the top of another thread’s deque. This is exactly the right task to steal: it’s a large, early sub-problem that represents a substantial amount of remaining work, so the thief stays productively busy while the original thread continues picking off its own small tasks from the bottom.
Thread A deque: [big-task | medium-task | small-task | newest-task]
↑ ↑
Thread B steals Thread A pops
from this end from this end
This makes ForkJoinPool nearly self-balancing, with very little contention on the queues.
6. When Not to Use ForkJoinPool¶
ForkJoinPool is not always the right tool:
| Situation | Better alternative |
|---|---|
| A small array (thousands of elements or fewer) | Sequential code — parallelism overhead dominates |
| Tasks that block (I/O, locks, network) | A regular ThreadPoolExecutor with a larger thread pool |
| Fully independent tasks with no recursive structure |
ExecutorService with Callable tasks |
You just want parallel map/filter/reduce
|
Java parallel streams (which use the common ForkJoinPool internally) |
ForkJoinPool shines when: the problem is CPU-bound, the work has a recursive, divide-and-conquer structure, and the dataset is large enough that splitting overhead is worthwhile.
Summary¶
- ForkJoinPool is a thread pool optimized for recursive, divide-and-conquer algorithms.
- Tasks extend
RecursiveTask<V>(returns a value) orRecursiveAction(modifies data in place, returns nothing). - The key pattern is: fork one sub-task, compute the other yourself, then join the forked one.
- Work stealing keeps all threads busy without central queue contention.
- Always choose a threshold carefully — too small hurts performance, too large limits parallelism.
- Avoid blocking inside tasks.