Introduction to writing custom collectors in Java 8
Collectors
class, e.g. collect(toList())
, toSet()
or maybe something more fancy like counting()
or groupingBy()
. Not many of us actually bother to look how collectors are defined and implemented. Let's start from analysing what Collector<T, A, R>
really is and how it works.Collector<T, A, R>
works as a "sink" for streams - stream pushes items (one after another) into a collector, which should produce some "collected" value in the end. Most of the time it means building a collection (like toList()
) by accumulating elements or reducing stream into something smaller (e.g. counting()
collector that barely counts elements). Every collector accepts items of type T
and produces aggregated (accumulated) value of type R
(e.g. R = List<T>
). Generic type A
simply defines the type of intermediate mutable data structure that we are going to use to accumulate items of type T
in the meantime. Type A
can, but doesn't have to be the same as R
- in simple words the mutable data structure that we use to collect items from input Stream<T>
can be different than the actual output collection/value. That being said, every collector must implement the following methods:interface Collector<T,A,R> {
Supplier<A> supplier()
BiConsumer<A,T> acumulator()
BinaryOperator<A> combiner()
Function<A,R> finisher()
Set<Characteristics> characteristics()
}
supplier()
returns a function that creates an instance of accumulator - mutable data structure that we will use to accumulate input elements of typeT
.accumulator()
returns a function that will take accumulator and one item of typeT
, mutating accumulator.combiner()
is used to join two accumulators together into one. It is used when collector is executed in parallel, splitting inputStream<T>
and collecting parts independently first.finisher()
takes an accumulatorA
and turns it into a result value, e.g. collection, of typeR
. All of this sounds quite abstract, so let's do a simple example.
ImmutableSet<T>
from Guava. However creating one is very simple. Remember that in order to iteratively build ImmutableSet
we use ImmutableSet.Builder<T>
- this is going to be our accumulator.import com.google.common.collect.ImmutableSet;First of all look carefully at generic types. Our
public class ImmutableSetCollector<T>
implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {
@Override
public Supplier<ImmutableSet.Builder<T>> supplier() {
return ImmutableSet::builder;
}
@Override
public BiConsumer<ImmutableSet.Builder<T>, T> accumulator() {
return (builder, t) -> builder.add(t);
}
@Override
public BinaryOperator<ImmutableSet.Builder<T>> combiner() {
return (left, right) -> {
left.addAll(right.build());
return left;
};
}
@Override
public Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher() {
return ImmutableSet.Builder::build;
}
@Override
public Set<Characteristics> characteristics() {
return EnumSet.of(Characteristics.UNORDERED);
}
}
ImmutableSetCollector
takes input elements of type T
, so it works for any Stream<T>
. In the end it will produce ImmutableSet<T>
- as expected. ImmutableSet.Builder<T>
is going to be our intermediate data structure. supplier()
returns a function that creates newImmutableSet.Builder<T>
. If you are not that familiar with lambdas in Java 8,ImmutableSet::builder
is a shorthand for() -> ImmutableSet.builder()
.accumulator()
returns a function that takesbuilder
and one element of typeT
. It simply adds said element to the builder.combiner()
returns a function that will accept two builders and turn them into one by adding all elements from one of them into the other - and returning the latter. Finallyfinisher()
returns a function that will turnImmutableSet.Builder<T>
intoImmutableSet<T>
. Again this is a shorthand syntax for:builder -> builder.build()
.- Last but not least,
characteristics()
informs JDK what capabilities our collector has. For example ifImmutableSet.Builder<T>
was thread-safe (it isn't), we could sayCharacteristics.CONCURRENT
as well.
collect()
: final ImmutableSet<Integer> set = ArraysHowever creating new instance is slightly verbose so I suggest creating static factory method, similar to what JDK does:
.asList(1, 2, 3, 4)
.stream()
.collect(new ImmutableSetCollector<>());
public class ImmutableSetCollector<T> implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {From now on we can take full advantage of our custom collector by simply typing:
//...
public static <T> Collector<T, ?, ImmutableSet<T>> toImmutableSet() {
return new ImmutableSetCollector<>();
}
}
collect(toImmutableSet())
. In the second part we will learn how to write more complex and useful collectors.Check out the second part of this article: Grouping, sampling and batching - custom collectors in Java 8.
Update
@akarazniewicz pointed out that collectors are just verbose implementation of folding. With my love and hate relationship with folds, I have to comment on that. Collectors in Java 8 are basically object-oriented encapsulation of the most complex type of fold found in Scala, namelyGenTraversableOnce.aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
. aggregate()
is like fold()
, but requires extra combop
to combine two accumulators (of type B
) into one. Comparing this to collectors, parameter z
comes from a supplier()
, seqop()
reduction operation is an accumulator()
and combop
is a combiner()
. In pseudo-code we can write:finisher(
seq.aggregate(collector.supplier())
(collector.accumulator(), collector.combiner()))
GenTraversableOnce.aggregate()
is used when concurrent reduction is possible - just like with collectors.Tags: java 8, scala