Turning recursive file system traversal into Stream
FindFirst
, FindNext
and FindClose
functions. First I came up with a procedure printing contents of a given directory. You can imagine how proud I was to discover I can actually call that procedure from itself to traverse file system recursively. Well, I didn't know the term recursion back then, but it worked. Similar code in Java would look something like this:public void printFilesRecursively(final File folder) {Didn't know
for (final File entry : listFilesIn(folder)) {
if (entry.isDirectory()) {
printFilesRecursively(entry);
} else {
System.out.println(entry.getAbsolutePath());
}
}
}
private File[] listFilesIn(File folder) {
final File[] files = folder.listFiles();
return files != null ? files : new File[]{};
}
File.listFiles()
can return null
, did ya? That's how it signals I/O errors, like if IOException
never existed. But that's not the point. System.out.println()
is rarely what we need, thus this method is neither reusable nor composable. It is probably the best counterexample of Open/Closed principle. I can imagine several use cases for recursive traversal of file system:- Getting a complete list of all files for display purposes
- Looking for all files matching given pattern/property (also check out
File.list(FilenameFilter)
) - Searching for one particular file
- Processing every single file, e.g. sending it over network
final File home = new File(FileUtils.getUserDirectoryPath());Remember that
final Stream<Path> files = Files.list(home.toPath());
files.forEach(System.out::println);
Files.list(Path)
(new in Java 8) does not look into subdirectories - we'll fix that later. The most important lesson here is: Files.list()
returns a Stream<Path>
- a value that we can pass around, compose, map, filter, etc. It's extremely flexible, e.g. it's fairly simple to count how many files I have in a directory per extension:import org.apache.commons.io.FilenameUtils;OK, just another API, you might say. But it becomes really interesting once we need to go deeper, recursively traversing subdirectories. One amazing feature of streams is that you can combine them with each other in various ways. Old Scala saying "flatMap that shit" is applicable here as well, check out this recursive Java 8 code:
//...
final File home = new File(FileUtils.getUserDirectoryPath());
final Stream<Path> files = Files.list(home.toPath());
final Map<String, List<Path>> byExtension = files
.filter(path -> !path.toFile().isDirectory())
.collect(groupingBy(path -> getExt(path)));
byExtension.
forEach((extension, matchingFiles) ->
System.out.println(
extension + "\t" + matchingFiles.size()));
//...
private String getExt(Path path) {
return FilenameUtils.getExtension(path.toString()).toLowerCase();
}
//WARNING: doesn't compile, yet:
private static Stream<Path> filesInDir(Path dir) {
return Files.list(dir)
.flatMap(path ->
path.toFile().isDirectory() ?
filesInDir(path) :
singletonList(path).stream());
}
Stream<Path>
lazily produced by filesInDir()
contains all files within directory including subdirectories. You can use it as any other stream by calling map()
, filter()
, anyMatch()
, findFirst()
, etc. But how does it really work? flatMap()
is similar to map()
but while map()
is a straightforward 1:1 transformation, flatMap()
allows replacing single entry in input Stream
with multiple entries. If we had used map()
, we would have end up with Stream<Stream<Path>>
(or maybe Stream<List<Path>>
). But flatMap()
flattens this structure, in a way exploding inner entries. Let's see a simple example. Imagine Files.list()
returned two files and one directory. For files flatMap()
receives a one-element stream with that file. We can't simply return that file, we have to wrap it, but essentially this is no-operation. It gets way more interesting for a directory. In that case we call filesInDir()
recursively. As a result we get a stream of contents of that directory, which we inject into our outer stream.Code above is short, sweet and... doesn't compile. These pesky checked exceptions again. Here is a fixed code, wrapping checked exceptions for sanity:
public static Stream<Path> filesInDir(Path dir) {Unfortunately this quite elegant code is not lazy enough.
return listFiles(dir)
.flatMap(path ->
path.toFile().isDirectory() ?
filesInDir(path) :
singletonList(path).stream());
}
private static Stream<Path> listFiles(Path dir) {
try {
return Files.list(dir);
} catch (IOException e) {
throw Throwables.propagate(e);
}
}
flatMap()
evaluates eagerly, thus it always traverses all subdirectories, even if we barely ask for first file. You can try with my tiny LazySeq
library that tries to provide even lazier abstraction, similar to streams in Scala or lazy-seq
in Clojure. But even standard JDK 8 solution might be really helpful and simplify your code significantly.Tags: functional programming, java 8