sbt 1.3.0 introduces the Glob type which can be used to specify a file system
query. The design is inspired by shell
globs. Glob has
only one public method, matches(java.nio.file.Path), that can be used to
check if a path matches the glob pattern.
Globs can be constructed explicitly or using a dsl that uses the / operator to
extend queries. In all of the examples provided, we use java.nio.file.Path,
but java.io.File may also be used.
The simplest Glob represents a single path. Explicitly create a single path glob with:
val glob = Glob(Paths.get("foo/bar"))
println(glob.matches(Paths.get("foo"))) // prints false
println(glob.matches(Paths.get("foo/bar"))) // prints true
println(glob.matches(Paths.get("foo/bar/baz"))) // prints false
It can also be created using the glob dsl with:
val glob = Paths.get("foo/bar").toGlob
There are two special glob objects:
1) AnyPath (aliased by *) matches any path with just one name component
2) RecursiveGlob (aliased by **) matches all paths
Using AnyPath, we can explicitly construct a glob that matches all children of
a directory:
val path = Paths.get("/foo/bar")
val children = Glob(path, AnyPath)
println(children.matches(path)) // prints false
println(children.matches(path.resolve("baz")) // prints true
println(children.matches(path.resolve("baz").resolve("buzz") // prints false
Using the dsl, the above becomes:
val children = Paths.get("/foo/bar").toGlob / AnyPath
val dslChildren = Paths.get("/foo/bar").toGlob / *
// these two definitions have identical results
Recursive globs are similar:
val path = Paths.get("/foo/bar")
val allDescendants = Glob(path, RescursiveGlob)
println(allDescendants.matches(path)) // prints false
println(allDescendants.matches(path.resolve("baz")) // prints true
println(allDescendants.matches(path.resolve("baz").resolve("buzz") // prints true
or
val allDescendants = Paths.get("/foo/bar").toGlob / **
Globs may also be constructed using path names. The following three globs are equivalent:
val pathGlob = Paths.get("foo").resolve("bar")
val glob = Glob("foo/bar")
val altGlob = Glob("foo") / "bar"
When parsing glob paths, any / characters are automatically converted to \
on windows.
Globs can apply name filters at each path level. For example,
val scalaSources = Paths.get("/foo/bar").toGlob / ** / "src" / "*.scala"
specifies all of the descendants of /foo/bar that have the scala file
extension whose parent directory is named src.
More advanced queries are also possible:
val scalaAndJavaSources =
Paths.get("/foo/bar").toGlob / ** / "src" / "*.{scala,java}"
The AnyPath special glob can be used to control the depth of the query. For
example, the glob
val twoDeep = Glob("/foo/bar") / * / * / *
matches any path that is a descendant of /foo/bar that has exactly two
parents, e.g. /foo/bar/a/b/c.txt would be accepted but not /foo/bar/a/b or
/foo/bar/a/b/c/d.txt.
The Glob apis use glob syntax (see
PathMatcher
for details). Regular
expressions
can be used instead:
val digitGlob = Glob("/foo/bar") / ".*-\d{2,3}[.]txt".r
digitGlob.matches(Paths.get("/foo/bar").resolve("foo-1.txt")) // false
digitGlob.matches(Paths.get("/foo/bar").resolve("foo-23.txt")) // true
digitGlob.matches(Paths.get("/foo/bar").resolve("foo-123.txt")) // true
It is possible to specify multiple path components in the regex:
val multiRegex = Glob("/foo/bar") / "baz-\d/.*/foo.txt"
multiRegex.matches(Paths.get("/foo/bar/baz-1/buzz/foo.txt")) // true
multiRegex.matches(Paths.get("/foo/bar/baz-12/buzz/foo.txt")) // false
Recursive globs cannot be expressed using regex syntax because ** is not valid
in a regex and paths are matched component wise (so "foo/.*/foo.txt" is actually
split into three regular expressions {"foo", ".*", "foo.txt"} for matching
purposes. To make the multiRegex from above recursive, one could write:
val multiRegex = Glob("/foo/bar") / "baz-\d/".r / ** / "foo.txt"
multiRegex.matches(Paths.get("/foo/bar/baz-1/buzz/foo.txt")) // true
multiRegex.matches(Paths.get("/foo/bar/baz-1/fizz/buzz/foo.txt")) // true
In regex syntax, \ is an escape character and cannot be used as a path
separator. If the regex covers multiple path components, / must be used as the
path separator, even on Windows:
val multiRegex = Glob("/foo/bar") / "baz-\d/foo\.txt".r
val validRegex = Glob("/foo/bar") / "baz/Foo[.].txt".r
// throws java.util.regex.PatternSyntaxException because \F is not a valid
// regex construct
val invalidRegex = Glob("/foo/bar") / "baz\Foo[.].txt".r
Querying the file system for the files that match one or more Glob patterns is
done via the sbt.nio.file.FileTreeView trait. It provides two methods
def list(glob: Glob): Seq[(Path, FileAttributes)]
def list(globs: Seq[Glob]): Seq[(Path, FileAttributes)]
that can be used to retrieve all of the paths matching the provided patterns.
val scalaSources: Glob = ** / "*.scala"
val regularSources: Glob = "/foo/src/main/scala" / scalaSources
val scala212Sources: Glob = "/foo/src/main/scala-2.12"
val sources: Seq[Path] = FileTreeView.default.list(regularSources).map(_._1)
val allSources: Seq[Path] =
FileTreeView.default.list(Seq(regularSources, scala212Sources)).map(_._1)
In the variant that takes Seq[Glob] as input, sbt will aggregate all of the
globs in such a way that it will only ever list any directory on the file system
once. It should return all of the files whose path name matches any of the
provided Glob patterns in the input Seq[Glob].
The FileTreeView trait is parameterized by a type, T, that is always
(java.nio.file.Path, sbt.nio.file.FileAttributes) in sbt. The FileAttributes
trait provides access to the following properties:
isDirectory — returns true if the Path represents a directory.
isRegularFile — returns true if the Path represents a regular file. This
should usually be the inverse of isDirectory.
isSymbolicLink — returns true if the Path is a symbolic link. The
default FileTreeView implementation always follows symbolic links. If the
symbolic link targets a regular file, both isSymbolicLink and isRegularFile
will be true. Similarly, if the link targets a directory, both isSymbolicLink
and isDirectory will be true. If the link is broken, isSymbolicLink will be
true but both isDirectory and isRegularFile will be false.
The reason that the FileTreeView always provides the attributes is because
checking the type of a file requires a system call, which can be slow. All of
the major desktop operating systems provide apis for listing a directory where
both the file names and file node types are returned. This allows sbt to provide
this information without making an extra system call. We can use this to
efficiently filter paths:
// No additional io is performed in the call to attributes.isRegularFile
val scalaSourcePaths =
FileTreeView.default.list(Glob("/foo/src/main/scala/**/*.scala")).collect {
case (path, attributes) if attributes.isRegularFile => path
}
In addition to the list methods described above, there two additional
overloads that take an sbt.nio.file.PathFilter argument:
def list(glob: Glob, filter: PathFilter): Seq[(Path, FileAttributes)]
def list(globs: Seq[Glob], filter: PathFilter): Seq[(Path, FileAttributes)]
The PathFilter has a single abstract method:
def accept(path: Path, attributes: FileAttributes): Boolean
It can be used to further filter the query specified by the glob patterns:
val regularFileFilter: PathFilter = (_, a) => a.isRegularFile
val scalaSourceFiles =
FileTreeView.list(Glob("/foo/bar/src/main/scala/**/*.scala"), regularFileFilter)
A Glob may be used as a PathFilter:
val filter: PathFilter = ** / "*include*"
val scalaSourceFiles =
FileTreeView.default.list(Glob("/foo/bar/src/main/scala/**/*.scala"), filter)
Instances of PathFilter can be negated with the ! unary operator:
val hiddenFileFilter: PathFilter = (p, _) => Try(Files.isHidden(p)).getOrElse(false)
val notHiddenFileFilter: PathFilter = !hiddenFileFilter
They can be combined with the && operator:
val regularFileFilter: PathFilter = (_, a) => a.isRegularFile
val notHiddenFileFilter: PathFilter = (p, _) => Try(Files.isHidden(p)).getOrElse(false)
val andFilter = regularFileFilter && notHiddenFileFilter
val scalaSources =
FileTreeView.default.list(Glob("/foo/bar/src/main/scala/**/*.scala"), andFilter)
They can be combined with the || operator:
val scalaSources: PathFilter = ** / "*.scala"
val javaSources: PathFilter = ** / "*.java"
val jvmSourceFilter = scalaSources || javaSources
val jvmSourceFiles =
FileTreeView.default.list(Glob("/foo/bar/src/**"), jvmSourceFilter)
There is also an implicit conversion from String to PathFilter that converts
the String to a Glob and converts the Glob to a PathFilter:
val regularFileFilter: PathFilter = (p, a) => a.isRegularFile
val regularScalaFiles: PathFilter = regularFileFilter && "**/*.scala"
In addition to the ad-hoc filters, there are some commonly used filters that are available in the default sbt scope:
sbt.io.HiddenFileFilter — accepts any file that is hidden according to
Files.isHidden. On posix systems, this will just check if the name starts with
. while on Windows, it will need to perform io to extract the dos:hidden
attribute.
sbt.io.RegularFileFilter — equivalent to (_, a: FileAttributes) =>
a.isRegularFile
sbt.io.DirectoryFilter — equivalent to (_, a: FileAttributes) =>
a.isDirectory
There is also a converter from sbt.io.FileFilter to sbt.nio.file.PathFilter
that can be invoked by calling toNio on the sbt.io.FileFilter instance:
val excludeFilter: sbt.io.FileFilter = HiddenFileFilter || DirectoryFilter
val excludePathFilter: sbt.nio.file.PathFilter = excludeFilter.toNio
The HiddenFileFilter, RegularFileFilter and DirectoryFilter inherit both
sbt.io.FileFilter and sbt.nio.file.PathFilter. They typically can be treated
like a PathFilter:
val regularScalaFiles: PathFilter = RegularFileFilter && (** / "*.scala")
This will not work when the implicit conversion from String to PathFinder is
required.
val regularScalaFiles = RegularFileFilter && "**/*.scala"
// won't compile because it gets interpreted as
// (RegularFileFilter: sbt.io.FileFilter).&&(("**/*.scala"): sbt.io.NameFilter)
In these situations, use toNio:
val regularScalaFiles = RegularFileFilter.toNio && "**/*.scala"
It is important to note that semantics of Glob are different from
NameFilter. When using the sbt.io.FileFilter, in order to filter files
ending with the .scala extension, one would write:
val scalaFilter: NameFilter = "*.scala"
An equivalent PathFilter is written
val scalaFilter: PathFilter = "**/*.scala"
The glob represented "*.scala" matches a path with a single component ending
in scala. In general, when converting sbt.io.NameFilter to
sbt.nio.file.PathFilter, it will be necessary to add a "**/" prefix.
In addition to FileTreeView.list, there is also FileTreeView.iterator. The
latter may be used to reduce memory pressure:
// Prints all of the files on the root file system
FileTreeView.iterator(Glob("/**")).foreach { case (p, _) => println(p) }
In the context of sbt, the type parameter, T, is always (java.nio.file.Path,
sbt.nio.file.FileAttributes). An implementation of FileTreeView is provided in sbt with the fileTreeView
key:
fileTreeView.value.list(baseDirectory.value / ** / "*.txt")
The FileTreeView[+T] trait has a single abstract method:
def list(path: Path): Seq[T]
sbt only provides implementations of FileTreeView[(Path, FileAttributes)]. In
this context, the list method should return the (Path, FileAttributes) pairs
for all of the direct children of the input path.
There are two implementations of FileTreeView[(Path, FileAttribute)]
provided by sbt:
1. FileTreeView.native — this uses a native jni library to efficiently
extract the file names and attributes from the file system without performing
additional io. Native implementations are available for 64 bit FreeBSD, Linux,
Mac OS and Windows. If no native implementation is available, it falls back to a
java.nio.file based implementation.
2. FileTreeView.nio — uses apis in java.nio.file to implement
FileTreeView
The FileTreeView.default method returns FileTreeView.native.
The list and iterator methods that take Glob or Seq[Glob] as arguments
are provided as extension methods to FileTreeView[(Path, FileAttributes)].
Since any implementation of FileTreeView[(Path, FileAttributes)] automatically
receives these extensions, it is easy to write an alternative implementation
that will still correctly work with Glob and Seq[Glob]:
val listedDirectories = mutable.Set.empty[Path]
val trackingView: FileTreeView[(Path, FileAttributes)] = path => {
val results = FileTreeView.default.list(path)
listedDirectories += path
results
}
val scalaSources =
trackingView.list(Glob("/foo/bar/src/main/scala/**/*.scala")).map(_._1)
println(listedDirectories) // prints all of the directories traversed by list
sbt has long had the PathFinder api which provides a dsl for collecting files. While there is overlap, Globs are a less powerful abstraction than PathFinder. This makes them more suitable for optimization. Globs describe the what, but not the how, of a query. PathFinders combine the what and the how, which makes them more difficult to optimize. For example, the following sbt snippet:
val paths = fileTreeView.value.list(
baseDirectory.value / ** / "*.scala",
baseDirectory.value / ** / "*.java").map(_._1)
will only traverse the file system once to collect all of the scala and java sources in the project. By contrast,
val paths =
(baseDirectory.value ** "*.scala" +++
baseDirectory.value ** "*.java").allPaths
will make two passes and will thus take about twice as long to run when compared to the Glob version.