090p2tour  

Introduction

This is the second preview of new features in 0.9.x. The feature this time is the new task engine. The main change is that tasks now produce a value. Another change is that cross-building is at the task level and can be done for more variables than just the Scala version.

I believe the model behind tasks to be simple, but powerful. Once you understand the model, using tasks should be straightforward and without surprises. I am interested in making this documentation clear and thorough to minimize the time to understanding the model. Your feedback is appreciated.

Setup

You can follow along by setting up a project as in 090p1tour, having your project definition extend SingleProject (a trait for testing out tasks) and import from sbt and sbt.std:

import sbt._
import std._

class TaskTest extends SingleProject
{
  ...
}

Remember to run the appropriate 'load' after each change for it to take effect. If you have put your project definition in project/build:

> load -src project/build/*.scala -name TaskTest

All of the task examples are available at http://gist.github.com/556888.

Basics

Task[T] represents a computation that produces a value of type T. To create a new task that has no inputs, use the task method. The argument to the task method is the code to run when the task is executed.

  lazy val hello: Task[Unit] =
    task { println("Hi!") }

  lazy val three: Task[Int] =
    task { 3 }

Note: the explicit type annotations in the examples are not necessary unless otherwise indicated. You can see the result of a task by running it with the show task. For example:

> show hello three
Hi!
hello: ()
three: 3

To declare a dependency on another task, use dependsOn. For example:

  lazy val goodbye: Task[Unit] = task { println("Goodbye") } dependsOn(hello)

To use the value from another task, use map:

  lazy val four: Task[Int] = three map { (t: Int) => t + 1 }

Or, a bit more concisely:

  lazy val four = three map { _ + 1 }

The next fundamental method is flatMap. This takes the result of one Task and provides another Task that will provide the result. For example, the following uses the output of the rand task to determine if either four or three should execute.

  lazy val rand: Task[Boolean] = task { math.random > 0.5 }
  lazy val num: Task[Int] = rand flatMap { (b: Boolean) => if(b) four else three }

If rand produces true, the four task is run and its result is the result of num. Otherwise, the three task runs and its result is used for the result of num. Note that in both cases the three tasks runs- once directly and once indirectly because four uses its result.

Parallel

Those were the basic serial methods[1]. There are also versions of map and flatMap that operate on multiple input tasks[2]. They do not impose an ordering on the input tasks, so you might refer to them as parallel map and flatMap. For example:

  lazy val par: Task[Int] = (three, four) map { (t: Int, f: Int) => t * f }

This means "create a new Task that takes the results of three and four and multiplies them together". Note that for the parallel map/flatMap, explicit parameter types are required for the function literal[3]. In order to abstract over arity, the native data structures involed are heterogeneous lists. Implicit conversions translate the above syntax to:

  lazy val par: Task[Int] = three :^: four :^: KNil map { case t :+: f :+: HNil => t * f }

Because of this, the compiler cannot infer the type parameters for the standard function literal. These implicit conversions are available for arity 2 and 3. For arbitrary arity or if you want type inference for the mapping function, you need to use the heterogenous list approach.

There are also methods to apply a function across a sequence in parallel, to combine the results of a sequence of tasks, or to reduce the results of a sequence of tasks.

  lazy val forkReduce: Task[Int] = (0 to 1000).fork( _ * 3).reduce { _ + _ }
  lazy val forkJoin: Task[Seq[Int]] = (0 to 1000).fork(_ * 3).join

(This is just an example. The overhead is too high for this to be faster than a direct computation.)

Error Handling

If a Task fails by throwing an Exception, tasks that need the failing task's result also fail. There are methods that are similar to Scala's catch/finally that modify this behavior.

task mapFailure f only succeeds if task fails, in which case the error handling function f is evaluated for the failure. Failure is represented by the Incomplete class. The main purpose of Incomplete is to track multiple sources of failure, since tasks form a graph. A simple example that prints the failing stack trace:

  lazy val fail: Task[Int] = task { error("A failure.") }
  lazy val catchLike: Task[Unit] = fail mapFailure { (t: Incomplete) => t.printStackTrace }

This just prints the stack trace of the top-level failure, but more useful error reporting would need to report the full graph of failures.

Like an exception in normal code, the failure propagates through the dependency graph. If all dependent tasks "catch" the failure, propagation of the failure ceases. As with catch, you must rethrow the exception if you want the catching function to fail as well. There is also flatMapFailure that allows returning a new Task like flatMap does when the input succeeds. Both methods are also available in parallel. The parallel map/flatMapFailure succeeds if any input fails and provides a Seq[Incomplete] to the error handling function.

To always run some code regardless of the success of a task, use andFinally. This works like finally. The provided code is run and the result of the initial task, either a successful value or an exception, is propagated.

  lazy val alwaysRunA: Task[Int] = fail andFinally { println("Finally 1") }
  lazy val alwaysRunB: Task[Int] = four andFinally { println("Finally 2") }

alwaysRunA will print "Finally 1" and then propagate the exception thrown by fail. alwaysRunB will print "Finally 2" and then return the result of four, which is 4.

To chain tasks and ignore their outputs, use && or ||. For &&, if a fails, a && b fails and b is not run (unless, of course, another task requires b to run). Otherwise, the result of a is ignored and the final result is that of b.

  // succeeds because 'hello' succeeds.  The result is that of the final task, which is `three` here.
  lazy val allA: Task[Int] = hello && three

  // fails because 'fail' fails
  lazy val allB: Task[Unit] = fail && hello

For a || b, if a fails, the result of a || b is that of b. If a succeeds, the result is that of a and b is not run. Each of following returns 4.

  lazy val orA: Task[Int] = fail || four
  lazy val orB: Task[Int] = four || fail
  lazy val orC: Task[Int] = four || three

Actually, ||, &&, map*, flatMap*, andFinally, reduce, and fork (that is, everything mentioned so far except task, dependsOn and join), are implemented in terms of two primitive functions (four, counting the parallel versions separately). Instead of requiring success and mapping on the resulting value or requiring failure and mapping on the cause, these always run and map on Result[T]. Result[T] is isomorphic to Either[Incomplete, T] and is used instead of Either for type inference[4].

The primitive functions are mapR and flatMapR, where the R is for Result [5].

  lazy val fullA: Task[Int] = four mapR {
    case Inc(i) => 0
    case Value(v) => v
  }

Inc[T] is like Left[Incomplete, T] and Value[T] is like Right[Incomplete, T].

Cross-building

Cross-building (running tasks against multiple configurations) is built into the task model. A task produces a result for a specific configuration. You define a configuration variable by declaring the type of value and a label for that value.

For example, the following declares two configuration variables. ScalaVersion ranges over Strings, while Number ranges over Ints. In the default sbt model, ScalaVersion will likely be predefined and most users won't have to worry about this part.

  val ScalaVersion = AttributeKey[String]("scala-version")
  val Number = AttributeKey[Int]("num")

You define the values to cross-build against with the cross task, which accepts the configuration and the values. For example:

  lazy val scalaVersion: Task[String] = cross(ScalaVersion)("2.7.7", "2.8.0")
  lazy val number: Task[Int] = cross(Number)(1,2,3)

Note that the result types are not Seq[String] and Seq[Int]. This means that cross-building is not present in the type of a task. Cross-building is handled mostly transparently after this point with some of restrictions.

The idea is that we use the methods presented above and the task system takes care of applying them across each configuration.

  lazy val printVersion: Task[Unit] = scalaVersion map println
  lazy val printNumber: Task[Unit] = number map println

  lazy val printBoth: Task[Unit] = (scalaVersion, number) map { (v: String, n: Int) => println(n + ": " + v) }

The task printBoth does a parallel map on scalaVersion and number. These have different configurations, though. scalaVersion produces a value for ScalaVersion values "2.7.7" and "2.8.0" and number produces a value for Number values 1,2, and 3. The parallel map does a cross product of its inputs. So, we now have a task printBoth that ranges over these configurations:

  AttributeMap(Number -> 1, ScalaVersion -> "2.7.7")
  AttributeMap(Number -> 2, ScalaVersion -> "2.7.7")
  AttributeMap(Number -> 3, ScalaVersion -> "2.7.7")
  AttributeMap(Number -> 1, ScalaVersion -> "2.8.0")
  AttributeMap(Number -> 2, ScalaVersion -> "2.8.0")
  AttributeMap(Number -> 3, ScalaVersion -> "2.8.0")

Each configuration is represented by an AttributeMap, which is a map that ensures that a key of type AttributeKey[T] is mapped to a value of type T. If we then map on printBoth, we will cross build against each of these six configurations.

If you do something like:

  lazy val numA: Task[Int] = cross(Number)(1,2,3)
  lazy val numB: Task[Int] = cross(Number)(4,5,6)
  (numA, numB) map { ...}

sbt will complain that there are no compatible configurations. There is no result for numA for Number values 4,5,6. Similarly, numB has no result for 1,2,3. Therefore, this constraint cannot be satisfied. I don't expect this to come up much, since I don't know when you would declare different values to cross-build like this, except in error. Most people will probably just use the cross-building built into sbt by default. If anyone starts doing more interesting cross-builds, more details on this can be documented.

To pull the result of all cross-builds of a Task into an explicit value, use merge. This takes the tasks for all configurations and puts their results into Seq[(AttributeMap, T)]. For example:

  lazy val stringBoth: Task[String] = (scalaVersion, number) map { (v: String, n: Int) => n + ": " + v }
  lazy val merged: Task[Seq[(AttributeMap, String)]] = stringBoth.merge

The result of merged is a Seq that contains a mapping from each configuration (AttributeMap) to the result (here, String).

As a final note on cross-building, it is done at construction time. This places a limitation on returning cross-built tasks in a flatMap. You cannot introduce new configurations in a flatMap. Consider the following examples:

  lazy val onBoth = stringBoth flatMap { #1 }
  lazy val onNum = number flatMap { #2 }
  lazy val onHello = hello flatMap { #3 }
  • For #1, you can return a task that is not cross-built or a task that is cross-built against Number, ScalaVersion, or both. This is because stringBoth is built against configurations with both variables Number and ScalaVersion.
  • For #2, you can return a task cross-built against Number or a non-cross-built task because number is cross-built for Number.
  • For #3, you can only return a task that is not cross-built against anything because hello is not cross-built.

Next part

  • Streams: a task I/O and logging system
  • Integration with processes: piping and redirection between tasks and processes
  • Controlling multi-project aggregation
  • Access to command line input

Notes

  1. You might notice that they give rise to a monad. See also scalaz's Promise.
  2. These are related to liftA2, liftA3, ... This usage is preferred to nested flatMaps (as in for comprehensions) because no ordering is imposed on the inputs and so they can be executed in parallel.
  3. This is a rough edge. Sorry.
  4. http://lampsvn.epfl.ch/trac/scala/ticket/2712
  5. Suggestions for better names welcome.
  6. Conceptually, a task is the combination of three monads: one for Task[_], one for Either[Incomplete, _], and one for Map[AttributeMap, _]. If it were fully represented in the type system, the type of a task might look like: Task[Map[AttributeMap, Either[Incomplete, T]]].