090p3tour  

Introduction

The second preview introduced the new task engine for 0.9.x. This preview expands on some features built on top of this new task model. These are:

  • Access to command line input
  • Streams: a task I/O and logging system
  • Integration with processes: piping and redirection between tasks and processes

These provided sufficient material for one post, so multi-project improvements will be covered in a follow-up. As before, type annotations are optional unless otherwise indicated. The context for all of the example code is:

   import sbt._
   import std._
   import java.io._
   import Process._

class Test(val info: ProjectInfo) extends TestProject
{
   ...
}

To follow along:

  1. Setup sbt 0.9 as indicated in the first preview
  2. Create a new project (type 'xsbt' and enter 's' for scratch)
  3. Save the project definition template above in .sbt/Test.scala (.sbt is an experiment) or project/build/Test.scala
  4. Run interactive mode ('xsbt shell')
  5. To initially load and to reload the project definition, run the 'loadp' command.
For example:

$ xsbt                                                                                                         
Project does not exist, create new project? (y/N/s) s                                                                                                                              
Getting Scala 2.8.0 ...                                                                                                                                                            
:: retrieving :: org.scala-tools.sbt#boot-scala                                                                                                                                    
        confs: [default]                                                                                                                                                           
        2 artifacts copied, 0 already retrieved (14484kB/83ms)                                                                                                                     
Getting org.scala-tools.sbt sbt_2.8.0 0.9.0-SNAPSHOT ...                                                                                                                           
:: retrieving :: org.scala-tools.sbt#boot-app                                                                                                                                      
        confs: [default]                                                                                                                                                           
        29 artifacts copied, 0 already retrieved (3726kB/36ms)
$ xsbt shell
> loadp

Command line input

The command line is available as the result of a task input of type Task[Input]. The Input interface looks like:

trait Input {
   // the full, unparsed command line
   val line: String
   // the command name
   val name: String
   // everything after the command name
   val arguments: String
   // 'arguments' split around whitespace, no escaping supported
   val splitArgs: Seq[String]
}

line is the full input String and the rest of the methods are convenience methods for processing this[1].

Grab the Input instance from the input task using functions shown in the previous article, such as by using map:

   lazy val hi =
      input map { (in: Input) =>
         println("Hi " + in.arguments)
      }

   lazy val hiAll =
      input map { (in: Input) =>
         in.splitArgs foreach( (arg: String) => println("Hi " + arg) )
      }

Sample usage of these tasks is:

 > hi A B
 Hi A B

 > hiAll A B
 Hi A
 Hi B

Because the input task is just a task, any task has access to the command line. It remains to be seen whether this will be useful or confusing in practice. However, it does loosen some restrictions imposed on method tasks, the predecessor to this approach. A method task could not be a dependency of other tasks and only the method task on the current project would be executed. Both of these will be demonstrated later.

Streams

There is a new system for logging that I call Streams (like Input/OutputStreams, not Scala's lazy list Stream type). It has several advantages, which will be demonstrated here. Similar to the command line input task, there is a task called streams that provides a TaskStreams instance. Unlike the input task, the provided TaskStreams is specific to the retrieving task[2]. This TaskStreams provides access to managed logging and persisted text and binary input and output streams. Any I/O streams used by the task are automatically closed when the task finishes executing[3].

The TaskStreams interface looks like:

sealed trait TaskStreams
{
   val log: Logger = log(default)
   def log(sid: String): Logger

   def text(sid: String = default): PrintWriter
   def binary(sid: String = default): BufferedOutputStream

   def readText(a: Task[_], sid: String = default): Task[BufferedReader]
   def readBinary(a: Task[_], sid: String = default): Task[BufferedInputStream]

   val default = "out"
}

Logging

We'll start by looking at logging. We grab a TaskStreams instance and log something to the default Logger.

   lazy val logDemo = streams map { (s: TaskStreams) =>
      s.log.info("Testing...")
   }

The first benefit is that the default Logger not only logs to the screen, but also to a file. The pre-defined last task lets us view the previous output.

  > logDemo
    [info] Testing...
  > last logDemo
    [info] Testing...

One use for this is when there is a lot of output from multiple tasks. Additionally, different logging levels can be set for printing to the screen and for persisting the messages. You could set the logging level for the screen to be quiet and verbose for the persisted file. Then, if you need to see detail, you could pull up the previous logging. Note that last is not special, merely predefined. You could write it yourself, but how to do that isn't shown here.

The Streams model allows multiple streams, each labeled with a String identifier. You can send logging to other streams by using the log method that accepts a stream id (sid).

   lazy val logExtra = streams map { (s: TaskStreams) =>
      val log: Logger = s.log("extra")
      log.info("Testing extra...")
   }

If you run 'last logExtra', nothing is printed to the screen because 'logExtra' didn't send anything to the default stream. So, the 'last' command is currently simple, but it could be modified to show a particular stream instead of the default output stream.

Text and Binary I/O

Basics

In addition to logging, a task can send arbitrary text and binary data to a stream.

   lazy val echo =
      (streams, input) map { (s: TaskStreams, in: Input) =>
         val out: PrintWriter = s.text()
         out.println(in.arguments)
      }

   lazy val hello =
      (streams, input) map { (s: TaskStreams, in: Input) =>
         val out: PrintWriter = s.text()
         in.splitArgs foreach { arg => out.println("Hi " + arg) }
      }

text() uses the default stream. text("id") would use the stream labeled id.

Example usage:

 > echo some input
 > hello some input
 > last echo
 some input
 > last hello
 Hi input
 Hi some

The data sent to a stream can then be read from other tasks. The following reads the output stream of the echo task just defined and sends it to standard output.

   lazy val goodbye = echo text { (in: BufferedReader) =>
      println("Goodbye " + in.readLine())
   }

Example output:

> goodbye for now
Goodbye for now

One thing to note here is that goodbye is not processing the command line. goodbye depends on echo, which is the one processing the arguments. To see this, use 'last':

> last goodbye
> last echo
for now

As mentioned in the command line input section, I'm not sure if this is useful or confusing in practice. Finally, if echo sent output to a different stream, you would use need to specify that stream id to text.

Streams and Process API

There is some integration between the Process API and Streams. You can pipe between Tasks and Processes and the output of a Process can be persisted like a Task. Note that there is an overhead to this compared to piping between two real Processes due to reading/writing from an intermediate backing File.

   // the exit code of the 'grep' process is the result of the 'grep' task.
   lazy val grep: Task[Int] =
      hello #| "grep me"

   // 'lines' is a method of type Task[Seq[String]]
   //  that gets the lines of output from a task, here from 'grep'
   lazy val grepLines =
      grep.lines map( (ls: List[String]) => println(ls) )

   lazy val grepToFile = grep #> new File("test")

Example output:

 > grep you me someone
 > last grep
 Hi someone
 Hi me
 > grepLines you me someone
List(Hi me, Hi someone)
 > grepToFile you me someone
 $ cat test
 Hi me
 Hi someone

Streams and SBinary

As a final example, let's store some binary data using SBinary. This library is now distributed with sbt and is available for use in project definitions (it is used by sbt itself for caching data).

      import sbinary._
      import JavaIO._
      import DefaultProtocol._

   lazy val write = streams map { (s: TaskStreams) =>
      // define some data
      val myData = (3, "asdf")
      // get the stream to write to
      val out: BufferedOutputStream = s.binary("data")
      // write the data to the stream using SBinary
      Operations.write(out , myData)
   }

   // get the 'data' stream for the 'write' task
   lazy val read = write.binary("data") { (stream: BufferedInputStream) =>
      // read in the data using SBinary
      val (i, st) = Operations.read[(Int,String)]( stream )
      // show it
      println("i: " + i)
      println("st: " + st)
   }

Example output:

> read                     
i: 3
st: asdf

Like with the result of a Task, multiple tasks can read a task's streams (they are not consumed by the first task to read from them), so you could have multiple read-like tasks in the same execution.

For the last example with SBinary, write could have just returned (3, "asdf") as its result. Also, in the examples shown so far, reading a task's streams forces that task to run first. Because the stream is persisted, this is not a strict requirement. The task could be a 'soft' dependency, where it would only run if some other task needs it to. This is how 'last' is implemented. However, because 'flatMap' was included in the task model, it is difficult to properly integrate soft dependencies and so they are not really ready for general use. Another pending addition is access to the underlying File. This would allow Streams to be used as the backing for caches.

Notes

  1. command is the initial, continuous run of symbolic characters or of alphanumeric characters (but not mixed).
  2. This is accomplished by a transformation on tasks that depend on streams. Several features of the task engine are implemented this way.
  3. This means you can't pass around a reference to the InputStream/Logger/OutputStream, although that would be considered poor practice anyway.