Modules

A PipeT module is a software component which performs a specific task. In most cases, a module needs input and produces output. Furthermore, the behavior of the module may be affected by parameters.

Each module has a module identifier, or URL. Examples of valid module URLs are:

  • sh:cat
  • sh:ls
  • pipeline:myPipeline
  • http://localhost/pipet/myModule
  • jar:/home/fred/modules/myModule.jar
  • jar:http://localhost/pipet/myModule.jar

The first part of the URL (before the colon) indicates the protocol; the second part is the location of the module. Of course, the interpretation of the location depends on the protocol.

Input and output

Modules communicate via input and output streams. They are binary data streams, so it can be anything which is serializable. It's up to the module to make sense of it.

A module has a fixed number of input and output streams, and each of them has a name and a type. For example, a module PDF2Text which transforms PDF into text could have the following I/O:

  • one input stream of type "application/pdf", named "pdf"
  • one output stream of type "text/plain", named "txt"

The names should be descriptive and unique, and indicative of the role of the input/output to the module. The types should follow the MIME type system if possible.

Module protocols

By default, PipeT knows four types of modules:

  • sh: shell command
  • jar: a JAR (Java archive) file which contains a PipeT module class
  • http: a remote module via HTTP
  • pipeline: a preconfigured pipeline of multiple modules

Shell command modules

Shell command modules should provide the command itself in the location part of their URL. For example, sh:cat is the URL of the module which calls the shell command cat.

A shell command module always has one input and one output stream, named stdin and stdout. In both cases, they are linked to the process's standard input and standard output.

See also: creating a shell command module.

JAR modules

This is the most flexible interface available. See: using the Java module API.

Remote modules

A remote module is identified by a URL, starting with http:. The module can be used by an ordinary browser or by PipeT. For details about the protocol, see: module RPC.

Pipelines

A pipeline is a set of linked modules. Pipelines are configured within a pipeline setup. Given a pipeline setup, you can use any of its pipelines as a module by specifying the name of the pipeline. For example, if a pipeline named myPipeline is configured in a pipeline setup, the pipeline can be used as a module by calling pipeline:myPipeline.