Interface Pipeline


public interface Pipeline
Models a distributed computation job using an analogy with a system of interconnected water pipes. The basic element is a stage which can be attached to one or more other stages. A stage accepts the data coming from its upstream stages, transforms it, and directs the resulting data to its downstream stages.

The Pipeline object is a container of all the stages defined on a pipeline: the source stages obtained directly from it by calling readFrom(BatchSource) as well as all the stages attached (directly or indirectly) to them.

Note that there is no simple one-to-one correspondence between pipeline stages and Core API's DAG vertices. Some stages map to several vertices (e.g., grouping and co-grouping are implemented as a cascade of two vertices) and some stages may be merged with others into a single vertex (e.g., a cascade of map/filter/flatMap stages can be fused into one vertex).

Since:
3.0
  • Method Summary

    Modifier and Type Method Description
    static Pipeline create()
    Creates a new, empty pipeline.
    <T> BatchStage<T> readFrom​(BatchSource<? extends T> source)
    Returns a pipeline stage that represents a bounded (batch) data source.
    <T> StreamSourceStage<T> readFrom​(StreamSource<? extends T> source)
    Returns a pipeline stage that represents an unbounded data source (i.e., an event stream).
    DAG toDag()
    Transforms the pipeline into a Jet DAG, which can be submitted for execution to a Jet instance.
    String toDotString()
    Returns a DOT format (graphviz) representation of the Pipeline.
    <T> SinkStage writeTo​(Sink<? super T> sink, GeneralStage<? extends T> stage0, GeneralStage<? extends T> stage1, GeneralStage<? extends T>... moreStages)
    Attaches the supplied sink to two or more pipeline stages.
  • Method Details

    • create

      @Nonnull static Pipeline create()
      Creates a new, empty pipeline.
      Since:
      3.0
    • readFrom

      @Nonnull <T> BatchStage<T> readFrom​(@Nonnull BatchSource<? extends T> source)
      Returns a pipeline stage that represents a bounded (batch) data source. It has no upstream stages and emits the data (typically coming from an outside source) to its downstream stages.
      Type Parameters:
      T - the type of source data items
      Parameters:
      source - the definition of the source from which the stage reads data
    • readFrom

      @Nonnull <T> StreamSourceStage<T> readFrom​(@Nonnull StreamSource<? extends T> source)
      Returns a pipeline stage that represents an unbounded data source (i.e., an event stream). It has no upstream stages and emits the data (typically coming from an outside source) to its downstream stages.
      Type Parameters:
      T - the type of source data items
      Parameters:
      source - the definition of the source from which the stage reads data
    • writeTo

      @Nonnull <T> SinkStage writeTo​(@Nonnull Sink<? super T> sink, @Nonnull GeneralStage<? extends T> stage0, @Nonnull GeneralStage<? extends T> stage1, @Nonnull GeneralStage<? extends T>... moreStages)
      Attaches the supplied sink to two or more pipeline stages. Returns the SinkStage representing the sink. You need this method when you want to drain more than one stage to the same sink. In the typical case you'll use GeneralStage.writeTo(Sink) instead.
      Type Parameters:
      T - the type of data being drained to the sink
    • toDag

      @Nonnull DAG toDag()
      Transforms the pipeline into a Jet DAG, which can be submitted for execution to a Jet instance.
    • toDotString

      @Nonnull String toDotString()
      Returns a DOT format (graphviz) representation of the Pipeline.