Interface Stage

All Known Subinterfaces:
BatchStage<T>, GeneralStage<T>, SinkStage, StreamStage<T>

public interface Stage
The basic element of a Jet pipeline, represents a computation step. It accepts input from its upstream stages (if any) and passes its output to its downstream stages (if any). Jet differentiates between batch stages that represent finite data sets (batches) and stream stages that represent infinite data streams. Some operations only make sense on a batch stage and vice versa.

To build a pipeline, start with pipeline.readFrom() to get the initial stage and then use its methods to attach further downstream stages. Terminate the pipeline by calling stage.writeTo(sink), which will attach a SinkStage.

Since:
3.0
  • Method Summary

    Modifier and Type Method Description
    Pipeline getPipeline()
    Returns the Pipeline this stage belongs to.
    String name()
    Returns the name of this stage.
    Stage setLocalParallelism​(int localParallelism)
    Sets the preferred local parallelism (number of processors per Jet cluster member) this stage will configure its DAG vertices with.
    Stage setName​(String name)
    Overrides the default name of the stage with the name you choose and returns the stage.
  • Method Details

    • getPipeline

      Pipeline getPipeline()
      Returns the Pipeline this stage belongs to.
    • setLocalParallelism

      @Nonnull Stage setLocalParallelism​(int localParallelism)
      Sets the preferred local parallelism (number of processors per Jet cluster member) this stage will configure its DAG vertices with. Jet always uses the same number of processors on each member, so the total parallelism automatically increases if another member joins the cluster.

      While most stages are backed by 1 vertex, there are exceptions. If a stage uses two vertices, each of them will have the given local parallelism, so in total there will be twice as many processors per member.

      The default value is -1 and it signals to Jet to figure out a default value. Jet will determine the vertex's local parallelism during job initialization from the global default and the processor meta-supplier's preferred value.

      Returns:
      this stage
    • setName

      @Nonnull Stage setName​(@Nonnull String name)
      Overrides the default name of the stage with the name you choose and returns the stage. This can be useful for debugging purposes, to better distinguish pipeline stages in the diagnostic output.
      Parameters:
      name - the stage name
      Returns:
      this stage
    • name

      Returns the name of this stage. It's used in diagnostic output.