com.hazelcast.jet.pipeline (hazelcast-jet-distribution 3.0 API)

Interface Summary
Interface	Description
BatchSource<T>	A finite source of data for a Jet pipeline.
BatchStage<T>	A stage in a distributed computation `pipeline` that will observe a finite amount of data (a batch).
BatchStageWithKey<T,K>	An intermediate step while constructing a group-and-aggregate batch pipeline stage.
GeneralStage<T>	The common aspect of `batch` and `stream` pipeline stages, defining those operations that apply to both.
GeneralStageWithKey<T,K>	An intermediate step when constructing a group-and-aggregate pipeline stage.
Pipeline	Models a distributed computation job using an analogy with a system of interconnected water pipes.
Sink<T>	A data sink in a Jet pipeline.
SinkStage	A pipeline stage that doesn't allow any downstream stages to be attached to it.
SourceBuilder.SourceBuffer<T>	The buffer object that the `fillBufferFn` gets on each call.
SourceBuilder.TimestampedSourceBuffer<T>	The buffer object that the `fillBufferFn` gets on each call.
Stage	The basic element of a Jet `pipeline`, represents a computation step.
StageWithKeyAndWindow<T,K>	Represents an intermediate step in the construction of a pipeline stage that performs a windowed group-and-aggregate operation.
StageWithWindow<T>	Represents an intermediate step in the construction of a pipeline stage that performs a windowed aggregate operation.
StreamSource<T>	An infinite source of data for a Jet pipeline.
StreamSourceStage<T>	A source stage in a distributed computation `pipeline` that will observe an unbounded amount of data (i.e., an event stream).
StreamStage<T>	A stage in a distributed computation `pipeline` that will observe an unbounded amount of data (i.e., an event stream).
StreamStageWithKey<T,K>	An intermediate step while constructing a windowed group-and-aggregate pipeline stage.

Class Summary
Class	Description
AggregateBuilder<R0>	Offers a step-by-step API to build a pipeline stage that co-aggregates the data from several input stages.
AggregateBuilder1<T0>	Offers a step-by-step API to build a pipeline stage that co-aggregates the data from several input stages.
ContextFactories	Utility class with methods that create several useful `context factories`.
ContextFactory<C>	A holder of functions needed to create and destroy a context object.
FileSinkBuilder<T>	See `Sinks.filesBuilder(java.lang.String)`.
FileSourceBuilder	Builder for a file source which reads lines from files in a directory (but not its subdirectories) and emits output object created by `mapOutputFn`
GeneralHashJoinBuilder<T0>	Offers a step-by-step fluent API to build a hash-join pipeline stage.
GenericPredicates	Generic wrappers for methods in `Predicates`.
GroupAggregateBuilder<K,R0>	Offers a step-by-step API to build a pipeline stage that co-groups and aggregates the data from several input stages.
GroupAggregateBuilder1<T0,K>	Offers a step-by-step API to build a pipeline stage that co-groups and aggregates the data from several input stages.
HashJoinBuilder<T0>	Offers a step-by-step fluent API to build a hash-join pipeline stage.
JmsSinkBuilder<T>	See `Sinks.jmsQueueBuilder(com.hazelcast.jet.function.SupplierEx<javax.jms.ConnectionFactory>)` or `Sinks.jmsTopicBuilder(com.hazelcast.jet.function.SupplierEx<javax.jms.ConnectionFactory>)`.
JmsSourceBuilder	See `Sources.jmsQueueBuilder(com.hazelcast.jet.function.SupplierEx<? extends javax.jms.ConnectionFactory>)` or `Sources.jmsTopicBuilder(com.hazelcast.jet.function.SupplierEx<? extends javax.jms.ConnectionFactory>)`.
JoinClause<K,T0,T1,T1_OUT>	Specifies how to join an enriching stream to the primary stream in a `hash-join` operation.
SessionWindowDefinition	Represents the definition of a session window.
SinkBuilder<W,T>	See `SinkBuilder.sinkBuilder(String, FunctionEx)`.
Sinks	Contains factory methods for various types of pipeline sinks.
SlidingWindowDefinition	Represents the definition of a sliding window.
SourceBuilder<S>	Top-level class for Jet source builders.
Sources	Contains factory methods for various types of pipeline sources.
StreamHashJoinBuilder<T0>	Offers a step-by-step fluent API to build a hash-join pipeline stage.
WindowAggregateBuilder<R0>	Offers a step-by-step fluent API to build a pipeline stage that performs a windowed co-aggregation of the data from several input stages.
WindowAggregateBuilder1<T0>	Offers a step-by-step fluent API to build a pipeline stage that performs a windowed co-aggregation of the data from several input stages.
WindowDefinition	The definition of the window for a windowed aggregation operation.
WindowGroupAggregateBuilder<K,R0>	Offers a step-by-step API to build a pipeline stage that performs a windowed co-grouping and aggregation of the data from several input stages.
WindowGroupAggregateBuilder1<T0,K>	Offers a step-by-step API to build a pipeline stage that performs a windowed co-grouping and aggregation of the data from several input stages.

Enum Summary
Enum Description

JournalInitialPosition
When passed to an IMap/ICache Event Journal source, specifies which event to start from.

Enum Summary
Enum	Description
JournalInitialPosition	When passed to an IMap/ICache Event Journal source, specifies which event to start from.

Package com.hazelcast.jet.pipeline Description

The Pipeline API is Jet's high-level API to build and execute distributed computation jobs. It models the computation using an analogy with a system of interconnected water pipes. The data flows from the pipeline's sources to its sinks. Pipes can bifurcate and merge, but there can't be any closed loops (cycles).

The basic element is a pipeline stage which can be attached to one or more other stages, both in the upstream and the downstream direction. A pipeline accepts the data coming from its upstream stages, transforms it, and directs the resulting data to its downstream stages.

Kinds of transformation performed by pipeline stages

Basic

Basic transformations have a single upstream pipeline and statelessly transform individual items in it. Examples are map, filter, and flatMap.

Grouping and aggregation

The aggregate*() transformations perform an aggregate operation on a set of items. You can call stage.groupingKey() to group the items by a key and then Jet will aggregate each group separately. For stream stages you must specify a stage.window() which will transform the infinite stream into a series of finite windows. If you specify more than one input stage for the aggregation (using stage.aggregate2(), stage.aggregate3() or stage.aggregateBuilder(), the data from all streams will be combined into the aggregation result. The AggregateOperation you supply must define a separate accumulate primitive for each contributing stream. Refer to its Javadoc for further details.

Hash-join

Hash-join is a special kind of joining transform, specifically tailored to the use case of data enrichment. It is an asymmetrical join that joins one or more enriching stages to the primary stage The source for an enriching stage is most typically a key-value store (such as a Hazelcast IMap). It must be a batch stage and each item must have a distinct join key. The primary stage, on the other hand, may be either a batch or a stream stage and may contain duplicate keys.

For each of the enriching stages there is a separate pair of functions to extract the joining key on both sides. For example, a Trade can be joined with both a Broker on trade.getBrokerId() == broker.getId() and a Product on trade.getProductId() == product.getId(), and all this can happen in a single hash-join transform.

Implementationally, the hash-join transform is optimized for throughput so that each computing member has a local copy of all the enriching data, stored in hashtables (hence the name). The enriching streams are consumed in full before ingesting any data from the primary stream.