public interface Processor
Processor
on each cluster member to do the work of a given vertex. The
vertex's localParallelism
property controls the number of
processors per member.
The processor is a single-threaded processing unit that performs the computation needed to transform zero or more input data streams into zero or more output streams. Each input/output stream corresponds to an edge on the vertex. The correspondence between a stream and an edge is established via the edge's ordinal.
The special case of zero input streams applies to a source vertex, which gets its data from the environment. The special case of zero output streams applies to a sink vertex, which pushes its data to the environment.
The processor accepts input from instances of Inbox
and pushes
its output to an instance of Outbox
.
See the isCooperative()
for important restrictions to how the
processor should work.
Modifier and Type | Interface and Description |
---|---|
static interface |
Processor.Context
Context passed to the processor in the
init() call. |
Modifier and Type | Method and Description |
---|---|
default void |
close()
Called as the last method in the processor lifecycle.
|
default boolean |
complete()
Called after all the inbound edges' streams are exhausted.
|
default boolean |
completeEdge(int ordinal)
Called after the edge input with the supplied
ordinal is
exhausted. |
default boolean |
finishSnapshotRestore()
Called after a job was restarted from a snapshot and the processor
has consumed all the snapshot data.
|
default void |
init(Outbox outbox,
Processor.Context context)
Initializes this processor with the outbox that the processing methods
must use to deposit their output items.
|
default boolean |
isCooperative()
Tells whether this processor is able to participate in cooperative
multithreading.
|
default void |
process(int ordinal,
Inbox inbox)
Called with a batch of items retrieved from an inbound edge's stream.
|
default void |
restoreFromSnapshot(Inbox inbox)
Called when a batch of items is received during the "restore from
snapshot" operation.
|
default boolean |
saveToSnapshot()
Stores its snapshotted state by adding items to the outbox's snapshot bucket.
|
default boolean |
tryProcess()
This method will be called periodically and only when the current batch
of items in the inbox has been exhausted.
|
default boolean |
tryProcessWatermark(Watermark watermark)
Tries to process the supplied watermark.
|
default boolean isCooperative()
A cooperative processor should also not attempt any blocking operations,
such as I/O operations, waiting for locks/semaphores or sleep
operations. Violations of this rule will manifest as less than 100% CPU
usage under maximum load (note that this is possible for other reasons too,
for example if the network is the bottleneck or if parking time is too high).
The processor must also return as soon as the outbox rejects an item
(that is when the offer()
method returns
false
).
If this processor declares itself cooperative, it will share a thread with other cooperative processors. Otherwise it will run in a dedicated Java thread.
Jet prefers cooperative processors because they result in a greater overall throughput. A processor should be non-cooperative only if it involves blocking operations, which would cause all other processors on the same shared thread to starve.
Processor instances of a single vertex are allowed to return different values, but a single processor instance must always return the same value.
The default implementation returns true
.
default void init(@Nonnull Outbox outbox, @Nonnull Processor.Context context) throws Exception
process(int, Inbox)
and complete()
).
The default implementation does nothing.
Exception
default void process(int ordinal, @Nonnull Inbox inbox)
If the method returns with items still present in the inbox, it will be called again before proceeding to call any other methods. There is at least one item in the inbox when this method is called.
The default implementation throws an exception. It is suitable for source processors.
ordinal
- ordinal of the inbound edgeinbox
- the inbox containing the pending itemsdefault boolean tryProcessWatermark(@Nonnull Watermark watermark)
The implementation may choose to process only partially and return
false
, in which case it will be called again later with the same
timestamp
before any other processing method is called. Before
the method returns true
, it should emit the watermark to the
downstream processors.
The default implementation throws an exception. For any non-sink
processor you must provide an implementation that at least forwards the
watermark. A sink processor may simply return true
.
watermark
- watermark to be processedtrue
if this watermark has now been processed,
false
otherwise.default boolean tryProcess()
If the call returns false
, it will be called again before
proceeding to call any other method. Default implementation returns
true
.
If this method tried to offer to the outbox and the offer call returned false, this method must also return false and retry to offer in the next call.
default boolean completeEdge(int ordinal)
ordinal
is
exhausted. If it returns false
, it will be called again before
proceeding to call any other method.
If this method tried to offer to the outbox and the offer call returned false, this method must also return false and retry the offer in the next call.
true
if the processor is now done completing the edge,
false
otherwise.default boolean complete()
false
, it will be invoked again until it returns true
.
For example, a streaming source processor will return false
forever. Unlike other methods which guarantee that no other method is
called until they return true
, saveToSnapshot()
can be
called even though this method returned false
. If you returned
because Outbox.offer()
returned false
, make sure to
first offer the pending item to the outbox in saveToSnapshot()
before continuing to offer to
snapshot.
After this method is called, no other processing methods are called on
this processor, except for saveToSnapshot()
.
Non-cooperative processors are required to return from this method from time to time to give the system a chance to check for snapshot requests and job cancellation. The time the processor spends in this method affects the latency of snapshots and job cancellations.
true
if the completing step is now done, false
otherwise.default boolean saveToSnapshot()
false
, it will be called again before proceeding to call
any other method.
This method will only be called after a call to process()
returns with an empty inbox. After all the input is
exhausted, it is also called between complete()
calls. Once
complete()
returns true
, this method won't be called
anymore.
The default implementation takes no action and returns true
.
default void restoreFromSnapshot(@Nonnull Inbox inbox)
Map.Entry
. May emit items to the outbox.
If it returns with items still present in the inbox, it will be called again before proceeding to call any other methods. It is never called with an empty inbox.
The default implementation throws an exception.
default boolean finishSnapshotRestore()
If it returns false
, it will be called again before proceeding
to call any other methods.
If this method tried to offer to the outbox and the offer call returned false, this method must also return false and retry the offer in the next call.
The default implementation takes no action and returns true
.
default void close() throws Exception
ProcessorSupplier.close(java.lang.Throwable)
is called on this member. The method might get
called even if init(com.hazelcast.jet.core.Outbox, com.hazelcast.jet.core.Processor.Context)
method was not yet called.
The method will be called right after complete()
returns true
, that is before the job is finished. The job might still be
running other processors.
If this method throws an exception, it is logged but it won't be reported as a job failure or cause the job to fail.
Exception
Copyright © 2019 Hazelcast, Inc.. All rights reserved.