Interface StreamSourceStage<T>

Type Parameters:
T - the type of items coming out of this stage

public interface StreamSourceStage<T>
A source stage in a distributed computation pipeline that will observe an unbounded amount of data (i.e., an event stream). Timestamp handling, a prerequisite for attaching data processing stages, is not yet defined in this step. Call one of the methods on this instance to declare whether and how the data source will assign timestamps to events.
  • Method Details

    • withoutTimestamps

      StreamStage<T> withoutTimestamps()
      Declares that the source will not assign any timestamp to the events it emits. You can add them later using addTimestamps, but the behavior is different — see the note there.
    • withIngestionTimestamps

      default StreamStage<T> withIngestionTimestamps()
      Declares that the source will assign the time of ingestion as the event timestamp. It will call System.currentTimeMillis() at the moment it observes an event from the data source and assign it as the event timestamp.

      Note: when snapshotting is enabled to achieve fault tolerance, after a restart Jet replays all the events that were already processed since the last snapshot. These events will then get different timestamps. If you want your job to be fault-tolerant, the events in the stream must have a stable timestamp associated with them. The source may natively provide such timestamps (the withNativeTimestamps(long) option). If that is not appropriate, the events should carry their own timestamp as a part of their data and you can use withTimestamps(timestampFn, allowedLag to extract it.

      Note 2: if the system time goes back (such as when adjusting it), newer events will get older timestamps and might be dropped as late, because the allowed lag is 0.

    • withNativeTimestamps

      StreamStage<T> withNativeTimestamps​(long allowedLag)
      Declares that the stream will use the source's native timestamps. This is typically the message timestamp that the external system assigns as event's metadata.

      If there's no notion of native timestamps in the source, this method will throw a JetException.

      allowedLag - the allowed lag of a given event's timestamp behind the top timestamp value observed so far
    • withTimestamps

      StreamStage<T> withTimestamps​(@Nonnull ToLongFunctionEx<? super T> timestampFn, long allowedLag)
      Declares that the source will extract timestamps from the stream items.
      timestampFn - a function that returns the timestamp for each item, typically in milliseconds
      allowedLag - the allowed lag of a given event's timestamp behind the top timestamp value observed so far. The time unit is the same as the unit used by timestampFn