Class AvroSourceBuilder<D>

Type Parameters:
D - the type of the datum read by datumReaderSupplier

public final class AvroSourceBuilder<D>
extends Object
Builder for an Avro file source which reads records from Avro files in a directory (but not its subdirectories) and emits output object created by mapOutputFn.
  • Method Details

    • glob

      public AvroSourceBuilder<D> glob​(@Nonnull String glob)
      Sets the globbing mask, see getPathMatcher(). Default value is "*" which means all files.
    • sharedFileSystem

      public AvroSourceBuilder<D> sharedFileSystem​(boolean sharedFileSystem)
      Sets if files are in a shared storage visible to all members. Default value is false

      If sharedFileSystem is true, Jet will assume all members see the same files. They will split the work so that each member will read a part of the files. If sharedFileSystem is false, each member will read all files in the directory, assuming the are local.

    • build

      public <T> BatchSource<T> build​(@Nonnull BiFunctionEx<String,​? super D,​T> mapOutputFn)
      Builds a custom Avro file BatchSource with supplied components and the output function mapOutputFn.

      The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.

      Any IOException will cause the job to fail. The files must not change while being read; if they do, the behavior is unspecified.

      The default local parallelism for this processor is 4 (or available CPU count if it is less than 4).

      Type Parameters:
      T - the type of the items the source emits
      mapOutputFn - the function which creates output object from each record. Gets the filename and record read by datumReader as parameters
    • build

      public BatchSource<D> build()
      Convenience for build(BiFunctionEx). Source emits records read by datumReader to downstream without any transformation.