public final class S3Sources extends Object
Modifier and Type | Method and Description |
---|---|
static <T> BatchSource<T> |
s3(List<String> bucketNames,
String prefix,
Charset charset,
SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier,
BiFunctionEx<String,String,? extends T> mapFn)
Creates an AWS S3
BatchSource which lists all the objects in the
bucket-list using given prefix , reads them line by line,
transforms each line to the desired output object using given mapFn and emits them to downstream. |
static BatchSource<String> |
s3(List<String> bucketNames,
String prefix,
SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier)
Convenience for
s3(List, String, Charset, SupplierEx, BiFunctionEx) . |
@Nonnull public static BatchSource<String> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier)
s3(List, String, Charset, SupplierEx, BiFunctionEx)
.
Emits lines to downstream without any transformation and uses StandardCharsets.UTF_8
.@Nonnull public static <T> BatchSource<T> s3(@Nonnull List<String> bucketNames, @Nullable String prefix, @Nonnull Charset charset, @Nonnull SupplierEx<? extends software.amazon.awssdk.services.s3.S3Client> clientSupplier, @Nonnull BiFunctionEx<String,String,? extends T> mapFn)
BatchSource
which lists all the objects in the
bucket-list using given prefix
, reads them line by line,
transforms each line to the desired output object using given mapFn
and emits them to downstream.
The source does not save any state to snapshot. If the job is restarted, it will re-emit all entries.
The default local parallelism for this processor is 2.
Here is an example which reads the objects from a single bucket with applying the given prefix.
Pipeline p = Pipeline.create();
BatchStage<String> srcStage = p.drawFrom(S3Sources.s3(
Arrays.asList("bucket1", "bucket2"),
"prefix",
StandardCharsets.UTF_8,
() -> S3Client.create(),
(filename, line) -> line
));
T
- the type of the items the source emitsbucketNames
- list of bucket-namesprefix
- the prefix to filter the objects. Optional, passing
null
will list all objects.clientSupplier
- function which returns the s3 client to use
one client per processor instance is usedmapFn
- the function which creates output object from each
line. Gets the object name and line as parametersCopyright © 2019 Hazelcast, Inc.. All rights reserved.