- Type Parameters:
T- type of item the partitioner accepts
- All Superinterfaces:
- All Known Implementing Classes:
- Functional Interface:
- This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.
@FunctionalInterface public interface Partitioner<T> extends Serializable
DAGedge that decides on the partition ID of an item traveling over it. The partition ID determines which cluster member and which instance of
Processoron that member an item will be forwarded to.
Jet's partitioning piggybacks on Hazelcast partitioning. Standard Hazelcast protocols are used to distribute partition ownership over the members of the cluster. However, if a DAG edge is configured as non-distributed, then on each member there will be some destination processor responsible for any given partition.
Nested Class Summary
Nested Classes Modifier and Type Interface Description
Partitioner.DefaultPartitioner which applies the default Hazelcast partitioning strategy.
Fields Modifier and Type Field Description
HASH_CODEPartitioner which calls
Object.hashCode()and coerces it with the modulo operation into the allowed range of partition IDs.
Modifier and Type Method Description
defaultPartitioner()Returns a partitioner which applies the default Hazelcast partitioning.
getPartition(T item, int partitionCount)Returns the partition ID of the given item.
init(DefaultPartitionStrategy strat)Callback that injects the Hazelcast's default partitioning strategy into this partitioner so it can be consulted by the
HASH_CODEstatic final Partitioner<Object> HASH_CODEPartitioner which calls
Object.hashCode()and coerces it with the modulo operation into the allowed range of partition IDs. The primary reason to prefer this over the default is performance and it's a safe choice on local edges.
WARNING: this is a dangerous strategy to use on distributed edges. Care must be taken to ensure that the produced hashcode remains stable across serialization-deserialization cycles as well as across all JVM processes. Consider a
hashCode()method that is correct with respect to its contract, but not with respect to the stricter contract given above. Take the following scenario:
- there are two Jet cluster members;
- there is a DAG vertex;
- on each member there is a processor for this vertex;
- each processor emits an item;
- these two items have equal partitioning keys;
- nevertheless, on each member they get a different hashcode;
- they are routed to different processors, thus failing on the promise that all items with the same partition key go to the same processor.
initdefault void init(@Nonnull DefaultPartitionStrategy strat)Callback that injects the Hazelcast's default partitioning strategy into this partitioner so it can be consulted by the
The creation of instances of the
Partitionertype is done in user's code, but the Hazelcast partitioning strategy only becomes available after the partitioner is deserialized on each target member. This method solves the lifecycle mismatch.
getPartitionReturns the partition ID of the given item.
partitionCount- the total number of partitions in use by the underlying Hazelcast instance
defaultPartitionerstatic Partitioner<Object> defaultPartitioner()Returns a partitioner which applies the default Hazelcast partitioning. It will serialize the item using Hazelcast Serialization, then apply Hazelcast's
MurmurHash-based algorithm to retrieve the partition ID. This is quite a bit of work, but has stable results across all JVM processes, making it a safe default.