Interface Partitioner<T>

Type Parameters:
T - type of item the partitioner accepts
All Superinterfaces:
Serializable
All Known Implementing Classes:
Partitioner.Default
Functional Interface:
This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.

@FunctionalInterface
public interface Partitioner<T>
extends Serializable
Encapsulates the logic associated with a DAG edge that decides on the partition ID of an item traveling over it. The partition ID determines which cluster member and which instance of Processor on that member an item will be forwarded to.

Jet's partitioning piggybacks on Hazelcast partitioning. Standard Hazelcast protocols are used to distribute partition ownership over the members of the cluster. However, if a DAG edge is configured as non-distributed, then on each member there will be some destination processor responsible for any given partition.

Since:
3.0
  • Nested Class Summary

    Nested Classes 
    Modifier and Type Interface Description
    static class  Partitioner.Default
    Partitioner which applies the default Hazelcast partitioning strategy.
  • Field Summary

    Fields 
    Modifier and Type Field Description
    static Partitioner<Object> HASH_CODE
    Partitioner which calls Object.hashCode() and coerces it with the modulo operation into the allowed range of partition IDs.
  • Method Summary

    Modifier and Type Method Description
    static Partitioner<Object> defaultPartitioner()
    Returns a partitioner which applies the default Hazelcast partitioning.
    int getPartition​(T item, int partitionCount)
    Returns the partition ID of the given item.
    default void init​(DefaultPartitionStrategy strat)
    Callback that injects the Hazelcast's default partitioning strategy into this partitioner so it can be consulted by the getPartition(Object, int) method.
  • Field Details

    • HASH_CODE

      static final Partitioner<Object> HASH_CODE
      Partitioner which calls Object.hashCode() and coerces it with the modulo operation into the allowed range of partition IDs. The primary reason to prefer this over the default is performance and it's a safe choice on local edges.

      WARNING: this is a dangerous strategy to use on distributed edges. Care must be taken to ensure that the produced hashcode remains stable across serialization-deserialization cycles as well as across all JVM processes. Consider a hashCode() method that is correct with respect to its contract, but not with respect to the stricter contract given above. Take the following scenario:

      1. there are two Jet cluster members;
      2. there is a DAG vertex;
      3. on each member there is a processor for this vertex;
      4. each processor emits an item;
      5. these two items have equal partitioning keys;
      6. nevertheless, on each member they get a different hashcode;
      7. they are routed to different processors, thus failing on the promise that all items with the same partition key go to the same processor.
  • Method Details

    • init

      default void init​(@Nonnull DefaultPartitionStrategy strat)
      Callback that injects the Hazelcast's default partitioning strategy into this partitioner so it can be consulted by the getPartition(Object, int) method.

      The creation of instances of the Partitioner type is done in user's code, but the Hazelcast partitioning strategy only becomes available after the partitioner is deserialized on each target member. This method solves the lifecycle mismatch.

    • getPartition

      int getPartition​(@Nonnull T item, int partitionCount)
      Returns the partition ID of the given item.
      Parameters:
      partitionCount - the total number of partitions in use by the underlying Hazelcast instance
    • defaultPartitioner

      static Partitioner<Object> defaultPartitioner()
      Returns a partitioner which applies the default Hazelcast partitioning. It will serialize the item using Hazelcast Serialization, then apply Hazelcast's MurmurHash-based algorithm to retrieve the partition ID. This is quite a bit of work, but has stable results across all JVM processes, making it a safe default.