Class DAG

All Implemented Interfaces:
DataSerializable, IdentifiedDataSerializable, Iterable<Vertex>

public class DAG
extends Object
implements IdentifiedDataSerializable, Iterable<Vertex>
Describes a computation to be performed by the Jet computation engine. A vertex represents a unit of data processing and an edge represents the path along which the data travels to the next vertex.

The work of a single vertex is parallelized and distributed, so that there are several instances of the Processor type on each member corresponding to it. Whenever possible, each instance should be tasked with only a slice of the total data and a partitioning strategy can be employed to ensure that the data sent to each vertex is collated by a partitioning key.

There are three basic kinds of vertices:

  1. source with just outbound edges;
  2. processor with both inbound and outbound edges;
  3. sink with just inbound edges.
Data travels from sources to sinks and is transformed and reshaped as it passes through the processors.