Job Management · Hazelcast Jet

Once a Jet job is submitted, it has its own lifecycle on the cluster which is distinct from the submitter. Jet offers several ways to manage the job after it's been submitted to the cluster.

Submitting Jobs

You can submit jobs to a cluster using the jet submit command and packaging the job as a JAR:

$ bin/jet submit -n hello-world examples/hello-world.jar arg1 arg2
Submitting JAR 'examples/hello-world.jar' with arguments [arg1, arg2]
Using job name 'hello-world'

For a full guide on submitting jobs, see the relevant section in Programming Guide.

Listing Jobs

You can use the Jet command line to get a list of all running jobs in the cluster:

$ bin/jet list-jobs
ID                  STATUS             SUBMISSION TIME         NAME
0401-9f77-b9c0-0001 RUNNING            2020-03-07T15:59:49.234 hello-world

You can also see completed jobs, by specifying the -a parameter:

$ bin/jet list-jobs -a
ID                  STATUS             SUBMISSION TIME         NAME
0402-de9d-35c0-0001 RUNNING            2020-03-08T15:14:11.439 hello-world-v2
0402-de21-7f00-0001 FAILED             2020-03-08T15:12:04.893 hello-world

Cancelling Jobs

A streaming Jet job will run indefinitely until cancelled. You can cancel a job as follows:

bin/jet cancel <job_name_or_id>

$ bin/jet cancel hello-world
Cancelling job id=0402-de21-7f00-0001, name=hello-world, submissionTime=2020-03-08T15:12:04.893
Job cancelled.

Once a job is cancelled, the snapshot for the job is lost and the job can't be resumed. Cancelled jobs will have the "failed" status. Only batch jobs are able to complete successfully.

Auto-scaling

Jet jobs by default auto-scale when a new node is added or removed from the cluster. The way Jet scales jobs is by restarting the job after a change in the cluster. You can find an in-depth explanation of Jet's fault tolerance design in the Architecture section.

In general, when auto-scaling is off and a new node is added to a cluster the job will keep running on the previous nodes but not on the new node without any restarts.

The exact behavior of what happens when a node joins or leaves depends on whether a job is configured with a processing guarantee and with auto-scaling. The table below shows the behavior of the job during a cluster change depending on these two settings.

Auto-Scaling Setting	Processing Guarantee Setting	Member Added	Member Removed
enabled (default)	any setting	restart after day	restart immediately
disabled	none	keep job running on old members	fail job
disabled	at-least-once or exactly-once	keep job running on old members	suspend job

Suspending and Resuming

Jet supports manually suspending and resuming of streaming jobs in a fault-tolerant way. The Job must be configured with a processing guarantee.

When a job is suspended, all the metadata about the job is still kept in the cluster. A snapshot of the job computational state is taken during a suspend operation and then on a resume the job is gracefully started from the same snapshot.

Suspending and resuming can be useful for example when you need to perform maintenance on a data source or sink without disrupting a running job.

Use the jet command line to suspend or resume jobs as below:

bin/jet suspend

$ bin/jet suspend hello-world
Suspending job id=0401-9f77-b9c0-0001, name=hello-world, submissionTime=2020-03-07T15:59:49.234...
Job suspended.

bin/jet resume

$ bin/jet resume hello-world
Resuming job id=0401-9f77-b9c0-0001, name=hello-world, submissionTime=2020-03-07T15:59:49.234...
Job resumed.

Restarting

It's also possible to simply restart a job without suspending and resuming in one atomic action. This can be useful when you want to have finer grained control on when the job should be scaled. For example, if you have auto-scaling off and are adding 3 nodes to a cluster you can manually restart at the desired point to have the jobs utilizing all of the new nodes:

bin/jet restart <job_name_or_id>