Class PythonServiceConfig

java.lang.Object
com.hazelcast.jet.python.PythonServiceConfig
All Implemented Interfaces:
Serializable

public class PythonServiceConfig
extends Object
implements Serializable
Configuration object for the Python service factory, used in a mapUsingPython stage.

Hazelcast Jet expects you to have a Python project in a local directory. It must contain the definition of a transform_list() function that receives a list of strings and returns a list of strings of the same size, with a one-to-one mapping between input and output elements. Here's a simple example of a function that transforms every input string by prepending "echo-" to it:


 def transform_list(input_list):
     return ["echo-%s" % i for i in input_list]
 
If you have a very simple setup with everything in a single Python file, you can use setHandlerFile(java.lang.String). Let's say you saved the above Python code to a file named echo.py. You can use it from Jet like this:

 StreamStage<String> inputStage = createInputStage();
 StreamStage<String> outputStage = inputStage.apply(
         mapUsingPython(new PythonServiceConfig()
                 .setHandlerFile("path/to/echo.py")));
 
In more complex setups you can tell Jet the location of your project directory and the name of the Python module containing transform_list(). You can also use a different name for the function.

Jet uploads the entire directory to the cluster, creates one or more Python processes on each member, and sends the pipeline data through your function. The number of processes is controlled by the local parallelism of the Python mapping stage.

Jet recognizes these special files in the base directory:

  • requirements.txt is assumed to list the dependencies of your Python code. Jet will automatically install them to a job-local virtual environment. You can also install the modules to the Jet servers' global Python environment in order to speed up job initialization. Jet reuses the global modules and adds the missing ones.
  • init.sh is assumed to be a Bash script that Jet will run when initializing the job.
  • cleanup.sh is assumed to be a Bash script that Jet will run when completing the job.
Regardless of local parallelism, the init and cleanup scripts run only once per cluster member. They run within the context of the job-local virtual Python environment.

To use this stage in a Hazelcast Jet cluster, Python must be installed on every cluster member. Jet supports Python versions 3.5-3.7. If the code has dependencies on non-standard Python modules, these must either be pre-installed or the member machines must have access to the public internet so that Jet can download and install them. A third option is to write init.sh that uses a different way of installing the dependencies. In that case make sure not to use the standard filename requirements.txt, which Jet uses automatically.

The Python mapping stage produces log output at the FINE level under the com.hazelcast.jet.python log category. This includes all the output from launched subprocesses.

Since:
4.0
See Also:
Serialized Form