Python pipeline steps

You can integrate Python into a Konduit Serving instance by defining PythonConfig objects as steps to PythonPipelineStep.

Konduit Serving uses JavaCPP Presets to execute Python scripts using the CPython API. This allows you to build custom Konduit Serving pipeline steps by writing Python scripts to be run within a Konduit Serving Java process.

`PythonConfig`

Define the configuration for a PythonStep.

A PythonConfig takes on the following parameters:

python_path: Optional. The search path for Python modules, as defined by sys.path. See the Python modules section for details.
python_code_path: Optional. Specify the location of your Python script.
python_code: Optional. A string that contains Python commands.
python_inputs: A dictionary with input names as keys and corresponding value types as values. Value types should be specified as one of the following strings: "INT", "STR", "FLOAT", "BOOL", "NDARRAY".
python_outputs: A dictionary with output names as keys and corresponding value types as values. Values types should be specified as per python_inputs.
extra_inputs: potential extra input variables. Specify value types as per python_inputs.
return_all_inputs: Boolean. Whether or not to return all inputs in addition to outputs.
setup_and_run: Boolean. Whether or not to use the setup-and-run schematics. Defaults to False.

In Java, we can define a PythonConfig object using the Lombok builder API.

A PythonConfig takes on the following parameters:

pythonPath: Optional. The search path for Python modules, as defined by sys.path. See the Python modules section for details.
pythonCodePath: Optional. Specify the location of your Python script.
pythonCode: Optional. Takes a String that contains Python commands.
pythonInput: Specify the name of the input and the type of value, as defined by the name() method of a PythonVariables.Type enumeration, namely: "INT", "STR", "FLOAT", "BOOL", "NDARRAY", "LIST".
pythonOutput: Specify the name of the output and the type of value as per pythonInput.

NumPy array subclasses and some NumPy array data types are not supported.

Unsupported NumPy array data types are as follows:np.uint8, np.uint16, np.uint32, np.uint64, np.uintp, np.complex64, np.complex128, np.int8, np.int16, np.bool, np.byte, np.ubyte, np.ushort, np.uintc, np.uint, np.ulonglong, np.half, np.csingle, np.cdouble, np.clongdouble.

For output type NDARRAY, convert your output to a regular NumPy array and supported data type using np.array()/np.ndarray() and/or the astype() method. Also, ensure that the output is a NumPy array and not a NumPy scalar: see the documentation for np.isscalar for details.

Example 1: Using Python script

python_config = PythonConfig(
    python_code_path="scripts/loadimage.py", 
    python_inputs={"x": "STR"}, 
    python_outputs={"y": "STR"}
)

String pythonCodePath = new ClassPathResource("scripts/loadimage.py") 
    .getFile() 
    .getAbsolutePath();

PythonConfig pythonConfig = PythonConfig.builder() 
    .pythonCodePath(pythonCodePath) 
    .pythonInput("x", PythonVariables.Type.STR.name()) 
    .pythonOutput("y", PythonVariables.Type.STR.name()) 
    .build();

Example 2: Specifying a custom Python path

python_config = PythonConfig(
    python_path=pythonPath,
    python_code="y = x + 2", 
    python_inputs={"x": "NDARRAY"}, 
    python_outputs={"y": "NDARRAY"}
)

PythonConfig pythonConfig = PythonConfig.builder()        
    .pythonPath(pythonPath)        
    .pythonCode("y = x + 2")        
    .pythonInput("x", PythonVariables.Type.NDARRAY.name())        
    .pythonOutput("y", PythonVariables.Type.NDARRAY.name())        
    .build();

`PythonStep`

For most use cases,PythonStep can be set up as follows:

python_step = PythonStep().step(python_config)

Note that by default, the default name for each step is default. You will need this when specifying your data inputs via the .predict() method of the Client class.

The step method of thePythonStep class is used to define the Python configuration.

PythonStep pythonStep = new PythonStep()
    .step(pythonConfig);

Finally, the PythonStep object can be passed to an InferenceConfiguration object, which is used to configure the Konduit Serving instance:

inference_config = InferenceConfiguration(
    serving_config=serving_config,
    pipeline_steps=[python_step]
)

InferenceConfiguration config = InferenceConfiguration.builder()
        .pipelineStep(pythonStep)
        .build();

To test a PythonStep without starting a Konduit Serving instance, the output of the Python pipeline step can be retrieved as a Writable[][] object. Apply thegetRunner() method to the PythonStep object, which gets the runner for the configuration, followed by the transform() method, which applies the transformations defined by the PythonConfig object. For example,

Writable[][] output = pythonStep.getRunner().transform(imagePath);

Some models may require the server to transform more than one set of inputs. For instance, to serve object detection models, annotations and images may have to be transformed in a single PythonStep. This requires a unique name to be specified for each PythonConfig:

python_config_1 = PythonConfig(
    python_path=pythonPath, 
    python_code="y = x + 2", 
    python_inputs={"x": "INT"}, 
    python_outputs={"y": "INT"}
)

python_config_2 = PythonConfig(
    python_path=pythonPath, 
    python_code="b = a + 3", 
    python_inputs={"a": "INT"}, 
    python_outputs={"b": "INT"}
)

python_step = (PythonStep()
    .step("stepOne", python_config_1)
    .step("stepTwo", python_config_2))

PythonConfig pythonConfig1 = PythonConfig.builder()
        .pythonPath(pythonPath)
        .pythonCode("y = x + 2")
        .pythonInput("x", PythonVariables.Type.INT.name())
        .pythonOutput("y", PythonVariables.Type.INT.name())
        .build();

PythonConfig pythonConfig2 = PythonConfig.builder()
        .pythonPath(pythonPath)
        .pythonCode("b = a + 3")
        .pythonInput("a", PythonVariables.Type.INT.name())
        .pythonOutput("b", PythonVariables.Type.INT.name())
        .build();

PythonPipelineStep pythonPipelineStep = new PythonPipelineStep()
        .step("stepOne", pythonConfig1)
        .step("stepTwo", pythonConfig2);

Writable[][] output = pythonPipelineStep
        .getRunner()
        .transform(
                new Object[] {3}, 
                new Object[] {3}
        );

System.out.println(Arrays.deepToString(output));

YAML configuration

Python steps can take any argument that can be passed to PythonConfig.The following is a basic example of specifying a Python step in a YAML configuration:

steps: 
  python_step: 
    type: PYTHON
    python_code_path: simple.py

type: specify this as PYTHON.
python_code: if you want to specify your Python code directly in your YAML file. The following documentation may be helpful for specifying multi-line Python code, specifically the section on literal block scalars.
python_code_path: specify the path of a Python .py script.
python_inputs: name-value pairs specifying the data types for each of the inputs referenced in the script. Data types should be one of the following: INT, STR, FLOAT, BOOL, NDARRAY.
python_outputs: name-value pairs specifying the data types for each of the outputs referenced in the script. Data types should be one of the following: INT, STR, FLOAT, BOOL, NDARRAY.
extra_inputs: potential extra input variables. Specify value types as per python_inputs.
return_all_inputs: Boolean. Whether or not to return all inputs in addition to output.
setup_and_run: Boolean. Whether or not to use the setup-and-run schematics. Defaults to False.
python_path: location of the Python modules. Generally, if your script only requires NumPy, setting a custom python_path is not necessary. Refer to the Python modules documentation on setting a custom Python path with additional modules.

The names referenced in python_inputs and python_outputs correspond with inputColumnNames and outputColumnNames. Modifying python_inputs and python_outputs does not modify the input and output name of the step. input_names and output_names are arguments to PythonStep which cannot be accessed through the YAML configuration, and default to the name default.

Python modules and the `pythonPath` argument

If the pythonPath is not specified, you will still be able to import modules cached for NumPy by JavaCPP Presets in your Python script(s).

In Java, you can find the location of the default modules by printing

Arrays.toString(cachePackages())

where cachePackages is imported as a static variable:

import static org.bytedeco.numpy.presets.numpy.cachePackages;

If you require additional modules, you can set a custompythonPath by running the following command in your Python environment and setting the output as your pythonPath:

import os 
from konduit.utils import default_python_path

work_dir = os.path.abspath('.')
default_python_path(work_dir)

Custom pythonPath follows the format defined by sys.path.The first element is the location of the script used to invoke the Python interpreter, and the remaining elements specify where Python should search for modules.

To list the modules that you can access, run help("modules") in your Python interpreter.

PreviousImage loading pipeline steps NextJava pipeline steps

Last updated 5 years ago

Was this helpful?

PythonConfig