# Python pipeline steps

Konduit Serving uses [JavaCPP Presets](https://github.com/bytedeco/javacpp-presets/) to execute Python scripts using the [CPython API](https://github.com/bytedeco/javacpp-presets/tree/master/cpython). This allows you to build custom Konduit Serving pipeline steps by writing Python scripts to be run within a Konduit Serving Java process.&#x20;

![](/files/-LwGXhXoQGKWShF0F-vJ)

## **`PythonConfig`**

Define the configuration for a `PythonStep`.

{% tabs %}
{% tab title="Python" %}
A `PythonConfig` takes on the following parameters:

* `python_path`: Optional. The search path for Python modules, as defined by [`sys.path`](https://docs.python.org/3/library/sys.html#sys.path). See the [Python modules](/0.1.0-snapshot/steps/python.md#pythonpath) section for details.&#x20;
* `python_code_path`: Optional. Specify the location of your Python script.
* `python_code`: Optional. A string that contains Python commands.
* `python_inputs`: A dictionary with input names as keys and corresponding value types as values. Value types should be specified as [one of the following strings](https://github.com/KonduitAI/konduit-serving/blob/71260366719840bcc2fd58698cebe471267de4bb/python/konduit/inference.py#L156-L162): `"INT"`, `"STR"`, `"FLOAT"`, `"BOOL"`, `"NDARRAY"`.
* `python_outputs`: A dictionary with output names as keys and corresponding value types as values. Values types should be specified as per `python_inputs`.&#x20;
* `extra_inputs`: potential extra input variables. Specify value types as per `python_inputs`.&#x20;
* `return_all_inputs`: Boolean. Whether or not to return all inputs in addition to outputs.
* `setup_and_run`: Boolean. Whether or not to use the setup-and-run schematics. Defaults to False.
  {% endtab %}

{% tab title="Java" %}
In Java, we can define a `PythonConfig` object using the Lombok builder API.&#x20;

A `PythonConfig` takes on the following parameters:

* `pythonPath`: Optional. The search path for Python modules, as defined by [`sys.path`](https://docs.python.org/3/library/sys.html#sys.path). See the [Python modules](/0.1.0-snapshot/steps/python.md#pythonpath) section for details.&#x20;
* `pythonCodePath`: Optional. Specify the location of your Python script.
* `pythonCode`: Optional. Takes a String that contains Python commands.
* `pythonInput`: Specify the name of the input and the type of value, as defined by the `name()` method of a `PythonVariables.Type` enumeration, namely: `"INT"`, `"STR"`, `"FLOAT"`, `"BOOL"`, `"NDARRAY"`, `"LIST"`.&#x20;
* `pythonOutput`: Specify the name of the output and the type of value as per `pythonInput`.&#x20;
  {% endtab %}
  {% endtabs %}

{% hint style="warning" %}
NumPy array subclasses and some NumPy array data types are not supported.&#x20;

Unsupported NumPy array data types are as follows:`np.uint8`, `np.uint16`, `np.uint32`, `np.uint64`, `np.uintp`, `np.complex64`, `np.complex128`, `np.int8`, `np.int16`, `np.bool`, `np.byte`, `np.ubyte`, `np.ushort`, `np.uintc`, `np.uint`, `np.ulonglong`, `np.half`, `np.csingle`, `np.cdouble`, `np.clongdouble`.&#x20;

For output type NDARRAY, convert your output to a regular NumPy array and supported data type using `np.array()/np.ndarray()` and/or the `astype()` method. Also, ensure that the output is a NumPy **array** and not a NumPy scalar: see the documentation for [`np.isscalar`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.isscalar.html) for details.&#x20;
{% endhint %}

#### &#x20;**Example 1: Using Python script**&#x20;

{% tabs %}
{% tab title="Python" %}

```python
python_config = PythonConfig(
    python_code_path="scripts/loadimage.py", 
    python_inputs={"x": "STR"}, 
    python_outputs={"y": "STR"}
)
```

{% endtab %}

{% tab title="Java" %}

```java
String pythonCodePath = new ClassPathResource("scripts/loadimage.py") 
    .getFile() 
    .getAbsolutePath();

PythonConfig pythonConfig = PythonConfig.builder() 
    .pythonCodePath(pythonCodePath) 
    .pythonInput("x", PythonVariables.Type.STR.name()) 
    .pythonOutput("y", PythonVariables.Type.STR.name()) 
    .build();
```

{% endtab %}
{% endtabs %}

#### Example 2: Specifying a custom Python path

{% tabs %}
{% tab title="Python" %}

```java
python_config = PythonConfig(
    python_path=pythonPath,
    python_code="y = x + 2", 
    python_inputs={"x": "NDARRAY"}, 
    python_outputs={"y": "NDARRAY"}
)
```

{% endtab %}

{% tab title="Java" %}

```java
PythonConfig pythonConfig = PythonConfig.builder()        
    .pythonPath(pythonPath)        
    .pythonCode("y = x + 2")        
    .pythonInput("x", PythonVariables.Type.NDARRAY.name())        
    .pythonOutput("y", PythonVariables.Type.NDARRAY.name())        
    .build();
```

{% endtab %}
{% endtabs %}

## `PythonStep`

For most use cases,`PythonStep` can be set up as follows:

{% tabs %}
{% tab title="Python" %}

```java
python_step = PythonStep().step(python_config)
```

Note that by default, the default name for each step is `default`. You will need this when specifying your data inputs via the `.predict()` method of the `Client` class.&#x20;
{% endtab %}

{% tab title="Java" %}
The  `step` method of the`PythonStep` class is used to define the Python configuration.&#x20;

```java
PythonStep pythonStep = new PythonStep()
    .step(pythonConfig);
```

{% endtab %}
{% endtabs %}

Finally, the `PythonStep` object can be passed to an `InferenceConfiguration` object, which is used to configure the Konduit Serving instance:

{% tabs %}
{% tab title="Python" %}

```java
inference_config = InferenceConfiguration(
    serving_config=serving_config,
    pipeline_steps=[python_step]
)
```

{% endtab %}

{% tab title="Java" %}

```java
InferenceConfiguration config = InferenceConfiguration.builder()
        .pipelineStep(pythonStep)
        .build();
```

To test a `PythonStep` without starting a Konduit Serving instance, the output of the Python pipeline step can be retrieved as a `Writable[][]` object. Apply the`getRunner()` method to the `PythonStep` object, which gets the runner for the configuration, followed by the `transform()` method, which applies the transformations defined by the `PythonConfig` object. For example,&#x20;

```java
Writable[][] output = pythonStep.getRunner().transform(imagePath);
```

{% endtab %}
{% endtabs %}

Some models may require the server to transform more than one set of inputs. For instance, to serve object detection models, annotations and images may have to be transformed in a single `PythonStep`. This requires a unique name to be specified for each `PythonConfig`:

{% tabs %}
{% tab title="Python" %}

```java
python_config_1 = PythonConfig(
    python_path=pythonPath, 
    python_code="y = x + 2", 
    python_inputs={"x": "INT"}, 
    python_outputs={"y": "INT"}
)

python_config_2 = PythonConfig(
    python_path=pythonPath, 
    python_code="b = a + 3", 
    python_inputs={"a": "INT"}, 
    python_outputs={"b": "INT"}
)

python_step = (PythonStep()
    .step("stepOne", python_config_1)
    .step("stepTwo", python_config_2))
```

{% endtab %}

{% tab title="Java" %}

```java
PythonConfig pythonConfig1 = PythonConfig.builder()
        .pythonPath(pythonPath)
        .pythonCode("y = x + 2")
        .pythonInput("x", PythonVariables.Type.INT.name())
        .pythonOutput("y", PythonVariables.Type.INT.name())
        .build();

PythonConfig pythonConfig2 = PythonConfig.builder()
        .pythonPath(pythonPath)
        .pythonCode("b = a + 3")
        .pythonInput("a", PythonVariables.Type.INT.name())
        .pythonOutput("b", PythonVariables.Type.INT.name())
        .build();

PythonPipelineStep pythonPipelineStep = new PythonPipelineStep()
        .step("stepOne", pythonConfig1)
        .step("stepTwo", pythonConfig2);

Writable[][] output = pythonPipelineStep
        .getRunner()
        .transform(
                new Object[] {3}, 
                new Object[] {3}
        );

System.out.println(Arrays.deepToString(output));
```

{% endtab %}
{% endtabs %}

## YAML configuration

Python steps can take any argument that can be passed to `PythonConfig`.The following is a basic example of specifying a Python step in a YAML configuration:&#x20;

```yaml
steps: 
  python_step: 
    type: PYTHON
    python_code_path: simple.py
```

* `type`: specify this as `PYTHON`.
* `python_code`: if you want to specify your Python code directly in your YAML file. The following [documentation](http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html) may be helpful for specifying multi-line Python code, specifically the section on literal block scalars.
* `python_code_path`: specify the path of a Python `.py` script.&#x20;
* `python_inputs`: name-value pairs specifying the data types for each of the inputs referenced in the script. Data types should be one of the following: `INT`, `STR`, `FLOAT`, `BOOL`, `NDARRAY`.
* `python_outputs`: name-value pairs specifying the data types for each of the outputs referenced in the script. Data types should be one of the following: `INT`, `STR`, `FLOAT`, `BOOL`, `NDARRAY`.
* `extra_inputs`: potential extra input variables. Specify value types as per `python_inputs`.&#x20;
* `return_all_inputs`: Boolean. Whether or not to return all inputs in addition to output.
* `setup_and_run`: Boolean. Whether or not to use the setup-and-run schematics. Defaults to `False`.
* `python_path`: location of the Python modules. Generally, if your script only requires NumPy, setting a custom `python_path` is not necessary. Refer to the [Python modules](https://serving.oss.konduit.ai/python#python-modules-and-the-pythonpath-argument) documentation on setting a custom Python path with additional modules.&#x20;

The names referenced in `python_inputs` and `python_outputs` correspond with `inputColumnNames` and `outputColumnNames`. Modifying `python_inputs` and `python_outputs` does not modify the input and output name of the step. `input_names` and `output_names` are arguments to `PythonStep` which cannot be accessed through the YAML configuration, and default to the name `default`.

## Python modules and the `pythonPath` argument&#x20;

If the `pythonPath` is not specified, you will still be able to import modules cached for NumPy by [JavaCPP Presets](https://github.com/bytedeco/javacpp-presets/) in your Python script(s).&#x20;

In Java, you can find the location of the default modules by printing&#x20;

```java
Arrays.toString(cachePackages())
```

where `cachePackages` is imported as a static variable:&#x20;

```java
import static org.bytedeco.numpy.presets.numpy.cachePackages;
```

If you require additional modules, you can set a custom`pythonPath` by running the following command in your Python environment and setting the output as your `pythonPath`:

```java
import os 
from konduit.utils import default_python_path

work_dir = os.path.abspath('.')
default_python_path(work_dir)
```

Custom `pythonPath` follows the format defined by `sys.path.`The first element is the location of the script used to invoke the Python interpreter, and the remaining elements specify where Python should search for modules.&#x20;

{% hint style="info" %}
To list the modules that you can access, run `help("modules")` in your Python interpreter.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://serving.konduit.ai/0.1.0-snapshot/steps/python.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
