Keras model loading functionality in Konduit Serving converts Keras models to Deeplearning4J models. As a result, Keras models containing operations not supported in Deeplearning4J cannot be served in Konduit Serving. See issue 8348.
Overview
Konduit Serving works by defining a series of steps. These include operations such as
Pre- or post-processing steps
One or more machine learning models
Transforming the output in a way that can be understood by humans
If deploying your model does not require pre- nor post-processing, only one step - a machine learning model - is required. This configuration is defined using a single ModelStep.
Before running this notebook, run the build_jar.py script or the konduit init command. Refer to the Building from source page for details.
Configure the step
Define the Keras configuration as a ModelConfig object.
model_config_type: This argument requires a ModelConfigType object. Specify model_type as KERAS, and model_loading_path to point to the location of Keras weights saved in the HDF5 file format.
For the ModelStep object, the following parameters are specified:
model_config: pass the ModelConfig object here.
parallel_inference_config: specify the number of workers to run in parallel. Here, we specify workers=1.
input_names: names for the input nodes.
output_names: names for the output nodes.
type: specify this as KERAS.
model_loading_path: location of the model weights.
input_names, output_names: names for the input and output nodes, as lists.
Input and output names can be obtained by visualizing the graph in Netron.
Configure the server
In the ServingConfig, specify a port number.
The ServingConfig has to be passed to Server in addition to the steps as a Python list. In this case, there is a single step: keras_step.
Start the server
Use the .start() method:
Configure the client
To configure the client, create a Client object with the port argument.
Note that you should create the Client object after the Server has started, so that Client can inherit the Server's attributes.
Add the following to your YAML file:
Use client_from_file to create a Client object in Python:
Inference
NDARRAY inputs to ModelSteps must be specified with a preceding batchSize dimension. For batches with a single observation, this can be done by using numpy.expand_dims() to add an additional dimension to your array.