BERT
This notebook illustrates a simple client-server interaction to perform inference on a TensorFlow model using the Python SDK for Konduit Serving.
Last updated
Was this helpful?
This notebook illustrates a simple client-server interaction to perform inference on a TensorFlow model using the Python SDK for Konduit Serving.
Last updated
Was this helpful?
This page documents two ways to create Konduit Serving configurations with the Python SDK:
Using Python to create a configuration, and
Writing the configuration as a YAML file, then serving it using the Python SDK.
These approaches are documented in separate tabs throughout this page. For example, the following code block shows the imports for each approach in separate tabs:
Konduit Serving works by defining a series of steps. These include operations such as
Pre- or post-processing steps
One or more machine learning models
Transforming the output in a way that can be understood by humans
If deploying your model does not require pre- nor post-processing, only one step - a machine learning model - is required. This configuration is defined using a single ModelStep
.
Before running this notebook, run the build_jar.py
script or the konduit init
command. Refer to the page for details.
Start by downloading the model weights to the data
folder.
Define the TensorFlow configuration as a TensorFlowConfig
object.
model_config_type
: This argument requires a ModelConfigType
object. Specify model_type
as TENSORFLOW
, and model_loading_path
to point to the location of TensorFlow weights saved in the PB file format.
Now that we have a TensorFlowConfig
defined, we can define a ModelStep
. The following parameters are specified:
model_config
: pass the TensorFlowConfig object here
parallel_inference_config
: specify the number of workers to run in parallel. Here, we specify workers=1
.
input_names
: names for the input data
output_names
: names for the output data
Specify the following:
http_port
: select a random port.
input_data_format
, output_data_format
: Specify input and output data formats as strings.
The ServingConfig
has to be passed to Server
in addition to the steps as a Python list. In this case, there is a single step: tf_step
.
By default, Server()
looks for the Konduit Serving JAR konduit.jar
in the directory the script is run in. To change this default, use the jar_path
argument.
Start the server:
To configure the client, create a Client object specifying the port number:
NDARRAY inputs to ModelSteps must be specified with a preceding batchSize
dimension. For batches with a single observation, this can be done by using numpy.expand_dims()
to add an additional dimension to your array.
Load some sample data from NumPy files. Note that these are NumPy arrays, each with shape (4, 128):
The configuration is stored as a dictionary. Note that the configuration can be converted to a dictionary using the as_dict()
method:
tensor_data_types_config
: The TensorFlowConfig object requires a dictionary input_data_types
. Its keys should represent column names, and the values should represent data types as strings, e.g. "INT32"
. See for a list of supported data types.
input_data_types
: maps each of the input_names
to the corresponding data type. The values should represent data types as strings, e.g. INT32
. See for a list of supported data types.
In TensorFlow, you can find the names of your input and output nodes by iterating throughmodel.inputs
and model.outputs
respectively and printing the .os.name
attribute of each. For more details, please refer to this .