MNIST
This notebook illustrates a simple client-server interaction to perform inference on a TensorFlow model using the Python SDK for Konduit Serving.
This tutorial is split into three parts:
Freezing models
Configuration
Running the server
from konduit import ParallelInferenceConfig, ServingConfig, ModelConfigType, TensorFlowConfig
from konduit import TensorDataTypesConfig, ModelStep, InferenceConfiguration
from konduit.server import Server
from konduit.client import Client
import tensorflow as tf
if tf.__version__[0] == "1":
from tensorflow import keras
elif tf.__version__[0] == "2":
import tensorflow.compat.v1 as tf
from tensorflow.compat.v1 import keras
else:
print("No valid TensorFlow version detected")
from keras.layers import Flatten, Dense, Dropout, Lambda
from keras.models import Sequential
from keras.datasets import mnist
from PIL import Image
import numpy as np
import imageio
import os
import matplotlib.pyplot as plt
import pandas as pd
Using TensorFlow backend.
tensorflow_version = tf.__version__
print(tensorflow_version)
2.0.0
Creating frozen models (Tensorflow 1.x)
In TensorFlow 1.x, "frozen" models can be exported in the TensorFlow Graph format. For deployment, we only need information about the graph and checkpoint variables. Freezing a model allows you to discard information that is not required for deploying your model.
TensorFlow 2.0 introduces the SavedModel format as the universal format for saving models. Even though the deployable protobuff (PB) files have the same file extension as frozen TensorFlow Graph files, SavedModel protobuff files are not currently supported in Konduit Serving. A workaround for TensorFlow 2.0 is to adapt the code from this tutorial for your use case to create TensorFlow Graph protobuffs, or save your models as Keras HDF5 files and serve as Keras models (refer to the Keras tutorial for details).
The following code is adapted from tf-import-examples
in the deeplearning4j-examples
repository.
In the following code, we build a model using TensorFlow's Keras API and save it as a TensorFlow Graph. The architecture is adapted from the following Kaggle kernel: https://inclass.kaggle.com/charel/learn-by-example-neural-networks-hello-world/notebook.
# Load data
train_data, test_data = mnist.load_data()
x_train, y_train = train_data
x_test, y_test = test_data
# Normalize
x_train = x_train / 255.0
x_test = x_test / 255.0
weights = None
def get_model(training=False):
inputs = keras.layers.Input(shape=(28, 28), name="input_layer")
x = keras.layers.Flatten()(inputs)
x = keras.layers.Dense(200, activation="relu")(x)
x = keras.layers.Dense(100, activation="relu")(x)
x = keras.layers.Dense(60, activation="relu")(x)
x = keras.layers.Dense(30, activation="relu")(x)
outputs = keras.layers.Dense(10, activation="softmax", name="output_layer")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(
optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
if training:
print(model.inputs[0].op.name)
print(model.outputs[0].op.name)
return model
def train():
with tf.Session() as sess:
keras.backend.set_session(sess)
model = get_model(True)
model.fit(x_train, y_train, epochs=8)
weights = model.get_weights()
return weights
def save(weights):
# save model to a protobuff
keras.backend.clear_session()
with tf.Session() as sess:
keras.backend.set_session(sess)
model = get_model(False)
model.set_weights(weights)
model.evaluate(x_test, y_test)
output_node_name = model.output.name.split(':')[0]
output_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph.as_graph_def(),
[output_node_name]
)
with tf.gfile.GFile(
name=f"../data/mnist/mnist_{tensorflow_version}.pb",
mode="wb"
) as f:
f.write(output_graph_def.SerializeToString())
weights = train()
save(weights)
WARNING:tensorflow:From C:\Users\Skymind AI Berhad\AppData\Local\Continuum\miniconda3\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
input_layer
output_layer/Softmax
Train on 60000 samples
Epoch 1/8
60000/60000 [==============================] - 3s 53us/sample - loss: 0.6992 - accuracy: 0.7869
Epoch 2/8
60000/60000 [==============================] - 3s 52us/sample - loss: 0.2445 - accuracy: 0.9294
Epoch 3/8
60000/60000 [==============================] - 3s 50us/sample - loss: 0.1782 - accuracy: 0.9481
Epoch 4/8
60000/60000 [==============================] - 3s 52us/sample - loss: 0.1425 - accuracy: 0.9588s
Epoch 5/8
60000/60000 [==============================] - 3s 51us/sample - loss: 0.1180 - accuracy: 0.9651
Epoch 6/8
60000/60000 [==============================] - 3s 49us/sample - loss: 0.1020 - accuracy: 0.9699s - l
Epoch 7/8
60000/60000 [==============================] - 3s 48us/sample - loss: 0.0882 - accuracy: 0.9741
Epoch 8/8
60000/60000 [==============================] - 3s 47us/sample - loss: 0.0778 - accuracy: 0.9771
10000/10000 [==============================] - 0s 31us/sample - loss: 0.1160 - accuracy: 0.9640
WARNING:tensorflow:From <ipython-input-3-3e44ee2b8acb>:54: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From C:\Users\Skymind AI Berhad\AppData\Local\Continuum\miniconda3\lib\site-packages\tensorflow_core\python\framework\graph_util_impl.py:275: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
INFO:tensorflow:Froze 10 variables.
INFO:tensorflow:Converted 10 variables to const ops.
Overview
Konduit Serving works by defining a series of steps. These include operations such as
Pre- or post-processing steps
One or more machine learning models
Transforming the output in a way that can be understood by humans
If deploying your model does not require pre- nor post-processing, only one step - a machine learning model - is required. This configuration is defined using a single ModelStep
.
Before running this notebook, run the build_jar.py
script or the konduit init
command. Refer to the Building from source page for details.
Configure the step
Define the TensorFlow configuration as a TensorFlowConfig
object.
tensor_data_types_config
: TheTensorFlowConfig
object requires a dictionaryinput_data_types
. Its keys should represent column names, and the values should represent data types as strings, e.g."INT32"
. See here for a list of supported data types.model_config_type
: This argument requires aModelConfigType
object. Specifymodel_type
asTENSORFLOW
, andmodel_loading_path
to point to the location of TensorFlow weights saved in the PB file format.
tensorflow_config = TensorFlowConfig(
tensor_data_types_config = TensorDataTypesConfig(
input_data_types=input_data_types
),
model_config_type = ModelConfigType(
model_type='TENSORFLOW',
model_loading_path=os.path.abspath(
f'../data/mnist/mnist_{tensorflow_version}.pb'
)
)
)
tensorflow_config.as_dict()
{'@type': 'TensorFlowConfig',
'tensorDataTypesConfig': {'@type': 'TensorDataTypesConfig',
'inputDataTypes': {'input_layer': 'FLOAT'}},
'modelConfigType': {'@type': 'ModelConfigType',
'modelType': 'TENSORFLOW',
'modelLoadingPath': 'C:\\Users\\Skymind AI Berhad\\Documents\\konduit-serving-examples\\data\\mnist\\mnist_2.0.0.pb'}}
Now that we have a TensorFlowConfig
defined, we can define a ModelStep
. The following parameters are specified:
model_config
: pass the TensorFlowConfig object hereparallel_inference_config
: specify the number of workers to run in parallel. Here, we specifyworkers=1
.input_names
: names for the input dataoutput_names
: names for the output data
tf_step = ModelStep(
model_config=tensorflow_config,
parallel_inference_config=ParallelInferenceConfig(workers=1),
input_names=input_names,
output_names=output_names
)
Configure the server
Specify the following:
http_port
: select a random port.input_data_format
,output_data_format
: Specify input and output data formats as strings.
port = np.random.randint(1000, 65535)
serving_config = ServingConfig(
http_port=port,
input_data_format='NUMPY',
output_data_format='NUMPY'
)
The ServingConfig
has to be passed to Server
in addition to the steps as a Python list. In this case, there is a single step: tf_step
.
server = Server(
serving_config=serving_config,
steps=[tf_step]
)
By default, Server()
looks for the Konduit Serving JAR konduit.jar
in the directory the script is run in. To change this default, use the jar_path
argument.
Start the server
Start the server:
server.start()
Starting server.....
Server has started successfully.
<subprocess.Popen at 0x1d6794ffd68>
Configure the client
To configure the client, create a Client object by specifying the port number:
client = Client(port=port)
The Client
's attributes will be obtained from the Server.
Inference
NDARRAY inputs to ModelSteps must be specified with a preceding batchSize
dimension. For batches with a single observation, this can be done by using numpy.expand_dims()
to add an additional dimension to your array.
We obtain test images from the test set defined by keras.datasets
.
for img in x_test[0:3]:
plt.imshow(img)
predicted = client.predict(
data_input={'input_layer': np.expand_dims(img.reshape(28, 28), axis=0)}
)
plt.show()
print(dict(zip(np.arange(10), predicted[0].round(3))))

{0: 0.0, 1: 0.0, 2: 0.001, 3: 0.001, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.998, 8: 0.0, 9: 0.0}

{0: 0.0, 1: 0.0, 2: 0.998, 3: 0.002, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}

{0: 0.0, 1: 0.986, 2: 0.005, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.005, 8: 0.003, 9: 0.0}
Batch prediction
To predict in batches, the data_input
dictionary has to be specified differently for client images in NDARRAY format. To input a batch of observations, ensure that your inputs are in the NCHW format: number of observations, channels (optional if single channel), height and width.
An example is as follows:
predicted = client.predict(
data_input={'input_layer': x_test[0:3].reshape(3, 28, 28)}
)
server.stop()
We compare the predicted probabilities and the corresponding labels:
pd.DataFrame(predicted).round(3)
0
1
2
3
4
5
6
7
8
9
0
0.0
0.000
0.001
0.001
0.0
0.0
0.0
0.998
0.000
0.0
1
0.0
0.000
0.998
0.002
0.0
0.0
0.0
0.000
0.000
0.0
2
0.0
0.986
0.005
0.000
0.0
0.0
0.0
0.005
0.003
0.0
y_test[0:3]
array([7, 2, 1], dtype=uint8)
The configuration is stored as a dictionary. Note that the configuration can be converted to a dictionary using the as_dict()
method:
server.config.as_dict()
{'@type': 'InferenceConfiguration',
'steps': [{'@type': 'ModelStep',
'inputNames': ['input_layer'],
'outputNames': ['output_layer/Softmax'],
'modelConfig': {'@type': 'TensorFlowConfig',
'tensorDataTypesConfig': {'@type': 'TensorDataTypesConfig',
'inputDataTypes': {'input_layer': 'FLOAT'}},
'modelConfigType': {'@type': 'ModelConfigType',
'modelType': 'TENSORFLOW',
'modelLoadingPath': 'C:\\Users\\Skymind AI Berhad\\Documents\\konduit-serving-examples\\data\\mnist\\mnist_2.0.0.pb'}},
'parallelInferenceConfig': {'@type': 'ParallelInferenceConfig',
'workers': 1}}],
'servingConfig': {'@type': 'ServingConfig',
'httpPort': 4776,
'inputDataFormat': 'NUMPY',
'outputDataFormat': 'NUMPY',
'logTimings': True}}
Last updated
Was this helpful?