This page provides a Java example of deploying a built-in model Python with Open Neural Network Exchange (ONNX) platform. The ONNX format is supported by other deep learning frameworks such Tensorflow, Pytorch, etc. In this example, the ONNX model is used to deploy the Iris model in the server.
Let's start from the main function by getting the trained model.
//get the file of trained model
Train.ModelTrainResult modelTrainResult = Train.onnxIrisModel();
Create an inference configuration by default.
//a default Inference Configuration
InferenceConfiguration inferenceConfiguration = new InferenceConfiguration();
We'll need to include ONNXStep into the pipeline and specify the following:
modelUri : the model file path
inputNames : names for model's input layer
outputNames : names for model's output layer
//include pipeline step into the Inference Configuration
inferenceConfiguration.pipeline(SequencePipeline.builder()
.add(new ONNXStep() //add ONNXStep into pipeline
.modelUri(modelTrainResult.modelPath())
.inputNames(modelTrainResult.inputNames())
.outputNames(modelTrainResult.outputNames())
).build()
);
Deploy the server
Let's deploy the model in the server by calling DeployKonduitServing with the configuration made before. A callback function is used to respond only after a successful or failed server deployment inside the handler block.
//deploy the model in server
DeployKonduitServing.deploy(new VertxOptions(), new DeploymentOptions(),
inferenceConfiguration,
handler -> {
if (handler.succeeded()) { // If the server is sucessfully running
// Getting the result of the deployment
InferenceDeploymentResult inferenceDeploymentResult = handler.result();
int runnningPort = inferenceDeploymentResult.getActualPort();
String deploymentId = inferenceDeploymentResult.getDeploymentId();
System.out.format("The server is running on port %s with deployment id of %s%n",
runnningPort, deploymentId);
try {
String result = Unirest.post(String.format("http://localhost:%s/predict", runnningPort))
.header("Content-Type", "application/json")
.header("Accept", "application/json")
.body(new JSONObject().put("input",
new JSONArray().put(Arrays.asList(1.0, 1.0, 1.0, 1.0)))
)
.asString().getBody();
System.out.format("Result from server : %s%n", result);
System.exit(0);
} catch (UnirestException e) {
e.printStackTrace();
System.exit(1);
}
} else { // If the server failed to run
System.out.println(handler.cause().getMessage());
System.exit(1);
}
});
Note that we consider only one test input array in this example for inference to show the model's deployment in Konduit-Serving. After execution, the successful server deployment gives below output text.
The server is running on port 44301 with deployment id of 775bfbd3-2d18-435b-86c6-e9fbe7303cad
Result from server : {
"output" : [ [ 0.035723433, 0.27029678, 0.69397974 ] ]
}
Process finished with exit code 0
The complete inference configuration in YAML format is as follows.