Keras (TensorFlow 2.0)
This page illustrates a simple client-server interaction to perform inference on a Keras LSTM model using the Java SDK for Konduit Serving.
import ai.konduit.serving.InferenceConfiguration;
import ai.konduit.serving.config.ParallelInferenceConfig;
import ai.konduit.serving.config.ServingConfig;
import ai.konduit.serving.configprovider.KonduitServingMain;
import ai.konduit.serving.configprovider.KonduitServingMainArgs;
import ai.konduit.serving.model.ModelConfig;
import ai.konduit.serving.model.ModelConfigType;
import ai.konduit.serving.pipeline.step.ModelStep;
import ai.konduit.serving.verticles.inference.InferenceVerticle;
import com.mashape.unirest.http.Unirest;
import com.mashape.unirest.http.exceptions.UnirestException;
import org.apache.commons.io.FileUtils;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.io.ClassPathResource;
import org.nd4j.serde.binary.BinarySerde;
Saving models in Keras HDF5 (.h5) format
Models can be saved using Python with the .save()
method. Refer to the TensorFlow documentation for Keras for details. These saved models shall be loaded in Java.
Overview
Konduit Serving works by defining a series of steps. These include operations such as
Pre- or post-processing steps
One or more machine learning models
Transforming the output in a way that can be understood by humans
If deploying your model does not require pre- nor post-processing, only one step - a machine learning model - is required. This configuration is defined using a single ModelStep
.
Configure the step
Define the Keras configuration as a ModelConfig
object.
modelConfigType
: This argument requires aModelConfigType
object. SpecifymodelType
asModelConfig.ModelType.KERAS
, andmodelLoadingPath
to point to the location of Keras weights saved in the HDF5 file format.
For the ModelStep
object, the following parameters are specified:
modelConfig
: pass the ModelConfig object hereparallelInferenceConfig
: specify the number of workers to run in parallel. Here, we specifyworkers = 1
.inputName
,outputName
: names for the input and output nodes, as lists
String kerasmodelfilePath = new ClassPathResource("data/keras/embedding_lstm_tensorflow_2.h5").getFile().getAbsolutePath();
ModelConfig kerasModelConfig = ModelConfig.builder()
.modelConfigType(ModelConfigType.builder()
.modelLoadingPath(kerasmodelfilePath.toString())
.modelType(ModelConfig.ModelType.KERAS).build())
.build();
ModelStep kerasmodelStep = ModelStep.builder()
.modelConfig(kerasModelConfig)
.inputName("input")
.outputName("lstm_1")
.parallelInferenceConfig(ParallelInferenceConfig.builder().workers(1).build())
.build();
Configure the server
In the ServingConfig
, specify any port number that is not reserved.
int port = Util.randInt(1000, 65535);
ServingConfig servingConfig = ServingConfig.builder().httpPort(port).
build();
The ServingConfig
has to be passed to Server
in addition to the steps as a list. In this case, there is a single step: kerasmodelStep
.
InferenceConfiguration inferenceConfiguration = InferenceConfiguration.builder()
.servingConfig(servingConfig)
.step(kerasmodelStep)
.build();
The inferenceConfiguration
is stored as a JSON File. Set the KonduitServingMainArgs with the saved config.json file path as configPath
and other necessary server configuration arguments.
File configFile = new File("config.json");
FileUtils.write(configFile, inferenceConfiguration.toJson(), Charset.defaultCharset());
KonduitServingMainArgs args1 = KonduitServingMainArgs.builder()
.configStoreType("file").ha(false)
.multiThreaded(false).configPort(port)
.verticleClassName(InferenceVerticle.class.getName())
.configPath(configFile.getAbsolutePath())
.build();
Start server by calling KonduitServingMain with the configurations mentioned in the KonduitServingMainArgs using Callback Function(as per the code mentioned in the Inference Section below)
Inference
NDARRAY inputs to set ModelStep must be specified with a shape size.
To configure the client, set the required URL to connect server and specify any port number that is not reserved (as used in server configuration).
A Callback Function onSuccess is implemented in order to post the Client request and get the HttpResponse, only after the successful run of the KonduitServingMain Server.
INDArray arr = Nd4j.create(new float[]{1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f}, 1, 10);
File file = new File("src/main/resources/data/test-input.zip");
System.out.println(file.getAbsolutePath());
BinarySerde.writeArrayToDisk(arr, file);
KonduitServingMain.builder()
.onSuccess(() -> {
try {
String response = Unirest.post(String.format("http://localhost:%s/raw/nd4j", port))
.field("input", file)
.asString().getBody();
System.out.print(response);
System.exit(0);
} catch (UnirestException e) {
e.printStackTrace();
System.exit(0);
}
})
.build()
.runMain(args1.toArgs());
Confirm the output
After executing the above, in order to confirm the successful start of the Server, check for the below output text:
Jan 07, 2020 2:31:37 PM ai.konduit.serving.configprovider.KonduitServingMain
INFO: Deployed verticle ai.konduit.serving.verticles.inference.InferenceVerticle
The Output of the program is as follows:
System.out.print(response)
{
"lstm_1" : {
"batchId" : "61c32b20-20c1-42d9-909e-40519304f2ac",
"ndArray" : {
"dataType" : "FLOAT",
"shape" : [ 1, 6, 10 ],
"data" : [ -0.0022880966, -0.0042849067, -0.005983479, -0.0073982426, -0.008555864, -0.009488584, -0.010229964, -0.010812048, -0.011263946,
-0.011611057, -0.0027362786, -0.005198746, -0.0072902446, -0.009000374, -0.010361125, -0.011421581, -0.012234278, -0.012848378, -0.013306688,
-0.013644846, 8.9187745E-4, 0.0012898755, 0.0014061421, 0.0013690761, 0.0012552886, 0.00110957, 9.573803E-4, 8.1235886E-4, 6.812691E-4, 5.6666887E-4,
-0.0029521661, -0.0049125706, -0.0062154937, -0.0070830043, -0.007662085, -0.008049844, -0.008310454, -0.008486291, -0.00860548, -0.008686648, -2.41272E-4,
-2.1998871E-4, -3.9213814E-5, 2.2318505E-4, 5.1378313E-4, 7.9911196E-4, 0.0010602918, 0.0012885022, 0.0014812968, 0.00164004, 0.0029523545, 0.0050065047,
0.006426977, 0.007400234, 0.008058914, 0.008497588, 0.008783782, 0.00896551, 0.009076695, 0.009141108 ]
}
}
The complete inference configuration in JSON format is as follows:
System.out.println(inferenceConfiguration.toJson());
{
"memMapConfig" : null,
"servingConfig" : {
"httpPort" : 62969,
"listenHost" : "localhost",
"logTimings" : false,
"metricTypes" : [ "CLASS_LOADER", "JVM_MEMORY", "JVM_GC", "PROCESSOR", "JVM_THREAD", "LOGGING_METRICS", "NATIVE" ],
"outputDataFormat" : "JSON",
"uploadsDirectory" : "file-uploads/"
},
"steps" : [ {
"@type" : "ModelStep",
"inputColumnNames" : { },
"inputNames" : [ "input" ],
"inputSchemas" : { },
"modelConfig" : {
"@type" : "ModelConfig",
"modelConfigType" : {
"modelLoadingPath" : "C:\\konduit-serving-examples\\java\\target\\classes\\data\\keras\\embedding_lstm_tensorflow_2.h5",
"modelType" : "KERAS"
},
"tensorDataTypesConfig" : null
},
"normalizationConfig" : null,
"outputColumnNames" : { },
"outputNames" : [ "lstm_1" ],
"outputSchemas" : { },
"parallelInferenceConfig" : {
"batchLimit" : 32,
"inferenceMode" : "BATCHED",
"maxTrainEpochs" : 1,
"queueLimit" : 64,
"vertxConfigJson" : null,
"workers" : 1
}
} ]
}
Last updated
Was this helpful?