DataVec
Konduit Serving supports data transformations defined by the DataVec vectorization and ETL library.
A reference Java project is provided in the Example repository ( https://github.com/KonduitAI/konduit-serving-examples ) with a Maven pom.xml dependencies file. If using the IntelliJ IDEA IDE, open the java folder as a Maven project and run the main function of InferenceModelStepDataVec the class.
Data transformations with DataVec
Schema (source)
A Schema
specifies the structure of your data. In DataVec, a TransformProcess
requires the Schema
of the data to be specified. Both schema and transform process classes come with a helper Builder class which are useful for organizing code and avoiding complex constructors.
Schema
objects have a number of methods that define different data types for columns: addColumnsString()
, addColumnInteger()
, addColumnLong()
, addColumnFloat()
, addColumnDouble()
and addColumnCategorical()
.
TransformProcess (source)
TransformProcess
provides a number of methods to manipulate your data. The following methods are available in the Datavec API:
Reduce the number of rows:
filter()
General data transformations:
replaceStringTransform()
,replaceMapTransform()
,Type casting:
stringToTimeTransform()
,transform()
,categoricalToInteger()
,Combining/reducing the values in each column:
reduce()
String operations:
appendStringColumnTransform()
,toLowerCase()
,toUpperCase()
,stringRemoveWhitespaceTransform()
,replaceStringTransform()
,stringMapTransform()
Column selection/renaming:
removeColumns()
,removeAllColumnsExceptFor()
,renameColumn()
One-hot encoding:
categoricalToOneHot()
,integerToOneHot()
In this short example, we append the string two
to the end of values in the string column first
. As an initial step, define the input and output Schema with string column:
Configure the step
The TransformProcess
can now be defined in the Konduit Serving configuration with a TransformProcessStep
. Here, we
configure the inputs and outputs: the schema, column names and data types should be defined here.
declare the
TransformProcess
using the.transformProcess()
method.
Note that Schema
data types are not defined in the same way as PythonStep
data types. See the source for a complete list of supported Schema data types:
NDArray
String
Boolean
Categorical
Float
Double
Integer
Long
Bytes
You should define the Schema data types in TransformProcessStep()
as strings.
Configure the server
Configure the Server using ServingConfig
to define the port using the httpPort
argument.
The inferenceConfiguration
is stored as a JSON File.
Inference
The Client
should be configured to match the Konduit Serving instance. As this example is run on a local computer, the server is located at host 'http://localhost'
and port port
. And Finally, we run the Konduit Serving instance with the saved config.json file path as configPath
and other necessary server configuration arguments.. Recall that the TransformProcessStep()
appends a string two
to strings in the column first
.
A Callback Function onSuccess is implemented in order to post the Client request and get the HttpResponse, only after the successful run of the KonduitServingMain Server.
Confirm the output
After executing the above, in order to confirm the successful start of the Server, check for the below output text:
The Output of the program is as follows:
The complete inference configuration in JSON format is as follows:
Last updated