# DataVec

```java
import ai.konduit.serving.InferenceConfiguration;
import ai.konduit.serving.config.ServingConfig;
import ai.konduit.serving.configprovider.KonduitServingMain;
import ai.konduit.serving.pipeline.step.TransformProcessStep;
import com.mashape.unirest.http.HttpResponse;
import com.mashape.unirest.http.JsonNode;
import com.mashape.unirest.http.Unirest;
import com.mashape.unirest.http.exceptions.UnirestException;
import org.apache.commons.io.FileUtils;
import org.datavec.api.transform.TransformProcess;
import org.datavec.api.transform.schema.Schema;
```

{% hint style="info" %}
A reference Java project is provided in the Example repository ( <https://github.com/KonduitAI/konduit-serving-examples> ) with a Maven pom.xml dependencies file. If using the IntelliJ IDEA IDE, open the java folder as a Maven project and run the main function of InferenceModelStepDataVec the class.
{% endhint %}

## Data transformations with DataVec

### **Schema (**[**source**](https://github.com/deeplearning4j/DataVec/blob/master/datavec-api/src/main/java/org/datavec/api/transform/schema/Schema.java)**)**

A `Schema` specifies the structure of your data. In DataVec, a `TransformProcess` requires the `Schema` of the data to be specified. Both schema and transform process classes come with a helper Builder class which are useful for organizing code and avoiding complex constructors.

`Schema` objects have a number of methods that define different data types for columns: `addColumnsString()`, `addColumnInteger()`, `addColumnLong()`, `addColumnFloat()`, `addColumnDouble()` and `addColumnCategorical()`.

### **TransformProcess (**[**source**](https://github.com/deeplearning4j/DataVec/blob/master/datavec-api/src/main/java/org/datavec/api/transform/TransformProcess.java)**)**

`TransformProcess` provides a number of methods to manipulate your data. The following methods are available in the Datavec API:

* Reduce the number of rows: `filter()`
* General data transformations: `replaceStringTransform()`, `replaceMapTransform()`,
* Type casting: `stringToTimeTransform()`, `transform()`, `categoricalToInteger()`,
* Combining/reducing the values in each column: `reduce()`
* String operations: `appendStringColumnTransform()`, `toLowerCase()`, `toUpperCase()`, `stringRemoveWhitespaceTransform()`, `replaceStringTransform()`, `stringMapTransform()`
* Column selection/renaming: `removeColumns()`, `removeAllColumnsExceptFor()`, `renameColumn()`
* One-hot encoding: `categoricalToOneHot()`, `integerToOneHot()`

In this short example, we append the string `two` to the end of values in the string column `first`. As an initial step, define the input and output Schema with string column:

```java
Schema inputSchema = new Schema.Builder()
    .addColumnString("first")
    .build();

Schema outputSchema = new Schema.Builder()
    .addColumnString("first")
    .build();

TransformProcess transformProcess = new TransformProcess.Builder(inputSchema).
    appendStringColumnTransform("first", "two").build();
```

## Configure the step

The `TransformProcess` can now be defined in the Konduit Serving configuration with a `TransformProcessStep`. Here, we

* **configure the inputs and outputs**: the schema, column names and data types should be defined here.
* **declare the `TransformProcess`** using the `.transformProcess()` method.

Note that `Schema` data types are not defined in the same way as `PythonStep` data types. See the [source](https://github.com/KonduitAI/konduit-serving/blob/78851701004ebb3dbf079889d46b79a9db8fac60/konduit-serving-api/src/main/java/ai/konduit/serving/util/SchemaTypeUtils.java#L154-L195) for a complete list of supported Schema data types:

* `NDArray`
* `String`
* `Boolean`
* `Categorical`
* `Float`
* `Double`
* `Integer`
* `Long`
* `Bytes`

You should define the Schema data types in `TransformProcessStep()` as strings.

```java
TransformProcessStep transformProcessStep = new TransformProcessStep(transformProcess, outputSchema);
```

## Configure the server

Configure the Server using `ServingConfig` to define the port using the `httpPort` argument.

```java
int port = Util.randInt(1000, 65535);

ServingConfig servingConfig = ServingConfig.builder()
    .httpPort(port)
    .build();

InferenceConfiguration inferenceConfiguration = InferenceConfiguration.builder()
    .step(transformProcessStep).servingConfig(servingConfig).build();
```

The `inferenceConfiguration` is stored as a JSON File.

```java
File configFile = new File("config.json");
FileUtils.write(configFile, inferenceConfiguration.toJson(), Charset.defaultCharset());
```

## Inference

The `Client` should be configured to match the Konduit Serving instance. As this example is run on a local computer, the server is located at host `'http://localhost'` and port `port`. And Finally, we run the Konduit Serving instance with the saved **config.json** file path as `configPath` and other necessary server configuration arguments.. Recall that the `TransformProcessStep()` appends a string `two` to strings in the column `first`.

A Callback Function onSuccess is implemented in order to post the Client request and get the HttpResponse, only after the successful run of the KonduitServingMain Server.

```java
KonduitServingMain.builder()
    .onSuccess(() -> {
        try {
            HttpResponse<JsonNode> response = Unirest.post(String.format("http://localhost:%s/raw/json", port))
                    .header("Content-Type", "application/json")
                    .body("{\"first\" :\"value\"}").asJson();

            System.out.println(response.getBody().toString());
            System.exit(0);
        } catch (UnirestException e) {
            e.printStackTrace();
            System.exit(0);
        }
    })
    .build()
    .runMain("--configPath", configFile.getAbsolutePath());
```

## Confirm the output

After executing the above, in order to confirm the successful start of the Server, check for the below output text:

```
Jan 08, 2020 1:36:01 PM ai.konduit.serving.configprovider.KonduitServingMain
INFO: Deployed verticle ai.konduit.serving.verticles.inference.InferenceVerticle
```

The Output of the program is as follows:

```java
System.out.println(response.getBody().toString());
```

```
{"first":"valuetwo"}
```

The complete inference configuration in JSON format is as follows:

```java
System.out.println(inferenceConfiguration.toJson());
```

```
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
{
  "memMapConfig" : null,
  "servingConfig" : {
    "httpPort" : 15614,
    "listenHost" : "localhost",
    "logTimings" : false,
    "metricTypes" : [ "CLASS_LOADER", "JVM_MEMORY", "JVM_GC", "PROCESSOR", "JVM_THREAD", "LOGGING_METRICS", "NATIVE" ],
    "outputDataFormat" : "JSON",
    "uploadsDirectory" : "file-uploads/"
  },
  "steps" : [ {
    "@type" : "TransformProcessStep",
    "inputColumnNames" : {
      "default" : [ "first" ]
    },
    "inputNames" : [ "default" ],
    "inputSchemas" : {
      "default" : [ "String" ]
    },
    "outputColumnNames" : {
      "default" : [ "first" ]
    },
    "outputNames" : [ "default" ],
    "outputSchemas" : {
      "default" : [ "String" ]
    },
    "transformProcesses" : {
      "default" : {
        "actionList" : [ {
          "transform" : {
            "@class" : "org.datavec.api.transform.transform.string.AppendStringColumnTransform",
            "columnName" : "first",
            "toAppend" : "two"
          }
        } ],
        "initialSchema" : {
          "@class" : "org.datavec.api.transform.schema.Schema",
          "columns" : [ {
            "@class" : "org.datavec.api.transform.metadata.StringMetaData",
            "name" : "first"
          } ]
        }
      }
    }
  } ]
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://serving.konduit.ai/0.1.0-snapshot/examples/java/datavec_java.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
