Konduit-Serving is a framework focused on deploying machine learning pipelines to production.
Konduit-Serving provides building blocks for developers to write their own production machine learning pipelines from pre-processing to model serving, exposable as a simple REST API. It also allows embedding Python/Java code (pre/post processing, custom models). It primarily focuses on server and edge deployments using REST and gRPC endpoints. Pipelines are deployed and defined using JSON/YAML or a command line interface.
The core abstraction is an idea called a pipeline step. A pipeline step performs a task such as:
- 1.Pre-processing steps
- 2.Running one or more machine learning models
- 3.Post-processing steps: transforming the output in a way that can be understood by humans, such as labels in a classification example,
as part of using a machine learning model in a deployment scenario.
TensorflowStep, KerasStep and Dl4jStepperforms inference on TensorFlow, Keras, Deeplearning4j (DL4J), respectively. Similarly, there are multiple steps defined in Konduit-Serving for setting up pre/post-processing in the machine learning pipeline.
A custom pipeline step can be built using a
PythonStep. This allows you to embed pre/post-processing steps into your machine learning pipeline, or to serve models built in frameworks that do not have built-in model steps such as scikit-learn and PyTorch.
Konduit-Serving utilizes many popular and performant libraries. Out of them the major ones are Vert.x and Deeplearning4J.
Vert. x is an open source, reactive and polyglot software development toolkit and has great support for Java language. For Konduit-Serving, Vert.x is used for building and managing CLI, Web-servers and implementing data transmission backends through multiple protocols like gRPC, MQTT and HTTP. It also implements a Kafka client setup and can link with a Kafka backend server.
Vert.x also allows us to scale our Konduit-Serving deployments very efficiently on multiple nodes. Although this feature hasn’t been tested thoroughly, Konduit-Serving has the ability to do it. The following stack shows how the different components in the stack work together.
Konduit-Serving runs dl4j through an interface agnostic architecture. This means Konduit-Serving is not aware of which backend it’s running. When compiling Konduit-Serving, you can use the command line to build a specific spin of Konduit-Serving optimized for a particular use case.
We have also added support for running DL4J on Aurora from Java. This means running Aurora math workloads from java via compiled libnd4j c++ that uses NCC. We’ve managed to get DL4J compiled on Aurora running via JavaCpp. An up to date link can be found here.
A build of Konduit-Serving that runs applications on aurora is pending, but possible based on the current work that’s been done.
In our case, when running Konduit-Serving in production a particular spin can be used for certain operating systems and hardware allowing for testing on 1 platform, but deployment on another.
The principal way a user would use aurora on Konduit-Serving and dl4j is the samediff framework. Aurora is accessible in a transparent way just by setting up a jar file containing the backend for aurora. GPUs are accessible in a similar way.
The SameDiff pipeline step is the component to use in order to make aurora accessible to Konduit-Serving.
SameDiffStep configuration looks like this from java:
Pipeline p = SequencePipeline.builder()
The configuration above represents a standalone model that runs a model specified as a file. A SameDiff model is saved as a flatbuffers file. This file is a descriptor containing the graph layout and associated weights for each ndarray embedded in the samediff graph.
In order to input variables in to a graph, graphs in different frameworks (including tensorflow and pytorch) rely on a concept called PlaceHolders. Placeholders are what you pass in to a graph where an input variable is passed in and replaces a stub element in the graph. A SameDiffStep allows you to specify the input and output placeholders to get different outputs out of the graph accessible by name. All pipeline steps that use a graph based framework require these input and output steps.
Sometimes, these inputs and outputs can be automatically inferred, but it’s generally a good idea. Specifying less outputs allows a user to specify only what outputs they want rather than everything.
The following main model pipeline step execution framework are supported by Konduit-Serving:
- 3.Others which can be converted into ONNX)
- 4.PMML (Experimental)
- 5.Predictive Model Markup Language also a platform interchange format for mostly all of the traditional ML models such as random forest.
- 4.Others with supports PMML conversion
- 5.Custom models
- 6.Through custom Python or Java code
Through custom Python or Java code