LogoLogo
HomeCommunity
CN master
CN master
  • Introduction
  • Quickstart
    • Pip package
    • Command line interface (CLI)
    • Python SDK
  • Installation
  • Building from source
  • YAML configurations
  • GitHub
  • Examples
    • Python
      • TensorFlow (1.x)
        • MNIST
        • BERT
      • Deeplearning4j (DL4J)
      • DataVec
      • Open Neural Network Exchange (ONNX)
      • Keras (TensorFlow 2.0)
    • Java
      • TensorFlow (2.x)
        • MNIST
        • BERT
      • Deeplearning4j (DL4J)
      • DataVec
      • Open Neural Network Exchange (ONNX)
      • Keras (TensorFlow 2.0)
  • Model monitoring
    • Monitoring with Grafana
  • Steps
    • Data transformation pipeline steps
    • Image loading pipeline steps
    • Python pipeline steps
    • Java pipeline steps
    • Model pipeline steps
  • Server
    • Server
  • Client
    • Python client configuration
Powered by GitBook
On this page
  • Concepts
  • Konduit Serving metrics endpoint
  • Prometheus
  • Grafana
  • Installation
  • Usage
  • Start Konduit server
  • Start Prometheus server
  • Start Grafana server
  • Obtaining a prediction
  • Stop server
  • References

Was this helpful?

  1. Model monitoring

Monitoring with Grafana

Prometheus and Grafana can be used for displaying metrics to assist with troubleshooting production systems.

PreviousKeras (TensorFlow 2.0)NextData transformation pipeline steps

Last updated 5 years ago

Was this helpful?

Concepts

Konduit Serving metrics endpoint

For monitoring, the REST API of a Konduit Serving instance exposes a /metrics endpoint that returns metrics in the Prometheus format.

By default, metrics returned by the metrics endpoint include

  • average CPU load;

  • memory use;

  • I/O wait time;

  • GPU bandwidth device to device, bandwidth device to host, current load for device, current available memory for each GPU; and

  • CPU current load for device, current available memory.

The metrics above are implemented by the . The metrics endpoint also returns Micrometer JVM and system metrics via the ClassLoaderMetrics, JvmMemoryMetrics, JvmGcMetrics, ProcessorMetrics and JvmThreadMetrics binders. See the for descriptions of these classes. Error, warning, info, debug and trace counts are monitored using Micrometer's .

Prometheus

Prometheus is a widely used time series database for tracking system metrics used for debugging production systems. This includes common metrics used to troubleshoot problems with production applications such as:

  • Out of memory

  • Latency

For machine learning, we may include other metrics to help debug things such as:

  • Compute time for a neural net

  • ETL creation (number of times it takes to convert raw data to a minibatch or NumPy ndarray)

Prometheus works by pulling data from the specified sources. A Prometheus instance is configured by a YAML file such as:

# Global configurations
global:
  scrape_interval:     5s # Set the scrape interval to every 5 seconds.
  evaluation_interval: 5s # Evaluate rules every 5 seconds.
scrape_configs:
  - job_name: 'scrape'
    static_configs:
    - targets: [ 'localhost:1337']

The main component to configure is targets. targets is where you specify the source to pull data from. A Konduit Serving instance exposes metrics to be picked up by Prometheus from http://<hostname>:<port>/metrics.

Grafana

Grafana allows you to declare a dashboard as a JSON file. An imported Grafana dashboard will show some pre-configured metrics. You can always extend/add more metrics in the Grafana GUI and re-export the configuration.

Installation

Usage

Start Konduit server

In this folder, run the following in a command line

konduit serve --config ../../yaml/simple.yaml

Start Prometheus server

In this example, we use Prometheus to monitor the Konduit Serving instance.

Copy the prometheus.yml file in this directory to the location of your Prometheus binary. Then, run:

./prometheus --config.file=prometheus_quickstart.yml

Omit the ./ if you're running Prometheus on cmd.exe. The ./ suffix is required on PowerShell.

By default, Prometheus runs on port 9090.

Start Grafana server

In this example, we use Grafana, which provides a dashboard to visualize data from the Prometheus instance.

In your browser, openlocalhost:3000. Login with the username admin and password admin.

On the bar on the left, mouse over on the + button, then click on Import.

On the next page, enter a name for your dashboard (such as Pipeline Metrics). Click the Import button:

Your Grafana dashboard will render on the next page. This dashboard contains metrics for system load and memory as well as timings for performing inference and ETL.

Obtaining a prediction

Use the predict-numpy command:

konduit predict-numpy --config ../../yaml/simple.yaml --numpy_data ../../data/simple/input_arr.npy

Stop server

Remember to stop the Konduit Serving instance with

konduit stop-server --config ../../yaml/simple.yaml

References

This YAML file contains a global configuration and a section. See for details.

is a dashboard system for pulling data from different sources and displaying it in real time. It can be used to visualize output from Prometheus.

Konduit Serving: Follow the to build a Konduit Serving JAR file and install the konduit Python module.

Prometheus: Download a for your OS architecture and unzip to a location on your local drive.

Grafana: Install Grafana from Grafana's page. See the for platform-specific instructions.

The following instructions assume that you're in the of the repository.

This creates a local Konduit Serving instance using the YAML configuration file at port 1337.

See the Grafana installation instructions for your platform (, , , ) for instructions to start a Grafana service or, optionally, have Grafana initialize on startup. If you use the Windows installer to install Grafana, will run Grafana automatically at startup, and there is no need to initialize the Grafana server instance.

Next, add a Prometheus data source. Click on Add Data Source > Prometheus, then insert the HTTP URL in the following page.

Copy and paste the JSON in into the import page as follows, then click the Load button:

Grafana support for Prometheus:

NativeMetrics class
Micrometer documentation
LogbackMetrics binder
scrap_config
Prometheus's configuration documentation
Grafana
installation steps
precompiled Prometheus binary
Downloads
Grafana installation documentation
monitoring/quickstart directory
KonduitAI/konduit-serving-examples
simple.yaml
Windows
macOS
Ubuntu / Debian
Centos / Redhat
NSSM
http://localhost:9090
dashboard.json
https://prometheus.io/docs/visualization/grafana/