Grafana Plugin (grafana/v1-alpha
)
The Grafana plugin is an optional plugin that can be used to scaffold Grafana Dashboards to allow you to check out the default metrics which are exported by projects using controller-runtime.
When to use it ?
- If you are looking to observe the metrics exported by controller metrics and collected by Prometheus via Grafana.
How to use it ?
Prerequisites:
- Your project must be using controller-runtime to expose the metrics via the controller default metrics and they need to be collected by Prometheus.
- Access to Prometheus.
- Prometheus should have an endpoint exposed. (For
prometheus-operator
, this is similar as: http://prometheus-k8s.monitoring.svc:9090 ) - The endpoint is ready to/already become the datasource of your Grafana. See Add a data source
- Prometheus should have an endpoint exposed. (For
- Access to Grafana. Make sure you have:
- Dashboard edit permission
- Prometheus Data source
Basic Usage
The Grafana plugin is attached to the init
subcommand and the edit
subcommand:
# Initialize a new project with grafana plugin
kubebuilder init --plugins grafana.kubebuilder.io/v1-alpha
# Enable grafana plugin to an existing project
kubebuilder edit --plugins grafana.kubebuilder.io/v1-alpha
The plugin will create a new directory and scaffold the JSON files under it (i.e. grafana/controller-runtime-metrics.json
).
Show case:
See an example of how to use the plugin in your project:
Now, let’s check how to use the Grafana dashboards
- Copy the JSON file
- Visit
<your-grafana-url>/dashboard/import
to import a new dashboard. - Paste the JSON content to
Import via panel json
, then pressLoad
button - Select the data source for Prometheus metrics
- Once the json is imported in Grafana, the dashboard is ready.
Grafana Dashboard
Controller Runtime Reconciliation total & errors
- Metrics:
- controller_runtime_reconcile_total
- controller_runtime_reconcile_errors_total
- Query:
- sum(rate(controller_runtime_reconcile_total{job=“$job”}[5m])) by (instance, pod)
- sum(rate(controller_runtime_reconcile_errors_total{job=“$job”}[5m])) by (instance, pod)
- Description:
- Per-second rate of total reconciliation as measured over the last 5 minutes
- Per-second rate of reconciliation errors as measured over the last 5 minutes
- Sample:
Controller CPU & Memory Usage
- Metrics:
- process_cpu_seconds_total
- process_resident_memory_bytes
- Query:
- rate(process_cpu_seconds_total{job=“$job”, namespace=“$namespace”, pod=“$pod”}[5m]) * 100
- process_resident_memory_bytes{job=“$job”, namespace=“$namespace”, pod=“$pod”}
- Description:
- Per-second rate of CPU usage as measured over the last 5 minutes
- Allocated Memory for the running controller
- Sample:
Seconds of P50/90/99 Items Stay in Work Queue
- Metrics
- workqueue_queue_duration_seconds_bucket
- Query:
- histogram_quantile(0.50, sum(rate(workqueue_queue_duration_seconds_bucket{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name, le))
- Description
- Seconds an item stays in workqueue before being requested.
- Sample:
Seconds of P50/90/99 Items Processed in Work Queue
- Metrics
- workqueue_work_duration_seconds_bucket
- Query:
- histogram_quantile(0.50, sum(rate(workqueue_work_duration_seconds_bucket{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name, le))
- Description
- Seconds of processing an item from workqueue takes.
- Sample:
Add Rate in Work Queue
- Metrics
- workqueue_adds_total
- Query:
- sum(rate(workqueue_adds_total{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name)
- Description
- Per-second rate of items added to work queue
- Sample:
Retries Rate in Work Queue
- Metrics
- workqueue_retries_total
- Query:
- sum(rate(workqueue_retries_total{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name)
- Description
- Per-second rate of retries handled by workqueue
- Sample:
Number of Workers in Use
- Metrics
- controller_runtime_active_workers
- Query:
- controller_runtime_active_workers{job=“$job”, namespace=“$namespace”}
- Description
- The number of active controller workers
- Sample:
WorkQueue Depth
- Metrics
- workqueue_depth
- Query:
- workqueue_depth{job=“$job”, namespace=“$namespace”}
- Description
- Current depth of workqueue
- Sample:
Unfinished Seconds
- Metrics
- workqueue_unfinished_work_seconds
- Query:
- rate(workqueue_unfinished_work_seconds{job=“$job”, namespace=“$namespace”}[5m])
- Description
- How many seconds of work has done that is in progress and hasn’t been observed by work_duration.
- Sample:
Visualize Custom Metrics
The Grafana plugin supports scaffolding manifests for custom metrics.
Generate Config Template
When the plugin is triggered for the first time, grafana/custom-metrics/config.yaml
is generated.
---
customMetrics:
# - metric: # Raw custom metric (required)
# type: # Metric type: counter/gauge/histogram (required)
# expr: # Prom_ql for the metric (optional)
# unit: # Unit of measurement, examples: s,none,bytes,percent,etc. (optional)
Add Custom Metrics to Config
You can enter multiple custom metrics in the file. For each element, you need to specify the metric
and its type
.
The Grafana plugin can automatically generate expr
for visualization.
Alternatively, you can provide expr
and the plugin will use the specified one directly.
---
customMetrics:
- metric: memcached_operator_reconcile_total # Raw custom metric (required)
type: counter # Metric type: counter/gauge/histogram (required)
unit: none
- metric: memcached_operator_reconcile_time_seconds_bucket
type: histogram
Scaffold Manifest
Once config.yaml
is configured, you can run kubebuilder edit --plugins grafana.kubebuilder.io/v1-alpha
again.
This time, the plugin will generate grafana/custom-metrics/custom-metrics-dashboard.json
, which can be imported to Grafana UI.
Show case:
See an example of how to visualize your custom metrics:
Subcommands
The Grafana plugin implements the following subcommands:
-
edit (
$ kubebuilder edit [OPTIONS]
) -
init (
$ kubebuilder init [OPTIONS]
)
Affected files
The following scaffolds will be created or updated by this plugin:
grafana/*.json
Further resources
- Check out video to show how it works
- Checkout the video to show how the custom metrics feature works
- Refer to a sample of
servicemonitor
provided by kustomize plugin - Check the plugin implementation
- Grafana Docs of importing JSON file
- The usage of servicemonitor by Prometheus Operator