Monitoring MQTT broker with Prometheus and Grafana

2020-06-09

IoT MQTT broker server - EMQ X provides plugin emqx_statsd for exporting the operating metrics of EMQ X, and status data of Erlang VM to the third-party monitoring system such as Prometheus. It can collect the metrics related to the Linux server through Prometheus own node-exporter to implement server + EMQ X overall operation and maintenance monitoring.

This article provides the building process based on Prometheus and Grafana for EMQ X MQTT broker operation and maintenance monitoring solution.

Installation and preparation

Docker pull

docker pull prom/node-exporter
docker pull prom/prometheus
docker pull prom/pushgateway

Running node-exporter

It is optional, used to collect server indicators such as CPU, memory, network, etc. If users use Docker to install, need to map the status files responded by the target server.

docker run -d -p 9100:9100 \
  -v "/proc:/host/proc:ro" \
  -v "/sys:/host/sys:ro" \
  -v "/:/rootfs:ro" \
  --net="host" \
  prom/node-exporter

Running pushgateway

pushgateway is used to receive metrics pushing from EMQ X, needs to ensure EMQ X can access.

docker run -d -p 9091:9091 prom/pushgateway

Running Prometheus

Specify the configuration file and listening port to run Prometheus:

docker run -p 9090:9090 \
    -v $PWD/prometheus.yaml:/etc/prometheus/prometheus.yaml \
    -d prom/prometheus \
    --config.file=/etc/prometheus/prometheus.yaml

The example of prometheus configuration file prometheus.yaml is as follows. Please refer Prometheus documentation for detailed meanings:

# prometheus.yaml
global:
  scrape_interval:     10s # the default grasping time
  evaluation_interval: 10s # evaluate rules once every 10 seconds

# The following data will be generated by default in every time series of local, mainly used for union query, remote storage and Alertmanager 
  external_labels:
      monitor: 'emqx-monitor'

# Load rules, regularly evaluate rule according to evaluation_interval
rule_files:
  # - "first.rules"
  # - "second.rules"
  - "/etc/prometheus/rules/*.rules"

# Configuration of pull data
scrape_configs:
  # Means that every time series in this configuration will automatically add this label {job_name:"prometheus"}
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['127.0.0.1:9090']

    # Server physical machine monitoring
  - job_name: 'node-exporter'
    scrape_interval: 5s
    static_configs:
      # node-exporter fill in according to the actual situation
      - targets: ['192.168.6.11:9100']
        labels:
          instance: wivwiv-local


  # EMQ X Pushgateway monitoring
  - job_name: 'pushgateway'
    scrape_interval: 5s
    honor_labels: true
    static_configs:
      # pushgateway fill in according to the actual situation
      - targets: ['192.168.6.11:9091']

Enable plugin EMQ X statsd

Open etc/emqx_statsd.conf to confirm the following configuration:

## pushgateway address
statsd.push.gateway.server = http://127.0.0.1:9091
## the cycle of data collection/push, measured in millisecond
statsd.interval = 15000

Enable plugin:

./bin/emqx_ctl load plugins emqx_statsd

Check the effect

You can view whether the component runs successfully through this command docker ps -a. After waiting for multiple push cycles, you can open http://localhost:9090 Prometheus control panel to view collected data.

Prometheus only provides simple chart data displays. If you need more delicate visual displays, please use it with Grafana.

image20191205152623657.png

Integrate Grafana

Grafana is an open source and universal measurement analysis and visual displays tool. It can display custom reports, charts, etc through data sources(such as various kinds of databases and open source components).

Running Grafana

Pull and Running Grafana mirroring through Docker:

docker run -d --name=grafana -p 3000:3000 grafana/grafana

After running successfully , you can open the Dashboard console through visiting http://127.0.0.1:3000.

Configure Prometheus data source

Adding data source in Grafana, then select Prometheus and fill in the correct address to finish adding data.

image20200507160650116.png

Import Grafana template data

The plugin emqx_statsd provides the template files of Grafana and Dashboard. These templates include the display of most of EMQ X monitoring data. Users can directly import to Grafana for displaying the icon of EMQ X monitoring status.

The template files located in emqx_statsd/grafana_template. Part of chart data may be displayed incorrectly, because of the difference of EMQ X version. Please adjust it manually.

Click button Upload.json file, then select the corresponding folder and data source after importing.

image20200507161318909.png

Display the effect

After finishing building and running a full set of systems for a while, the data collected by Prometheus will be displayed in Grafana. The default display effect of the template is as follows:

  • EMQ Dashboard: includes the historical statistics of connection, message, topic and throughput.
  • EMQ: includes the historical statistics for the number of client, subscriptions, topics, messages, packet, and other business information.
  • ErlangVM: the number of every EMQ X node Erlang virtual machine process/thread, the historical statistics of ETS/Mnesia database usage.

If you have other requirements, can refer to the attachment 「all metrics of EMQX-STATSD」and use Grafana to display diagram data.

image20200507162019439.png

image20200507161948328.png

image20200507161914310.png

Alarm management

Prometheus and Grafana support the metrics alarm function. After configuring alarm rules, the server will constantly evaluate the set rule and current indicator data. The server will send a notification when rule conditions are met.

The length of the article is limited, please check subsequent articles for alarm configuration and practice.

Attached are all metrics of EMQX-STATSD

EMQ X MQTT broker push metrics data through Prometheus push gateway, supports the following indicator items.

# TYPE erlang_vm_ets_limit gauge
erlang_vm_ets_limit 256000
# TYPE erlang_vm_logical_processors gauge
erlang_vm_logical_processors 4
# TYPE erlang_vm_logical_processors_available gauge
erlang_vm_logical_processors_available NaN
# TYPE erlang_vm_logical_processors_online gauge
erlang_vm_logical_processors_online 4
# TYPE erlang_vm_port_count gauge
erlang_vm_port_count 16
# TYPE erlang_vm_port_limit gauge
erlang_vm_port_limit 1048576
# TYPE erlang_vm_process_count gauge
erlang_vm_process_count 320
# TYPE erlang_vm_process_limit gauge
erlang_vm_process_limit 2097152
# TYPE erlang_vm_schedulers gauge
erlang_vm_schedulers 4
# TYPE erlang_vm_schedulers_online gauge
erlang_vm_schedulers_online 4
# TYPE erlang_vm_smp_support untyped
erlang_vm_smp_support 1
# TYPE erlang_vm_threads untyped
erlang_vm_threads 1
# TYPE erlang_vm_thread_pool_size gauge
erlang_vm_thread_pool_size 4
# TYPE erlang_vm_time_correction untyped
erlang_vm_time_correction 1
# TYPE erlang_vm_statistics_context_switches counter
erlang_vm_statistics_context_switches 20767
# TYPE erlang_vm_statistics_garbage_collection_number_of_gcs counter
erlang_vm_statistics_garbage_collection_number_of_gcs 3924
# TYPE erlang_vm_statistics_garbage_collection_words_reclaimed counter
erlang_vm_statistics_garbage_collection_words_reclaimed 6751048
# TYPE erlang_vm_statistics_garbage_collection_bytes_reclaimed counter
erlang_vm_statistics_garbage_collection_bytes_reclaimed 54008384
# TYPE erlang_vm_statistics_bytes_received_total counter
erlang_vm_statistics_bytes_received_total 23332
# TYPE erlang_vm_statistics_bytes_output_total counter
erlang_vm_statistics_bytes_output_total 21266
# TYPE erlang_vm_statistics_reductions_total counter
erlang_vm_statistics_reductions_total 18413181
# TYPE erlang_vm_statistics_run_queues_length_total gauge
erlang_vm_statistics_run_queues_length_total 0
# TYPE erlang_vm_statistics_runtime_milliseconds counter
erlang_vm_statistics_runtime_milliseconds 1782
# TYPE erlang_vm_statistics_wallclock_time_milliseconds counter
erlang_vm_statistics_wallclock_time_milliseconds 68277
# TYPE erlang_vm_memory_atom_bytes_total gauge
erlang_vm_memory_atom_bytes_total{usage="used"} 1507142
erlang_vm_memory_atom_bytes_total{usage="free"} 18787
# TYPE erlang_vm_memory_bytes_total gauge
erlang_vm_memory_bytes_total{kind="system"} 63949544
erlang_vm_memory_bytes_total{kind="processes"} 45457848
# TYPE erlang_vm_dets_tables gauge
erlang_vm_dets_tables 0
# TYPE erlang_vm_ets_tables gauge
erlang_vm_ets_tables 115
# TYPE erlang_vm_memory_processes_bytes_total gauge
erlang_vm_memory_processes_bytes_total{usage="used"} 45457696
erlang_vm_memory_processes_bytes_total{usage="free"} 152
# TYPE erlang_vm_memory_system_bytes_total gauge
erlang_vm_memory_system_bytes_total{usage="atom"} 1525929
erlang_vm_memory_system_bytes_total{usage="binary"} 104504
erlang_vm_memory_system_bytes_total{usage="code"} 26779999
erlang_vm_memory_system_bytes_total{usage="ets"} 7685312
erlang_vm_memory_system_bytes_total{usage="other"} 27853800
# TYPE erlang_mnesia_held_locks gauge
erlang_mnesia_held_locks 0
# TYPE erlang_mnesia_lock_queue gauge
erlang_mnesia_lock_queue 0
# TYPE erlang_mnesia_transaction_participants gauge
erlang_mnesia_transaction_participants 0
# TYPE erlang_mnesia_transaction_coordinators gauge
erlang_mnesia_transaction_coordinators 0
# TYPE erlang_mnesia_failed_transactions counter
erlang_mnesia_failed_transactions 21
# TYPE erlang_mnesia_committed_transactions counter
erlang_mnesia_committed_transactions 128
# TYPE erlang_mnesia_logged_transactions counter
erlang_mnesia_logged_transactions 3
# TYPE erlang_mnesia_restarted_transactions counter
erlang_mnesia_restarted_transactions 0
# TYPE emqx_connections_count gauge
emqx_connections_count 0
# TYPE emqx_connections_max gauge
emqx_connections_max 0
# TYPE emqx_sessions_count gauge
emqx_sessions_count 0
# TYPE emqx_sessions_max gauge
emqx_sessions_max 0
# TYPE emqx_topics_count gauge
emqx_topics_count 0
# TYPE emqx_topics_max gauge
emqx_topics_max 0
# TYPE emqx_suboptions_count gauge
emqx_suboptions_count 0
# TYPE emqx_suboptions_max gauge
emqx_suboptions_max 0
# TYPE emqx_subscribers_count gauge
emqx_subscribers_count 0
# TYPE emqx_subscribers_max gauge
emqx_subscribers_max 0
# TYPE emqx_subscriptions_count gauge
emqx_subscriptions_count 0
# TYPE emqx_subscriptions_max gauge
emqx_subscriptions_max 0
# TYPE emqx_subscriptions_shared_count gauge
emqx_subscriptions_shared_count 0
# TYPE emqx_subscriptions_shared_max gauge
emqx_subscriptions_shared_max 0
# TYPE emqx_routes_count gauge
emqx_routes_count 0
# TYPE emqx_routes_max gauge
emqx_routes_max 0
# TYPE emqx_retained_count gauge
emqx_retained_count 3
# TYPE emqx_retained_max gauge
emqx_retained_max 3
# TYPE emqx_vm_cpu_use gauge
emqx_vm_cpu_use 12.029950083194677
# TYPE emqx_vm_cpu_idle gauge
emqx_vm_cpu_idle 87.97004991680532
# TYPE emqx_vm_run_queue gauge
emqx_vm_run_queue 1
# TYPE emqx_vm_process_messages_in_queues gauge
emqx_vm_process_messages_in_queues 0
# TYPE emqx_bytes_received counter
emqx_bytes_received 0
# TYPE emqx_bytes_sent counter
emqx_bytes_sent 0
# TYPE emqx_packets_received counter
emqx_packets_received 0
# TYPE emqx_packets_sent counter
emqx_packets_sent 0
# TYPE emqx_packets_connect counter
emqx_packets_connect 0
# TYPE emqx_packets_connack_sent counter
emqx_packets_connack_sent 0
# TYPE emqx_packets_connack_error counter
emqx_packets_connack_error 0
# TYPE emqx_packets_connack_auth_error counter
emqx_packets_connack_auth_error 0
# TYPE emqx_packets_publish_received counter
emqx_packets_publish_received 0
# TYPE emqx_packets_publish_sent counter
emqx_packets_publish_sent 0
# TYPE emqx_packets_publish_inuse counter
emqx_packets_publish_inuse 0
# TYPE emqx_packets_publish_error counter
emqx_packets_publish_error 0
# TYPE emqx_packets_publish_auth_error counter
emqx_packets_publish_auth_error 0
# TYPE emqx_packets_publish_dropped counter
emqx_packets_publish_dropped 0
# TYPE emqx_packets_puback_received counter
emqx_packets_puback_received 0
# TYPE emqx_packets_puback_sent counter
emqx_packets_puback_sent 0
# TYPE emqx_packets_puback_inuse counter
emqx_packets_puback_inuse 0
# TYPE emqx_packets_puback_missed counter
emqx_packets_puback_missed 0
# TYPE emqx_packets_pubrec_received counter
emqx_packets_pubrec_received 0
# TYPE emqx_packets_pubrec_sent counter
emqx_packets_pubrec_sent 0
# TYPE emqx_packets_pubrec_inuse counter
emqx_packets_pubrec_inuse 0
# TYPE emqx_packets_pubrec_missed counter
emqx_packets_pubrec_missed 0
# TYPE emqx_packets_pubrel_received counter
emqx_packets_pubrel_received 0
# TYPE emqx_packets_pubrel_sent counter
emqx_packets_pubrel_sent 0
# TYPE emqx_packets_pubrel_missed counter
emqx_packets_pubrel_missed 0
# TYPE emqx_packets_pubcomp_received counter
emqx_packets_pubcomp_received 0
# TYPE emqx_packets_pubcomp_sent counter
emqx_packets_pubcomp_sent 0
# TYPE emqx_packets_pubcomp_inuse counter
emqx_packets_pubcomp_inuse 0
# TYPE emqx_packets_pubcomp_missed counter
emqx_packets_pubcomp_missed 0
# TYPE emqx_packets_subscribe_received counter
emqx_packets_subscribe_received 0
# TYPE emqx_packets_subscribe_error counter
emqx_packets_subscribe_error 0
# TYPE emqx_packets_subscribe_auth_error counter
emqx_packets_subscribe_auth_error 0
# TYPE emqx_packets_suback_sent counter
emqx_packets_suback_sent 0
# TYPE emqx_packets_unsubscribe_received counter
emqx_packets_unsubscribe_received 0
# TYPE emqx_packets_unsubscribe_error counter
emqx_packets_unsubscribe_error 0
# TYPE emqx_packets_unsuback_sent counter
emqx_packets_unsuback_sent 0
# TYPE emqx_packets_pingreq_received counter
emqx_packets_pingreq_received 0
# TYPE emqx_packets_pingresp_sent counter
emqx_packets_pingresp_sent 0
# TYPE emqx_packets_disconnect_received counter
emqx_packets_disconnect_received 0
# TYPE emqx_packets_disconnect_sent counter
emqx_packets_disconnect_sent 0
# TYPE emqx_packets_auth_received counter
emqx_packets_auth_received 0
# TYPE emqx_packets_auth_sent counter
emqx_packets_auth_sent 0
# TYPE emqx_messages_received counter
emqx_messages_received 0
# TYPE emqx_messages_sent counter
emqx_messages_sent 0
# TYPE emqx_messages_qos0_received counter
emqx_messages_qos0_received 0
# TYPE emqx_messages_qos0_sent counter
emqx_messages_qos0_sent 0
# TYPE emqx_messages_qos1_received counter
emqx_messages_qos1_received 0
# TYPE emqx_messages_qos1_sent counter
emqx_messages_qos1_sent 0
# TYPE emqx_messages_qos2_received counter
emqx_messages_qos2_received 0
# TYPE emqx_messages_qos2_sent counter
emqx_messages_qos2_sent 0
# TYPE emqx_messages_publish counter
emqx_messages_publish 0
# TYPE emqx_messages_dropped counter
emqx_messages_dropped 0
# TYPE emqx_messages_dropped_expired counter
emqx_messages_dropped_expired 0
# TYPE emqx_messages_dropped_no_subscribers counter
emqx_messages_dropped_no_subscribers 0
# TYPE emqx_messages_forward counter
emqx_messages_forward 0
# TYPE emqx_messages_retained counter
emqx_messages_retained 2
# TYPE emqx_messages_delayed counter
emqx_messages_delayed 0
# TYPE emqx_messages_delivered counter
emqx_messages_delivered 0
# TYPE emqx_messages_acked counter
emqx_messages_acked 0
# TYPE emqx_delivery_dropped counter
emqx_delivery_dropped 0
# TYPE emqx_delivery_dropped_no_local counter
emqx_delivery_dropped_no_local 0
# TYPE emqx_delivery_dropped_too_large counter
emqx_delivery_dropped_too_large 0
# TYPE emqx_delivery_dropped_qos0_msg counter
emqx_delivery_dropped_qos0_msg 0
# TYPE emqx_delivery_dropped_queue_full counter
emqx_delivery_dropped_queue_full 0
# TYPE emqx_delivery_dropped_expired counter
emqx_delivery_dropped_expired 0
# TYPE emqx_client_connected counter
emqx_client_connected 0
# TYPE emqx_client_authenticate counter
emqx_client_authenticate 0
# TYPE emqx_client_auth_anonymous counter
emqx_client_auth_anonymous 0
# TYPE emqx_client_check_acl counter
emqx_client_check_acl 0
# TYPE emqx_client_subscribe counter
emqx_client_subscribe 0
# TYPE emqx_client_unsubscribe counter
emqx_client_unsubscribe 0
# TYPE emqx_client_disconnected counter
emqx_client_disconnected 0
# TYPE emqx_session_created counter
emqx_session_created 0
# TYPE emqx_session_resumed counter
emqx_session_resumed 0
# TYPE emqx_session_takeovered counter
emqx_session_takeovered 0
# TYPE emqx_session_discarded counter
emqx_session_discarded 0
# TYPE emqx_session_terminated counter
emqx_session_terminated 0