Skip to main content

Monitoring Apache Ignite Cluster With Grafana (Part 1)

Apache Ignite is built on JVM and not a set-it-and-forget-it system. So, like other distributed systems, it requires monitoring for acting on time. However, Apache Ignite provides a web application named Ignite Web Console to manage and monitor the cluster, but it's not enough for system monitoring. You can also use JConsole/VisualVM for monitoring an individual Ignite node and a small number of Ignite nodes. Monitoring an Ignite cluster over 5 nodes by VisualVM or JConsole is unrealistic and time-consuming. Also, JMX does not provide any historical data. So, it is not recommended for production environments. Nowadays, there are a lot of tools available for system monitoring. The most famous of them are:
In this article, we cover the Grafana for monitoring Ignite clusters and provide step-by-step instructions to install and configure the entire stack technology.
Grafana is an open source graphical tool dedicated to query, visualize, and alert all your metrics. It brings your metrics together and lets you create graphs and dashboards based on data from various sources. Also, you can use Grafana to display data from different monitoring systems like Zabbix. It is lightweight, easy to install, easy to configure, and it looks beautiful.
Before we dive into the details, let’s discuss the concept of monitoring large-scale production environments. Figure 1 illustrates a high-level overview of how the monitoring system looks in production environments.
Figure 1.
In the above figure, data such as OS metrics, log files, and application metrics are gathering from various hosts through different protocols like JMX, SNMP into a single time-series database. Next, all the gathered data is used to display on a dashboard for real-time monitoring. However, a monitoring system could be complicated and vary in different environments.
Portions of this article were taken from the book The Apache Ignite book. If it got you interested, check out the rest of the book for more helpful information.
Let’s start at the bottom of the monitoring chain and work our way up. To avoid a complete lesson on monitoring, we will only cover the basics along with what the most common checks should be done as they relate to Ignite and it's operation. The data we are planning to use for monitoring are:
  • Ignite node Java Heap.
  • Ignite cluster topology version.
  • Amount of server or client nodes in the cluster.
  • Ignite node total up time.
The stack technologies we use for monitoring the Ignite cluster comprise three components: InfluxDB, Grafana, and jmxtrans. The high-level architecture of our monitoring system is shown in the figure below.
Figure 2
Ignite nodes does not send the MBeans metrics to the InfluxDB directly. We use jmxtrans, which collects the JMX metrics and send to the InfluxDB. Jmxtrans is lightweight and running as a daemon to collect the server metrics. InfluxDB is an open-source time series database developed by InfluxData. It is written in Go and optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring and application metrics.
Next, we install and configure InfluxDB, Grafana, and jmxtrans to collect metrics from the Ignite cluster. We also compose a custom dashboard in Grafana that monitors Ignite cluster resources.
Prerequisites. To follow the instruction to configure the monitoring infrastructure, you need the following:
Name Version
OS MacOS, Windows, *nix
InfluxDB 1.7.1
Grafana 5.4.0
jmxtrans 271-SNAPSHOT
Step 1. The data store for all the metrics from the Ignite cluster will be Influx. Let’s install the InfluxDB first. I am using MacOS, so I will use Homebrew to install InfluxDB. Please visit the InfluxDB website and follow the instructions to install for other operating systems like Windows or Linux.
brew install influxdb
After completing the installation process, launch the database by using the following command:
influxd -config /usr/local/etc/influxdb.conf
InfluxDB running on http://localhost:8086 and provides REST API for manipulating the database objects by default. Also, InfluxDB provides a command line tool named influx to interact with the database. Execute the influx shell script on another console which starts the CLI and automatically connects to the local InfluxDB instance. The output should look as follows:
influx
Connected to http://localhost:8086 version v1.7.1 InfluxDB shell version: v1.7.1
Enter an InfluxQL query
A fresh install of InfluxDB has no database, so let’s create a database to store the Ignite metrics. Enter the following Influx Query Language (a.k.a InfluxQL) statement to create the database.
create database ignitesdb
Now that the ignitesdb database is created, we can use the SHOW DATABASES statement to display all the existing databases.
show databases
name: databases name
----
_internal ignitesdb
Note that the _internal database is created and used by InfluxDB to store internal runtime metrics. To insert or query the database, use USE <db-name> statement, which will automatically set the database for all future requests. For example:
USE ignitesdb
Using database ignitesdb
That's enough for now. In the next part of this article, we will install and configure Grafana, jmxtrans to monitor the Ignite cluster. Stay tuned!

Comments

Popular posts from this blog

8 things every developer should know about the Apache Ignite caching

Any technology, no matter how advanced it is, will not be able to solve your problems if you implement it improperly. Caching, precisely when it comes to the use of a distributed caching, can only accelerate your application with the proper use and configurations of it. From this point of view, Apache Ignite is no different, and there are a few steps to consider before using it in the production environment. In this article, we describe various technics that can help you to plan and adequately use of Apache Ignite as cutting-edge caching technology. Do proper capacity planning before using Ignite cluster. Do paperwork for understanding the size of the cache, number of CPUs or how many JVMs will be required. Let’s assume that you are using Hibernate as an ORM in 10 application servers and wish to use Ignite as an L2 cache. Calculate the total memory usages and the number of Ignite nodes you have to need for maintaining your SLA. An incorrect number of the Ignite nodes can become a b...

Benchmarking high performance java collection framework

I am an ultimate fan of java high performance framework or library. Java native collection framework always works with primitive wrapper class such as Integer, Float e.t.c. Boxing and unboxing of wrapper class to primitive data type always decrease the java execution performance. Most of us, always looking for such a library or framework to works with primitive data type in collections for increasing performance of Java application. Most of the time i uses javolution framework to get better performance, however, this holiday i have read about a few new java collections frameworks and decided to do some homework benchmarking to find out, how much they could better than Java native collection framework. I have examine two new java collection framework, one of them are fastutil and another one are HPPC. For benchmarking i have used java JMH with mode Throughput. For benchmarking i took similar collection for java ArrayList, HashSet and HasMap from two above described frameworks. Col...

Apache Ignite Baseline Topology by Examples

Ignite Baseline Topology or BLT represents a set of server nodes in the cluster that persists data on disk. Where, N1-2 and N5 server nodes are the member of the Ignite clusters with native persistence which enable data to persist on disk. N3-4 and N6 server nodes are the member of the Ignite cluster but not a part of the baseline topology. The nodes from the baseline topology are a regular server node, that store's data in memory and on the disk, and also participates in computing tasks. Ignite clusters can have different nodes that are not a part of the baseline topology such as: Server nodes that are not used Ignite native persistence to persist data on disk. Usually, they store data in memory or persists data to a 3rd party database or NoSQL. In the above equitation, node N3 or N4 might be one of them. Client nodes that are not stored shared data. To better understand the baseline topology concept, let’s start at the beginning and try to understand its goal and what ...