This week The apache Ignite book becomes one of the top books of leanpub

This week The apache Ignite book becomes one of the top books of leanpub.


Monitoring Apache Ignite Cluster With Grafana (Part 1)

Apache Ignite is built on JVM and not a set-it-and-forget-it system. So, like other distributed systems, it requires monitoring for acting on time. However, Apache Ignite provides a web application named Ignite Web Console to manage and monitor the cluster, but it's not enough for system monitoring. You can also use JConsole/VisualVM for monitoring an individual Ignite node and a small number of Ignite nodes. Monitoring an Ignite cluster over 5 nodes by VisualVM or JConsole is unrealistic and time-consuming. Also, JMX does not provide any historical data. So, it is not recommended for production environments. Nowadays, there are a lot of tools available for system monitoring. The most famous of them are:
In this article, we cover the Grafana for monitoring Ignite clusters and provide step-by-step instructions to install and configure the entire stack technology.
Grafana is an open source graphical tool dedicated to query, visualize, and alert all your metrics. It brings your metrics together and lets you create graphs and dashboards based on data from various sources. Also, you can use Grafana to display data from different monitoring systems like Zabbix. It is lightweight, easy to install, easy to configure, and it looks beautiful.
Before we dive into the details, let’s discuss the concept of monitoring large-scale production environments. Figure 1 illustrates a high-level overview of how the monitoring system looks in production environments.
Figure 1.
In the above figure, data such as OS metrics, log files, and application metrics are gathering from various hosts through different protocols like JMX, SNMP into a single time-series database. Next, all the gathered data is used to display on a dashboard for real-time monitoring. However, a monitoring system could be complicated and vary in different environments.
Portions of this article were taken from the book The Apache Ignite book. If it got you interested, check out the rest of the book for more helpful information.
Let’s start at the bottom of the monitoring chain and work our way up. To avoid a complete lesson on monitoring, we will only cover the basics along with what the most common checks should be done as they relate to Ignite and it's operation. The data we are planning to use for monitoring are:
  • Ignite node Java Heap.
  • Ignite cluster topology version.
  • Amount of server or client nodes in the cluster.
  • Ignite node total up time.
The stack technologies we use for monitoring the Ignite cluster comprise three components: InfluxDB, Grafana, and jmxtrans. The high-level architecture of our monitoring system is shown in the figure below.
Figure 2
Ignite nodes does not send the MBeans metrics to the InfluxDB directly. We use jmxtrans, which collects the JMX metrics and send to the InfluxDB. Jmxtrans is lightweight and running as a daemon to collect the server metrics. InfluxDB is an open-source time series database developed by InfluxData. It is written in Go and optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring and application metrics.
Next, we install and configure InfluxDB, Grafana, and jmxtrans to collect metrics from the Ignite cluster. We also compose a custom dashboard in Grafana that monitors Ignite cluster resources.
Prerequisites. To follow the instruction to configure the monitoring infrastructure, you need the following:
Name Version
OS MacOS, Windows, *nix
InfluxDB 1.7.1
Grafana 5.4.0
jmxtrans 271-SNAPSHOT
Step 1. The data store for all the metrics from the Ignite cluster will be Influx. Let’s install the InfluxDB first. I am using MacOS, so I will use Homebrew to install InfluxDB. Please visit the InfluxDB website and follow the instructions to install for other operating systems like Windows or Linux.
brew install influxdb
After completing the installation process, launch the database by using the following command:
influxd -config /usr/local/etc/influxdb.conf
InfluxDB running on http://localhost:8086 and provides REST API for manipulating the database objects by default. Also, InfluxDB provides a command line tool named influx to interact with the database. Execute the influx shell script on another console which starts the CLI and automatically connects to the local InfluxDB instance. The output should look as follows:
Connected to http://localhost:8086 version v1.7.1 InfluxDB shell version: v1.7.1
Enter an InfluxQL query
A fresh install of InfluxDB has no database, so let’s create a database to store the Ignite metrics. Enter the following Influx Query Language (a.k.a InfluxQL) statement to create the database.
create database ignitesdb
Now that the ignitesdb database is created, we can use the SHOW DATABASES statement to display all the existing databases.
show databases
name: databases name
_internal ignitesdb
Note that the _internal database is created and used by InfluxDB to store internal runtime metrics. To insert or query the database, use USE <db-name> statement, which will automatically set the database for all future requests. For example:
USE ignitesdb
Using database ignitesdb
That's enough for now. In the next part of this article, we will install and configure Grafana, jmxtrans to monitor the Ignite cluster. Stay tuned!


8 things every developer should know about the Apache Ignite caching

Any technology, no matter how advanced it is, will not be able to solve your problems if you implement it improperly. Caching, precisely when it comes to the use of a distributed caching, can only accelerate your application with the proper use and configurations of it. From this point of view, Apache Ignite is no different, and there are a few steps to consider before using it in the production environment. In this article, we describe various technics that can help you to plan and adequately use of Apache Ignite as cutting-edge caching technology.

  • Do proper capacity planning before using Ignite cluster. Do paperwork for understanding the size of the cache, number of CPUs or how many JVMs will be required. Let’s assume that you are using Hibernate as an ORM in 10 application servers and wish to use Ignite as an L2 cache. Calculate the total memory usages and the number of Ignite nodes you have to need for maintaining your SLA. An incorrect number of the Ignite nodes can become a bottleneck for your entire application. Please use the Apache Ignite official documentation for preparing a system capacity planning.
  • Select the best deployment option. You can use Ignite as an embedded or a real cluster topology. All of them contains a few pros and cons. When Ignite is running in the same JVM (in embedded mode) with the application, the network roundtrip for getting data from the cache is minimum. However, in this case, Ignite uses the same JVM resources along with the application which can impact on the application performance. Moreover, in the embedded mode, if the application dies, the Ignite node also fails. On the other hand, when Ignite node is running on a separate JVM, there is a minimal network overhead for fetching the data from the cluster. So, if you have a web application with a small memory footprint, you can consider using Ignite node in the same JVM.
  • Use on-heap caching for getting maximum performance. By default, Ignite uses Java off-heap for storing cache entries. When using off-heap to store data, there is always some overhead of de/serialization of data. To mitigate the latency and get the maximum performance you can use on-heap caching. You should also take into account that, Java heap size is almost limited and there is a GC (Garbage collection) overhead whenever using on-heap caching. Therefore, consider using on-heap caching whenever you are using a small limited size of a cache, and the cache entries are almost constants.
  • Use Atomic cache mode whenever possible. If you do not need strong data consistency, consider using the atomic mode. In an atomic mode, each DML operation will either succeed or fail and, neither Read nor Write operation will lock the data. This mode gives a better performance than the transactional mode. An example of using an atomic cache configuration is shown below.

    <property name="cacheConfiguration">
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
                <property name="name" value="testCache" />
                <property name="atomicityMode" value="ATOMIC" />

  • Disable unnecessary internal event's notification. Ignite has a rich event system to notify users/nodes about various events, including cache modification, eviction, compaction, topology changes, and a lot more. Since thousands of events per second are generated, it creates an additional load on the system. This can lead to significant performance degradation. Therefore, it is highly recommended to enable only those events that your application logic requires.

    <bean class="org.apache.ignite.configuration.IgniteConfiguration">
        <!-- Enable events that you need and leave others disabled -->
        <property name="includeEventTypes">
                <util:constant static-field=""/>
                <util:constant static-field=""/>
                <util:constant static-field=""/>

  • Turn off backups copy. If you are using PARTITIONED cache and the data loss is not critical for you, consider disabling backups for the PARTITIONED cache. When backups are enabled, Ignite cache engine maintains a remote copy of each entry, which requires network exchanges. To turn off the backups copy, use the following cache configuration:

    <bean class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="cacheConfiguration">
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
                <!-- Set cache mode. -->
                <property name="cacheMode" value="PARTITIONED"/>
                <!-- Set number of backups to 0-->
                <property name="backups" value="0"/>
  • Synchronizing the requests for the same key. Let's explain by an example. Assume, your application has to handle 5000 requests per second. Most of them requested by one key. All the threads follow the following logic: If there is no value for the key in the cache, I query to the database. At the ends, each of the thread goes to the database and updates the value for the key into the cache. As a result, the application spends more times than if the cache was not at all. This is one of the common reasons when your application slows down whenever you are using a cache.

    However, the solution to this problem is simple: synchronizing the requests for the same keys. From version 2.1, Apache Ignite support @Cacheable annotation with sync attributes which ensure that a single thread is forming the cache value. To achieve this, you have to add the sync attribute as follows:

    @Cacheable(value = "exchangerate", sync = true)
    public String getExchangerate(String region) {
  • Turn off or tune durable memory. Since version 2.1, Apache Ignite has its own persistence implementation. Unfortunately, persistence slows down the system. The WAL slows down the system even more. If you do not need the data durability, you can disable or turn off the WAL archiving. In Apache Ignite, starting from version 2.4, it is possible to disable WAL without restarting the entire cluster as shown below:


    By the way, you can also tune the WAL logging level according to your requirements. By default, the WAL log level is enabled on DEFAULT mode, which guaranty the highest level of data durability. You can change the log to one of the following levels:

    1. LOG_ONLY.
    3. NONE.

Caching gives enormous performance benefits, saves unnecessary network roundtrips and reduce CPU costs. Many believe that caching is such an easy way to make everything faster and cooler. However, as practice shows, most often incorrect use of caching makes thing worse. Caching is the mechanism that only gives performance boosts when you use it correctly. So, remember this before implementing it in your project, take measurements before and after on all related cases.

Don't hesitate to leave your comments or ideas if you have any. Portions of this article were taken from The Apache Ignite Book. If it got you interested, check out the rest of the book for more helpful information.