Skip to main content

Apache Ignite deep dive, SQL engine

Apache Ignite is an open source memory-centric distributed database, caching and comput- ing platform. From the beginning, it was designed as an in-memory data grid for developing a high-performance software system. So, it’s core architecture design is slightly different from the traditional NoSQL databases, which can simplify building modern applications with a flexible data model and simpler high availability, high scalability.
    Moreover, to understand how to design application with any databases or framework properly, you must understand the architecture of the database or framework itself. By getting a better idea of the system, you can solve different problems in your enterprise architecture landscape, can select a comprehensive database or framework that is appropriate for your application and get the maximum benefits from the system. In this article we are going to explore the Apache Ignite SQL engine. 
    Under the hood, Apache Ignite uses H2 database for executing SQL queries over Ignite caches. H2 database is high-speed in-memory SQL database written in pure Java. H2 database can be run in the embedded or the server mode. Every Apache Ignite node runs one instance of the H2 database in the embedded mode. In this mode, H2 database instance runs with the same Apache Ignite node process. The H2 database starts along with Ignite node and stops whenever the Ignite node dies or forced to stop.
Portions of this article were taken from the book The Apache Ignite book. If it got you interested, check out the rest of the book for more helpful information.
    H2 database also supplies an H2 web console, which is a standalone application and includes its own web server. This console lets you access to the H2 SQL database using a browser interface. This web console let you know the internal structure of the tables and indexes. Also, with the H2 web console, you can analyze how a query is executed by the database, e.g., whether indexes are used or if the database has done an expensive full scan. This feature is crucial for optimizing the query performance.
To start the h2 web console, you have to set an environmental variable through the command console and run the ignite.sh{bat} script. Set the variable as follows:
export IGNITE_H2_DEBUG_CONSOLE=test
Start the Ignite node from this console as shown below:


IGNITE_HOME/bin/ignite.sh
Figure 1.
Copy the URL from the terminal and open it on any of your favorite browser. The database welcome page should be pops up as shown in the following screenshot.
Figure 2.
Tip: If the h2 web console started successfully, the web page should be open automat- ically in your default web browser. If you want to connect to the h2 web console from another computer, you need to provide the IP address with the port of the server, for example, http://192.168.1.35:56723
As we have mentioned before, in H2 web console application, we can administrate the h2 database, run SQL queries and much more. Whenever we create any table (through SQLLINE or Ignite SQL JAVA API) in Ignite, a meta-data information of the tables created and displayed in the H2 database. However, the data, as well as the indexes, are always stored in the Ignite caches that execute queries through the h2 database engine. Let’s create the DEPT tables through SQLLINE from the previous chapter again and see what will happen on h2 database. Run the following script for running the SQLLINE command console to connect to the Ignite cluster as follows:
./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/


Next, create the DEPT table and insert a few rows. Now, we have a table named DEPT with some example data in Ignite.
Figure 3.
Let’s get back to the H2 web console and refresh the h2 database objects panel (you should see a small refresh button on the h2 web console menu bar on the upper left side of the web page). You should see a new table with name DEPT appears on the object panel. Expand the DEPT table, and you should get the following picture as shown in figure 4.
Figure 4.
From the above screenshot, we can discover that h2 DEPT table contains 3 extra columns with name _KEY, _VAL, _VER. Apache Ignite as a key-value data store, always stores cache keys and values as _KEY and _VAL fields. H2 DEPT table column _key and _val corresponded to the Ignite internal _key and _val field of the cache. _VER field assigns the Ignite topology version and the node order. Run the following query to unravel the mystery.
select _key, _val, _ver from dept;

After you run the SQL query in the H2 web console SQL statement panel, you will see the following output.
Figure 5.
From the preceding figure, you can notice that, every _KEY field hold the actual department number and the rest of the information such as department name and the location hold by the _VAL field. H2 Dept table deptno field maps to the actual _key field of the cache, dname field maps to the _VAL field dname attribute and finally loc field maps to the _VAL field loc attribute.
If you expand the indexes object for the DEPT table on the h2 web console object panel, you should discover three different indexes created by the Ignite: _key_PK, _key_PK_proxy and _key_PK_hash.
_key_PK is the btree primary key index.
_key_PK_hash is the h2 primary key hash index. This index is an in-memory hash index and are usually faster than regular index.
_key_PK_proxy is a proxy index (not a primary index), allows to delegate the calls to the underlying the normal index.
With the H2 web console, we can do more than investigate the internal structure of the table such as executing a query plan. Let’s run the following SQL query on the SQL statement panel as shown below:
explain select * from dept d where d.deptno = 10;
The preceding SQL statement should give us the following query plan as shown in the following screenshot.
Figure 6.
From the above screenshot, we can notice that, the H2 SQL engine used the _key_PK_proxy index to query the DEPT table.
Note that, H2 database uses a cost-based (running time) optimizer. For simple queries and queries with medium complexity (less than 7 tables in the join clause), the expected SQL running time cost of all possible plans is calculated and the plan with the lowest cost (running time) is used.
It’s enough for now, we will explain a lot about the Ignite architecture in the subsequent blog post. Stay tuned!

Comments

Popular posts from this blog

8 things every developer should know about the Apache Ignite caching

Any technology, no matter how advanced it is, will not be able to solve your problems if you implement it improperly. Caching, precisely when it comes to the use of a distributed caching, can only accelerate your application with the proper use and configurations of it. From this point of view, Apache Ignite is no different, and there are a few steps to consider before using it in the production environment. In this article, we describe various technics that can help you to plan and adequately use of Apache Ignite as cutting-edge caching technology. Do proper capacity planning before using Ignite cluster. Do paperwork for understanding the size of the cache, number of CPUs or how many JVMs will be required. Let’s assume that you are using Hibernate as an ORM in 10 application servers and wish to use Ignite as an L2 cache. Calculate the total memory usages and the number of Ignite nodes you have to need for maintaining your SLA. An incorrect number of the Ignite nodes can become a b...

Tip: SQL client for Apache Ignite cache

A new SQL client configuration described in  The Apache Ignite book . If it got you interested, check out the rest of the book for more helpful information. Apache Ignite provides SQL queries execution on the caches, SQL syntax is an ANSI-99 compliant. Therefore, you can execute SQL queries against any caches from any SQL client which supports JDBC thin client. This section is for those, who feels comfortable with SQL rather than execute a bunch of code to retrieve data from the cache. Apache Ignite out of the box shipped with JDBC driver that allows you to connect to Ignite caches and retrieve distributed data from the cache using standard SQL queries. Rest of the section of this chapter will describe how to connect SQL IDE (Integrated Development Environment) to Ignite cache and executes some SQL queries to play with the data. SQL IDE or SQL editor can simplify the development process and allow you to get productive much quicker. Most database vendors have their own fron...

Load balancing and fail over with scheduler

Every programmer at least develop one Scheduler or Job in their life time of programming. Nowadays writing or developing scheduler to get you job done is very simple, but when you are thinking about high availability or load balancing your scheduler or job it getting some tricky. Even more when you have a few instance of your scheduler but only one can be run at a time also need some tricks to done. A long time ago i used some data base table lock to achieved such a functionality as leader election. Around 2010 when Zookeeper comes into play, i always preferred to use Zookeeper to bring high availability and scalability. For using Zookeeper you have to need Zookeeper cluster with minimum 3 nodes and maintain the cluster. Our new customer denied to use such a open source product in their environment and i was definitely need to find something alternative. Definitely Quartz was the next choose. Quartz makes developing scheduler easy and simple. Quartz clustering feature brings the HA and...