Apache Ignite deep dive, SQL engine

Apache Ignite is an open source memory-centric distributed database, caching and comput- ing platform. From the beginning, it was designed as an in-memory data grid for developing a high-performance software system. So, it’s core architecture design is slightly different from the traditional NoSQL databases, which can simplify building modern applications with a flexible data model and simpler high availability, high scalability.

Moreover, to understand how to design application with any databases or framework properly, you must understand the architecture of the database or framework itself. By getting a better idea of the system, you can solve different problems in your enterprise architecture landscape, can select a comprehensive database or framework that is appropriate for your application and get the maximum benefits from the system. In this article we are going to explore the Apache Ignite SQL engine.

Under the hood, Apache Ignite uses H2 database for executing SQL queries over Ignite caches. H2 database is high-speed in-memory SQL database written in pure Java. H2 database can be run in the embedded or the server mode. Every Apache Ignite node runs one instance of the H2 database in the embedded mode. In this mode, H2 database instance runs with the same Apache Ignite node process. The H2 database starts along with Ignite node and stops whenever the Ignite node dies or forced to stop.

Portions of this article were taken from the book The Apache Ignite book. If it got you interested, check out the rest of the book for more helpful information.

H2 database also supplies an H2 web console, which is a standalone application and includes its own web server. This console lets you access to the H2 SQL database using a browser interface. This web console let you know the internal structure of the tables and indexes. Also, with the H2 web console, you can analyze how a query is executed by the database, e.g., whether indexes are used or if the database has done an expensive full scan. This feature is crucial for optimizing the query performance.

To start the h2 web console, you have to set an environmental variable through the command console and run the ignite.sh{bat} script. Set the variable as follows:

export IGNITE_H2_DEBUG_CONSOLE=test

Start the Ignite node from this console as shown below:

IGNITE_HOME/bin/ignite.sh

Figure 1.

Copy the URL from the terminal and open it on any of your favorite browser. The database welcome page should be pops up as shown in the following screenshot.

Figure 2.

Tip: If the h2 web console started successfully, the web page should be open automat- ically in your default web browser. If you want to connect to the h2 web console from another computer, you need to provide the IP address with the port of the server, for example, http://192.168.1.35:56723

As we have mentioned before, in H2 web console application, we can administrate the h2 database, run SQL queries and much more. Whenever we create any table (through SQLLINE or Ignite SQL JAVA API) in Ignite, a meta-data information of the tables created and displayed in the H2 database. However, the data, as well as the indexes, are always stored in the Ignite caches that execute queries through the h2 database engine. Let’s create the DEPT tables through SQLLINE from the previous chapter again and see what will happen on h2 database. Run the following script for running the SQLLINE command console to connect to the Ignite cluster as follows:

./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/

Next, create the DEPT table and insert a few rows. Now, we have a table named DEPT with some example data in Ignite.

Figure 3.

Let’s get back to the H2 web console and refresh the h2 database objects panel (you should see a small refresh button on the h2 web console menu bar on the upper left side of the web page). You should see a new table with name DEPT appears on the object panel. Expand the DEPT table, and you should get the following picture as shown in figure 4.

Figure 4.

From the above screenshot, we can discover that h2 DEPT table contains 3 extra columns with name _KEY, _VAL, _VER. Apache Ignite as a key-value data store, always stores cache keys and values as _KEY and _VAL fields. H2 DEPT table column _key and _val corresponded to the Ignite internal _key and _val field of the cache. _VER field assigns the Ignite topology version and the node order. Run the following query to unravel the mystery.

select _key, _val, _ver from dept;

After you run the SQL query in the H2 web console SQL statement panel, you will see the following output.

Figure 5.

From the preceding figure, you can notice that, every _KEY field hold the actual department number and the rest of the information such as department name and the location hold by the _VAL field. H2 Dept table deptno field maps to the actual _key field of the cache, dname field maps to the _VAL field dname attribute and finally loc field maps to the _VAL field loc attribute.

If you expand the indexes object for the DEPT table on the h2 web console object panel, you should discover three different indexes created by the Ignite: _key_PK, _key_PK_proxy and _key_PK_hash.

_key_PK is the btree primary key index.

_key_PK_hash is the h2 primary key hash index. This index is an in-memory hash index and are usually faster than regular index.
_key_PK_proxy is a proxy index (not a primary index), allows to delegate the calls to the underlying the normal index.

With the H2 web console, we can do more than investigate the internal structure of the table such as executing a query plan. Let’s run the following SQL query on the SQL statement panel as shown below:

explain select * from dept d where d.deptno = 10;

The preceding SQL statement should give us the following query plan as shown in the following screenshot.

Figure 6.

From the above screenshot, we can notice that, the H2 SQL engine used the _key_PK_proxy index to query the DEPT table.

Note that, H2 database uses a cost-based (running time) optimizer. For simple queries and queries with medium complexity (less than 7 tables in the join clause), the expected SQL running time cost of all possible plans is calculated and the plan with the lowest cost (running time) is used.

It’s enough for now, we will explain a lot about the Ignite architecture in the subsequent blog post. Stay tuned!

My workspace

Search This Blog

Apache Ignite deep dive, SQL engine

Labels

Comments

Popular posts from this blog

8 things every developer should know about the Apache Ignite caching

Tip: SQL client for Apache Ignite cache

Using Apache Ignite thin client - Apache Ignite insider blog