Portions of this article were taken from the book The Apache Ignite book . If it got you interested, check out the rest of the book for more helpful information. Spark Data frame is included in new book. Apache Ignite offers several ways to improve a Spark job's performance: Ignite RDD, which represents an Ignite cache as a Spark RDD abstraction, and Ignite IGFS, an in-memory file system that can be transparently plugged into Spark deployments. Ignite RDD allows easily sharing states in-memory between different Spark jobs or applications. With Ignite in-memory shares RDDs, any Spark job can put some data into an Ignite cache that other Spark jobs can access later. Ignite RDD is implemented as a view over the Ignite distributed cache, which can be deployed either within the Spark job execution process or on a Spark worker. Before we move on to more advanced topics, let's have a look at the history of Spark and what kinds of problems can be solved by Ignite RDDs. ...
A journey for designing high-performance software based on NoSQL, BigData and Microservices