My workspace

Optimizing Spark Job Performance With Apache Ignite (Part 1)

Portions of this article were taken from the book The Apache Ignite book . If it got you interested, check out the rest of the book for more helpful information. Spark Data frame is included in new book. Apache Ignite offers several ways to improve a Spark job's performance: Ignite RDD, which represents an Ignite cache as a Spark RDD abstraction, and Ignite IGFS, an in-memory file system that can be transparently plugged into Spark deployments. Ignite RDD allows easily sharing states in-memory between different Spark jobs or applications. With Ignite in-memory shares RDDs, any Spark job can put some data into an Ignite cache that other Spark jobs can access later. Ignite RDD is implemented as a view over the Ignite distributed cache, which can be deployed either within the Spark job execution process or on a Spark worker. Before we move on to more advanced topics, let's have a look at the history of Spark and what kinds of problems can be solved by Ignite RDDs. ...

My workspace

Search This Blog

Posts

Optimizing Spark Job Performance With Apache Ignite (Part 1)

Book review: High Performance in-memory computing with Apache Ignite by Sadruddin Md