My workspace

Real time data processing with Cassandra, Part 2

The last few months I was busy with our new telecommunication project to develop MNP (Mobile number portability) for the Russian Federation. Now in Russia anybody can change their telephone operator without changing the number, it's a another history for another blog. Today I have found a few hours to keep my promise. In this blog, I will try to describe how to configure and manage spark pseudo cluster with shark for real time data processing. In the previous blog I will show how to use hive with Hadoop to process data from Cassandra. For whom, who doesn't familiar, Spark is execution engines that supports cyclic data flow and in-memory computing, in the otherhand Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. I am going to use following open source projects to configure and run the spark + shark cluster : 1) Scala-2.10.3 2) Spark-0.9.0-incubating-bin-hadoop1 3) Shark-0.9.0...

My workspace

Search This Blog

Posts

Real time data processing with Cassandra, Part 2