Skip to main content

Posts

Showing posts from December, 2012

Real time data processing with Storm, ETL from Oracle to Cassandra

Last couple of months we are using Apache Hadoop Map Reduce batch processing to analyze a huge amount of data. We have a few legacy product where we can't consider to using Cassandra big table database. A few of them uses Oracle Database as their primary storage. As our requirements we have to extract the data from the rdbms, parse the payload and load it to Cassandra for aggregating. Here i have decided to use Storm for real time data processing. Our usecase is as follows: 1) Storm spout connects to Oracle Database and collects data from particular table with some intervals. 2) Storm bolt parses the data with some fashion and emit to Storm-cassandra bolt to store the row into Cassandra DB. Here is the fragment code of project. First i have create a Jdbc connector class, class contain a few class variables which contradicting with Storm ideology, as far i have just need one spout as input - it's enough for me. package storm.contrib.jdbc; import org.slf4j.Logger; import