News

This week The apache Ignite book becomes one of the top books of leanpub

This week The apache Ignite book becomes one of the top books of leanpub.

Saturday

Elasticsearch with Cassandra data

Sooner or later every enterprise application needs full text search with their content. Slor, elasticsearch based on lucene are one the best candidate for developying enterprise search. Elasticsearch got very popularity with its simplicity, but out of box it dosen't support importing data from Cassandra cluster. However Elasticsearch provides river, a river is a pluggable service running within elasticsearch cluster pulling data (or being pushed with data) that is then indexed into the cluster. With a few search i have found a cassandra-river on github from ebay, unfortunatley, project was legeacy and only support Cassandra version 1.2*. With a few effort i rewrite the project with data stax cassandra driver. Here you can find the project, now it support the following features:
1) Cron scheduling;
2) Reading Cassandra rows through Paging;
3) Based on DataStax java driver 2.0;

For quick installation, download the project from the Github. Build with maven:
mvn clean install

it will create river plugin in the folder target/releases/cassandra-river-1.0-SNAPSHOT.zip. To installation the river plugin you could use plugin command line utility.
from the elasticsearch_home/bin directory run the follwing command:
./plugin --url file:/PATH/cassandra-river-1.0-SNAPSHOT.zip --install cassandra-river
now you can start the elasticsearch or and initilize the river with following command:
curl -XPUT 'http://HOST:PORT/_river/cassandra-river/_meta' -d '{
    "type" : "cassandra",
    "cassandra" : {
        "cluster_name" : "Test Cluster",
        "keyspace" : "nortpole",
        "column_family" : "users",
        "batch_size" : 20000,
        "hosts" : "localhost",
        "dcName" : "DC",
        "cron"  : "0/60 * * * * ?"
    },
    "index" : {
        "index" : "prodinfo",
        "type" : "product"
    }
}'
it should start pulling data from your Cassandra cluster.
For remove plugin use:
./plugin --remove cassandra-river

If you have installed elasticsearch _head plugin, you can search as follows:
Improvments plan:
1) Add unit Tests
2) Update index in ES
3) Add newly added rows in ES by date
4) Add multi tables support


8 comments :

Hema Siddaramaiah said...

Thanks for the nice tutorial. Why do I receive the below error message always when i try to run:

NoClassSettingsException[Failed to load class with value [cassandra]]; nested: ClassNotFoundException[cassandra]

Hema Siddaramaiah said...

Thanks for the nice tutorial. I always receive below error message when I try to run the project:

NoClassSettingsException[Failed to load class with value [cassandra]]; nested: ClassNotFoundException[cassandra]

Asit KAUSHIK said...

Hi,
I am able to get the columnfamilies with simple datatypes but getting errors where my table has a uuid, list,map,set or list. the error is invalid uuid <>.
So it failing for column families where i have the uuid,map,list and sets data type.
Did you face this or am i doing something wring here
Regards
Asit

Bhavik Kaushik Mumma Papa's 'NANU' said...

Hema,
This error means that you river plugin is functional.
If you have multiple nodes you have to apply plugin to all the nodes.
Hope this helps.
Also on my issue the new checkin are done which handles the above datatypes.

Regards
Asit

Mario Rojas Marconi said...

BSD Thank, I Have problem make plugin: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project cassandra-river: Compilation failure: Compilation failure:
[ERROR] /media/DATA/Download/DB/NoSQL/Cassandra/elasticsearch-cassandra-river/src/main/java/com/blu/es/plugin/river/CassandraRiverPlugin.java:[15,5] error: annotations are not supported in -source 1.3

I have java 8

Thank

Shamim Bhuiyan said...

Hello Mario!
i Have add maven compiler plugin which fixed the problem.

Ashwini said...

Thanks for the tutorial.

Does the plugin has hard dependencies on Elasticsearch and Cassandra version as specified in the pom.xml file. e.g. - Do the latest cassandra-river-1.0.4-SNAPSHOT.zip - works with Elasticsearch v 1.4.* and cassandra v 2.1.*

Shamim Bhuiyan said...

actually, i am not sure about latest cassandra driver version