Saturday

Elasticsearch with Cassandra data

Sooner or later every enterprise application needs full text search with their content. Slor, elasticsearch based on lucene are one the best candidate for developying enterprise search. Elasticsearch got very popularity with its simplicity, but out of box it dosen't support importing data from Cassandra cluster. However Elasticsearch provides river, a river is a pluggable service running within elasticsearch cluster pulling data (or being pushed with data) that is then indexed into the cluster. With a few search i have found a cassandra-river on github from ebay, unfortunatley, project was legeacy and only support Cassandra version 1.2*. With a few effort i rewrite the project with data stax cassandra driver. Here you can find the project, now it support the following features:
1) Cron scheduling;
2) Reading Cassandra rows through Paging;
3) Based on DataStax java driver 2.0;

For quick installation, download the project from the Github. Build with maven:
mvn clean install

it will create river plugin in the folder target/releases/cassandra-river-1.0-SNAPSHOT.zip. To installation the river plugin you could use plugin command line utility.
from the elasticsearch_home/bin directory run the follwing command:
./plugin --url file:/PATH/cassandra-river-1.0-SNAPSHOT.zip --install cassandra-river
now you can start the elasticsearch or and initilize the river with following command:
curl -XPUT 'http://HOST:PORT/_river/cassandra-river/_meta' -d '{
    "type" : "cassandra",
    "cassandra" : {
        "cluster_name" : "Test Cluster",
        "keyspace" : "nortpole",
        "column_family" : "users",
        "batch_size" : 20000,
        "hosts" : "localhost",
        "dcName" : "DC",
        "cron"  : "0/60 * * * * ?"
    },
    "index" : {
        "index" : "prodinfo",
        "type" : "product"
    }
}'
it should start pulling data from your Cassandra cluster.
For remove plugin use:
./plugin --remove cassandra-river

If you have installed elasticsearch _head plugin, you can search as follows:
Improvments plan:
1) Add unit Tests
2) Update index in ES
3) Add newly added rows in ES by date
4) Add multi tables support


Post a Comment