Skip to main content

Elasticsearch with Cassandra data

Sooner or later every enterprise application needs full text search with their content. Slor, elasticsearch based on lucene are one the best candidate for developying enterprise search. Elasticsearch got very popularity with its simplicity, but out of box it dosen't support importing data from Cassandra cluster. However Elasticsearch provides river, a river is a pluggable service running within elasticsearch cluster pulling data (or being pushed with data) that is then indexed into the cluster. With a few search i have found a cassandra-river on github from ebay, unfortunatley, project was legeacy and only support Cassandra version 1.2*. With a few effort i rewrite the project with data stax cassandra driver. Here you can find the project, now it support the following features:
1) Cron scheduling;
2) Reading Cassandra rows through Paging;
3) Based on DataStax java driver 2.0;

For quick installation, download the project from the Github. Build with maven:
mvn clean install

it will create river plugin in the folder target/releases/cassandra-river-1.0-SNAPSHOT.zip. To installation the river plugin you could use plugin command line utility.
from the elasticsearch_home/bin directory run the follwing command:
./plugin --url file:/PATH/cassandra-river-1.0-SNAPSHOT.zip --install cassandra-river
now you can start the elasticsearch or and initilize the river with following command:
curl -XPUT 'http://HOST:PORT/_river/cassandra-river/_meta' -d '{
    "type" : "cassandra",
    "cassandra" : {
        "cluster_name" : "Test Cluster",
        "keyspace" : "nortpole",
        "column_family" : "users",
        "batch_size" : 20000,
        "hosts" : "localhost",
        "dcName" : "DC",
        "cron"  : "0/60 * * * * ?"
    },
    "index" : {
        "index" : "prodinfo",
        "type" : "product"
    }
}'
it should start pulling data from your Cassandra cluster.
For remove plugin use:
./plugin --remove cassandra-river

If you have installed elasticsearch _head plugin, you can search as follows:
Improvments plan:
1) Add unit Tests
2) Update index in ES
3) Add newly added rows in ES by date
4) Add multi tables support


Comments

Popular posts from this blog

Apache Ignite with Spring Data

See more details on The Apache Ignite Book . Spring Data  provides a unified and easy way to access the different kinds of persistence store, both relational database systems, and NoSQL data stores. It is on top of JPA, adding another layer of abstraction and defining a standard-based design to support persistence Layer in a Spring context. Apache Ignite  IgniteRepository  implements Spring Data CrudRepository interface and extends basic capabilities of the  CrudRepository , which in turns supports: Basic CRUD operations on a repository for a specific type. Access to the Apache Ignite SQL grid via Spring Data API. With Spring Data's repositories, you only need to write an interface with finder methods to query the objects. All the CRUD method for manipulating the objects will be delivered automatically. As an example: @RepositoryConfig(cacheName = "DogCache") public interface DogRepository extends IgniteRepository<Dog, Long> { List<Dog>

The Apache Ignite Native persistence, a brief overview

Portions of this article were taken from the book  The Apache Ignite book . If it got you interested, check out the rest of the book for more helpful information. In-memory approaches can achieve blazing speed by putting the working set of the data into the system memory. When all data is kept in memory, the need to deal with issues arising from the use of traditional spinning disks disappears. This means, for instance, there is no need to maintain additional cache copies of data and manage synchronization between them. But there is also a downside to this approach because the data is in memory only, it will not survive if the whole cluster gets terminated. Therefore, this types of data stores are not considered persistence at all. In this blog post, I will do an effort to explore the Apache Ignite new native persistence feature and provide a clear, understandable picture how the Apache Ignite native persistence works.  In most cases, you can’t (should not) store the whole data

Book review: High Performance in-memory computing with Apache Ignite by Sadruddin Md

A new title  The Apache Ignite book  has been released including Ignite 2.6 and above. Read the full book review by Sadruddin Md .