Jai’s Weblog – Tech, Security & Fun…

Tech, Security & Fun…

  • Jaibeer Malik

    Jaibeer Malik
  • View Jaibeer Malik's profile on LinkedIn
  • Subscribe

  • Feedburner

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 32 other followers

  • Archives

  • Categories

  • Stats

    • 414,349
  • Live Traffic

  • Advertisements

Archive for the ‘ElasticSearch’ Category

Oozie: Scheduling Coordinator/Bundle jobs for Hive partitioning and ElasticSearch indexing

Posted by Jai on May 28, 2014

This post covers to use Oozie to schedule Hive add partition every hour with the help of Coordinator jobs and to automatically update the ElasticSearch data served to customer based on nightly jobs using Bundle jobs functionality. The automated procedure using oozie jobs will help to update the statistical data used on website to display product views count and top search query string.

In continuation to the previous posts on

As described in earlier posts, the hive partitioning strategy is added based on current time and accordingly the elasticsearch indexing based on analytic data also. We will cover here to automate the process using Oozie to add hive partition once data is available in hadoop directory.


Oozie  is a workflow scheduler system to manage Apache Hadoop jobs.


Read the rest of this entry »


Posted in ElasticSearch, Hadoop, Hive, Java, Oozie | Tagged: , , , | 2 Comments »

ElasticSearch-Hadoop: Indexing product views count and customer top search query from Hadoop to ElasticSearch

Posted by Jai on May 22, 2014

This post covers to use ElasticSearch-Hadoop to read data from Hadoop system and index that in ElasticSearch. The functionality it covers is to index product views count and top search query per customer in last n number of days. The analyzed data can further be used on website to display customer recently viewed, product views count and top search query string.

In continuation to the previous posts on

we already have customer search clicks data gathered using Flume and stored in Hadoop HDFS and ElasticSearch, and how to analyze same data using Hive and generate statistical data. Here we will further see how to use the analyzed data to enhance customer experience on website and make it relevant for the end customers.

Recently Viewed Items

We already have covered in first part, how we can use flume ElasticSearch sink to index the recently viewed items directory to ElasticSearch instance and the data can be used to display real time clicked items for the customer.


Elasticsearch for Apache Hadoop  allows Hadoop jobs to interact with ElasticSearch with small library and easy setup.

elasticsearch-hadoop-hive, allows to access ElasticSearch using Hive. As shared in previous post, we have product views count and also customer top search query data extracted in Hive tables. We will read and index the same data to ElasticSearch so that it can be used for display purpose on website.

Read the rest of this entry »

Posted in ElasticSearch, Hadoop, Java, Spring Data | Tagged: , , , | 3 Comments »

Flume: Gathering customer product search clicks data using Apache Flume

Posted by Jai on May 19, 2014

This post covers to use Apache flume to gather customer product search clicks and store the information using hadoop and elasticsearch sinks. The data may consist of different product search events like filtering based on different facets, sorting information, pagination information and further the products viewed and some of the products marked as favorite by the customers. In later posts we will analyze data further to use the same information for display and analytic.

Product Search Functionality

Any eCommerce platform offers different products to customers and search functionality is one of the basics of that. Allowing user for guided navigation using different facets/filters or free text search for the content is trivial of the any of existing search functionality.


Consider a similar scenario where customer can search for a product and allows us to capture the product search behavior with following information,

Read the rest of this entry »

Posted in ElasticSearch, Flume, Hadoop, Java | Tagged: , , | 5 Comments »

Customer product search clicks analytics using big data

Posted by Jai on May 14, 2014

The application demonstrate to setup customer product search clicks analytics using big data Hadoop, Hive, Pig, Oozie, ElasticSearch, Akka, Spring Data etc.

Github Repository

URL: https://github.com/jaibeermalik/searchanalytics-bigdata

Analyzing Search Clicks Data Using Flume, Hadoop, Hive, Pig, Oozie, ElasticSearch, Akka, Spring Data.

Repository contains unit/integration test cases to generate analytics based on clicks events related to the product search on any e-commerce website.


Getting Started

The project is maven project and can be build with Eclipse. Check pom dependencies for relevant version of earch application. It uses cloudera hadoop distribution version 2.3.0-cdh5.0.0.


The scenario covered in the application for the search analytics using big data is as follow,
Read the rest of this entry »

Posted in Akka, ElasticSearch, Flume, Hadoop, Hive, Java, Oozie, Pig, Spring, Spring Data | Tagged: , , , , , , , , | 5 Comments »

ElasticSearch: Indexing setup using Akka tutorial

Posted by Jai on March 21, 2014

Find ElasticSearch tutorial on github using Akka with test cases. The tutorial covers the search indexing setup using Akka.


elasticsearch-akka repository uploaded on github explaining the ElasticSearch indexing setup using Akka Actors usage with test cases.

URL: https://github.com/jaibeermalik/elasticsearch-akka


Setup Index

Read the rest of this entry »

Posted in Akka, ElasticSearch, Git, Java, Scala | Tagged: , , , , | Leave a Comment »

%d bloggers like this: