Jai’s Weblog – Tech, Security & Fun…

Tech, Security & Fun…

  • Jaibeer Malik

    Jaibeer Malik
  • View Jaibeer Malik's profile on LinkedIn
  • Subscribe

  • Feedburner

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 40 other subscribers
  • Archives

  • Categories

  • Stats

    • 426,575
  • Live Traffic

Archive for the ‘Spring Data’ Category

HBase: Generating search click events statistics for customer behavior

Posted by Jai on July 9, 2014


In this post we will explore HBase to store customer search click events data and utilizing same to derive customer behavior information based on search query string and facet filter clicks. We will cover to use MiniHBaseCluster, HBase Schema design, integration with Flume using HBaseSink to store JSON data.

In continuation to the previous posts on,

We have explored to store search click events data in Hadoop and to query same using different technologies. Here we will use HBase to achieve same,

  •  HBase mini cluster setup
  •  HBase template using Spring Data
  •  HBase Schema Design
  •  Flume Integration using HBaseSink
  •  HBaseJsonSerializer to serialize json data
  •  Query Top 10 search query string in last an hour
  •  Query Top 10 search facet filter in last an hour
  •  Get recent search query string for a customer in last 30 days

searchanalytics-hbase-flume

Read the rest of this entry »

Posted in Architecture, Flume, Hadoop, HBase, Java, Spring Data | Tagged: , , | 1 Comment »

ElasticSearch-Hadoop: Indexing product views count and customer top search query from Hadoop to ElasticSearch

Posted by Jai on May 22, 2014


This post covers to use ElasticSearch-Hadoop to read data from Hadoop system and index that in ElasticSearch. The functionality it covers is to index product views count and top search query per customer in last n number of days. The analyzed data can further be used on website to display customer recently viewed, product views count and top search query string.

In continuation to the previous posts on

we already have customer search clicks data gathered using Flume and stored in Hadoop HDFS and ElasticSearch, and how to analyze same data using Hive and generate statistical data. Here we will further see how to use the analyzed data to enhance customer experience on website and make it relevant for the end customers.

Recently Viewed Items

We already have covered in first part, how we can use flume ElasticSearch sink to index the recently viewed items directory to ElasticSearch instance and the data can be used to display real time clicked items for the customer.

ElasticSearch-Hadoop

Elasticsearch for Apache Hadoop  allows Hadoop jobs to interact with ElasticSearch with small library and easy setup.

elasticsearch-hadoop-hive, allows to access ElasticSearch using Hive. As shared in previous post, we have product views count and also customer top search query data extracted in Hive tables. We will read and index the same data to ElasticSearch so that it can be used for display purpose on website.

elasticsearch-hadoop-hive
Read the rest of this entry »

Posted in ElasticSearch, Hadoop, Java, Spring Data | Tagged: , , , | 4 Comments »

Hive: Query customer top search query and product views count using Apache Hive

Posted by Jai on May 20, 2014


This post covers to use Apache Hive to query the search clicks data stored under Hadoop. We will take examples to generate customer top search query and statistics on total product views.

In continuation to the previous posts on

we already have customer search clicks data gathered using Flume in Hadoop HDFS.

Here will analyze further to use Hive to query the stored data under Hadoop.

Hive

Hive allow us to query big data using SQL-like language HiveQL.

hive-query-search-events

Hadoop Data

As shared in last post, we have search clicks data stored under hadoop with the following format “/searchevents/2014/05/15/16/”. The data is stored in separate directory created per hour.

The files are created as,

hdfs://localhost.localdomain:54321/searchevents/2014/05/06/16/searchevents.1399386809864

Read the rest of this entry »

Posted in Hadoop, Hive, Java, Spring, Spring Data | Tagged: , , , | 4 Comments »

Customer product search clicks analytics using big data

Posted by Jai on May 14, 2014


The application demonstrate to setup customer product search clicks analytics using big data Hadoop, Hive, Pig, Oozie, ElasticSearch, Akka, Spring Data etc.

Github Repository

URL: https://github.com/jaibeermalik/searchanalytics-bigdata

Analyzing Search Clicks Data Using Flume, Hadoop, Hive, Pig, Oozie, ElasticSearch, Akka, Spring Data.

Repository contains unit/integration test cases to generate analytics based on clicks events related to the product search on any e-commerce website.

bigdata-tech-analytics

Getting Started

The project is maven project and can be build with Eclipse. Check pom dependencies for relevant version of earch application. It uses cloudera hadoop distribution version 2.3.0-cdh5.0.0.

Functionality

The scenario covered in the application for the search analytics using big data is as follow,
Read the rest of this entry »

Posted in Akka, ElasticSearch, Flume, Hadoop, Hive, Java, Oozie, Pig, Spring, Spring Data | Tagged: , , , , , , , , | 6 Comments »