May 2024
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Archive for the ‘Flume’ Category

HBase: Generating search click events statistics for customer behavior

Posted by Jai on July 9, 2014

In this post we will explore HBase to store customer search click events data and utilizing same to derive customer behavior information based on search query string and facet filter clicks. We will cover to use MiniHBaseCluster, HBase Schema design, integration with Flume using HBaseSink to store JSON data.

In continuation to the previous posts on,

Customer product search clicks analytics using big data,
Flume: Gathering customer product search clicks data using Apache Flume,
Hive: Query customer top search query and product views count using Apache Hive,
ElasticSearch-Hadoop: Indexing product views count and customer top search query from Hadoop to ElasticSearch,
Oozie: Scheduling Coordinator/Bundle jobs for Hive partitioning and ElasticSearch indexing,
Spark: Real time analytics for big data for top search queries and top product views

We have explored to store search click events data in Hadoop and to query same using different technologies. Here we will use HBase to achieve same,

HBase mini cluster setup
HBase template using Spring Data
HBase Schema Design
Flume Integration using HBaseSink
HBaseJsonSerializer to serialize json data
Query Top 10 search query string in last an hour
Query Top 10 search facet filter in last an hour
Get recent search query string for a customer in last 30 days

Read the rest of this entry »

Posted in Architecture, Flume, Hadoop, HBase, Java, Spring Data | Tagged: Flume, HBase, HBaseSink | 1 Comment »

Spark: Real time analytics for big data for top search queries and top product views

Posted by Jai on June 4, 2014

Hadoop being the batch processing framework makes it a little hard to get the real time analytics for big data. Apache Spark overcomes this batch nature and provides distributed computation capabilities and events processed in streaming fashion. In this post, we will cover to explore Spark streaming capability to process Flume Events data to generate Top search query strings generated in last an hour or top product views in the last one hour.

In continuation to the previous posts on

We have so far utilized the Hadoop system batching capabilities to process huge amount of data. But the overall batching operation makes it a bit of latency issue depending on your data. This is where Spark comes into picture. We will explore Spark streaming capability here to get some real time analytics and those can be used on the website for display purpose or for monitoring purpose.

Spark

Apache spark “is a fast and general engine for large-scale data processing.”

Functionality

As shared in other above exmaples, we have the customer search clicks data available to us. We have Flume system in place to process the data and store in Hadoop for later processing perspective. Take a scenario, you want to display real time customer behavior on the website, how other customers are doing

What other customers searching?
Other customers also searching for…
Top search query string on the website in last an hour
What other customers viewing?
Other customers also viewing products…
Top product views in the last an hour

Read the rest of this entry »

Posted in Flume, Hadoop, Java, Spark | Tagged: Apache Spark, Flume, Hadoop, Java, Spark, Spark Streaming | 2 Comments »

Flume: Gathering customer product search clicks data using Apache Flume

Posted by Jai on May 19, 2014

This post covers to use Apache flume to gather customer product search clicks and store the information using hadoop and elasticsearch sinks. The data may consist of different product search events like filtering based on different facets, sorting information, pagination information and further the products viewed and some of the products marked as favorite by the customers. In later posts we will analyze data further to use the same information for display and analytic.

Product Search Functionality

Any eCommerce platform offers different products to customers and search functionality is one of the basics of that. Allowing user for guided navigation using different facets/filters or free text search for the content is trivial of the any of existing search functionality.

SearchQueryInstruction

Consider a similar scenario where customer can search for a product and allows us to capture the product search behavior with following information,

Read the rest of this entry »

Posted in ElasticSearch, Flume, Hadoop, Java | Tagged: ElasticSearch, Flume, Hadoop | 6 Comments »

Customer product search clicks analytics using big data

Posted by Jai on May 14, 2014

The application demonstrate to setup customer product search clicks analytics using big data Hadoop, Hive, Pig, Oozie, ElasticSearch, Akka, Spring Data etc.

Github Repository

URL: https://github.com/jaibeermalik/searchanalytics-bigdata

Analyzing Search Clicks Data Using Flume, Hadoop, Hive, Pig, Oozie, ElasticSearch, Akka, Spring Data.

Repository contains unit/integration test cases to generate analytics based on clicks events related to the product search on any e-commerce website.

Getting Started

The project is maven project and can be build with Eclipse. Check pom dependencies for relevant version of earch application. It uses cloudera hadoop distribution version 2.3.0-cdh5.0.0.

Functionality

The scenario covered in the application for the search analytics using big data is as follow,
Read the rest of this entry »

Posted in Akka, ElasticSearch, Flume, Hadoop, Hive, Java, Oozie, Pig, Spring, Spring Data | Tagged: Akka, Big Data, ElasticSearch, Flume, Hadoop, Hive, Oozie, Pig, Spring Data | 6 Comments »

	Exploring Enterprise… on Oozie: Scheduling Coordinator/…
	Exploring Enterprise… on ElasticSearch-Hadoop: Indexing…
	Exploring Enterprise… on Flume: Gathering customer prod…
	Exploring Enterprise… on Customer product search clicks…
	Exploring Enterprise… on ElasticSearch: Indexing setup…
	Exploring Enterprise… on ElasticSearch: Learn Java API…
	Exploring Enterprise… on ElasticSearch: Boosting score…
	Exploring Enterprise… on ElasticSearch: Text analysis f…
	Exploring Enterprise… on ElasticSearch: Faceted Search…
	Exploring Enterprise… on Getting started with Elas…

Jai’s Weblog – Tech, Security & Fun…

Tech, Security & Fun…

Jaibeer Malik

Subscribe

Feedburner

Email Subscription

Archives

Categories

Stats

Live Traffic

Books

Posts on:

Top Posts

Recent Comments

Follow me on Twitter

Interesting Links

Follow me on FriendFeed

Archive for the ‘Flume’ Category

HBase: Generating search click events statistics for customer behavior

Spark: Real time analytics for big data for top search queries and top product views

Spark

Functionality

Flume: Gathering customer product search clicks data using Apache Flume

Product Search Functionality

SearchQueryInstruction

Customer product search clicks analytics using big data

Github Repository

Analyzing Search Clicks Data Using Flume, Hadoop, Hive, Pig, Oozie, ElasticSearch, Akka, Spring Data.

Getting Started

Functionality

Tech, Security & Fun…

Jaibeer Malik

Subscribe

Feedburner

Email Subscription

Archives

Categories

Stats

Live Traffic

Books

Posts on:

Top Posts

Recent Comments

Archive for the ‘Flume’ Category

Share this:

Spark

Functionality

Share this:

Product Search Functionality

SearchQueryInstruction

Share this:

Github Repository

Analyzing Search Clicks Data Using Flume, Hadoop, Hive, Pig, Oozie, ElasticSearch, Akka, Spring Data.

Getting Started

Functionality

Share this: