Jai’s Weblog – Tech, Security & Fun…

Tech, Security & Fun…

  • Jaibeer Malik

    Jaibeer Malik
  • View Jaibeer Malik's profile on LinkedIn
  • Subscribe

  • Feedburner

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 32 other followers

  • Archives

  • Categories

  • Stats

    • 414,576
  • Live Traffic

  • Advertisements

Archive for the ‘Architecture’ Category

HBase: Generating search click events statistics for customer behavior

Posted by Jai on July 9, 2014

In this post we will explore HBase to store customer search click events data and utilizing same to derive customer behavior information based on search query string and facet filter clicks. We will cover to use MiniHBaseCluster, HBase Schema design, integration with Flume using HBaseSink to store JSON data.

In continuation to the previous posts on,

We have explored to store search click events data in Hadoop and to query same using different technologies. Here we will use HBase to achieve same,

  •  HBase mini cluster setup
  •  HBase template using Spring Data
  •  HBase Schema Design
  •  Flume Integration using HBaseSink
  •  HBaseJsonSerializer to serialize json data
  •  Query Top 10 search query string in last an hour
  •  Query Top 10 search facet filter in last an hour
  •  Get recent search query string for a customer in last 30 days


Read the rest of this entry »


Posted in Architecture, Flume, Hadoop, HBase, Java, Spring Data | Tagged: , , | 1 Comment »

Svn2Git: Migrating repository from Subversion to Git

Posted by Jai on October 23, 2013

Efficient enough Version Control System has always been a challenge based on your requirements of local, centralized or distributed version controlling. Git a distributed version controlling system helps us to achieve same painlessly and is already out for quite some time now with proven track record. In this post we will cover the steps to migrate from your existing version control system like SVN to Git.

Why Git

Some of the features of Git which makes it to standout,

  • Fixed the pitfalls and learning from SVN
  • Dramatic increase in operation speed (diff, merge, view history etc.)
  • Easy, Cheap and efficient branch operations
  • Full history tree available offline
  • Distributed, peer-to-peer model
  • Git’s repositories are much smaller than Subversions
  • Git branches carry their entire history
  • Git provides better auditing of branch and merge events
  • Git’s repository file formats are simple, so repair is easy and corruption is rare.
  • Backing up Subversion repositories centrally is potentially simpler – since you can choose to distributed folders within a repo in git
  • Git repository clones act as full repository backups
  • Walking through versions is simpler in Subversion because it uses sequential revision numbers (1,2,3,..); Git uses unpredictable SHA-1 hashes. Walking backwards in Git is easy using the “^” syntax, but there is no easy way to walk forward.

Read the rest of this entry »

Posted in Architecture, Git, Tools | Tagged: , , , , | 1 Comment »

ElasticSearch: Faceted Search for Hierarchical data

Posted by Jai on March 19, 2013

Faceted Search is the navigational search allowing business to clearly define the properties or characteristics of the product catalog and navigate user to find relevant products with minimum efforts. Most of the available search solutions support the functionality now a days, in this post we will cover how to implemented faceted search using flattened data approach for hierarchical data using ElasticSearch for a typical eCommerce platform.

Search Scenarios/Business Example:

Earlier post, Data Modeling approach for search content and tagging  explains the different characteristics of a typical eCommerce platform serving hierarchical data in terms of categorization of data and sub categorization.

Take an example of such a typical eCommerce platform where one site you need to display the Navigration browsing of your hierarchical data based on some search solution. For example, you need to display products like Books/Clothes etc. Each product has its own specific characteristics and can be categoriezed in different categories and sub categories.

Hierarchical Data:

The hierarchical data in business form represents the taxonomy for your data. The way you can characterize your data in the form of different category type, categories and sub categories for the product catalog.
Read the rest of this entry »

Posted in Architecture, ElasticSearch, Java | Tagged: , , , , , , , | Leave a Comment »

Data Modeling approach for search content and tagging

Posted by Jai on March 13, 2013

For effective search solution, the process of converting the unstructured data into structured format is very important for a successful business. The process includes understanding the user requirements, analyzing the tons of unstructured data for different format and specific system properties and enhancement of the same over a period of time. In this article we will discuss further the data modelling part taking example of typical eCommerce platform, including business process and technical relational database format for the search content to tag in structured format.

eCommerce Platform for search:

A typical eCommerce site, eCommerce

Product catalog with unlimited categories & sub categories.
Featured product/hot product on the home page.
Product search facility.
Search Record display page having advance search option on the top, with the help of this user can refine his search.
Option to compare product on search listing page.
Product listing page with link to product detail page

Read the rest of this entry »

Posted in Architecture, Database, ElasticSearch | Tagged: , , , | 1 Comment »

Choosing the right search solution for your site

Posted by Jai on March 12, 2013

Effective search solution is the first step to provide relevant data to the end users for any eCommerce site. The right combination of marketing rules and corresponding technical capabilities of the solution put you closer to end customer inline with the target customer. In this article we will cover some of the different characteristics of a search solution which you need to focus upon while choosing a search solution for your site.

Your data

The requirements for the search solution vary from organization to organization and business to business. Each organization or domain have different variations of data. Some deal with small end user facing data and some have huge internal data. The common case of eCommerce sites is to store products data and making those available to the end customers with the intention to provide end customers the flexibility to search for the right content.

Search Solution Selection

Search Solution Selection

Search Solutions

Now a days there are quite some options available in market to choose from the search solutions, ranging from open source to commercial products.

Read the rest of this entry »

Posted in Architecture, ElasticSearch, Tools | Tagged: , , , | 1 Comment »

%d bloggers like this: