Jai’s Weblog – Tech, Security & Fun…

Tech, Security & Fun…

  • Jaibeer Malik

    Jaibeer Malik
  • View Jaibeer Malik's profile on LinkedIn
  • Subscribe

  • Feedburner

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 16 other followers

  • Archives

  • Categories

  • Stats

    • 235,245
  • Live Traffic

ElasticSearch: Faceted Search for Hierarchical data

Posted by Jai on March 19, 2013

Faceted Search is the navigational search allowing business to clearly define the properties or characteristics of the product catalog and navigate user to find relevant products with minimum efforts. Most of the available search solutions support the functionality now a days, in this post we will cover how to implemented faceted search using flattened data approach for hierarchical data using ElasticSearch for a typical eCommerce platform.

Search Scenarios/Business Example:

Earlier post, Data Modeling approach for search content and tagging  explains the different characteristics of a typical eCommerce platform serving hierarchical data in terms of categorization of data and sub categorization.

Take an example of such a typical eCommerce platform where one site you need to display the Navigration browsing of your hierarchical data based on some search solution. For example, you need to display products like Books/Clothes etc. Each product has its own specific characteristics and can be categoriezed in different categories and sub categories.

Hierarchical Data:

The hierarchical data in business form represents the taxonomy for your data. The way you can characterize your data in the form of different category type, categories and sub categories for the product catalog.

tagging-product

Taking data modelling example from earlier post, you have your data defined in different product categories and sub categories.

We will focus here to allow the hierarchical data categories to be available to end users using navigational search.

Faceted Search

Faceted Search  is the navigational approach for accessing your data by applying multiple filters. Facets are the properties of the product classifying it in different categories.

Some typical examples of Faceted search are, LinkedIn Faceted Search

Some of the obvious added value for faceted search,

  • enables guided navigation
  • is very intuitive
  • compratively efficient
  • very precise
  • improves user experience
  • very quick with no wait time

Faceted Search site Designing

Have a look at article for Designing Faceted Search site, which will help you to achieve better end user navigational experience. This cover the site overall navigational views with different examples,

  • Site layout
  • Site menu
  • Selection state
  • Data Refresh approach

To understand better the facet behavior for your data, refer to article Designing Faceted Search Part II covering different options available for your facet data. The options of representing your data,

  • As single select, using links
  • As multi select, using checkboxes
  • As pallets, choosing colors etc.

Solr Multi level Faceting

Solr,  an open source search engine, provides Faceting navigation , Solr Faceting Overview.

To implement faceted search using Solr, have a look at article Faceted Search with Solr 
Complex Solr Faceting 
Solr Faceting

To implement hierarchical faceting using Solr,

One of the simplest approach for hierarchical faceting is to convert the hierarchical data to flattened data which will be served by search engine.

ElasticSearch Facets

ElasticSearch, an open source search engine provides an easy and full Java API to build facets.

Have a look at complete list of facets offered by elasticsearch, Facets 

Elasticsearch offers inbuilt functionality of multi type of facets, to name few

  • Terms Facet: return the N most frequent terms.
  • Range Facets: allows to specify a set of ranges and get both the number of docs (count) that fall within each range, and aggregated data either based on the field, or using another field.
  • Histogram Facets: The histogram facet works with numeric data by building a histogram across intervals of the field values.
  • Statistical Facet: Statistical facet allows to compute statistical data on a numeric fields.

We will be using here, Terms Facet for the faceted search.

Have a look at Solr Vs ElasticSearch Faceting for comparative analysis.

Search Flattened data:

Based on above example, let’s say we have hierarchical data of the product catalog with us which have been designed based on product taxonomy and different categorizations.

Most of the product catalog start with the major category types available for the products. For example, the category type for the products you sell fall under some top category for the products, either it is Book/Games/Toys/Clothes/Computer etc.

The product is then further categorized into sub categories. eg. Apple mackbook will fall under the hierarchical categorization of Computer->Laptops->Apple Mackbook pro
And another category which it will belong to Brand->Apple and Memory->250GB/500GB and Color->White/Silver etc.

Let’s say your data model based on taxonomy for your product catalog is,

tag-product-cat

Product Category type: major product catalogs, Books/Games/Computer/Toys
Product Catalog Categories : hierarchical data to further categorize a product in a sub category(eg. apple macbook pro).

We need to flatten the above data, to server it to the search engine in flattened format.

Let’s say for categorization we have an array of categories applicable to the product. We need to divide the product category in different type and different level.


array: catdata[]
field: cat__

//eg. The field value for category apple mackboo pro is
//category_computer_1: "Computer", category_computer_2: "Laptops", category_computer_3: "Apple Macbook pro"

catdata[
{
cat_computer_1: "Computer"
},
{
cat_computer_2: "Laptops"
},
{
cat_computer_3: "Apple Macbook pro"
}
]

ElasticSearch Java API examples

We will see how we can use, ElasticSearch Java api to create flattened data and retrieve facets accordingly for above fields.

Earlier post, Getting started with ElasticSearch will allow you to get started with elasticsearch to create an index and index a document data.

Add hierarchical data field mapping

Add field mapping for the document to the search engine with below additional field.

//For each category type eg. "computer", "books", "toys" etc.
List catTypes = new String[]{"computer", "books", "toys"};
//to restrict the level of hierarchical data to which level.
int supportedLevel: 4;
builder.startObject("catdata")
			.startObject("properties");
//Add different cat type leveled field mappings
                         for (String catType : catTypes)
                         {
                             for (int i = 1; i <= supportedLevel; i++)
                              {                                  
                                 builder.startObject(catType + "_" + i)                                             
                                           .field("type", "multi_field")                                            
                                           .startObject("fields")                                                 
                                                 .startObject("textsearch")                                                    
                                                    .field("type", "string")                                                     
                                                    .field("store", "yes")                                                    
                                                    .field("analyzer", "custom1")                                                
                                                 .endObject() 						
                                                 .startObject("facet")                                                   
                                                    .field("type", "string")                                                    
                                                    .field("store", "yes")                                                   
                                                    .field("index", "not_analyzed")                                               
                                                 .endObject()                                            
                                              .endObject()                                        
                                          .endObject();                             
                                 }                         
                        } 
                    ;
   builder.endObject() 
        .endObject() ;

Index data for the field, let’s say the product is tagged with multiple categories, eg. Apple Macbook pro and multiple colors The JSON object formed is,

 

catdata[ 
{ cat_computer_1: "Computer" }, 
{ cat_computer_2: "Laptops" }, 
{ cat_computer_3: "Apple Macbook pro" } , 
{ cat_color_1: "White" } , 
{ cat_color_1: "Silver" } ] 

The flattened data represents that, if the product is tagged with child category in the hierarchy it is automatically tagged with the parent category also. These conversions depend on business requirements as to tag the product directly with parent or only with child categories. You can change the implementation as per your requirements.

Add data to document

 
//For each category find the level in you data model //eg. Computer->laptops->Apple Mac pro, Computer->Tablets->Ipad 2
//eg. also for each type tag parent cat along with child cat
int level = getLevelForCat(cat); //will return int level in hierarhical data
int catName: getCatName(cat); //will return cat name , eg. Ipad 2
//start array field
builder.startArray("catdata");
//For loop, for all the child and parent cat
//Add single cat data
builder.startObject()
		.field(cat+ "_" + catType + "_" + level, catName)
        .endObject();
builder.endArray();

Index the document, and update document catdata field values.

Retrieve Facets

Based on above data we need to retrieve the facet values.

//prepare search request
SearchRequestBuilder requestBuilder = client.prepareSearch(indexName).setTypes(types).setFrom(from).setSize(size);

//Let's say we only need top level facets first
for (String catType : catTypes)
{
//Add facet to the search request
	TermsFacetBuilder termsFacetBuilder = new TermsFacetBuilder(catType + "_" + 1);
	termsFacetBuilder.field("catdata.cat_" + catType + "_" + 1);
	termsFacetBuilder.order(ComparatorType.TERM);
	termsFacetBuilder.size(100);
	requestBuilder.addFacet( termsFacetBuilder);
}

//Execute search
SearchResponse searchResponse = requestBuilder.execute().actionGet();

//Parse search Response
Facets facets = response.getFacets();
for (Facet facet : facets)
{
    TermsFacet termsFacet = (TermsFacet) facet;
	System.out.println("Facet: " + termsFacet.getName());
    for (TermsFacet.Entry entry : termsFacet.entries())
        {
            String term = entry.getTerm();
            int count = entry.getCount();
			System.out.println(term + ":" + count);
        }
}

The above facet results will return you following top level facets.
Facet: cat_computer_1
Computer: 1
Facet: cat_color_1
White: 1
Silver: 1

Filter on selected facets

The next part is to be able to filter on the facet results. Let’s say search for document with facet color White.


//Get query builder
QueryBuilder matchQqueryBuilder = QueryBuilders.matchAllQuery();

//Add filter on the query based on filtered query
AndFilterBuilder andFilterBuilder = FilterBuilders.andFilter();
andFilterBuilder.termFilter("cat_color_1", "White");
requestBuilder.setQuery(QueryBuilders.filteredQuery(queryBuilder, andFilterBuilder));

//Execute search
SearchResponse searchResponse = requestBuilder.execute().actionGet();

//parse search resuls and response.

Retrieve next level facet based on selected facet

The next part is, you want to retrieve second level of facets based on user selection of top level of facets.

//prepare search request
SearchRequestBuilder requestBuilder = client.prepareSearch(indexName).setTypes(types).setFrom(from).setSize(size);

//Let's say we only need top level facets first
//eg. catTypesFirstLevel for "color" and "books"
for (String catType : catTypesFirstLevel)
{
//Add facet to the search request
	TermsFacetBuilder termsFacetBuilder = new TermsFacetBuilder(catType + "_" + 1);
	termsFacetBuilder.field("catdata.cat_" + catType + "_" + 1);
	termsFacetBuilder.order(ComparatorType.TERM);
	termsFacetBuilder.size(100);
	requestBuilder.addFacet( termsFacetBuilder);
}

//Let's say second level for selected cat type
//eg. second level for "computer"
for (String catType : catTypesSecondLevel)
{
//Add facet to the search request
	TermsFacetBuilder termsFacetBuilder = new TermsFacetBuilder(catType + "_" + 2);
	termsFacetBuilder.field("catdata.cat_" + catType + "_" + 2);
	termsFacetBuilder.order(ComparatorType.TERM);
	termsFacetBuilder.size(100);
	requestBuilder.addFacet( termsFacetBuilder);
}

//Get query builder
QueryBuilder matchQqueryBuilder = QueryBuilders.matchAllQuery();

//Add filter on the query based on filtered query
AndFilterBuilder andFilterBuilder = FilterBuilders.andFilter();
andFilterBuilder.termFilter("cat_computer_1", "Computer");
requestBuilder.setQuery(QueryBuilders.filteredQuery(queryBuilder, andFilterBuilder));

//Execute search
SearchResponse searchResponse = requestBuilder.execute().actionGet();

//Parse search Response
Facets facets = response.getFacets();
for (Facet facet : facets)
{
    TermsFacet termsFacet = (TermsFacet) facet;
	System.out.println("Facet: " + termsFacet.getName());
    for (TermsFacet.Entry entry : termsFacet.entries())
        {
            String term = entry.getTerm();
            int count = entry.getCount();
			System.out.println(term + ":" + count);
        }
}

The above facet results will return you following top level facets.
Facet: cat_computer_2
Laptops: 1
Facet: cat_color_1
White: 1
Silver: 1

There are different pros and cons with flattening data and very much depend on how you wish to implement and use the same suiting best to your requirements. As the flattened data in the search engine represents different characteristics and properties attached to a document, the above approach does not implement the strict hierarchical representation at the search engine side.

It is the combined approach of flattening and UI representations (single click or multi select of facets) and the quality of hierarchical data tagged which will represent it best.

Additional Features

The above example provides you basic representation of flattened data for your hierarchical data. There are multiple additional requirements in real worlds, some of which are

Sequenced/Ordered Data

There is always a sequenced representation of data and hierarchical ordering of categories which needs be controlled by business rather than default ordering by any search solution. ElasticSearch based on terms facets allows you to sort the data based on alphabetical order and count of terms.

Let’s control the sequencing of generated facets based on sequenced order in the hierarchical data for each category.

//add aaditional field configuration value under the multi field
		.startObject("seqfacet")
			.field("type", "string")
            .field("store", "yes")
            .field("index", "not_analyzed")
        .endObject()

//update field value based on numberical value of sequence level.
// eg. cat_computer_1_seq= "00_Computer"
builder.startArray("catdata");
builder.startObject()
		.field(cat+ "_" + catType + "_" + level, catName)
		.field(cat+ "_" + catType + "_" + level + "_seq", level < 10 ? "00_" + catName : level + "_" + catName)
        .endObject();
builder.endArray();

//Update terms facet builiding on sequenced field, it will be sorted on sequnced field.
	TermsFacetBuilder termsFacetBuilder = new TermsFacetBuilder(catType + "_" + 1);
	termsFacetBuilder.field("catdata.cat_" + catType + "_" + 1 + "_seq");
	termsFacetBuilder.order(ComparatorType.TERM);
	termsFacetBuilder.size(100);
	requestBuilder.addFacet( termsFacetBuilder);

//Parse search Response and facets
Facets facets = response.getFacets();
for (Facet facet : facets)
{
    TermsFacet termsFacet = (TermsFacet) facet;
	System.out.println("Facet: " + termsFacet.getName());
    for (TermsFacet.Entry entry : termsFacet.entries())
        {
            String term = entry.getTerm().substring(entry.getTerm().indexOf("_") + 1);
            int count = entry.getCount();
			System.out.println(term + ":" + count);
        }
}

Case insentive filtering for facets

To enable case insensitive filtering for the facets, you need to have the term facets comparison on lower/upper case of terms. There can be different requirements like building your facets dynamically etc. and allow case insensitive filtering.

To achieve the case insensitive filtering for the facets generated, you need to enable to generate facets from different fields and then do the filtering of facets on different field.

//add aaditional field configuration value under the multi field
//this field will store case insensitive value , lower case value for the facet term
//You can control it either using analyzers or through code implementation.
		.startObject("facetfilter")
			.field("type", "string")
            .field("store", "yes")
            .field("index", "not_analyzed")
        .endObject()

//update field value based on numberical value of sequence level.
// eg. cat_computer_1.facetfilter = "computer"
builder.startArray("catdata");
builder.startObject()
		.field(cat+ "_" + catType + "_" + level, catName)
		.field(cat+ "_" + catType + "_" + level + ".facetfilter", catName.toLowercase())
		.field(cat+ "_" + catType + "_" + level + "_seq", level < 10 ? "0_" + catName : level + "_" + catName)
        .endObject();
builder.endArray();

//Add filter on the query based on filtered query
// eg. cat_computer_1.facetfilter = "computer" is mapped with lower case value of "Computer"
AndFilterBuilder andFilterBuilder = FilterBuilders.andFilter();
andFilterBuilder.termFilter("cat_computer_1", "Computer".toLowerCase());
requestBuilder.setQuery(QueryBuilders.filteredQuery(queryBuilder, andFilterBuilder));

//Execute search
SearchResponse searchResponse = requestBuilder.execute().actionGet();

Text search

To enable the test search on the faceted fields, use the multi field nature of the leveled field. Configure your own analyzer, tokenizer etc. for the field. and add the same to the query fields to allow query upon.

//Add analyzer settings
builder.startObject("analysis")
	.startObject("analyzer")
        .startObject("customtextsearch")
             .field("type", "custom")
             .field("tokenizer", "whitespace")
             .field("filter", new String[]{"lowercase",
											//"",
                                            //"",
                                            //"",
                                            //""
                                            })
             .field("char_filter", "html_strip")
         .endObject()
	.endObject()
.endObject()

//Add field configuration to the multi field, in for loop for all the cat fields
.startObject("cat_computer_1")
			.field("type", "string")
            .field("store", "yes")
            .field("analyzer", "customtextsearch")
.endObject()

//add cat fields for text search
QueryStringQueryBuilder queryStringQueryBuilder = QueryBuilders.queryString(queryString);
queryStringQueryBuilder.field("catdata.cat_computer_1", (float) 1.5)

Term suggestion

Auto suggestion for relevant terms based on your facet navigation data is very common functionality across the sites now a days. You want to suggest some terms to the end user based on as user start typing. And displaying facet navigational terms or categories available and product count for that category or term is very commonly used.

The auto suggestion can be easily achieved based on above configurations of the same category fields. Add different field configurations for the multi field that how you would like to analyze your data.

//Add analyzer settings
builder.startObject("analysis")
	.startObject("analyzer")
        .startObject("customkeywordanalyzer")
            .field("type", "custom")
			.field("tokenizer", "keyword")
            .field("filter", new String[]{"lowercase"})
        .endObject()
	.endObject()
.endObject()

//Add field configuration to the multi field
.startObject("autosuggestion")
			.field("type", "string")
            .field("store", "yes")
            .field("analyzer", "customkeywordanalyzer")
.endObject()

//add cat fields for text search
SearchRequestBuilder searchRequestBuilder = client().prepareSearch(indexName).setTypes(documentTypes).setSize(0);
searchRequestBuilder.setQuery(QueryBuilders.matchAllQuery());
searchRequestBuilder.addFacet(getAutoSuggestionsFacet(queryString, autoSuggestionSortOrder, size));
TermsFacetBuilder termsFacet = FacetBuilders.termsFacet("autosuggestionterms");
termsFacet.fields(fieldsArray);
String[] fieldsArray = new String[]{"catdata.cat_computer_1.autosuggestion",
"catdata.cat_computer_2.autosuggestion",
"catdata.cat_color_1.autosuggestion"};

//term containing.
termsFacet.regex(".*" + queryString + ".*";

//Execute request
SearchResponse searchResponse = requestBuilder.execute().actionGet();

//Only facets are returned
//Parse facets results and suggested terms along with count is returned.
Facets facets = response.getFacets();
for (Facet facet : facets)
{
    TermsFacet termsFacet = (TermsFacet) facet;
	System.out.println("Facet: " + termsFacet.getName());
    for (TermsFacet.Entry entry : termsFacet.entries())
        {
            String term = entry.getTerm().substring(entry.getTerm().indexOf("_") + 1);
            int count = entry.getCount();
			System.out.println(term + ":" + count);
        }
}

The terms returned will be as per your facets or categories like “computer”, “laptops”, “white”, “silver” etc.

Internationalization (i18n)

Most of the sites do offer content available for different locales based on localization settings to target the local customers. Mapping your index and other content according to required locale is also one of the basic requirement for search engine to server data in different locales.

The same can be achieved in different ways using elasticsearch. Either you can flatten your data to be available in different locales, the same document will store information in different locales and based on user locale, you can query that locale data only. eg. “cat_computer_1.en_EN” will server you English content only if present.

Another option is you can have separate index for each locale and based on user locales, you can target specific locale based index to server the content. eg. “index_en_EN” will server only English content.

It is possible to achieve the same by both ways, it is up to your data and requirements that which approach will suit to better. Based on elasticsearch capabilities to handle multiple index, the preferred approach would be to have separate locale based content allowing locale based analyzer for text and would be easy and more efficient to handle.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: