Data Modeling approach for search content and tagging
Posted by Jai on March 13, 2013
For effective search solution, the process of converting the unstructured data into structured format is very important for a successful business. The process includes understanding the user requirements, analyzing the tons of unstructured data for different format and specific system properties and enhancement of the same over a period of time. In this article we will discuss further the data modelling part taking example of typical eCommerce platform, including business process and technical relational database format for the search content to tag in structured format.
eCommerce Platform for search:
A typical eCommerce site, eCommerce
Product catalog with unlimited categories & sub categories.
Featured product/hot product on the home page.
Product search facility.
Search Record display page having advance search option on the top, with the help of this user can refine his search.
Option to compare product on search listing page.
Product listing page with link to product detail page
The online product catalog in eCommerce platform with typical properties and attributes of, Product Catalog
Unlimited number of products
Up-sell products by displaying substitute products in product page
Cross-sell products by displaying product accessories in product page
Organize online catalog by category and/or brand
Assign products to an unlimited number of sub-categories
Promotional prices applied over pre-determined period
Display real quantities, fixed quantities or generic message
URL, title, keywords and meta description tag generator
Search Solution compatible data:
As stated in the above example a typical ecommerce site and a product catalog, some of the attributes for the search solution are,
- Categorization & sub categorization of the content required for the navigation search.
- Describing the product with relevant information available for text search.
- Adding specific keywords and meta data to products to boost specific products.
Current Unstructured data:
For each organization most of the data remains in the unstructured format usually.
The whole process of retrieving search relevant information from all unstructured data is quite cumbersome. The first should be to understand the unstructured content based on end user and business requirements.
Based on your situation, you need to choose which approach fits best to select commercial products to help you out to map data or start enriching your content yourself.
Process of Converting unstructured data to structured one
In the end it is the end customer who is most important for the business, understanding the end user behavior is most important. While analyzing data, focus on some of the important attributes of site offering,
- How users approach the site
- What product categorization relevant to users
- How to differentiate different target groups
Categorization of products
Most of the products sold on any of the eCommerce site fall under some category whether to start with Clothing, food, books, gadgets, mobiles, computers etc. The top level categorization help to direct the user to relevant products.
Extracting relevant top level categorization from various products available based on unstructured data is also quite robust and intensive process for business.
Most of online products fall under sub categorization of products or brand. The process of extracting out sub categories is very import, you don’t want to have too deep sub categorization to avoid too many clicks and at the same time you also want user to allow to fine filter the search results.
This process of sub categorization is very tricky and changes from product to product and business to business.
Gathering product properties
You know your business, you do know what value you provide to end customer and for what user comes to your site to buy. You already have the data with you which can be further classified as different attributes and properties of the product.
Separate out the key properties of the products you currently offering, the typical selling point for your product. eg. for Clothing the typical attributes are Color, Occasion, Brand, Material, Size etc.
Each product needs to be flexible to add additional information attached to it. eg.
Tags/Keywords: which can be used for separate categorization, SEO improvements and also for free text search.
Meta Information: which can be used for SEO to define meta information for a product.
Text content for products
Describing the product with relevant description and content is very important for the product for free text search to allow end user to search based on text. Each product needs to be described with relevant description matching the specific product properties.
Some of the product properties are usually calculated based on the dynamically configured system properties. eg. price availability on a particular day or time, the product availability itself at any particular moment.
Separate out the variations of the system configured properties of the product which needs to be calculated dynamically at that moment of indexing or query time.
Content Analytics and Business Intelligence (BI):
Based on how customer behave to different navigational and text search content, you will be enhancing your product categorization and tagging information in future.
Analyze fully what customer is trying to find, which content sell most, which content is served based on which search criteria.
Data Modelling Examples
There are multiple scenarios based on your situation,
- if you already have lot of unstructured content, you can still develop custom solution to retrieve relevant information for the search solution.
- if you already have unstructured content, there are commercial tools which can help you to automatically retrieve that information based on customization suiting to your business needs.
- if you just starting with enriching your system with more relevant product information, below example will give you some directions.
We will take an example of typical eCommerce site, served from any Relational Database system. We want to enable the system to store product categorization information, description information and other meta information along with support for multilingual.
As displayed in the above image, you can enrich your product with different categorization, detailed description and SEO and tags related information which will later be used in serving search results.
Preparing search document
The next step of tagged and enriched data content would be to prepare your data to be search compatible.
One of the process of normalizing your data is making it compatible with the search solution which is usually the flattened data.
We will be covering in details in later posts, to enable your tagged product to be ready for any search solution.