Home l FAQs l Careers l Contact Us l Client Login
 
 
 
 
PolyVista's Patent Pending Solution
PolyVista's approach to delivering a Text mining solution includes three fundamental processes.

1. Clustering: This process involves the creation of hierarchal text clusters from data fields generally consisting of free-formatted text.
2. Integration: The second process includes the integration of these clusters (text-based dimensions) in an otherwise standard multi-dimensional database (MS Analysis Services). This provides the all-important linking of structured and unstructured data.
3. Analysis: And the third process is the capability to analyze these new-clustered text dimensions using both automated (discovery algorithms) as well as standard (manual) interactive OLAP methods.

It should be noted that without step 3 (the ability to effectively analyze the structured and unstructured dimensions), the text mining process becomes largely academic and offers significantly less value.

1. Hierarchal Text Clusters
Text data is provided in either ASCII delimited files, MS Access tables or MS SQL Server table format. This data is processed to create a keyword frequency distribution taking into account stemming, synonyms and stop words. Using keyword frequency as a basis, hierarchal text clustering is performed using a PolyVista proprietary parallel processing method. This involves an iterative K-means clustering technique that results in a hierarchal text cluster organized in a parent-child relationship. The clustering process is largely data driven though we expose a number of tuning parameters to influence the clustering process. These parameters include the number of clusters at a given level, the number of terms describing the clusters and the degree to which a given word vector "fits" a given cluster. The order of the terms describing the clusters is driven entirely by their observed rank frequency. While k-means is one of the fastest clustering techniques, PolyVista has employed a unique (patent pending) parallel processing method to handle millions of text records which are commonly received daily in many large call center applications.


2. Integrating Structured and Unstructured data
The second step in our Text mining solution involves creating a cube (multi-dimensional database) using this new text-based information. The text data is processed and represented internally as a "ragged" dimension (a hierarchal dimension with non-uniform levels) in the cube. The final cube represents a very powerful integration of standard structured data with its corresponding unstructured data elements. This unique approach allows analysts for the first time to understand text field data in both a parent-child relationship (summarized, hierarchical view) and a multi-dimensional context as well.


3. Automated analysis techniques
In addition to offering the technology to process and construct large text based dimensions, PolyVista has also developed new algorithms and visualizations particularly suited for text based feature discovery. Automating the discovery process is crucial in efficiently surfacing new business insight in complex multi-dimensional cubes. While our current discovery algorithms can be applied effectively to any text dimension, we have added a new algorithm to our Discovery suite. The Difference algorithm has been designed to find and rank differences or deltas between any two user definable sets in terms of one or more selectable dimensions. For example, one could rank deltas between the number of calls received yesterday (set 1) and the previous 7 days (set 2) in terms of a text cluster dimension such as Problem Area.
Example
A call or contact center makes a great example of deriving value from the integration of text and structured data. Consider a chat-based support system where customers initiate a real-time "electronic dialog" (chat) with a customer support agent. For each chat session, the dialog between agent and customer interaction is recorded and stored. The structured elements of this call record include information about the customer as well as the agent (who, what, where, when, etc.). The unstructured data is the verbatim (free-form text) customer/agent dialog itself.

The business value in clustering these text fields and merging them with their related structural elements include the following:

Problem Identification - Timely and accurate problem identification is critical for the support agent as well as the product engineers and service managers. In a typical support system the agent is usually responsible for identifying the problem and classifying the problem type. Where a classification scheme is very simple, the odds of a correct classification are good. Unfortunately, a simple classification scheme may not have enough detail to support identifying important new problem trends or identifying root cause. On the other hand, a more complex scheme may offer very robust analysis potential, but its complex nature invites errors and creative shortcuts, thus negating any analytic value. Text clustering can provide an automated method to assist in problem identification and classification; it is generally unbiased and not prone to taking shortcuts.

Early Warning - With accelerating product lifecycles, the value of analysis is closely tied to how quickly actionable results can be delivered. A problem discovered and corrected in the first weeks of a product's life is much less costly than one discovered several weeks later. Integrating the structured and unstructured data facilitates analysis of these "problem" clusters across multiple dimensions like time, manufacturing location, component suppliers, or product family. This capability enables analysts to quickly identify root causes and to take immediate corrective action.
 
 
 
Home l Company Information l Solutions l Services l Events l Media Center l FAQs l Careers l Contact Us
All Rights Reserved By PolyVista, Inc ® privacy. If you have any comments, please e-mailwebmaster
Site Designed and Maintained by Oliver Stephenson