StatSoft(n.d) defines Data Mining an “analytic process designed to explore data (usually large amounts of data – typically business or market related – also known as “big data”) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. ”. They further went on to state that “the ultimate goal of data mining is prediction”.
Figure 1.0 Illustrates the Multiple Disciplines for data mining(GDi Techno Solutions, 2012)
As mentioned in Brookshear et al.(2012,p414-415) the different types of data mining are as follows:
- Class description
- Class discrimination
- Cluster Analysis
- Association Analysis
- Outlier Analysis
- Sequential Pattern Analysis
Also known as characterization, is a data mining system that according to Brookshear et al.(2012) ,“deals with identifying properties that characterize a given group of data items”. This produces a descriptive summarization into the characteristics of customers.
An Example: This pattern will show characteristics of a customer(s) who spends more than let’s say USD$500 a year at CPJ Market’s online store. This can result in a general profile of the customers; such as age, marital status, employment status or credit ratings.
It is a comparison or the contrasting of two groups. “class discrimination deals with identifying properties that divide two groups.” (Brookshear et al.,2012,p414) . According to Zaiane(1999), “The techniques used for data discrimination are very similar to the techniques used for data characterization with the exception that data discrimination results include comparative measures.”
An Example: This pattern can be used to compare the general characteristics of the customers who bought complete albums on iTunes last year against those who bought less than 3 tracks from an album.
Clustering analyses also known as ‘unsupervised classification’. Cluster analysis is somewhat similar to classification in that it organizes the data into classes, however unlike classification, the class labels are not known, therefore it’s based on the algorithm to disclose these classes.
An Example: Cluster analysis can be performed on Wal-Mart customer data in order to identify homogeneous subpopulations of customers.
Figure 1.1 Illustrate a cluster analysis pattern(Smart, 2013)
From a sales and marketing perspective, it determines which items are frequently sold together within the same transaction and or time period.
Example: If a customer buys a fish tank there is a 50% chance that he/she will also buy an air pump as well. This pattern is used most often on online stores, a prime example is Amazon and their techniques to up-sell items.
Figure 1.2 Illustrate a association analysis pattern(Olsen, 2013)
Outliers also known as ‘exceptions or surprises’ are data elements that cannot be grouped in a given class or cluster.
An Example: A very well known use of this is in finding Fraudulent usage of credit cards and the sudden change in a customer’s buying patterns, especially purchases wherein a customer buying increases in volume very suddenly.
Sequential Pattern Analysis
Data evolution analysis describes and models regularities or trends for objects whose behavior of time-related data and the changes over time. This focuses on “characterizing, comparing, classifying or clustering of time-related data.”(Zaiane,1999).
Example: This pattern can be used in predicting the future trends in the stock market prices, This contributes to a decision in which stock investment or not. This pattern can be used by various financial or investment companies.
Database types, data mining patterns within and why
|Databases Types||Data Mining Patterns||Why|
|Transactional||Outlier, Associate, Class discrimination, Class description, and Cluster||This database contains transactional data that could highlight each of the following patterns.|
|Time-Series||Sequence Pattern and Cluster||This database contains stock exchange and movement over a period of time|
|Sequence||Sequence Pattern, Class discrimination, Associate, Class description, and Cluster||This type of database contains information with regards to customer shopping sequences or browsing info.|
|Multimedia||Class discrimination, Class description, and Cluster Analysis||A primary example would be Netflix, as this pattern could help improve UX and increase sales.|
|Legacy||Class discrimination, Associate, Class description, Cluster Analysis, and Sequence Pattern||As the name suggests, this contains history information that can span numerous patterns. This is a grouping of all the major databases.|
In conclusion, Data mining help organizations to make informed decisions with regards to the pattern of interest. Not all patterns will be suited for an organization and those that are will provide the most or become the source of vital information. For marketing companies or supermarkets, this can facilitate an increase in sales and revenue, management of inventory restock and supply chain. Data mining has become an important facet of the information age and more research is being done to improve it usefulness.
Brookshear, J. G., Smith, D. and Brylow, D. Computer Science: An Overview, 11th Edition. Reading, MA: Pearson (Addison-Wesley), 2012
GDi Techno Solutions (2012) Data mining – GDi Techno Solutions, Available at: http://www.slideshare.net/gditechnosolutions/data-mining-gdi-techno-solutions (Accessed: May 8, 2016).
Olsen, J. (2013) Shopping for KPIs: Market Basket Analysis for Web Analytics Data, Available at: https://blogs.adobe.com/digitalmarketing/analytics/shopping-for-kpis-market-basket-analysis-for-web-analytics-data/ (Accessed: May 8, 2016).
StatSoft (n.d) What is Data Mining (Predictive Analytics, Big Data), Available at: http://www.statsoft.com/Textbook/Data-Mining-Techniques#mining (Accessed: May 8, 2016).
Smart, F. (2013) Cluster Analysis, Available at: http://www.econometricsbysimulation.com/2013/09/cluster-analysis.html (Accessed: May 8, 2016).
Zaiane, O. R. (1999) Chapter I: Introduction to Data Mining, Available at: https://webdocs.cs.ualberta.ca/~zaiane/courses/cmput690/notes/Chapter1/ (Accessed: May 8, 2016).