Data Mining Methods

Models used in data mining are examined under two main headings as predictive and descriptive [1].

In predictive models, it is aimed to develop a model based on the data with known results and to estimate the result values for the data sets with unknown results by using this model. In descriptive models, the patterns in existing data that can be used to guide decision making are provided.


Methods used in data mining [2]

There are also those that divide data mining methods into two main categories as supervised and unsupervised. In data mining, when there is a well-defined or precise target, the term supervised is used. If there is no specific definition for the desired result or there is uncertainty, the term unsupervised is used.

Controlled and unsupervised statements are the opposite of each other. When supervised and unsupervised methods are evaluated in terms of the whole process;

  • Unsupervised methods mostly aim to give an idea about the methods used for understanding, recognizing and discovering the data and which will be applied next.
  • It can be said that supervised methods are used to extract information and conclusions from the data. For this reason, it is important for the accuracy and validity of the findings to confirm an information or result obtained by an uncontrolled method, if possible, with a controlled method.

Factor Analysis, Principle Component Analysis, Hierarchical Clustering, K-Nearest Neighbor, K-Means Clustering, Two-Stage Clustering (Two Step Cluster), Kohonen Networks (Kohonen Nets) or Self-Organizing Maps, Anomaly Detection and Feature Selection algorithms can be counted [3].

Chi-Square Automatic Interaction Detector / CHAID, Detailed Chi-Square Automatic Interaction Detector (E-CHAID), Classification and Regression Tree (Classification and Regression Tree / CRT), Fast, Unattended, Efficient Statistical Tree (QUEST), Artificial Neural Networks with C5.0, Linear Regression Analysis and Logistic Regression Models and Association Rules (Association Rules), Generalized Rule Induction (GRI), Apriori and CARMA algorithms can be cited as examples [3].

New methods and algorithms are being added to the many methods used in data mining almost every day. Some of these are mainly statistical methods, which we can call classical techniques that have been used for decades. Other methods are usually based on statistics, but mostly machine learning and artificial intelligence supported next generation methods.

Data mining models are basically divided into 3 groups according to the functions they see. These:

1- Classification and Regression,
2- Clustering,
3- Association Rules.
Classification and regression models are predictive, clustering and association rules models are descriptive models.

You can find the descriptions of the models in the next Post …

[1] Zhong, N. — Zhou, L., “Methodologies for knowledge discovery and data mining”, Third pacific-asia conference Pakdd-99, Beijing China, (1999).

[2] Kaya, H. ve Köymen, K., “Veri madenciliği kavramı ve uygulama alanları”, Doğu Anadolu Bölgesi Araştırmaları, 159–164 (2008).

[3] Albayrak, A.S. ve Akbulut, R., “Sermaye yapısını belirleyen faktörüler: ĠMKB sanayi ve hizmet sektörlerinde işlem gören işletmeler üzerine bir inceleme”, Dumlupınar Üniversitesi Sosyal Bilimler Dergisi, 22: 22 (2008).

Hiç yorum yok:

Yorum Gönder