Four Stages Of Data Mining

922 Words4 Pages

A. Data Mining frame work
The general framework of Knowledge Discovery and Data Mining consists of mainly four stages. The main stages are:

1.Data gathering:
This stage consists of gathering all available information on students. A set of factors that can affect the students’ performance must be identified and collected from the different sources of available data and finally, all the information should be integrated into a dataset.
2.Data Pre-processing:
At this stage the dataset is prepared to apply the data mining techniques. Traditional pre-processing methods such as data cleaning, transformation of variables, and data partitioning have to be applied.
3. Data mining:
Different data mining algorithms are applied to the dataset. The …show more content…

In [1] white box classification techniques are used to predict the dropouts. Decision trees and rules induction algorithms and evolutionary algorithms are mainly used as the “white box” classification techniques. White box classification algorithms obtain models that can explain their predictions at a higher level of abstraction by IF-THEN rules. A decision tree is a set of conditions organized in a hierarchical structure. An instance can be classified by following the path of satisfied conditions from the root of the tree until a leaf is reached, which corresponds to a class label. Rule induction algorithms usually employ a specific-to-general approach, in which obtained rules are generalized until a satisfactory description of each class is …show more content…

Genetic Programming (GP) is a machine learning technique used to optimize a population of computer programs according to a fitness function determined by a program’s ability to perform a given computational task. Genetic programming has been applied with success in various complex optimization, search and classification problems. Interpretable Classification Rule Mining algorithm (ICRM) that employs Grammar based Genetic Programming (G3P) to evolve rule-based classifiers and has been shown to perform well in other classification domains. G3P is a variant of genetic programming in which a grammar is defined and the evolutionary process proceeds, making sure that every individual generated is legal with respect to the grammar. The algorithm iterates to find the best rules for the different classes. A context-free grammar is used to specify which relational operators are allowed to appear in the antecedents of the rules and which attribute must appear in the consequents or the class. The use of a grammar provides expressiveness, flexibility, and ability to restrict the search space in the search for rules. The implementation of constraints using a grammar is a very natural way to express the syntax of rules. The rules must be constructed