Evaluation of Data Mining Classification Models

Abdalla M EL- HABIL , Mohammed El-Ghareeb


Abstract : This paper aims to identify and evaluate data mining algorithms which are commonly implemented in supervised classification task. Decision tree, Neural networks, Support Vector Machines (SVM), Naive Bayes, and K-Nearest Neighbor classifiers are evaluated by conducting a simulation study and then assigned to three different datasets to classify and predict the class membership of binary (2-class) and multi-class categorical dependent variables present in these datasets, these datasets were different among each other regarding their size (relatively large and small), and type of predictors (ordinal, numeric, and categorical), as well as number of classes associated with the categorical dependent variable presents in each datasets. Classification performance of these models obtained from a hold-out and 10-fold cross-validation, and empirically evaluated regarding to their overall classification accuracy. We concluded that, there are some differences between the classifiers accuracies, validated by using Hold out and 10-fold cross validation methods assigned to classify a binary categorical dependent variable presents in relatively large dataset, a (3-class) categorical dependent variable presents in relatively small dataset, and a (7-class) categorical dependent variable presents in relatively small dataset, SVM classifier gave the highest averaged rate of classification accuracy in the both methods of validation assigned to these different datasets. Therefore, we can conclude that the SVM, Neural networks, and k-Nearest Neighbor gave the highest averaged rate of classification, and 10-fold cross validation increased the classifiers accuracies. And this result is approximately matching the conducted simulation results. Key words: Data mining classification - Decision tree - Neural networks - Support Vector Machine (SVM) - Naive Bayes – k Nearest Neighbor - Hold- out validation - 10-Fold cross-validation – Bootstrapping - Confusion Matrix


  • There are currently no refbacks.

Copyright (c) 2015 IUG Journal for Natural and Engineering Studies