Abstract' X-Validation is used to perform a cross-validation process. The input given to it is split up into number of validations subsets. The inner subprocesses are applied number of validations times using the test set which is the input of the Testing subprocess and training set which is the input of the Training subprocess. The Training subprocess must return a model, which is usually trained on the input. The Testing subprocess must return a Performance Vector which is generated by applying the model(or algorithm) and measuring it's performance. Then we can add objects from the Training to the Testing subprocess using the particular port. The performance calculated by this estimation scheme is only an estimation of the performance that is achieved with the model built on the complete delivered data set instead of an exact calculation. In this paper, we will compare different classification algorithms in RapidMiner and conclude which will give the accurate result for the particular data set.
Index Terms' RapidMiner, X-Validation, K-NN, Default Model, Decision Tree, ID3.
The last few years Data Mining has become more and more popular. Data Mining has especially become popular in the 'elds of fraud analysis and healthcare, for it reduces costs in time and money. Classification consists of predicting a certain outcome based on a given input. In order to predict the outcome, the algorithm processes a training set containing a set of attributes and the respective outcome, usually called goal or prediction attribute. In Rapidminer, each process must contain exactly one operator of classification and regression class, and it must be the root operator of the process. This operator provides a set of parameters that are of global relevance to the process like logging and initialization of parameters of the random number generator.
II. CROSS - VALIDATION
X-Validation operator performs a cross-validation to estimate the statistical performance of a learning operator (usually on unseen data sets). It is mainly used to estimate how accurately a model (learnt by a particular learning operator) will perform in practice. The X-Validation operator is a nested operator. The X-Validations has two subprocesses: a training subprocess(used for training a model) and a testing subprocess(applied in the testing subprocess). The performance of a given model is also measured during the testing phase.
The given input DataSet is partitioned into k subsets of the equal size. Of the k subsets, only a single subset is retained as the testing data set (i.e. input of the testing subprocess), and the remaining k-1 subsets are used a training data set (i.e. input of the training subprocess). This process is then repeated k times and with each of the k subsets used exactly once as the testing data. The k results retrieved from the k iterations then can be averaged to produce a single estimation. The value of k can be adjusted using the number of validations parameter.
If we test this model on some independent set of data, then most of the times this model does not perform that well on testing data as it performed on the data that was used to generate it. This is called 'over-fitting'. The Cross-Validation operator predicts the fit of a model to a pseudo testing data which can be specially useful when separate testing data is not present.
Fig. 1. X-Validation on dataset 'Golf'
In Fig. 1 we drag the dataset 'Golf' and the operator 'X-Validation'. Now we will double click on the Validation operator and calculate the performance for different classification algorithms
III. TRAINING AND TESTING MODELS
A. K'NN - This operator in RapidMiner generates a k Nearest Neighbor model from the input ExampleSet. This model can be a classification or regression model depending on the input ExampleSet. The k-Nearest Neighbor algorithm is based on learning by analogy, that is, by comparing a given test example with training examples that are similar to it. The training examples are described by n attributes. Each example represents a point in an n-dimensional space. In this way, all of the training examples are stored in an n-dimensional pattern space. When given an unknown example, a k-nearest neighbor algorithm searches the pattern space for the k training examples that are closest to the unknown example. These k training examples are the k "nearest neighbors" of the unknown example. "Closeness" means distance metric, for example Euclidean distance or Manhattan distance.
The basic k-Nearest Neighbor algorithm is composed of two steps: Find the k training examples that are closest to the unseen example. Take the most commonly occurring classification for these k examples (or, in the case of regression, take the average of these k label values).
Fig. 2. Training and testing for K-NN
Figure 2 shows the testing and training phase for the K-NN classification algorithm. In left side we have the training phase and in the right side we have testing phase. In training phase, we dragg the K-NN and the Apply model operator. Apply Model operator applies an already learnt or trained model on an ExampleSet. And in the testing phase, we dragg the performance operator. Then we run it and we get the output as shown in the fig. 3.
Fig. 3. Performance of K-NN using X-Validation
B. Default Model - The Default Model operator generates a model that predicts the specified default value for the label in all examples. The method to use for generating a default value can be selected through the method parameter. The default value can be median, mode or average of label values. For numeric label, a constant default value can be specified through the constant parameter. For nominal label, values of an attribute can be used as predictions; the attribute can be selected through the attribute parameter. This operator should not be used for 'actual' prediction tasks, but it can be used for comparing the results of 'actual' learning schemes with guessing.
Fig. 4. Training and Testing for Default Model
Figure 4 shows the training and testing phase for Default Model. In training phase we dragg 'Default Model' and 'Apply Model' operator and in testing phase we drag the 'Performance' operator. Then we run it and get the output as shown in the Fig. 5.
Fig. 5. Performance of Default Model using X-Validation
C. Decision Tree - This operator learns a Decision Tree with only one single split. This operator can be applied on both nominal and numerical data sets. The Decision Stump operator is used for generating a decision tree with only one single split. The resulting tree can be used for classifying unseen examples. This operator can be very efficient when boosted with operators like the AdaBoost operator. The examples of the given ExampleSet have several attributes and every example belongs to a class (like yes or no). The leaf nodes of a decision tree defines the class name whereas a non-leaf node is defined as a decision node. The decision node is an attribute test with each branch (to another decision tree) being a possible value of the attribute. For more information about decision trees, please study the Decision Tree operator.
Fig. 6. Training and Testing for Decision Tree
Figure 6 shows the training and testing phase for Decision Tree. In training phase we dragg 'Decision Tree' and 'Apply Model' operator and in testing phase we drag the 'Performance' operator. Then we run it and get the output as shown in the Fig. 7.
Fig. 7. Performance of Decision Tree using X-Validation
D. ID3 - This operator learns an unpruned Decision Tree from nominal data for classification. This decision tree learner works similar to Quinlan's ID3. ID3 (Iterative Dichotomiser 3) is an algorithm used to generate a decision tree invented by Ross Quinlan. Very simply, ID3 is used to build a decision tree from a fixed set of examples. The resulting tree that we get is used to classify future samples. The examples of the given ExampleSet have several attributes and every example belongs to a class (like yes or no). ID3 uses feature selection heuristic to help it decide which attribute goes into a decision node. The required heuristic can be selected by the criterion parameter.
Some major benefits of ID3 are:
' The training data creates Understandable prediction rules.
' Builds a short tree in relatively small time.
' Until all data is classified, only needs to test enough attributes.
' To find leaf nodes that enables test data to be pruned, reducing the number of tests.
ID3 may have some disadvantages in some cases e.g.
' If a small sample is used for testing, data may be over-fitted or over-classified.
' Only one attribute at a time can be tested for making a decision.
Fig. 8. Training and Testing for ID3
Figure 8 shows the training and testing phase for ID3. In training phase we dragg 'ID3', 'Discretize' and 'Apply Model' operator and in testing phase we drag the 'Performance' operator. Discretize operator discretizes the selected numerical attributes into user-specified classes. The selected numerical attributes will be changed to either nominal or ordinal attributes. Then we run it and get the output as shown in the Fig. 9.
Fig. 9. Performance of ID3 using X-Validation
E. Perceptron ' The perceptron is a type of artificial neural network invented in 1957 by Frank Rosenblatt. It is the simplest kind of feed-forward neural network: a linear classifier. The single layer perceptron is simply a linear classifier which is efficiently trained by a simple update rule(for all the wrongly classified data points, the weight vector is either increased or decreased by the corresponding example values). An artificial neural network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and neural network processes information using a connectionist approach to computation (the central connectionist principle is that mental phenomena can be described by interconnected networks of simple and often uniform units). In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. Modern neural networks are usually used to model complex relationships between inputs and outputs or to find patterns in data. A feed-forward neural network is an ANN where connections between the units do not form a directed cycle. In this network, the information moves in only one direction that is in forward direction, from the input nodes, to the hidden nodes (if any) and then to the output nodes. There are no cycles(or loops) in the network. If you want to use a more sophisticated neural net, please use the Neural Net operator.
Fig. 10. Training and Testing for Perceptron
Figure 10 shows the training and testing phase for Perceptron. In training phase we dragg 'Perceptron', 'Nominal to Numerical' and 'Apply Model' operator and in testing phase we drag the 'Performance' operator. The Nominal to Numerical operator is used for changing the type of non-numeric attributes to a numeric type. This operator not only changes the type of selected attributes but it also maps all values of these attributes to numeric values. Binary attribute values are mapped to 0 and 1. Numeric attributes of input the ExampleSet remain unchanged. This operator provides three modes for conversion from nominal to numeric. This mode is selected by the coding type parameter. Then we run it and get the output as shown in the Fig. 11.
Fig. 11. Performance of Perceptron using X-Validation
F. Naive Bayes - Naive Bayes operator generates a Naive Bayes classification model. A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. Or Simply, a Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class (i.e. attribute) is unrelated to the presence (or absence) of any other feature. For example, if a fruit is round, red and about 4 inches in diameter, then it may be considered to be an apple. Even if these features depend on each other or upon the existence of the other features, a Naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple. The advantage of the Naive Bayes classifier is that it only requires a small amount of training data to estimate the means and variances of the variables that are necessary for classification. Because of the assumptions of independent variables, only the variances of the variables for each label need to be determined and not the entire covariance matrix.
Fig. 12. Training and Testing for Naive Bayes
Figure 12 shows the training and testing phase for Perceptron. In training phase we dragg 'Perceptron', 'Nominal to Numerical' and 'Apply Model' operator and in testing phase we drag the 'Performance' operator. Then we run it and get the output as shown in the Fig. 13.
Fig. 13. Performance of Naive Bayes using X-Validation
IV. Result And Analysis
As we have applied X-Validation on the Classification algorithms that is K-NN, Default Model, Decision Tree, ID3, Perceptron and Naive Bayes and calculated the accuracy for all. As we can in the figures the accuracy for K-NN is 15.00%, for Default Model 35.00%, for Decision Tree 25.00% , for ID3 is 20.00%, for Perceptron 35% and for Naive Bayes 35%. So, we can analyze that the accuracy of Default Model, Perceptron and Naive Bayes is best of all the classification algorithms.
V. CONCLUSION AND FUTURE SCOPE
Therefore, we can easily calculate and conclude the performance of the classification algorithms as well as other algorithms in RapidMiner. Now, we have taken a small dataset and analyzed the result. In future, we will take a larger dataset and then will calculate the accuracy for each algorithm and will analyze accordingly.
 Margaret H. Dunham, Data Mining 'Introduction and Advanced Topics'.
 M. Wurst. The word vector tool and the RapidMiner text plug-in. http://wvtool.sf.net, accesed july 31,2007.
 Jiawei Han, Michelin Kamber, 'Data Mining Concepts and Techniques'[M], Morgan Kaufmann publishers, USA, 2001, 70-181.
 Quinlan J R, 'Induction of Decision Trees', [J], Machine Learning,1986.
 Agrawal , T. lmielinski and A. Swami ' Database mining: A performance perspective', IEEE Transactions on knowledge and Data Eng. , vol. 5, no. 6.
 Provost F.J. and Domingos, P., Tree induction for probability-based ranking. Machine Learning, 2003.
 J.M. Keller , M. R. Gray and J.A. Gigens Jr. 'A fuzzy k-nearest neighbor algorithm', IEEE Trans. On Systems, Man, and cybernectics, vol. SMC-15, no. 4.
 DING Xiang-wu and WANG Bin, ' An Improved Pre-pruning Algorithm Based on ID3,' JISUANJI YU XIANDAIHUA, Vol. 9.
 DAN Xiao-rong, CHEN Xuan-shu, LIU De-wei, 'Research and Improvement on the Decision Tree Classification Algorithm of Data Mining,'Software Guide, Vol. 8.
 HUANG ai-hui, CHEN Xiang-tao, 'An improved ID3 Algorithm of Decision Trees,' CSE, Vol. 31.