Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine Volume 2013, Article ID 487179, 8 pages http://dx.doi.org/10.1155/2013/487179 Research Article Determination of Fetal State from Cardiotocogram Using LS-SVM with Particle Swarm Optimization and Binary Decision Tree Ersen YJlmaz and ÇaLlar KJlJkçJer Electrical-Electronic Engineering Department, Uludag University, 16059 Gorukle, Bursa, Turkey Correspondence should be addressed to Ersen Yılmaz; ersen@uludag.edu.tr Received 26 June 2013; Accepted 6 September 2013 Academic Editor: Damien R. Hall Copyright © 2013 E. Yılmaz and Ç. Kılıkçıer. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We use least squares support vector machine (LS-SVM) utilizing a binary decision tree for classification of cardiotocogram to determine the fetal state. The parameters of LS-SVM are optimized by particle swarm optimization. The robustness of the method is examined by running 10-fold cross-validation. The performance of the method is evaluated in terms of overall classification accuracy. Additionally, receiver operation characteristic analysis and cobweb representation are presented in order to analyze and visualize the performance of the method. Experimental results demonstrate that the proposed method achieves a remarkable classification accuracy rate of 91.62%. 1. Introduction depends on the knowledge and clinical experience of obste- tricians. There is a growing tendency to use clinical decision support A clinical decision support system eliminates the incon- systems in medical diagnosis.These systems help to optimize sistency of visual evaluation.There have been proposed seve- medical decisions, improve medical treatments, and reduce ral classification tools for developing such system [4, 6–10]. financial costs [1, 2]. A large number of the medical diagnosis One of these tools is support vector machine (SVM) procedures can be converted into intelligent data classifica- and it is used in [4, 8, 10]. In [4, 8], SVM is used for FHR tion tasks. These classification tasks can be categorized as signal classification with two classes, normal or at risk. The two-class task andmulticlass task.The first type separates the risk of metabolic acidosis for newborn based on FHR signal data between only two classes while the second type involves is predicted in [4] while the classification of antepartum the classification of the data with more than two classes FHR signal is made in [8]. In [10], a medical decision supp- [3]. ort system based on SVM and genetic algorithm (GA) is pre- Cardiotocographywas introduced into obstetrics practice sented for the evaluation of fetal well-being from the CTG in the early 1970s, and since then it has been used as a world- recordings as normal or pathologic. wide method for antepartum (before delivery) and intra- In [6], an approach based on hidden Markov models partum (during delivery) fetal monitoring. Cardiotocogram (HMM) is presented for automatic classification of FHR (CTG) is a recording of two distinct signals, fetal heart rate signal belonging to hypoxic and normal newborns. In [7], (FHR), and uterine activity (UA) [4]. It is used for deter- an ANBLIR (Artificial Neural Network Based on Logical mining the fetal state during both pregnancy and delivery. Interpretation of fuzzy if-then Rules) system is used to eva- The aim of the CTG monitoring is to determine babies who luate the risk of low-fetal birth weight as normal or abnormal may be short of oxygen (hypoxic); thus further assessments using CTG signals recorded during the pregnancy. of fetal condition may be performed or the baby might be In [9], an adaptive neurofuzzy inference system (ANFIS) delivered by caesarean section or natural birth [5]. The visual is proposed for the prediction of fetal state from the CTG evaluation of the CTG not only requires time but also recordings as normal or pathologic. 2 Computational and Mathematical Methods in Medicine Support vector machines (SVM) is developed for two- class label; SVM requires the minimization of the following class task, but classification problems generally requiremulti- primal optimization problem: class task. There are several methods proposed in the litera- ture based on binary decision tree (BDT) to extend the binary 𝑁 1 SVMs to multi-class problems, for example, [11, 12]. min 2𝐽 (𝑤, 𝜉) = ‖𝑤‖ + 𝐶∑𝜉𝑖 𝑤,𝑏,𝜉 2 (1) LS-SVM is a modified version of SVM in a least square 𝑖=1 sense [13]. The higher computational load of SVM is over- subject to 𝑇 𝑦 (𝑤 𝜑 (𝑥 ) + 𝑏) ≥ 1 − 𝜉 , 𝑖 = 1, . . . , 𝑁, 𝑖 𝑖 𝑖 come by LS-SVM because LS-SVM solves the problem using a set of linear equations while SVM solves as a quadratic pro- where 𝑤 is the normal vector to hyperplane, 𝑏 is the bias or gramming problem. offset scalar, 𝜉 are the slack parameters which are used to𝑖 The choice of appropriate kernel function and the model allow softmargins,𝐶 is the penalty parameter which controls parameters (including kernel parameters) is crucial for SVM- the trade-off between minimizing the error and maximizing based methods, and this influences directly the classification the margin, and 𝜑(𝑥 ) is a nonlinear mapping from the input𝑖 performance.Themost common kernel functions used in the space to the higher dimensional feature space [4, 8, 13, 17, 18]. literature are polynomial, Gaussian radial basis, exponential The corresponding dual problem of (1) is given by radial basis, and sigmoid. Performance evaluation of classifiers is a fundamental 𝑁 𝑁 𝑁1 step for determining the best classifier or the best set of para- max 𝐽 (𝛼) = ∑𝛼 − ∑∑𝛼 𝛼 𝑦 𝑦 𝐾(𝑥 , 𝑥 )𝑖 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗𝛼 2 𝑖=1 𝑖=1𝑗=1 meters for a classifier [14]. In general, the overall classification (2)𝑁 accuracy is a natural way to measure the performance of the subject to ∑𝛼 𝑦 = 0, 0 ≤ 𝛼 ≤ 𝐶, ∀ , 𝑖 𝑖 𝑖 𝑖 classifiers. The classifier predicts the class for each data point 𝑖=1 in the data set; if the prediction is correct it is counted as a success and if it is wrong it is counted as an error. The overall where 𝛼 are Lagrange multipliers, the term 𝐾(𝑥 , 𝑥 ) is a𝑖 𝑖 𝑗 classification accuracy is computed as the ratio of the number kernel function representing the inner product of two vectors of successes over the number of the whole data points to be in the feature space, that is, 𝑇𝜑 (𝑥 )𝜑(𝑥 ). Kernel functionmust 𝑖 𝑗 classified. satisfy thewell-knownMercer’s condition.Thedata points for For many classification problems, especially in the med- which 𝛼 > 0 are called support vectors, which construct the𝑖 ical diagnosis, the overall classification accuracy is not ade- following decision function [4, 8, 13, 17, 18]: quate alone because in general not all errors have the same consequences. Wrong diagnoses can cause different cost and 𝑁 dangers depending onwhich kind ofmistakes have been done 𝑓 (𝑥) = sign(∑𝑦 𝛼 𝐾 (𝑥, 𝑥 ) + 𝑏) , (3)𝑖 𝑖 𝑖 [15].Therefore, for such situations, in addition to overall clas- 𝑖=1 sification accuracy receiver operation characteristic (ROC) analysis is usually performed [16]. where 𝑁𝑏 = −(1/2)∑ 𝑦 𝛼 (𝐾(𝑥 , 𝑥 ) + 𝐾(𝑥 , 𝑥 )), 𝑥 and 𝑖=1 𝑖 𝑖 + 𝑖 − 𝑖 + In this paper, we use LS-SVM utilizing a BDT for clas- 𝑥 are two arbitrary supporting vectors from different classes− sification of the CTG data to determine the fetal state as 𝑦 ∈ ±1 [17].𝑖 normal, suspect, or pathologic. Gaussian radial basis function is chosen as the kernel of LS-SVM, and themodel parameters, 3. Least Squares SVM (LS-SVM) which are the penalty factor and thewidth ofGaussian kernel, are optimized by using particle swarm optimization (PSO). LS-SVM is originally proposed by Suykens andVandewalle as The robustness of the proposed method LS-SVM-PSO-BDT a modification to SVM regression formulation [13]. The idea is examined with 10-fold cross-validation (10-fold CV) on the behind the modification is to transform the problem from CTG data set taken from UCI machine learning repository. a quadratic programming problem to solving a set of linear The performance of the method is evaluated in terms of equations. overall classification accuracy. Additionally, ROC analysis The optimization problem has been modified as follows: and cobweb representation are presented in order to analyze and visualize the performance of the method. 𝑁 min 1 2 1 2𝐽 (𝑤, 𝑒) = ‖𝑤‖ + 𝛾∑𝑒𝐿𝑆 𝑖 𝑤,𝑏,𝑒 (4) 2. Support Vector Machine (SVM) 2 2 𝑖=1 subject to 𝑇𝑦 (𝑤 𝜑 (𝑥 ) + 𝑏) = 1 − 𝑒 , 𝑖 = 1, . . . , 𝑁, 𝑖 𝑖 𝑖 SVM is a powerful supervised learning algorithm based on statistical learning theory that has been widely used for where 𝛾 and 𝑒 are similar to the penalty parameter 𝐶 and 𝑖 solving a wide range of data classification problems since it the slack variable 𝜉 of SVM, respectively. In (4), it can be 𝑖 was first introduced by Boser et al. [17]. SVM builds a hype- easily seen that the following twomodifications aremade; the rplane separating the data points into two different classes first one is that the inequality constraints are replaced by the with a maximummargin. equality constraints, and the second one is that the squared A given training set of𝑁 data points 𝑝(𝑥 , 𝑦 ), 𝑥 ∈ 𝑅 , and loss function is taken for 𝑒 . These modifications significantly 𝑖 𝑖 𝑖 𝑖 𝑦 ∈ ±1, where 𝑥 is a data point and 𝑦 is the corresponding simplify the problem [19]. 𝑖 𝑖 𝑖 Computational and Mathematical Methods in Medicine 3 To solve the optimization problem in (4), Lagrangian Table 1: Confusion matrix. function is defined as given below: Predicted Actual Positive Negative 𝐿 (𝑤, 𝑏, 𝑒; 𝛼) = 𝐽𝐿𝑆 𝐿𝑆 (𝑤, 𝑒) Positive TP (true positive) FP (false positive) 𝑁 Negative FN (false negative) TN (true negative) 𝑇 −∑𝛼 {𝑦 [𝑤 𝜑 (𝑥 ) + 𝑏] − 1 + 𝑒 } , 𝑖 𝑖 𝑖 𝑖 𝑖=1 (5) The particles are updated by iteratively by using the follo- wing equations: where 𝛼 are Lagrange multipliers, which can be positive 𝑖 or negative due to the equality constraints. According to 𝑘+1 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘𝑉 = 𝜔𝑉 + 𝑐 𝑟 (𝑃 − 𝜆 ) + 𝑐 𝑟 (𝐺 − 𝜆 ) 𝑖 𝑖 1 1 𝑖 𝑖 2 2 𝑖 optimality conditions, we can get (10) 𝑘+1 𝑘 𝑘+1 𝜆 = 𝜆 + 𝑉 , 𝑖 𝑖 𝑖 𝑁 𝜕𝐿 𝐿𝑠 = 0, 𝑤 = ∑𝛼 𝑦 𝜑 (𝑥 ) , 𝑖 𝑖 𝑖 where 𝜆 = [𝜆 , . . . , 𝜆 ] and 𝑉 = [𝑉 , . . . , 𝑉 ] are the 𝜕𝑤 𝑖 𝑖1 𝑖𝑀 𝑖 𝑖1 𝑖𝑀 𝑖=1 current position and the velocity of the 𝑖th particle in𝑀 dim- 𝑁 ensional space and 𝐺 = [𝐺 , . . . , 𝐺 ] and 𝑃 = [𝑃 , . . . , 𝑃 ]1 𝑀 𝑖 𝑖1 𝑖𝑀 𝜕𝐿 𝐿𝑠 = 0, ∑𝛼 𝑦 = 0, are the best position of the swarm and the best position of the 𝑖 𝑖 𝜕𝑏 𝑖=1 𝑖th particle, respectively. The value of inertia weight 𝜔 is a trade-off between global 𝜕𝐿 𝐿𝑠 = 0, 𝛼 = 𝛾𝑒 , 𝑖 = 1, . . . , 𝑁, search and local search. A bigger value of inertia weight 𝑖 𝑖 𝜕𝑒 𝑖 allows the particles to search new areas in the search space (global search) while a smaller value let the particles move 𝜕𝐿 𝐿𝑠 𝑇 = 0, 𝑦 [𝑤 𝜑 (𝑥 ) + 𝑏] − 1 + 𝑒 = 0, 𝑖 = 1, . . . , 𝑁. in the current search area for fine tuning (local search). The 𝑖 𝑖 𝑖 𝜕𝛼 𝑖 cognitive and the social learning factors 𝑐 and 𝑐 are positive (6) 1 2constants, and 𝑟 and 𝑟 are random numbers in the range 1 2 [0, 1] [20, 21]. Defining 𝑇 𝑇𝑍 = [𝜑 (𝑥 )𝑦 ; . . . ; 𝜑 (𝑥 )𝑦 ], 𝑌 = [𝑦 ; . . . ; 𝑦 ], 1 1 𝑁 𝑁 1 𝑁 𝐼 = [1; . . . ; 1], 𝑒 = [𝑒 ; . . . ; 𝑒 ], 𝛼 = [𝛼 ; . . . ; 𝛼 ] and after 1 𝑁 𝑖 𝑁 elimination of and , a linear Karush-Kuhn-Tucker system 5. Binary Decision Tree (BDT)𝑤 𝑒 is obtained as in (7) [13]: BDT architecture for classification of data sets with 𝑅 classes requires𝑅−1 classifiers.The architecture for classification of a 𝑇 0 | −𝑌 𝑏 0 (7) data set with𝑅 classes is shown in Figure 1.There is a classifier[ ] [ ] = [ ] , −1 𝑌 | Ω + 𝛾 𝐼 𝛼 𝐼 at each node in the tree to make a binary decision. where 𝑇Ω = 𝑍𝑍 and the Mercer’s condition can be applied to 6. Cross-Validation (CV) the matrixΩ: CV is amost commonly used statisticalmethod for evaluating 𝑇 Ω = 𝑦 𝑦 𝜑(𝑥 ) 𝜑 (𝑥 ) = 𝑦 𝑦 𝐾 (𝑥 , 𝑥 ) , 𝑖, 𝑗 = 1, . . . , 𝑁. and comparing the learning algorithms by separating the data 𝑖,𝑗 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 (8) set into two sets as training and testing. In CV, the training and testing sets must cross-over in successive rounds, and LS-SVM classifier takes the form as in (9) which is similar thus each data point has a chance of being validated against to SVM case as in (3) and found by solving the linear set of [22]. equations in (7): General form of CV is 𝑘-fold CV in which the data set is divided into 𝑘 groups of (almost) equal size, and 𝑘 iterations are made. In each iteration step, one of the 𝑘 groups is used 𝑁 𝑓 (𝑥) = sign(∑𝑦 𝛼 𝐾 (𝑥, 𝑥 ) + 𝑏) . (9) for testing and the remaining 𝑘 − 1 groups are used for𝑖 𝑖 𝑖 𝑖=1 training. 4. Particle Swarm Optimization (PSO) 7. ROC Analysis PSO is a swarm intelligence based optimization method pro- ROC analysis has been used a standard tool for the design, posed by Kennedy and Eberhart inspired by social behavior optimization, and evaluation of two-class classifiers [23]. In of bird flocking andfish schooling [20]. In PSO, the procedure ROC analysis with two classes, the notation, which is given begins with an initialization step in which a population in Table 1, is used for the confusion matrix [24]. (swarm) of possible solutions (particles) is chosen in the ROC analysis investigates and employs the relationship search space and then searches for optimum solution by between sensitivity and specificity of two-class classifiers updating particles over generations. while decision threshold varies [25]. Sensitivity is the true 4 Computational and Mathematical Methods in Medicine 1, 2, . . . , R Classifier 1, . . . , R/2 (R/2) + 1, . . . , R Classifier Classifier 1, 2, . . . , R/4 (R/4) + 1, . . . , R/2 (R/2) + 1, . . . , 3R/4 (3R/4) + 1, . . . , R 1, 2 R − 1, R Classifier Classifier 1 2 · · · R − 1 R Figure 1: BDT architecture for classification of data set with 𝑅 classes. positive rate while specificity is the true negative rate, and when confronted with a data point from one of the classes they are defined as TP/(TP+FN) and TN/(TN+FP), respec- the classifier classifies it as having the same chances of tively [24]. being from any of 𝑅 classes. A polygon with 2𝑅 − 𝑅 equal ROC curve represents the performance of a classifier in a sides can be formed to map the misclassification rates of two-dimensional graph, and conventionally the true positive the confusion ratio matrix. This polygon (chance polygon) rate is plotted against the false positive rate [25]. Detailed is used to compare the performance of any classifier with information about ROC analysis can be found in [23–28]. the chance classifier in terms of misclassification rates. Any The extension of ROC analysis for more than two classes polygon within the chance performance polygon shows a has been studied extensively in the literature [15, 23, 27, 29, better performance than chance performance. For a chance 30]. For 𝑅 classes, the confusion matrix is 𝑅 × 𝑅matrix such classification with three classes, the misclassification rates are that its diagonal entries contain the 𝑅 correct classifications (0.33, 0.33, 0.33, 0.33, 0.33, 0.33), and the chance polygon while its off-diagonal entries contain 2𝑅 − 𝑅 possible errors. becomes a hexagon given as in Figure 2 [30, 31]. Therefore, generating ROC curves for visualizing the perfor- mance of a classifier becomes difficult as the number of classes 9. CTG Data Set increase, for example, a six-dimensional space is required for three classes. Recently, cobweb representation is used to The CTG data set used in this study is taken from UCI visualize the performance of the classifiers in the form of Machine Learning Repository [http://archive.ics.uci.edu/ml/ multiclass version of ROC analysis [30]. datasets/Cardiotocography], (last accessed: June, 2013) andthe details can be found in [32]. This data set has 2126 data points from three classes representing the fetal state 8. Cobweb Representation as normal, suspect, or pathologic. All data points have 21 The cobweb representation is generated by using the mis- features, and these features are listed in Table 2. classification ratios of the confusion ratio matrix, which is column-normalized version of the confusion matrix. Let us 10. Proposed LS-SVM-PSO-BDT Method consider a chance classification with 𝑅 classes.The confusion The proposed LS-SVM-PSO-BDT method for fetal state ratio matrix has 2𝑅 − 𝑅 misclassification rates which are determination is described in this section. Its architecture is equal to 1/𝑅. The misclassification rates of 1/𝑅 show that given in Figure 3. Computational and Mathematical Methods in Medicine 5 Misclassification cobweb Normal, suspect, and pathologic true state → decision state Class1 → Class2 0.35 LS-SVM 1 0.3 PSO 0.25 Class1 Class3 0.2→ Class2 → Class1 0.15 Normal Suspect, pathologic 0.1 0.05 0 LS-SVM 2 PSO Class3 → Class1 Class2 → Class3 Suspect Pathologic Figure 3: The proposed method’s architecture. Class3 → Class2 Chance polygon is illustrated in (11), is chosen as the kernel function of LS- Figure 2: Misclassification cobweb for a chance classification with SVMs: three classes. 1 2 𝐾(𝑥, 𝑥 ) = exp (− (𝑥 − 𝑥 ) ) , (11) 𝑖 2𝜎2 𝑖 2 Table 2: Features used for determining the fetal state. where 𝜎 is the width of the kernel. LS-SVM parameters, the penalty factor 𝛾, and the kernel Features width 2𝜎 are optimized by using PSO. LB FHR baseline (beats per minute) Training procedure of the method is summarized as the AC Number of accelerations per second following sequential steps. FM Number of fetal movements per second Step 1. Training data points are put into the root node and UC Number of uterine contractions per second divided into two groups as PS (pathologic and suspect) and DL Number of light decelerations per second Nr (normal). DS Number of severe decelerations per second Step 2. LS-SVM 1 is trained on the data points in the root DP Number of prolonged decelerations per second node to classify the data points as PS or Nr. Meanwhile LS- ASTV Percentage of time with abnormal short term variability SVM 1 parameters are optimized by using PSO. MSTV Mean value of short term variability Step 3. LS-SVM 2 is trained on the data points in the ALTV Percentage of time with abnormal long term variability subnode PS to classify the data points as P (pathologic) or S MLTV Mean value of long term variability (suspect). Meanwhile, LS-SVM 2 parameters are optimized Width Width of FHR histogram by using PSO. Min Minimum (low frequency) of FHR histogram In the first step, the reason why we combine pathologic Max Maximum (high frequency) of FHR histogram and suspect data points in one group instead of combining N Number of histogram peaks normal and suspect data points is to minimize the risk ofmax making decisions that cause abnormalities in babies. Nzeros Number of histogram zeros Mode Histogram mode 11. Experimental Results and Discussions Mean Histogram mean Median Histogram median The proposed method LS-SVM-PSO-BDT is used for theclassification of the CTGdata set which is taken from theUCI Variance Histogram variance Machine Learning Repository. Tendency Histogram tendency In order to validate the robustness of themethod a 10-fold CV procedure is performed. The entire data set is randomly divided into ten subsets of approximately equal size while keeping the proportion of data points from different classes There are two nodes in BDT due to that the CTG data in each subset roughly the same as that in the whole data set. has three classes. A Gaussian radial basis function, which In each fold, one subset is left out for testing, and the union 6 Computational and Mathematical Methods in Medicine Table 3: Classification accuracy for each fold. Fold-1 Fold-2 Fold-3 Fold-4 Fold-5 Fold-6 Fold-7 Fold-8 Fold-9 Fold-10 89.67% 94.84% 91.08% 94.84% 92.49% 91.55% 88.27% 90.14% 92.96% 90.14% Table 4: Comparison of LS-SVM-PSO-BDT with the existing Table 5: Confusion matrix of LS-SVM-PSO-BDT. methods in similar works. Predicted ActualMaximum Normal Suspect Pathologic Method classification Number of Number of accuracy classes data points Normal 1604 70 12 Suspect 38 208 29 LS-SVM-PSO-BDT 91.62% 3 2162 Pathologic 13 17 135 SVM Krupa et al., 2011 [8] 81.50% 2 129 Total 1655 295 176 SVM Georgoulas et al., 2006 [4] 81.25% 2 80 Table 6: Confusion ratio matrix of LS-SVM-PSO-BDT. Hidden Markov models Georgoulas et al., 2004 [6] 83.00% 2 36 Predicted Actual Normal Suspect Pathologic ANBLIR system Normal 0.969 0.237 0.068 Czabanski et al., 2010 [7] 97.50% 2 685 Suspect 0.023 0.705 0.165 ANFIS Ocak and Ertunc, 2012 [9] 97.15% 2 1831 Pathologic 0.008 0.058 0.767 99.30% SVM and GA (specificity) Ocak, 2013 [10] 100% 2 1831 analyze the classification results, which is given in Table 5. (sensitivity) This table shows the number of correctly and incorrectlyclassified data points from the CTG data. In order to visualize the performance of the proposed method a cobweb representation is presented. Cobweb of the remaining nine sets is used for training.Thus, after ten representation is generated by using the misclassification folds, each subset is used once for testing purpose. The final ratios from the confusion ratio matrix, which is column- result is average result of these ten folds. normalized version of the confusion matrix. The confusion In the experiment, the parameters for LS-SVM-PSO-BDT ratio matrix of the proposed method is given in Table 6. are set as follows. Twenty-five particles are used in PSOs.The Diagonal entries of the confusion ratio matrix show the initial values of 25 particles for the penalty factor 𝛾 and the correct classification ratios while its off-diagonal entries show kernel width 2𝜎 are chosen on the intervals , 2 −4𝛾 𝜎 ∈ [2 , 212]. the misclassification ratios. From Table 6, 96.90% of normal The inertia weight, cognitive, and social learning factors data points, 70.50% of suspect data points, and 76.70% of of PSOs are chosen as 𝜔 = 0.75, 𝑐 = 2, and 𝑐 = 2. The codes pathologic data points are correctly classified as normal, 1 2 for the proposed method have been developed in MATLAB suspect, and pathologic, respectively. [33], without using any toolbox. The classification accuracies Cobweb representation of the proposed method is given for ten folds are reported in Table 3. in Figure 4. It can be seen from Figure 4 that the misclassifi- The overall classification accuracy of LS-SVM-PSO-BDT, cation ratios of LS-SVM-PSO-BDT are smaller than those of which is average accuracy of ten folds, is obtained as 91.62%. the chance classifier. There have been similar works focusing on the classifi- cation of the CTG data in the literature [4, 6–10]. It is not 12. Conclusions possible to make a direct comparison of the methods in these works with the proposedmethod because they are all used for In this work, we use LS-SVMutilizing a BDT for classification two-class task and additionally the properties of theCTGdata of the CTG data to determine the fetal state as normal, sets used in [4, 6–8] are different. But, based on the overall suspect, or pathologic. Gaussian radial basis function is classification accuracy, a comparison of the proposedmethod chosen as the kernel of LS-SVM, and the model parameters, with themethods used in abovementionedworks is provided which are the penalty factor and thewidth ofGaussian kernel, in Table 4. are optimized by using PSO.The robustness of LS-SVM-PSO- Although the number of classes and the number of data BDT is examined by running 10-fold CV. The performance points in the CTG data set used in our work are larger of the proposed method is evaluated in terms of overall than those in above mentioned works, LS-SVM-PSO-BDT classification accuracy. According to empirical results, the achieves a remarkable classification accuracy rate of 91.62%. proposed LS-SVM-PSO-BDTmethod achieves a remarkable In addition to overall classification accuracy ROC meth- overall classification accuracy rate of 91.62%. odology is used to analyze the performance of the method Additionally, ROC methodology is used to analyze the in more detail. Therefore, a confusion matrix is created to performance of the method in more detail. The correct Computational and Mathematical Methods in Medicine 7 Misclassification cobweb [4] G. Georgoulas, C. D. Stylios, and P. P. Groumpos, “Predicting true state → decision state the risk of metabolic acidosis for newborns based on fetal heart Normal → suspect rate signal classification using support vector machines,” IEEE 0.35 Transactions on Biomedical Engineering, vol. 53, no. 5, pp. 875– 0.3 884, 2006. 0.25 [5] Z. Alfirevic, D. Devane, and G. M. Gyte, “Continuous car- Normal → 0.2 Suspect → diotocography (CTG) as a form of electronic fetal monitoring pathologic 0.15 normal (EFM) for fetal assessment during labour,” Cochrane Database 0.1 of Systematic Reviews, vol. 3, Article ID CD006066, 2006. 0.05 [6] G. G. Georgoulas, C. D. Stylios, G. Nokas, and P. P. Groumpos, 0 “Classification of fetal heart rate during labour using hidden markovmodels,” in Proceedings of IEEE International Joint Con- ference on Neural Networks, pp. 2471–2475, Budapest, Hungary, Pathologic → Suspect → July 2004. normal pathologic [7] R. Czabanski, M. Jezewski, J. Wrobel, J. Jezewski, and H. Horoba, “Predicting the risk of low-fetal birth weight from car- diotocographic signals using ANBLIR system with determinis- Pathologic tic annealing and 𝜀-insensitive learning,” IEEE Transactions on→ suspect Information Technology in Biomedicine, vol. 14, no. 4, pp. 1062– 1074, 2010. Chance polygon [8] N. Krupa, M. A. MA, E. Zahedi, S. Ahmed, and F. M. Hassan, LS-SVM-PSO-BDT “Antepartum fetal heart rate feature extraction and classifica- Figure 4: Misclassification cobweb for LS-SVM-PSO-BDT. tion using empirical mode decomposition and support vector machine,” BioMedical Engineering Online, vol. 10, article 6, 2011. [9] H. Ocak and H. M. Ertunc, “Prediction of fetal state from classification and misclassification ratios of the method with the cardiotocogram recordings using adaptive neuro-fuzzy the respect to each individual class are presented. 96.90% inference systems,” Neural Computing and Applications, 2012. of normal data points, 70.50% of suspect data points, and [10] H. Ocak, “A medical decision support system based on support 76.70% of pathologic data points are correctly classified as vector machines and the genetic algorithm for the evaluation normal, suspect, and pathologic, respectively. In order to of fetal well-being,” Journal of Medical Systems, vol. 37, no. 2, p.9913, 2013. visualize the performance of the method, a cobweb rep- resentation is presented. This representation indicates that [11] B. Fei and J. Liu, “Binary tree of SVM: a new fast multiclass misclassification ratios of the proposed method are smaller training and classification algorithm,” IEEE Transactions onNeural Networks, vol. 17, no. 3, pp. 696–704, 2006. than those of the chance classifier. Empirical results show that the proposedmethod can help the obstetricians tomakemore [12] G. Madzarov, D. Gjorgjevikj, and I. Chorbev, “A multi-classSVM classifier utilizing binary decision tree,” Informatica, vol. accurate decision in determining the fetal state. 33, no. 2, pp. 233–242, 2009. [13] J. A. K. Suykens and J. Vandewalle, “Least squares support Acknowledgment vector machine classifiers,” Neural Processing Letters, vol. 9, no. 3, pp. 293–300, 1999. The authors would like to thank the UCI Repository of [14] F. Klawonn, F. Höppner, and S. May, “An alternative to ROC Machine Learning Databases for being a valuable resource: and AUC analysis of classifiers,” Advances in Intelligent Data Frank, A. & Asuncion, A. (2010). UCI Machine Learning Analysis X, Springer, vol. 7014, pp. 210–221, 2011. Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: Uni- [15] C. Ferri, J. Hernández-Orallo, andM. A. Salido, “Volume under versity of California, School of Information and Computer the ROC surface for multi-class problems,” in Proceedings of the Science. 14th European Conference on Machine Learning, pp. 108–120, Barcelona, Spain, September 2003. References [16] F. Provost and T. Fawcett, “Analysis and visualization of clas-sifier performance: comparison under imprecise class and cost [1] A. G. Floares, “Using computational intelligence to develop distributions,” in Proceedings of the 3rd International Conference intelligent clinical decision support systems,” Computational on Knowledge Discovery and Data Mining, pp. 43–48, Newport Intelligence Methods for Bioinformatics and Biostatistics, Beach, Calif, USA, 1997. Springer, vol. 6160, pp. 266–275, 2010. [17] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “Training algorithm [2] E. Yılmaz, “An expert system based on Fisher score and LS- for optimal margin classifiers,” in Proceedings of the 5th Annual SVM for cardiac arrhythmia diagnosis,” Computational and ACM Workshop on Computational Learning Theory, pp. 144– MathematicalMethods inMedicine, vol. 2013, Article ID 849674, 152, Pittsburgh, Pa, USA, July 1992. 6 pages, 2013. [18] V. N. Vapnik,TheNature of Statistical LearningTheoy, Springer, [3] A. Tsakonas, G. Dounias, J. Jantzen, H. Axer, B. Bjerregaard, New York, NY, USA, 1995. and D. G. Von Keyserlingk, “Evolving rule-based systems in [19] D. Tao, X. Li, X. Wu, W. Hu, and S. J. Maybank, “Supervised two medical domains using genetic programming,” Artificial tensor learning,”Knowledge and Information Systems, vol. 13, pp. Intelligence in Medicine, vol. 32, no. 3, pp. 195–216, 2004. 1–42, 2007. 8 Computational and Mathematical Methods in Medicine [20] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of IEEE International Conference on Neural Networks, pp. 1942–1948, Perth, Australia, December 1995. [21] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” in Proceedings of IEEE World Congress on Computational Intel- ligence, pp. 69–73, Anchorage, Alaska, USA, May 1998. [22] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-validation,” in Encyclopedia of Data Base Systems, L. Liu and M. T. Özsu, Eds., pp. 532–538, Springer, New York, NY, USA, 2009. [23] T. C. W. Landgrebe and R. P. W. Duin, “Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 5, pp. 810–822, 2008. [24] F. Provost and R. Kohavi, “Guest editors’ introduction: on app- lied research in machine learning,” Machine Learning, vol. 30, no. 2-3, pp. 127–132, 1998. [25] P. A. Flach, “ROC analysis,” in Encyclopedia of Machine Learn- ingEds, C. Sammut and G. I. Webb, Eds., pp. 869–875, Springer, New York, NY, USA, 2010. [26] C. E. Metz, “Basic principles of ROC analysis,” Seminars in Nuclear Medicine, vol. 8, no. 4, pp. 283–298, 1978. [27] T. Fawcett, “An introduction to ROC analysis,” Pattern Recogni- tion Letters, vol. 27, no. 8, pp. 861–874, 2006. [28] J. A. Swets, R. M. Dawes, and J. Monahan, “Better decisions through science,” Scientific American, vol. 283, no. 4, pp. 82–87, 2000. [29] A. Srinivasan, “Note on the location of optimal classifiers in N dimensional ROC space,” Tech. Rep. PRG-TR-2-99, Computing Laboratory, Oxford University, 1999. [30] B. Diri and S. Albayrak, “Visualization and analysis of classifiers performance in multi-class medical data,” Expert Systems with Applications, vol. 34, no. 1, pp. 628–634, 2008. [31] A. C. Patel andM.K.Markey, “Comparison of three-class classi- fication performance metrics: a case study in breast cancer CAD,” in Medical Imaging: Image Perception, Observer Perfor- mance, and Technology Assessment, pp. 581–589, San Diego, Calif, USA, February 2005. [32] D. Ayres-de-Campos, J. Bernardes A, Garrido, J. Marques-de- Sa, and L. Pereira-Leite, “SisPorto 2.0: a program for automated analysis of cardiotocograms,” Journal of Maternal-Fetal and Neonatal Medicine, vol. 9, no. 5, pp. 311–318, 2000. [33] MATLAB Version 7.13.0, The MathWorks, Natick, Mass, USA, 2011. MEDIATORS of INFLAMMATION The Scientific Gastroenterology Journal of World Journal Research and Practice Diabetes Research Disease Markers Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Journal of International Journal of Immunology Research Endocrinology Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Submit your manuscripts at http://www.hindawi.com BioMed PPAR Research Research International Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Journal of Obesity Evidence-Based Journal of Stem Cells Complementary and Journal of Ophthalmology International Alternative Medicine Oncology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Parkinson’s Disease Computational and Mathematical Methods Behavioural AIDS Oxidative Medicine and in Medicine Neurology Research and Treatment Cellular Longevity Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014