Hindawi Publishing Corporation
Computational and Mathematical Methods in Medicine
Volume 2013, Article ID 487179, 8 pages
http://dx.doi.org/10.1155/2013/487179
Research Article
Determination of Fetal State from Cardiotocogram
Using LS-SVM with Particle Swarm Optimization and
Binary Decision Tree
Ersen YJlmaz and ÇaLlar KJlJkçJer
Electrical-Electronic Engineering Department, Uludag University, 16059 Gorukle, Bursa, Turkey
Correspondence should be addressed to Ersen Yılmaz; ersen@uludag.edu.tr
Received 26 June 2013; Accepted 6 September 2013
Academic Editor: Damien R. Hall
Copyright © 2013 E. Yılmaz and Ç. Kılıkçıer. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
We use least squares support vector machine (LS-SVM) utilizing a binary decision tree for classification of cardiotocogram to
determine the fetal state. The parameters of LS-SVM are optimized by particle swarm optimization. The robustness of the method
is examined by running 10-fold cross-validation. The performance of the method is evaluated in terms of overall classification
accuracy. Additionally, receiver operation characteristic analysis and cobweb representation are presented in order to analyze
and visualize the performance of the method. Experimental results demonstrate that the proposed method achieves a remarkable
classification accuracy rate of 91.62%.
1. Introduction depends on the knowledge and clinical experience of obste-
tricians.
There is a growing tendency to use clinical decision support A clinical decision support system eliminates the incon-
systems in medical diagnosis.These systems help to optimize sistency of visual evaluation.There have been proposed seve-
medical decisions, improve medical treatments, and reduce ral classification tools for developing such system [4, 6–10].
financial costs [1, 2]. A large number of the medical diagnosis One of these tools is support vector machine (SVM)
procedures can be converted into intelligent data classifica- and it is used in [4, 8, 10]. In [4, 8], SVM is used for FHR
tion tasks. These classification tasks can be categorized as signal classification with two classes, normal or at risk. The
two-class task andmulticlass task.The first type separates the risk of metabolic acidosis for newborn based on FHR signal
data between only two classes while the second type involves is predicted in [4] while the classification of antepartum
the classification of the data with more than two classes FHR signal is made in [8]. In [10], a medical decision supp-
[3]. ort system based on SVM and genetic algorithm (GA) is pre-
Cardiotocographywas introduced into obstetrics practice sented for the evaluation of fetal well-being from the CTG
in the early 1970s, and since then it has been used as a world- recordings as normal or pathologic.
wide method for antepartum (before delivery) and intra- In [6], an approach based on hidden Markov models
partum (during delivery) fetal monitoring. Cardiotocogram (HMM) is presented for automatic classification of FHR
(CTG) is a recording of two distinct signals, fetal heart rate signal belonging to hypoxic and normal newborns. In [7],
(FHR), and uterine activity (UA) [4]. It is used for deter- an ANBLIR (Artificial Neural Network Based on Logical
mining the fetal state during both pregnancy and delivery. Interpretation of fuzzy if-then Rules) system is used to eva-
The aim of the CTG monitoring is to determine babies who luate the risk of low-fetal birth weight as normal or abnormal
may be short of oxygen (hypoxic); thus further assessments using CTG signals recorded during the pregnancy.
of fetal condition may be performed or the baby might be In [9], an adaptive neurofuzzy inference system (ANFIS)
delivered by caesarean section or natural birth [5]. The visual is proposed for the prediction of fetal state from the CTG
evaluation of the CTG not only requires time but also recordings as normal or pathologic.
2 Computational and Mathematical Methods in Medicine
Support vector machines (SVM) is developed for two- class label; SVM requires the minimization of the following
class task, but classification problems generally requiremulti- primal optimization problem:
class task. There are several methods proposed in the litera-
ture based on binary decision tree (BDT) to extend the binary 𝑁
1
SVMs to multi-class problems, for example, [11, 12]. min 2𝐽 (𝑤, 𝜉) = ‖𝑤‖ + 𝐶∑𝜉𝑖
𝑤,𝑏,𝜉 2 (1)
LS-SVM is a modified version of SVM in a least square 𝑖=1
sense [13]. The higher computational load of SVM is over- subject to
𝑇
𝑦 (𝑤 𝜑 (𝑥 ) + 𝑏) ≥ 1 − 𝜉 , 𝑖 = 1, . . . , 𝑁,
𝑖 𝑖 𝑖
come by LS-SVM because LS-SVM solves the problem using
a set of linear equations while SVM solves as a quadratic pro- where 𝑤 is the normal vector to hyperplane, 𝑏 is the bias or
gramming problem. offset scalar, 𝜉 are the slack parameters which are used to𝑖
The choice of appropriate kernel function and the model allow softmargins,𝐶 is the penalty parameter which controls
parameters (including kernel parameters) is crucial for SVM- the trade-off between minimizing the error and maximizing
based methods, and this influences directly the classification the margin, and 𝜑(𝑥 ) is a nonlinear mapping from the input𝑖
performance.Themost common kernel functions used in the space to the higher dimensional feature space [4, 8, 13, 17, 18].
literature are polynomial, Gaussian radial basis, exponential The corresponding dual problem of (1) is given by
radial basis, and sigmoid.
Performance evaluation of classifiers is a fundamental 𝑁 𝑁 𝑁1
step for determining the best classifier or the best set of para- max 𝐽 (𝛼) = ∑𝛼 − ∑∑𝛼 𝛼 𝑦 𝑦 𝐾(𝑥 , 𝑥 )𝑖 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗𝛼 2
𝑖=1 𝑖=1𝑗=1
meters for a classifier [14]. In general, the overall classification (2)𝑁
accuracy is a natural way to measure the performance of the subject to ∑𝛼 𝑦 = 0, 0 ≤ 𝛼 ≤ 𝐶, ∀ ,
𝑖 𝑖 𝑖 𝑖
classifiers. The classifier predicts the class for each data point 𝑖=1
in the data set; if the prediction is correct it is counted as a
success and if it is wrong it is counted as an error. The overall where 𝛼 are Lagrange multipliers, the term 𝐾(𝑥 , 𝑥 ) is a𝑖 𝑖 𝑗
classification accuracy is computed as the ratio of the number kernel function representing the inner product of two vectors
of successes over the number of the whole data points to be in the feature space, that is, 𝑇𝜑 (𝑥 )𝜑(𝑥 ). Kernel functionmust
𝑖 𝑗
classified. satisfy thewell-knownMercer’s condition.Thedata points for
For many classification problems, especially in the med- which 𝛼 > 0 are called support vectors, which construct the𝑖
ical diagnosis, the overall classification accuracy is not ade- following decision function [4, 8, 13, 17, 18]:
quate alone because in general not all errors have the same
consequences. Wrong diagnoses can cause different cost and 𝑁
dangers depending onwhich kind ofmistakes have been done 𝑓 (𝑥) = sign(∑𝑦 𝛼 𝐾 (𝑥, 𝑥 ) + 𝑏) , (3)𝑖 𝑖 𝑖
[15].Therefore, for such situations, in addition to overall clas- 𝑖=1
sification accuracy receiver operation characteristic (ROC)
analysis is usually performed [16]. where 𝑁𝑏 = −(1/2)∑ 𝑦 𝛼 (𝐾(𝑥 , 𝑥 ) + 𝐾(𝑥 , 𝑥 )), 𝑥 and
𝑖=1 𝑖 𝑖 + 𝑖 − 𝑖 +
In this paper, we use LS-SVM utilizing a BDT for clas- 𝑥 are two arbitrary supporting vectors from different classes−
sification of the CTG data to determine the fetal state as 𝑦 ∈ ±1 [17].𝑖
normal, suspect, or pathologic. Gaussian radial basis function
is chosen as the kernel of LS-SVM, and themodel parameters, 3. Least Squares SVM (LS-SVM)
which are the penalty factor and thewidth ofGaussian kernel,
are optimized by using particle swarm optimization (PSO). LS-SVM is originally proposed by Suykens andVandewalle as
The robustness of the proposed method LS-SVM-PSO-BDT a modification to SVM regression formulation [13]. The idea
is examined with 10-fold cross-validation (10-fold CV) on the behind the modification is to transform the problem from
CTG data set taken from UCI machine learning repository. a quadratic programming problem to solving a set of linear
The performance of the method is evaluated in terms of equations.
overall classification accuracy. Additionally, ROC analysis The optimization problem has been modified as follows:
and cobweb representation are presented in order to analyze
and visualize the performance of the method. 𝑁
min 1 2 1 2𝐽 (𝑤, 𝑒) = ‖𝑤‖ + 𝛾∑𝑒𝐿𝑆 𝑖
𝑤,𝑏,𝑒 (4)
2. Support Vector Machine (SVM) 2 2 𝑖=1
subject to 𝑇𝑦 (𝑤 𝜑 (𝑥 ) + 𝑏) = 1 − 𝑒 , 𝑖 = 1, . . . , 𝑁,
𝑖 𝑖 𝑖
SVM is a powerful supervised learning algorithm based on
statistical learning theory that has been widely used for where 𝛾 and 𝑒 are similar to the penalty parameter 𝐶 and
𝑖
solving a wide range of data classification problems since it the slack variable 𝜉 of SVM, respectively. In (4), it can be
𝑖
was first introduced by Boser et al. [17]. SVM builds a hype- easily seen that the following twomodifications aremade; the
rplane separating the data points into two different classes first one is that the inequality constraints are replaced by the
with a maximummargin. equality constraints, and the second one is that the squared
A given training set of𝑁 data points 𝑝(𝑥 , 𝑦 ), 𝑥 ∈ 𝑅 , and loss function is taken for 𝑒 . These modifications significantly
𝑖 𝑖 𝑖 𝑖
𝑦 ∈ ±1, where 𝑥 is a data point and 𝑦 is the corresponding simplify the problem [19].
𝑖 𝑖 𝑖
Computational and Mathematical Methods in Medicine 3
To solve the optimization problem in (4), Lagrangian Table 1: Confusion matrix.
function is defined as given below:
Predicted Actual
Positive Negative
𝐿 (𝑤, 𝑏, 𝑒; 𝛼) = 𝐽𝐿𝑆 𝐿𝑆 (𝑤, 𝑒) Positive TP (true positive) FP (false positive)
𝑁 Negative FN (false negative) TN (true negative)
𝑇
−∑𝛼 {𝑦 [𝑤 𝜑 (𝑥 ) + 𝑏] − 1 + 𝑒 } ,
𝑖 𝑖 𝑖 𝑖
𝑖=1
(5) The particles are updated by iteratively by using the follo-
wing equations:
where 𝛼 are Lagrange multipliers, which can be positive
𝑖
or negative due to the equality constraints. According to 𝑘+1 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘𝑉 = 𝜔𝑉 + 𝑐 𝑟 (𝑃 − 𝜆 ) + 𝑐 𝑟 (𝐺 − 𝜆 )
𝑖 𝑖 1 1 𝑖 𝑖 2 2 𝑖
optimality conditions, we can get (10)
𝑘+1 𝑘 𝑘+1
𝜆 = 𝜆 + 𝑉 ,
𝑖 𝑖 𝑖
𝑁
𝜕𝐿
𝐿𝑠
= 0, 𝑤 = ∑𝛼 𝑦 𝜑 (𝑥 ) ,
𝑖 𝑖 𝑖 where 𝜆 = [𝜆 , . . . , 𝜆 ] and 𝑉 = [𝑉 , . . . , 𝑉 ] are the
𝜕𝑤 𝑖 𝑖1 𝑖𝑀 𝑖 𝑖1 𝑖𝑀
𝑖=1 current position and the velocity of the 𝑖th particle in𝑀 dim-
𝑁 ensional space and 𝐺 = [𝐺 , . . . , 𝐺 ] and 𝑃 = [𝑃 , . . . , 𝑃 ]1 𝑀 𝑖 𝑖1 𝑖𝑀
𝜕𝐿
𝐿𝑠
= 0, ∑𝛼 𝑦 = 0, are the best position of the swarm and the best position of the
𝑖 𝑖
𝜕𝑏
𝑖=1 𝑖th particle, respectively.
The value of inertia weight 𝜔 is a trade-off between global
𝜕𝐿
𝐿𝑠
= 0, 𝛼 = 𝛾𝑒 , 𝑖 = 1, . . . , 𝑁, search and local search. A bigger value of inertia weight
𝑖 𝑖
𝜕𝑒
𝑖 allows the particles to search new areas in the search space
(global search) while a smaller value let the particles move
𝜕𝐿
𝐿𝑠 𝑇
= 0, 𝑦 [𝑤 𝜑 (𝑥 ) + 𝑏] − 1 + 𝑒 = 0, 𝑖 = 1, . . . , 𝑁. in the current search area for fine tuning (local search). The
𝑖 𝑖 𝑖
𝜕𝛼
𝑖 cognitive and the social learning factors 𝑐 and 𝑐 are positive
(6) 1 2constants, and 𝑟 and 𝑟 are random numbers in the range
1 2
[0, 1] [20, 21].
Defining 𝑇 𝑇𝑍 = [𝜑 (𝑥 )𝑦 ; . . . ; 𝜑 (𝑥 )𝑦 ], 𝑌 = [𝑦 ; . . . ; 𝑦 ],
1 1 𝑁 𝑁 1 𝑁
𝐼 = [1; . . . ; 1], 𝑒 = [𝑒 ; . . . ; 𝑒 ], 𝛼 = [𝛼 ; . . . ; 𝛼 ] and after
1 𝑁 𝑖 𝑁
elimination of and , a linear Karush-Kuhn-Tucker system 5. Binary Decision Tree (BDT)𝑤 𝑒
is obtained as in (7) [13]: BDT architecture for classification of data sets with 𝑅 classes
requires𝑅−1 classifiers.The architecture for classification of a
𝑇
0 | −𝑌 𝑏 0 (7) data set with𝑅 classes is shown in Figure 1.There is a classifier[ ] [ ] = [ ] ,
−1
𝑌 | Ω + 𝛾 𝐼 𝛼 𝐼 at each node in the tree to make a binary decision.
where 𝑇Ω = 𝑍𝑍 and the Mercer’s condition can be applied to 6. Cross-Validation (CV)
the matrixΩ:
CV is amost commonly used statisticalmethod for evaluating
𝑇
Ω = 𝑦 𝑦 𝜑(𝑥 ) 𝜑 (𝑥 ) = 𝑦 𝑦 𝐾 (𝑥 , 𝑥 ) , 𝑖, 𝑗 = 1, . . . , 𝑁. and comparing the learning algorithms by separating the data
𝑖,𝑗 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗
(8) set into two sets as training and testing. In CV, the training
and testing sets must cross-over in successive rounds, and
LS-SVM classifier takes the form as in (9) which is similar thus each data point has a chance of being validated against
to SVM case as in (3) and found by solving the linear set of [22].
equations in (7): General form of CV is 𝑘-fold CV in which the data set is
divided into 𝑘 groups of (almost) equal size, and 𝑘 iterations
are made. In each iteration step, one of the 𝑘 groups is used
𝑁
𝑓 (𝑥) = sign(∑𝑦 𝛼 𝐾 (𝑥, 𝑥 ) + 𝑏) . (9) for testing and the remaining 𝑘 − 1 groups are used for𝑖 𝑖 𝑖
𝑖=1 training.
4. Particle Swarm Optimization (PSO) 7. ROC Analysis
PSO is a swarm intelligence based optimization method pro- ROC analysis has been used a standard tool for the design,
posed by Kennedy and Eberhart inspired by social behavior optimization, and evaluation of two-class classifiers [23]. In
of bird flocking andfish schooling [20]. In PSO, the procedure ROC analysis with two classes, the notation, which is given
begins with an initialization step in which a population in Table 1, is used for the confusion matrix [24].
(swarm) of possible solutions (particles) is chosen in the ROC analysis investigates and employs the relationship
search space and then searches for optimum solution by between sensitivity and specificity of two-class classifiers
updating particles over generations. while decision threshold varies [25]. Sensitivity is the true
4 Computational and Mathematical Methods in Medicine
1, 2, . . . , R
Classifier 
1, . . . , R/2 (R/2) + 1, . . . , R
Classifier Classifier 
1, 2, . . . , R/4 (R/4) + 1, . . . , R/2 (R/2) + 1, . . . , 3R/4 (3R/4) + 1, . . . , R
1, 2 R − 1, R
Classifier Classifier 
1 2 · · · R − 1 R
Figure 1: BDT architecture for classification of data set with 𝑅 classes.
positive rate while specificity is the true negative rate, and when confronted with a data point from one of the classes
they are defined as TP/(TP+FN) and TN/(TN+FP), respec- the classifier classifies it as having the same chances of
tively [24]. being from any of 𝑅 classes. A polygon with 2𝑅 − 𝑅 equal
ROC curve represents the performance of a classifier in a sides can be formed to map the misclassification rates of
two-dimensional graph, and conventionally the true positive the confusion ratio matrix. This polygon (chance polygon)
rate is plotted against the false positive rate [25]. Detailed is used to compare the performance of any classifier with
information about ROC analysis can be found in [23–28]. the chance classifier in terms of misclassification rates. Any
The extension of ROC analysis for more than two classes polygon within the chance performance polygon shows a
has been studied extensively in the literature [15, 23, 27, 29, better performance than chance performance. For a chance
30]. For 𝑅 classes, the confusion matrix is 𝑅 × 𝑅matrix such classification with three classes, the misclassification rates are
that its diagonal entries contain the 𝑅 correct classifications (0.33, 0.33, 0.33, 0.33, 0.33, 0.33), and the chance polygon
while its off-diagonal entries contain 2𝑅 − 𝑅 possible errors. becomes a hexagon given as in Figure 2 [30, 31].
Therefore, generating ROC curves for visualizing the perfor-
mance of a classifier becomes difficult as the number of classes 9. CTG Data Set
increase, for example, a six-dimensional space is required
for three classes. Recently, cobweb representation is used to The CTG data set used in this study is taken from UCI
visualize the performance of the classifiers in the form of Machine Learning Repository [http://archive.ics.uci.edu/ml/
multiclass version of ROC analysis [30]. datasets/Cardiotocography], (last accessed: June, 2013) andthe details can be found in [32]. This data set has 2126
data points from three classes representing the fetal state
8. Cobweb Representation as normal, suspect, or pathologic. All data points have 21
The cobweb representation is generated by using the mis- features, and these features are listed in Table 2.
classification ratios of the confusion ratio matrix, which is
column-normalized version of the confusion matrix. Let us 10. Proposed LS-SVM-PSO-BDT Method
consider a chance classification with 𝑅 classes.The confusion The proposed LS-SVM-PSO-BDT method for fetal state
ratio matrix has 2𝑅 − 𝑅 misclassification rates which are determination is described in this section. Its architecture is
equal to 1/𝑅. The misclassification rates of 1/𝑅 show that given in Figure 3.
Computational and Mathematical Methods in Medicine 5
Misclassification cobweb Normal, suspect, and pathologic
true state → decision state
Class1 → Class2
0.35 LS-SVM 1
0.3 PSO 
0.25
Class1 Class3 0.2→ Class2 → Class1
0.15 Normal Suspect, pathologic 
0.1
0.05
0
LS-SVM 2 PSO 
Class3 → Class1 Class2 → Class3
Suspect Pathologic 
Figure 3: The proposed method’s architecture.
Class3 → Class2
Chance polygon is illustrated in (11), is chosen as the kernel function of LS-
Figure 2: Misclassification cobweb for a chance classification with SVMs:
three classes.
1 2
𝐾(𝑥, 𝑥 ) = exp (− (𝑥 − 𝑥 ) ) , (11)
𝑖
2𝜎2
𝑖
2
Table 2: Features used for determining the fetal state. where 𝜎 is the width of the kernel.
LS-SVM parameters, the penalty factor 𝛾, and the kernel
Features width 2𝜎 are optimized by using PSO.
LB FHR baseline (beats per minute) Training procedure of the method is summarized as the
AC Number of accelerations per second following sequential steps.
FM Number of fetal movements per second Step 1. Training data points are put into the root node and
UC Number of uterine contractions per second divided into two groups as PS (pathologic and suspect) and
DL Number of light decelerations per second Nr (normal).
DS Number of severe decelerations per second Step 2. LS-SVM 1 is trained on the data points in the root
DP Number of prolonged decelerations per second node to classify the data points as PS or Nr. Meanwhile LS-
ASTV Percentage of time with abnormal short term variability SVM 1 parameters are optimized by using PSO.
MSTV Mean value of short term variability Step 3. LS-SVM 2 is trained on the data points in the
ALTV Percentage of time with abnormal long term variability subnode PS to classify the data points as P (pathologic) or S
MLTV Mean value of long term variability (suspect). Meanwhile, LS-SVM 2 parameters are optimized
Width Width of FHR histogram
by using PSO.
Min Minimum (low frequency) of FHR histogram In the first step, the reason why we combine pathologic
Max Maximum (high frequency) of FHR histogram and suspect data points in one group instead of combining
N Number of histogram peaks normal and suspect data points is to minimize the risk ofmax making decisions that cause abnormalities in babies.
Nzeros Number of histogram zeros
Mode Histogram mode 11. Experimental Results and Discussions
Mean Histogram mean
Median Histogram median The proposed method LS-SVM-PSO-BDT is used for theclassification of the CTGdata set which is taken from theUCI
Variance Histogram variance Machine Learning Repository.
Tendency Histogram tendency In order to validate the robustness of themethod a 10-fold
CV procedure is performed. The entire data set is randomly
divided into ten subsets of approximately equal size while
keeping the proportion of data points from different classes
There are two nodes in BDT due to that the CTG data in each subset roughly the same as that in the whole data set.
has three classes. A Gaussian radial basis function, which In each fold, one subset is left out for testing, and the union
6 Computational and Mathematical Methods in Medicine
Table 3: Classification accuracy for each fold.
Fold-1 Fold-2 Fold-3 Fold-4 Fold-5 Fold-6 Fold-7 Fold-8 Fold-9 Fold-10
89.67% 94.84% 91.08% 94.84% 92.49% 91.55% 88.27% 90.14% 92.96% 90.14%
Table 4: Comparison of LS-SVM-PSO-BDT with the existing Table 5: Confusion matrix of LS-SVM-PSO-BDT.
methods in similar works.
Predicted ActualMaximum Normal Suspect Pathologic
Method classification Number of Number of
accuracy classes data points Normal 1604 70 12
Suspect 38 208 29
LS-SVM-PSO-BDT 91.62% 3 2162 Pathologic 13 17 135
SVM
Krupa et al., 2011 [8] 81.50% 2 129 Total 1655 295 176
SVM
Georgoulas et al., 2006 [4] 81.25% 2 80 Table 6: Confusion ratio matrix of LS-SVM-PSO-BDT.
Hidden Markov models
Georgoulas et al., 2004 [6] 83.00% 2 36 Predicted
Actual
Normal Suspect Pathologic
ANBLIR system Normal 0.969 0.237 0.068
Czabanski et al., 2010 [7] 97.50% 2 685 Suspect 0.023 0.705 0.165
ANFIS
Ocak and Ertunc, 2012 [9] 97.15% 2 1831 Pathologic 0.008 0.058 0.767
99.30%
SVM and GA (specificity)
Ocak, 2013 [10] 100% 2 1831 analyze the classification results, which is given in Table 5.
(sensitivity) This table shows the number of correctly and incorrectlyclassified data points from the CTG data.
In order to visualize the performance of the proposed
method a cobweb representation is presented. Cobweb
of the remaining nine sets is used for training.Thus, after ten representation is generated by using the misclassification
folds, each subset is used once for testing purpose. The final ratios from the confusion ratio matrix, which is column-
result is average result of these ten folds. normalized version of the confusion matrix. The confusion
In the experiment, the parameters for LS-SVM-PSO-BDT ratio matrix of the proposed method is given in Table 6.
are set as follows. Twenty-five particles are used in PSOs.The Diagonal entries of the confusion ratio matrix show the
initial values of 25 particles for the penalty factor 𝛾 and the correct classification ratios while its off-diagonal entries show
kernel width 2𝜎 are chosen on the intervals , 2 −4𝛾 𝜎 ∈ [2 , 212]. the misclassification ratios. From Table 6, 96.90% of normal
The inertia weight, cognitive, and social learning factors data points, 70.50% of suspect data points, and 76.70% of
of PSOs are chosen as 𝜔 = 0.75, 𝑐 = 2, and 𝑐 = 2. The codes pathologic data points are correctly classified as normal,
1 2
for the proposed method have been developed in MATLAB suspect, and pathologic, respectively.
[33], without using any toolbox. The classification accuracies Cobweb representation of the proposed method is given
for ten folds are reported in Table 3. in Figure 4. It can be seen from Figure 4 that the misclassifi-
The overall classification accuracy of LS-SVM-PSO-BDT, cation ratios of LS-SVM-PSO-BDT are smaller than those of
which is average accuracy of ten folds, is obtained as 91.62%. the chance classifier.
There have been similar works focusing on the classifi-
cation of the CTG data in the literature [4, 6–10]. It is not 12. Conclusions
possible to make a direct comparison of the methods in these
works with the proposedmethod because they are all used for In this work, we use LS-SVMutilizing a BDT for classification
two-class task and additionally the properties of theCTGdata of the CTG data to determine the fetal state as normal,
sets used in [4, 6–8] are different. But, based on the overall suspect, or pathologic. Gaussian radial basis function is
classification accuracy, a comparison of the proposedmethod chosen as the kernel of LS-SVM, and the model parameters,
with themethods used in abovementionedworks is provided which are the penalty factor and thewidth ofGaussian kernel,
in Table 4. are optimized by using PSO.The robustness of LS-SVM-PSO-
Although the number of classes and the number of data BDT is examined by running 10-fold CV. The performance
points in the CTG data set used in our work are larger of the proposed method is evaluated in terms of overall
than those in above mentioned works, LS-SVM-PSO-BDT classification accuracy. According to empirical results, the
achieves a remarkable classification accuracy rate of 91.62%. proposed LS-SVM-PSO-BDTmethod achieves a remarkable
In addition to overall classification accuracy ROC meth- overall classification accuracy rate of 91.62%.
odology is used to analyze the performance of the method Additionally, ROC methodology is used to analyze the
in more detail. Therefore, a confusion matrix is created to performance of the method in more detail. The correct
Computational and Mathematical Methods in Medicine 7
Misclassification cobweb [4] G. Georgoulas, C. D. Stylios, and P. P. Groumpos, “Predicting
true state → decision state the risk of metabolic acidosis for newborns based on fetal heart
Normal → suspect rate signal classification using support vector machines,” IEEE
0.35 Transactions on Biomedical Engineering, vol. 53, no. 5, pp. 875–
0.3 884, 2006.
0.25 [5] Z. Alfirevic, D. Devane, and G. M. Gyte, “Continuous car-
Normal → 0.2 Suspect → diotocography (CTG) as a form of electronic fetal monitoring
pathologic 0.15 normal (EFM) for fetal assessment during labour,” Cochrane Database
0.1 of Systematic Reviews, vol. 3, Article ID CD006066, 2006.
0.05 [6] G. G. Georgoulas, C. D. Stylios, G. Nokas, and P. P. Groumpos,
0 “Classification of fetal heart rate during labour using hidden
markovmodels,” in Proceedings of IEEE International Joint Con-
ference on Neural Networks, pp. 2471–2475, Budapest, Hungary,
Pathologic → Suspect → July 2004.
normal pathologic [7] R. Czabanski, M. Jezewski, J. Wrobel, J. Jezewski, and H.
Horoba, “Predicting the risk of low-fetal birth weight from car-
diotocographic signals using ANBLIR system with determinis-
Pathologic tic annealing and 𝜀-insensitive learning,” IEEE Transactions on→
suspect Information Technology in Biomedicine, vol. 14, no. 4, pp. 1062–
1074, 2010.
Chance polygon [8] N. Krupa, M. A. MA, E. Zahedi, S. Ahmed, and F. M. Hassan,
LS-SVM-PSO-BDT “Antepartum fetal heart rate feature extraction and classifica-
Figure 4: Misclassification cobweb for LS-SVM-PSO-BDT. tion using empirical mode decomposition and support vector
machine,” BioMedical Engineering Online, vol. 10, article 6, 2011.
[9] H. Ocak and H. M. Ertunc, “Prediction of fetal state from
classification and misclassification ratios of the method with the cardiotocogram recordings using adaptive neuro-fuzzy
the respect to each individual class are presented. 96.90% inference systems,” Neural Computing and Applications, 2012.
of normal data points, 70.50% of suspect data points, and [10] H. Ocak, “A medical decision support system based on support
76.70% of pathologic data points are correctly classified as vector machines and the genetic algorithm for the evaluation
normal, suspect, and pathologic, respectively. In order to of fetal well-being,” Journal of Medical Systems, vol. 37, no. 2, p.9913, 2013.
visualize the performance of the method, a cobweb rep-
resentation is presented. This representation indicates that [11] B. Fei and J. Liu, “Binary tree of SVM: a new fast multiclass
misclassification ratios of the proposed method are smaller training and classification algorithm,” IEEE Transactions onNeural Networks, vol. 17, no. 3, pp. 696–704, 2006.
than those of the chance classifier. Empirical results show that
the proposedmethod can help the obstetricians tomakemore [12] G. Madzarov, D. Gjorgjevikj, and I. Chorbev, “A multi-classSVM classifier utilizing binary decision tree,” Informatica, vol.
accurate decision in determining the fetal state. 33, no. 2, pp. 233–242, 2009.
[13] J. A. K. Suykens and J. Vandewalle, “Least squares support
Acknowledgment vector machine classifiers,” Neural Processing Letters, vol. 9, no.
3, pp. 293–300, 1999.
The authors would like to thank the UCI Repository of [14] F. Klawonn, F. Höppner, and S. May, “An alternative to ROC
Machine Learning Databases for being a valuable resource: and AUC analysis of classifiers,” Advances in Intelligent Data
Frank, A. & Asuncion, A. (2010). UCI Machine Learning Analysis X, Springer, vol. 7014, pp. 210–221, 2011.
Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: Uni- [15] C. Ferri, J. Hernández-Orallo, andM. A. Salido, “Volume under
versity of California, School of Information and Computer the ROC surface for multi-class problems,” in Proceedings of the
Science. 14th European Conference on Machine Learning, pp. 108–120,
Barcelona, Spain, September 2003.
References [16] F. Provost and T. Fawcett, “Analysis and visualization of clas-sifier performance: comparison under imprecise class and cost
[1] A. G. Floares, “Using computational intelligence to develop distributions,” in Proceedings of the 3rd International Conference
intelligent clinical decision support systems,” Computational on Knowledge Discovery and Data Mining, pp. 43–48, Newport
Intelligence Methods for Bioinformatics and Biostatistics, Beach, Calif, USA, 1997.
Springer, vol. 6160, pp. 266–275, 2010. [17] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “Training algorithm
[2] E. Yılmaz, “An expert system based on Fisher score and LS- for optimal margin classifiers,” in Proceedings of the 5th Annual
SVM for cardiac arrhythmia diagnosis,” Computational and ACM Workshop on Computational Learning Theory, pp. 144–
MathematicalMethods inMedicine, vol. 2013, Article ID 849674, 152, Pittsburgh, Pa, USA, July 1992.
6 pages, 2013. [18] V. N. Vapnik,TheNature of Statistical LearningTheoy, Springer,
[3] A. Tsakonas, G. Dounias, J. Jantzen, H. Axer, B. Bjerregaard, New York, NY, USA, 1995.
and D. G. Von Keyserlingk, “Evolving rule-based systems in [19] D. Tao, X. Li, X. Wu, W. Hu, and S. J. Maybank, “Supervised
two medical domains using genetic programming,” Artificial tensor learning,”Knowledge and Information Systems, vol. 13, pp.
Intelligence in Medicine, vol. 32, no. 3, pp. 195–216, 2004. 1–42, 2007.
8 Computational and Mathematical Methods in Medicine
[20] J. Kennedy and R. Eberhart, “Particle swarm optimization,”
in Proceedings of IEEE International Conference on Neural
Networks, pp. 1942–1948, Perth, Australia, December 1995.
[21] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,”
in Proceedings of IEEE World Congress on Computational Intel-
ligence, pp. 69–73, Anchorage, Alaska, USA, May 1998.
[22] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-validation,” in
Encyclopedia of Data Base Systems, L. Liu and M. T. Özsu, Eds.,
pp. 532–538, Springer, New York, NY, USA, 2009.
[23] T. C. W. Landgrebe and R. P. W. Duin, “Efficient multiclass
ROC approximation by decomposition via confusion matrix
perturbation analysis,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 30, no. 5, pp. 810–822, 2008.
[24] F. Provost and R. Kohavi, “Guest editors’ introduction: on app-
lied research in machine learning,” Machine Learning, vol. 30,
no. 2-3, pp. 127–132, 1998.
[25] P. A. Flach, “ROC analysis,” in Encyclopedia of Machine Learn-
ingEds, C. Sammut and G. I. Webb, Eds., pp. 869–875, Springer,
New York, NY, USA, 2010.
[26] C. E. Metz, “Basic principles of ROC analysis,” Seminars in
Nuclear Medicine, vol. 8, no. 4, pp. 283–298, 1978.
[27] T. Fawcett, “An introduction to ROC analysis,” Pattern Recogni-
tion Letters, vol. 27, no. 8, pp. 861–874, 2006.
[28] J. A. Swets, R. M. Dawes, and J. Monahan, “Better decisions
through science,” Scientific American, vol. 283, no. 4, pp. 82–87,
2000.
[29] A. Srinivasan, “Note on the location of optimal classifiers in N
dimensional ROC space,” Tech. Rep. PRG-TR-2-99, Computing
Laboratory, Oxford University, 1999.
[30] B. Diri and S. Albayrak, “Visualization and analysis of classifiers
performance in multi-class medical data,” Expert Systems with
Applications, vol. 34, no. 1, pp. 628–634, 2008.
[31] A. C. Patel andM.K.Markey, “Comparison of three-class classi-
fication performance metrics: a case study in breast cancer
CAD,” in Medical Imaging: Image Perception, Observer Perfor-
mance, and Technology Assessment, pp. 581–589, San Diego,
Calif, USA, February 2005.
[32] D. Ayres-de-Campos, J. Bernardes A, Garrido, J. Marques-de-
Sa, and L. Pereira-Leite, “SisPorto 2.0: a program for automated
analysis of cardiotocograms,” Journal of Maternal-Fetal and
Neonatal Medicine, vol. 9, no. 5, pp. 311–318, 2000.
[33] MATLAB Version 7.13.0, The MathWorks, Natick, Mass, USA,
2011.
MEDIATORS
of
INFLAMMATION
The Scientific Gastroenterology Journal of
World Journal Research and Practice Diabetes Research Disease Markers
Hindawi Publishing Corporation
Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Journal of International Journal of
Immunology Research Endocrinology
Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Submit your manuscripts at
http://www.hindawi.com
BioMed 
PPAR Research Research International
Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Journal of
Obesity
Evidence-Based 
Journal of Stem Cells Complementary and Journal of
Ophthalmology International Alternative Medicine Oncology
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Parkinson’s 
Disease
 Computational and  
Mathematical Methods Behavioural AIDS Oxidative Medicine and 
in Medicine Neurology Research and Treatment Cellular Longevity
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014