Turkish Journal of Electrical Engineering and Computer Sciences Volume 24 Number 5 Article 60 1-1-2016 Speech steganalysis based on the delay vector variance method OSMAN HİLMİ KOÇAL EMRAH YÜRÜKLÜ ERDOĞAN DİLAVEROĞLU Follow this and additional works at: https://journals.tubitak.gov.tr/elektrik Part of the Computer Engineering Commons, Computer Sciences Commons, and the Electrical and Computer Engineering Commons Recommended Citation KOÇAL, OSMAN HİLMİ; YÜRÜKLÜ, EMRAH; and DİLAVEROĞLU, ERDOĞAN (2016) "Speech steganalysis based on the delay vector variance method," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 24: No. 5, Article 60. https://doi.org/10.3906/elk-1411-167 Available at: https://journals.tubitak.gov.tr/elektrik/vol24/iss5/60 This Article is brought to you for free and open access by TÜBİTAK Academic Journals. It has been accepted for inclusion in Turkish Journal of Electrical Engineering and Computer Sciences by an authorized editor of TÜBİTAK Academic Journals. For more information, please contact academic.publications@tubitak.gov.tr. Turkish Journal of Electrical Engineering & Computer Sciences Turk J Elec Eng & Comp Sci (2016) 24: 4129 – 4141 http :// journa l s . tub i tak .gov . t r/e lektr ik/ ⃝c TÜBİTAK Research Article doi:10.3906/elk-1411-167 Speech steganalysis based on the delay vector variance method Osman Hilmi KOÇAL1, Emrah YÜRÜKLÜ2,∗, Erdoğan DİLAVEROĞLU3 1Department of Computer Engineering, Yalova University, Yalova, Turkey 2Department of Electrical and Electronics Engineering, Bursa Orhangazi University, Bursa, Turkey 3Department of Electrical and Electronics Engineering, Uludağ University, Bursa, Turkey Received: 24.11.2014 • Accepted/Published Online: 15.07.2015 • Final Version: 20.06.2016 Abstract: This study investigates the use of delay vector variance-based features for steganalysis of recorded speech. Because data hidden within a speech signal distort the properties of the original speech signal, we designed a new audio steganalyzer that utilizes delay vector variance (DVV) features based on surrogate data in order to detect the existence of hidden data. The proposed DVV features are evaluated individually and together with other chaotic-type features. The performance of the proposed steganalyzer method is also discussed with a focus on the effect of different hiding capacities. The results of the study show that using the proposed DVV features alone or in cooperation with other features helps in designing a distinctive audio steganalyzer, as cooperation with other chaotic-type features provides higher performances for stego and cover objects. Key words: Steganography, steganalysis, speech, chaos, false neighbors, Lyapunov exponent, surrogate data, delay vector variance 1. Introduction Steganography is the science of covert communication that is applied by hiding secret messages in digital signals. To achieve secure and undetectable communication, a stego signal (used to conceal a secret message) should be indistinguishable from a cover signal, which does not contain any secret message. Analyzing the detection of stego and cover signals is called steganalysis. The set of techniques used for steganalysis is called a steganalyzer. ‘Embedding’ is a term generally used in the steganalysis literature to express hidden information. Em- bedding, however, has another meaning in chaos theory. To prevent possible misunderstandings, ‘embedding’ is used in this paper as it is used in chaos theory, not in steganalysis. Existing methods of audio steganalysis target various data-hiding techniques. The work in [1] focused on least significant bit-based steganography, whereas the work in [2] addressed the steganalysis of the MP3stega algorithm. The steganalysis of the Hide4pgp algorithm was explored in [3]. Spread spectrum watermarking and stochastic modulation steganography were considered in [4]. Watermarking and steganographic data- hiding methods for time and frequency domains were studied in [5]. An essential reference for the basics of steganography can be found in [6]. See [7] for a brief summary of steganalysis approaches. Chaotic feature-based audio steganalysis was investigated in [8] and [9] and found useful for distinguishing stego from cover signals. Chaotic phenomena in speech are still the subject of various research studies. Though many of the reported approaches assume the linearity and stationarity of audio, theoretical and experimental ∗Correspondence: emrah.yuruklu@bou.edu.tr 4129 KOÇAL et al./Turk J Elec Eng & Comp Sci evidence for the existence of chaotic phenomena in speech signals also exists, which cannot be covered by linear modeling [8–10]. The delay vector variance (DVV) method, which is based on surrogate time series [11], is one technique that determines the level of nonlinearity in signals [12]. Assuming that hiding data leads to additive distortion of a speech signal, it is anticipated that this process will change the nonlinear and chaotic structure of the speech signal, and consequently the DVV and chaotic-based features. A change in chaotic structure also means a change in nonlinear structure, which manifests as a deviation from the numerical values of the DVVs of the cover signal and therefore enables the detection of the possible existence of a hidden signal. Using DVV-based features was proposed for the first time in [9]. Performance results in [9], which was studied only one special test case, showed that DVV-based features are very promising for speech steganalysis even though they were used alone, i.e. not combined with any other chaotic-based features. In this paper, research cases are extended and the performance results of [9] are improved with a combination of DVV-based features when compared to the authors’ previously proposed chaotic features [8]. Section 2 of this paper provides an overview of DVV analysis for nonlinear signals. Section 3 shows the application of DVV analysis to steganalysis and DVV feature extraction, and also shows how hiding data changes DVV-based features. Feature selection and the results of the experiments are given in Section 3. Conclusions are drawn in Section 4. 2. DVV method To assess the presence of nonlinearity in time series, the ‘surrogate time series’ method is a widely used technique offered by Theiler et al. [12]. A surrogate time series is produced with the same magnitude and a similar Fourier phase to transform the original time series. Although there are different approaches to producing surrogate time series, the most widely used is the iterative amplitude adjusted Fourier transform (iAAFT), which was proposed by Schreiber and Schmitz [13]. In this paper, the iAAFT approach is used to produce surrogate time series. For detailed information, please refer to [13]. By comparing the characteristics of the original and the surrogate time series, the level of nonlinearity in the time series can be anticipated. Metrics based on surrogate times are defined for performing such a comparison. 2.1. Nonlinearity measurements Besides the DVV method, there are two other approaches used to predict the nonlinearity of the time series. These are third-order autocovariance and asymmetry due to time reversal [14]. Third-order autocovariance is a higher-order extension of the traditional autocovariance, and can be given by: tC3(τ) = ⟨xnxn−τxn−2τ ⟩ (1) Here, τ is the time lag. Given that the time series is accepted as time-reversible, if the probabilistic properties do not change with regards to time reversal, invariance for the probabilistic features can be measured by using the following equation: ⟨ ⟩ 3 tREV (τ) = (xn − xn−τ ) (2) In [13,14], it was shown that with the combination of DVV, these two methods are very helpful for two-tailed tests of nonlinearity. 4130 KOÇAL et al./Turk J Elec Eng & Comp Sci 2.2. DVV method The DVV method was first proposed in 2003 [14]. It is a generic predictor of nonlinearity that uses phase spaces of time series. For an embedding dimension DE , a set of delay vectors x(n) are generated with time lag τ : x(n) = [xn−DEτ , ..., xn−τ ] (3) The definition of proper values and the effects of selecting improper values for the embedding dimension DE and time delay τ were extensively described in [8,11]. Figure 1 illustrates a sample of the embedding operation of a real audio time series into a 3D phase space. 4 10 1 4 0.5 10 1 0.5 0 0 –0.5 – 0.5 –1 –1.5 –1 1 0 1 4 – 0 .5 10 1 0 4 10 –1.5 –0. 5 – 0 50 100 150 200 250 300 350 x (n) 2 –1.0 –1.5 x (n+10) n Figure 1. Phase space of a real speech segment for T = 10 and DE = 3. To determine a specific embedding dimension, DE , the DVV method computes the mean target variance, σ∗2 , for all sets of Ωn . Here, every Ωn is a group that consists of delay vectors that are a certain distance from x(n). The distance is varied in relation to the distribution of pairwise distances between delay vectors [14]. The DVV method can be summarized as follows: ∑N 1 N σ 2 n(rd) σ∗2(rd) = n=1 (4) σ2x • For the given embedding dimension DE , the mean, µd , and the standard deviation, σd , are computed over all pairwise Euclidean distances between delay vectors. • For the given embedding dimension, DE , andΩn(rd) sets are generated according to the following formula: Ωn(rd) = {x(i)| ∥x(n)− x(i)∥ ≤ rd} ; i= 1, 2,.., N . Here, N is the sample count in the time series of x, and rd is the particular distance that must be chosen from the interval [µd − −ndσd ;µd + ndσd ], where nd is a parameter that controls the span over which to compute the DVV plot. • For the given embedding dimension DE , the variance of every set of Ω 2n(rd), σn(rd) is computed. The average value of all sets, Ωn(rd), is normalized by the variance of the time series σ 2 x . At the end, the measure of unpredictability is computed by: 4131 x (n) x (n+20) KOÇAL et al./Turk J Elec Eng & Comp Sci Unless one set, Ωn(rd), contains more than 30 delay vectors, it is not taken into account when performing the computations [14]. As a result of standardizing the distance axis, the DVV plots are easily interpreted. Figure 2 shows the DVV plots of four benchmark signals to further explain DVV. According to the plots, the Henon Map signal (A) is the most predictable signal, while colored noise (C) is the most unpredictable. Note that all the x-axes are standardized as distances to make the comparison easier [15]. A B 1 .0 1 .0 *2 *2 0.5 0.5 0.0 | | | | | | | 0.0 | | | | | | | –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 standardized distance standardized distance C D 1 .0 1 .0 *2 *2 0.5 0.5 0.0 | | | | | | | 0.0 | | | | | | | –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 standardized distance standardized distance Figure 2. Solid curves represent the DVV plots for (a) the Henon Map, (b) Mackey–Glass, (c) colored noise, and (d) the laser time series. Average DVV plots computed over 99 surrogates are shown as dashed curves [15]. 3. A new steganalyzer feature set using the DVV method By evaluating Figure 2, the nonlinear dynamics of the signal become easier to interpret. However, this approach is not enough, given that objective decisions are not used when creating evaluations. For this reason, Schreiber 4132 | | | | | | | | | | | | KOÇAL et al./Turk J Elec Eng & Comp Sci and Schmitz created the method described below, which produces numerical results [13]: 2 n surr≥ −1 (5) α 1. The number of surrogate data, n surr, is decided according to the defined limit of significance,α (generally 0.05): 2. The n surr surrogate time series is produced from the original time series s1 , s2 ,. . . sn surr . 3. DVV curves of the original and surrogate time series are constructed. 4. Statistically analyze DVV curves by using one-sided or two-sided tests to find out their coherence with the original data by using various methods. For the fourth step, we decided to use RMSE values for measuring the difference between the DVV curves of the original and the surrogate signal. For n surr surrogate time series, the RMSE vector is constituted as RMSE = {RMSE 1 , RMSE 2 ,. . . , RMSE n surr } . For using DVV-based features for speech steganalysis, we have designed a feature set based on calculated RMSE values of surrogate time series. The proposed feature vector, which must be constituted for every original time series (i.e. audio record), consists of four elements—mean, variance, skewness, and kurtosis—which are values that are calculated over RMSE values [9]: Fsurr = {mean (RMSE) var (RMSE) ske (RMSE) kur (RMSE) (6) In the Ωn set, the square value of the neighborhood distances between all delayed vector couples is: ∑DE 2 d(x(n), x(m))2 = ∥x(m)− x(n)∥ = (x(n+ k)− x(m+ k))2 (7) k=1 The mean, µd , and variance values, σ 2 d , in the Ωn set are[: ] [ ] µ = E[d] σ2 2 = E (d− µ ) = E d2 − µ2d d d d (8) x(m)∈Ωn x(m)∈Ωn Assuming that hiding data in cover signals means adding zero-mean and σ2ε varianced white noise with a magnitude of ε(n), if we reorganize Eq. (7) for both cover and stego signal, then: ∑DE dC(x(n), x(m)) 2 = d2C = (x(n+ k)− x(m+ k))2 (9) k=1 ∑DE dS(x(n), x(m)) 2 = d2S = ([x(n+ k) + ε(n+ k)]− [x(m+ k) + ε(m+ k)])2 (10) k=1 If we expand the term of d2S , which is used for stego signals, then: D∑E d2S = {x2(n+ k) + 2x(n+ k)ε(n+ k) + ε2(n+ k) + x2(m+ k) + 2x(m+ k)ε(m+ k) k=1 +ε2(m+ k)− 2 [x(n+ k)x(m+ k) + x(n+ k)ε(m+ k) +ε(n+ k)x(m+ k) + ε(n+ k)ε(m+ k)]} (11) 4133 KOÇAL et al./Turk J Elec Eng & Comp Sci D∑E If we put d2 = x2C (n+ k) + 2x(n+ k)x(m+ k) + x 2(m+ k) into Eq. (11), then: k=1 D∑E { d2 = d2S C + 2x(n+ k)ε(n+ k) + ε 2(n+ k) + 2x(m+ k)ε(m+ k) + ε2(m+ k) k=1 (12) −2 [x(n+ k)ε(m+ k) + ε(n+ k)x(m+ k) +ε(n+ k)ε(m+ k)]} As long as all the pairs of ⟨ε(m+k), ε(n+k)⟩ , ⟨ε(m+k), ε(m+k)⟩ , ⟨ε(m+k), x(m+k)⟩ , ⟨ε(m+k)x(n+k)⟩ , ⟨ε(n + k), x(m + k)⟩ ve ⟨ε(n + k), x(n + k)⟩, are independent and identically distributed, the mean values of Eq. (12) can be simplified as: [ ] [ ] [ ] E d2S = E d 2 2 C +DE . E ε (n+ k) +D 2 E .E[ε (m+ k)] (13) x(n)∈Ωn x(n)∈Ωn Because ε(n) is zero-mean, and σ2ε is varianced, white noise is calculated as: [ ] [ ] E d2S = E d 2 2 C +DE .2σε (14) x(n)∈Ωn x(n)∈Ωn The variance value used in DVV analysis and seen in Eq. (8) can be defined for cover and stego signals as follows: [ ] 2 2(σd)C = E d 2 C − (µd)C (15) x(m)∈Ωn [ ] (σ2 2 d)S = E d 2 S − (µd)∈ S (16) x(m) Ωn Using Eqs. 414) and (15), the term (σ2d)S can be written as: 2 2 2 2 (σd) = (σd) + (µd)C +DE .2σ 2 ε − (µd)S (17)S C 2 2 With the assumption of (µd)C ≈ (µd)S , Eq. (17) can be simplified as: 2 2 (σd) 2 S = (σd)C +DE .2σε (18) Such results show that the variance values of stego signals used in DVV analysis are always higher than those of the cover signal. The first three elements of the proposed Fsurr feature vector can be seen in Figure 3 and are calculated over 2000 stego and cover signals using DSSS [16] and stochastic modulation [17] steganography techniques. Here the n surr value is 20, and the difference between the feature values of the cover and stego signals is demonstrated in Figure 3. 4134 KOÇAL et al./Turk J Elec Eng & Comp Sci 12 12 10 10 cover signal 8 stego signal 8 6 6 (a) (b) 4 4 2 cover signal 2 stego signal 0 0 4 40 0 3 30 0 4 6 2 20 5 0 2 10 0 var (RMSE) 1 0 var (RMSE) – 2 0 5 0 4 mean (RMSE) 0 – mean (RMSE) – Figure 3. The first three elements of Fsurr of 2000 cover and stego signals, which are constituted with (a) steganographic techniques DSSS and (b) stochastic modulation. nsurr is 20 for Fsurr . 4. Experimental results Tests were performed using nine different methods of hiding data. Five of these methods used watermarking techniques, whereas the other four used steganographic techniques. Watermarking methods were used to extend the case to the widest range of possibility, although a stego signal can be deliberately changed before the signal is received [18]. The watermarking techniques used were direct-sequence spread spectrum (DSSS), frequency hop- ping with spread spectrum (FHSS), echo hiding (ECHO) [16], stochastic modulation (STOMOD) [17], and DCT- based watermarking (COX) [19]. The four steganographic methods used were Steganos (www.steganos.com), MP3Stego (www.petitcolas.net/fabien/steganography/mp3stego), Steghide (http://steghide.sourceforge.net), and Hide4Pgp (www.heinz-repp.onlinehome.de/Hide4PGP.htm). Note that stochastic modulation can be used for steganography; in this study, however, using STOMOD for watermarking is taken into account. These methods were selected due to their popularity, free availability, and wide usage in related works. The individual performance of steganography and watermarking methods is used to determine whether the proposed feature set is useful for audio steganalysis or not. 4.1. Dataset The database used for the test scenario is a subset of the TIMIT speech database (https://catalog.ldc.upenn.edu/ LDC93S1), which is widely known in the evaluation of automatic speech recognition systems. The TIMIT database comprises over 6300 utterances from 630 male and female speakers, sampled at 16 kHz. Two thousand speech segments, ignoring dialects and male–female differences, were randomly selected from the TIMIT database for experiments in this study. For every data-hiding method studied, a stego signal subset was constituted that contained hidden data in utterances using the data-hiding method. 4.2. Data hiding We concealed messages into excerpts using nine data-hiding methods. The procedure for concealing messages was randomly selected; half of the set used stego and cover signals for training, while the other half was used for testing. An objective distortion measure, the signal-to-watermark ratio (SWR), was used to define the payload level in the watermarking methods: ∑ x(n)2 SWR = ∑ , (19) (x(n)− y(n))2 4135 skew (RMSE) skew (RMSE) KOÇAL et al./Turk J Elec Eng & Comp Sci where x(n) and y(n) signify cover and watermarked signals, respectively. For steganographic methods, the performances of three different data-hiding rates of each individual method were evaluated and found to be at 100% and 50% of maximum allowed capacity. 4.3. Feature set The feature set proposed in this study has four elements, all of which are based on DVV values, as described in Eq. (6). This new feature set was used for all steganalysis trials, unless otherwise stated. In producing the DVV- based feature vector, the DVV MATLAB Toolbox was utilized (http://www.commsp.ee.ic.ac.uk/∼mandic/dvv. htm). To determine if DVV-based features can be used to improve the performance of other chaotic-based steganalyzer feature sets already investigated in the literature, a unified feature set was constituted and tested. The proposed chaotic-based feature set [8] is used for this purpose, which consists of 22 elements and is defined below: Fchaotic(DE) = {FFNF (DE)|DE = 3, 4, 5, 6, 7} ∪ λ { i| i = 1, 2, 3, 4, 5, 6, 7} (20) Here, FFNF (DE) is the feature set of the false neighbor fraction (FNF), which is based on the statistical values of the FNF method [20], while λi is i . The Lyapunov exponent (LE) value of the signal is as shown in [21]. Using these chaotic features in audio steganalysis is very advantageous, as described in [8]. FFNF (DE) is described as follows: FFNF (DE) = [FNF,mean (dD (s(n), s(m))) , RMS (dD (s(n), s(m)))] (21)E E Here, the three elements of the feature vector are the fraction of false neighbors, the average size of the neighborhood, and the root mean-squared size of the neighborhood. The FNF and LE values of the signals are calculated with TISEAN [22]. By assembling chaotic features with proposed DVV-based features, a unified feature set, which comprises 26 elements, can be defined as follows: F (DE) = Fsurr ∪ {FFNF (DE)|DE = 3, 4, 5, 6, 7} ∪ λ { i| i = 1, 2, 3, 4, 5, 6, 7} (22) 4.4. Feature selection and classifiers To achieve the highest detection rate in order to evaluate redundant features, a sequential forward float- ing search method (SFFS) [23] was coupled with a support vector machine (SVM) classification method (http://sourceforge.net/projects/svm/). A radial basis function with gamma equal to 4 was used in the SVM as a SVM function. 4.5. Simulation results To assess the validity of the proposed feature set, several tests were carried out that considered the effect of payload. The performance of the DVV-based steganalyzer is provided in Table 1 for the TIMIT dataset. As a performance metric, results are given as percent of the performance, missed detection count (MISS), and false alarm count (FA). In all simulations, half of the dataset was used for training the SVM classifier and the remaining half was used to test the designed steganalyzer with a trained classifier. In the TIMIT dataset, it is observed that SWR values greater than 38 dB for DSSS, 34 dB for FHSS, 20 dB for Cox, and 18 dB for ECHO become noticeable, namely audible. For this reason, 20 dB, 30 dB, and 40 dB were selected as payloads. It is shown that a 30-dB payload causes performance to drop to 55% with the 4136 KOÇAL et al./Turk J Elec Eng & Comp Sci Table 1. SVM-based steganalysis performance for different data-hiding capacities with DVV features. Watermarking methods Steganography methods Method SWR MISS FA % Method Capacity MISS FA % 20 dB 0/1000 0/1000 100.0 usage DSSS 30 dB 1/1000 2/1000 99.9 50% 142/1000 166/1000 84.5 STEGA 40 dB 5/1000 6/1000 99.5 100% 64/1000 73/1000 93.2 20 dB 2/1000 0/1000 99.9 50% 42/1000 176/1000 89.1 HIDE4PGP FHSS 30 dB 5/1000 6/1000 99.5 100% 33/1000 84/1000 94.2 40 dB 16/1000 13/1000 98.0 50% 121/1000 154/1000 86.3 STEGHIDE 20 dB 98/1000 236/1000 83.3 100% 48/1000 83/1000 93.5 ECHO 30 dB 176/1000 395/1000 71.5 50% 287/1000 167/1000 77.3 MP3 40 dB 293/100 489/1000 60.9 100% 203/1000 124/1000 83.7 20 dB 285/1000 212/1000 75.2 COX 30 dB 452/1000 457/1000 54.6 40 dB 478/1000 485/1000 51.9 20 dB 13/1000 8/1000 99.0 STOMOD 30 dB 53/1000 34/1000 95.7 40 dB 171/1000 98/1000 89.8 Cox method; note, however, that this level of payload is audible. For steganographic methods, the performance of each method was evaluated individually for two different data-hiding rates: 100% and 50% of maximum allowed capacity. There was no audibility concern for steganographic methods, as the maximum data-hiding rate used in the tests is equal to the maximum allowed capacity—the audibility threshold. The performance of the steganalyzer decreases as payload (in terms of SWR or capacity usage) also decreases. This is because a higher payload causes more corruption to the audio data, which makes detecting hidden data much simpler. Curves of the receiver-operating characteristics (ROCs) for watermarking and steganography methods are illustrated in Figure 4. The calculated conditions for producing ROC curves hold the same configuration used in Table 1—namely, the same database and the same SVM classifier. ROC curves provide performance Figure 4. ROC curves for (a) watermarking and (b) steganographic methods obtained using the SVM classifier. The dataset is a random sample of 2000 speeches from the TIMIT database. The payload is 20 dB for watermarking methods and 100% capacity usage for steganographic methods. 4137 KOÇAL et al./Turk J Elec Eng & Comp Sci Table 2. SVM-based steganalysis performance for different data-hiding capacities with chaotic features. Watermarking methods Steganography methods Method SWR MISS FA % Method Capacity MISS FA % 20dB 0/1000 0/1000 100.0 usage DSSS 30 dB 0/1000 0/1000 100.0 50% 105/1000 117/1000 88.9STEGA 40 dB 2/1000 2/1000 99.8 100% 32/1000 84/1000 94.2 20 dB 0/1000 0/1000 100.0 50% 32/1000 143/1000 91.3 HIDE4PGP FHSS 30 dB 1/1000 3/1000 99.8 100% 29/1000 55/1000 95.8 40 dB 13/1000 9/1000 99.0 50% 44/1000 65/1000 94.6 STEGHIDE 20 dB 74/1000 148/1000 88.9 100% 19/1000 32/1000 97.5 ECHO 30 dB 132/1000 295/1000 78.7 50% 239/1000 127/1000 81.7MP3 40 dB 212/100 437/1000 67.6 100% 154/1000 114/1000 86.6 20 dB 227/1000 245/1000 76.4 COX 30 dB 435/1000 442/1000 56.2 40 dB 471/1000 451/1000 53.9 20 dB 7/1000 5/1000 99.4 STOMOD 30 dB 40/1000 22/1000 96.9 40 dB 108/1000 61/1000 91.5 results if the threshold of the decision is changed. Figure 4 illustrates that DVV-based features are powerful even with biased thresholds. The unified feature set, as described in Eq. (22), is also evaluated to investigate the unification of DVV- based features with other proposed, chaotic-type steganalyzers. The performance results of the steganalyzer with unified features can be seen in Table 2. The steganalyzer used DVV-based features only, showing that using a unified set of features provides higher performance results. Table 3 gives the comparative results of using chaotic and DVV-based features individually and together. Using the support of DVV-based features, chaotic-based features are more discriminative than when they are alone. Comparing DVV-based and chaotic-type features shows that even though DVV-based features use very limited information from the audio data (namely, RMSE values of the original and surrogate DVV curves), the performance results are very promising. Since chaotic-type features were selected from a variety of chaos measurement methods [8], high performance results should be expected. However, the DVV-based steganalyzer achieved slightly higher performance results for some steganalysis conditions. Table 2 shows that Table 3. Comparative performance results of different feature-based steganalyzers for different data-hiding capacities. Watermarking methods Steganography methods Method SWR CHAOTIC DVV CHAOTIC Method Capacity CHAOTIC DVV CHAOTIC + DVV usage + DVV 20 dB 100.0 100.0 100.0 50% 88.2 84.5 88.9 STEGA DSSS 30 dB 100.0 99.9 100.0 100% 92.6 93.2 94.2 40 dB 99.8 99.5 99.8 50% 90.8 89.1 91.3 HIDE4PGP 20 dB 100.0 99.9 100.0 100% 93.8 94.2 95.8 FHSS 30 dB 99.8 99.5 99.8 50% 88.8 86.3 94.6 STEGHIDE 40 dB 98.8 98.0 99.0 100% 94.4 93.5 97.5 20 dB 87.8 83.3 88.9 50% 81.0 77.3 81.7 MP3 ECHO 30 dB 78.5 71.5 78.7 100% 85.5 83.7 86.6 40 dB 66.4 60.9 67.6 20 dB 72.2 75.2 76.4 COX 30 dB 55.4 54.6 56.2 40 dB 52.3 51.9 53.9 20 dB 99.3 99.0 99.4 STOMOD 30 dB 96.6 95.7 96.9 40 dB 90.3 89.8 91.5 4138 KOÇAL et al./Turk J Elec Eng & Comp Sci with the support of DVV-based features, the performance of a chaotic-based steganalyzer provides even better performance. When comparing recently proposed audio steganalysis approaches, the proposal of Geetha et al. [23], which uses Hausdorff distance and higher-order statistics, is analyzed. In Geetha et al.’s study, the selected classifier is the J48 decision tree algorithm, and a database of 200 audio samples was used for steganalysis. For our study, the SVM is selected as a classifier, and 2000 randomly selected audio samples from the TIMIT database were used. Even though the database and classifier used in this study differ from those of Geetha et al.’s study, information about the performance of the proposed steganalyzers is still provided. For a fair comparison, tests should be performed using the same database and detection algorithms; however, Table 4 shows that the proposed DVV-based steganalyzer provides better performance for the DSSS and Steghide methods and poorer performance for the ECHO method. Table 4. Comparative performance results of different feature-based steganalyzers with different database and classifiers. Method Capacity usage Hausdorff DVV Chaotic + DVV DSSS 10% (SWR: 33 dB) 78.6 99.7 99.9 Echo 10% (SWR: 26 dB) 88.4 78.8 83.4 Steghide 10% 83.2 78.7 86.4 5. Conclusion We have proposed a new feature vector based on DVV analysis of nonlinear data. We have constituted a new numerical feature vector by using DVV analysis, which is normally used for detecting the nonlinearity level of signals. Tests for the proposed steganalyzer were carried out by using the TIMIT database. Compared to recently proposed similar steganalyzer studies [3,5,8,24–26], DVV-based features used as speech steganalyzers show very promising results, specifically for the DSSS, FHSS, STOMOD, Hide4PGP, and Steghide methods. Moreover, tests were repeated for a new steganalyzer that is composed of a combination of DVV-based and chaotic-based features. A number of test results for different embedding capacities and different decision thresholds (ROC curves) are also presented. Use of DVV-based features in cooperation with other chaotic features improves the performance further. This combined steganalyzer provides the advantages of both DVV- based and chaotic features, and it can be used for almost all speech steganography methods proposed in the literature. Moreover, comparative results are obtained from our proposed steganalyzers and those of Geetha et al. [24] for the same usage capacities used in [24]. Test results show that for the Steghide and Hide4pgp steganography methods, our combined DVV-based and chaotic-based steganalyzer has the most distinctive feature set among the published high-performance steganalyzers [3,5,8,9,24–26]. Given the performance results of this study, it can easily be said that DVV-based features can be used for speech steganalysis as well as nonlinearity detection. In fact, the combined DVV- and chaotic-based steganalyzer is the best tool for distinguishing speech data from secret data hidden by the Steghide and Hide4pgp steganography methods. For the special case of mp3 audio files, recent studies have shown that with specific features, a more than 95% detection rate is achievable with SVM classifiers, even at low hiding capacity rates such as 20% [27,28]. The performance of the proposed steganalyzer can be increased further by using side information specific to the mp3 files. As long as chaotic-type and DVV-based steganalyzers fall short of this level of achievement, 4139 KOÇAL et al./Turk J Elec Eng & Comp Sci the proposed DVV and chaotic-type features should be preferred for non-mp3 samples, namely coded but not compressed audio files such as .wav files. Acknowledgments This work was supported in part by TÜBİTAK (Project No. 104E056). The authors would like to thank H Özer of the TÜBİTAK UEKAE Speech Group for providing the database of speech tracks used in the experiments. They would also like to thank the anonymous reviewers for their constructive comments that greatly improved this paper. References [1] Westfeld AP. Detecting low embedding rates. In: IH 2002 International Workshop; 7–9 October 2002; Noordwijk- erhout, the Netherlands. Berlin, Germany: Springer-Verlag, 2003. pp. 324-339. [2] Westfeld AP, Pfitzmann A. Attacks on steganographic systems. In: IH 1999 International Workshop; 29 September– 1 October 1999; Dresden, Germany. Heidelberg, Germany: Springer-Verlag, 1999. pp. 61-66. [3] Johnson MK, Lyu S, Farid H. Steganalysis of recorded speech. In: SPIE 2005 Symposium on Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents; 16–20 January 2005; San Jose, CA, USA. Bellingham, WA, USA: SPIE. pp. 664-672. [4] Altun O, Sharma G, Çelik M, Sterling M, Titlebaum E, Bocko M. Morphological steganalysis of audio signals and the principle of diminishing marginal distortions. In: IEEE 2005 International Conference on Acoustics, Speech, and Signal Processing; 18–23 March 2005; Philadelphia, PA, USA. New York, NY, USA: IEEE. pp. 21-24. [5] Özer H, Avcıbaş İ, Sankur B, Memon N. Steganalysis of audio based on audio quality metrics. In: SPIE 2003 Conference on Security and Watermarking of Multimedia Contents; 20 January 2003; Santa Clara, CA, USA. Bellingham, WA, USA: SPIE. pp. 55-66. [6] Djebbar F, Ayad B, Meraim KA, Hamam H. Comparative study of digital audio steganography techniques. EURASIP J Audio SPEE 2012; 25: 1-16. [7] Meghanathan N, Nayak L. Steganalysis algorithms for detecting the hidden information in image, audio and video cover media. International Journal of Network Security & Its Application 2010; 2: 43-55. [8] Koçal OH, Yürüklü E, Avcıbaş İ. Chaotic-type features for speech steganalysis. IEEE T Inf Foren Sec 2008; 3: 651-661. [9] Koçal OH, Yürüklü E, Dilaveroğlu E. A new approach for speech audio steganalysis using delay vector variance method. J Fac Eng Arch 2014; 19: 27-36. [10] Kokkinos I, Maragos P. Nonlinear speech analysis using models for chaotic systems. IEEE T Speech Audi P 2005; 13: 1098-1109. [11] Banbrook M, McLaughlin S. Speech characterization and synthesis by nonlinear methods. IEEE T Speech Audi P 1999; 7: 1-17. [12] Theiler J, Eubank S, Longtin A, Galdrikian B, Farmer JD. Testing for nonlinearity in time series: the method of surrogate data. Physica D 1992; 58: 77-94. [13] Schreiber T, Schmitz A. Discrimination power of measures for nonlinearity in a time series. Phys Rev E 1997; 55: 5443-5447. [14] Gautama T, Mandic DP, Van Hulle MM. Indications of nonlinear structures in brain electrical activity. Phys Rev E 2003; 67: 046204.1-046204.5. [15] Gautama T, Mandic DP, Van Hulle MM. A novel method for determining the nature of time series. IEEE T Bio-Med Eng 2004; 51: 728-736. 4140 KOÇAL et al./Turk J Elec Eng & Comp Sci [16] Bender W, Gruhl D, Morimoto N, Lu A. Techniques for data hiding. IBM Syst J 1996; 35: 313-336. [17] Fridrich J, Goljan M. Digital image steganography using stochastic modulation. In: SPIE 2003 Conference on Security and Watermarking of Multimedia Contents; 20 January 2003; Santa Clara, CA, USA. Bellingham, WA, USA: SPIE. pp. 191-202. [18] Simmons GJ. Prisoners’ problem and the subliminal channel. In: Chaum D, editor. Advances in Cryptology. New York, NY, USA: Plenum Press, 1984. pp. 51-67. [19] Cox IJ, Kilian J, Leighton FT, Shamoon T. Secure spread spectrum watermarking for multimedia. IEEE T Image Process 1997; 6: 1673-1687. [20] Kennel MB, Abarbanel HDI. False neighbors and false strands: a reliable minimum embedding dimension algorithm. Phys Rev E 2002; 66: 026209. [21] Hilborn R. Chaos and Nonlinear Dynamics. 2nd ed. Oxford, UK: Oxford University Press, 2000. [22] Hegger R, Kantz H, Schreiber T. Practical implementation of nonlinear time series methods: the TISEAN package. Chaos 1999; 9: 413-435. [23] Pudil P, Novovicova J, Kittler J. Floating search methods in feature selection. Pattern Recogn Lett 1994; 15: 1119-1125. [24] Geetha S, Ishwarya N, Kamaraj N. Audio steganalysis with Hausdorff distance higher order statistics using a rule based decision tree paradigm. Expert Syst Appl 2010; 37: 7469-7482. [25] Avcıbaş İ. Audio steganalysis with content-independent distortion measures. IEEE Signal Proc Let 2006; 13: 92-95. [26] Özer H, Sankur B, Memon N, Avcıbaş İ. Detection of audio covert channels using statistical footprints of hidden messages. Digit Signal Process 2006; 16: 389-401. [27] Qiao M, Sung AH, Liu Q. MP3 audio steganalysis. Inform Sciences 2013; 31: 123-134. [28] Yu X, Wang R, Yan D, Zhu J. MP3 Audio steganalysis using calibrated side information feature. J Comput Inf Syst 2012; 8: 4241-4248. 4141