Feature Based on Statistical Moments of Wavelet Characteristic Functions

It is well-known that the histogram of a digital image or a wavelet subband is essentially the probability mass function (pmf), if the image grayscale values or the wavelet coefficient values are treated as a random variable. Furthermore, if each component of the histogram is multiplied by a correspondingly shifted unit impulse, we then have the probability density function (pdf). One can consider the characteristic function (CF) and the pdf (here, histogram) are similar to a Fourier transform pair (with the sign in the exponential reversed). Denote histogram by h(xj), and characteristic function by H(fk), both j and k are allowed to vary from 0, 1, up to N-1.  Then they form a pair of discrete Fourier transform (DFT).

On the other hand, because of the de-correlation capability of wavelet transform, the coefficients of different subbands at the same level are kind of independent to each other. Therefore, the features generated from different wavelet subbands at the same level are kind of independent to each other as well.

Motivated by these considerations, we propose to use the statistical moments of the characteristic functions of wavelet subbands as features for steganalysis. The n-th statistical moment of a characteristic function,, is defined as follows.

                                     

where is the magnitude of the CF. Note that the utilization of the magnitude of CF makes this definition of moments different from conventional one. According to the Fourier transform theory, since the histogram is real-valued, the magnitude of CF, , is even symmetric, while the CF's phase angle is odd symmetric. Therefore, only about a half of points need to be used in the moment calculation for steganalysis.

Prediction-error Image

In steganalysis, we only care about the distortion caused by data hiding. It is known that this type of distortion may be rather weak and hence covered by other types of noises, including those due to the peculiar feature of the image itself. In order to enhance the noise introduced by data hiding, we propose to predict each pixel grayscale value in the original cover image by using its neighboring pixels' grayscale values, and obtain a prediction-error image by subtracting the predicted image from the test image. It is expected that this prediction-error image removes various information other than that caused by data hiding, thus making the steganalysis more efficient because the hidden data are usually unrelated to the cover media. In other words, the prediction-error image is used to erase the image content. The prediction algorithm is expressed below,

                                    

where a, b, c are is the context of the pixel x under consideration,  is the prediction value of x.  The location of a, b, c can be illustrated below. 

x

b

a

c

78-D Feature Vector

 

In our work, a test image will be decomposed using a three-level Haar transform. For each level, there are four subbands, resulting in 12 subbands in total. If the original image is considered as level-0 LL subband, we have a total of 13 subbands. For each subband, the first three moments of characteristic functions are derived, resulting in a set of 39 features. Similarly, for the prediction-error image, another set of 39 features can be generated. Thus, a 78-D feature vector is produced for the test image.

Our extensive experimental study has shown that using more than three-level wavelet decomposition and including more than the first three order moments do not further improve the steganalysis performance, while leading to higher computational complexity. Hence the 78-D feature vectors are used in our proposed steganalysis system.

Neural Network Classifier

    In addition to feature extraction, the design of classifier is another key element in steganalysis. It affects the classification performance in terms of classification success rate as well as computational complexity and, hence, implementation speed. Therefore, the classifier plays an important role in steganalysis.

1. In our experiment, neural network classifier is adopted.

2. Bayes classifier is also adopted for steganalysis in our work. Due to its fast training speed, we provided Bayes classifier in our demonstration part.

Bayes Classifier

Bayes classifier under the condition of Gaussian distribution is adopted to steganalyze test images, each represented by a 78-D feature vector, denote by Xi, where i is the index of the test image. The notations of  are used to denote the class of original images and the class of stego-images, respectively. Assume that both image classes obey Gaussian distribution. The mean vectors and covariance matrixes of and are denoted by  and, respectively. The Bayes classifier can be stated as follows .

 

A. Maximum posterior decision:

if                                  ,                              

else                                                                                

where:        

     

and

,            

where N stands for normal (Gaussian) distribution.

 

    B. Decision function:

                                             

where,

          

Experimental Results on CorelDraw Image Database

To evaluate the performance of the proposed steganalysis system, we use all the 1096 sample images contained in the CorelDRAW Version 10.0 software CD#3 for experiments. It contains pictures of Nature, Ocean, Food, Animals, Architecture, Places, Leisure, Misc. and so on.  The following five typical data hiding methods are used in experiments: Cox et al.'s non-blind SS (), Piva et. al's blind SS , Huang and Shi's 8 by 8  block SS , a generic QIM (0.1 bpp (bit per pixel)), and a generic LSB (0.3 bpp, both the pixel position used for embedding data and the to-be-embedded bits are randomly selected). For each image in the CorelDRAW image database, five stego-images are generated with these five data hiding methods, respectively. For all the data hiding methods, different random signals are embedded into different images. The evaluation of the proposed steganalysis system is hence more general.

At first, we evaluate the system with each one of the five data hiding methods one at a time. In the experiments, randomly selected 5/6 of the 1096 CorelDraw images and the corresponding 5/6 of 1096 stego-images (specifically, 896 out of 1096 images in our experiments) for training purpose, following the common practice in the automatic recognition of Arabic numerals. The remaining 1/6 of the 1096 CorelDraw images and the corresponding stego-images (specifically, the remaining 200 image pairs) are used for testing purpose. The detection rate is defined as the ratio of the number of the correctly classified images with respect to the number of the overall test images. To be reliable, the detection rates are reported by averaging the rates obtained in multiple times (specifically 10 times) of such types of randomly conducted experiments. The average detection rates are listed in Table 1.

Next, we combine the five data hiding methods to evaluate the blind steganalysis ability of the proposed system. Similarly to the above, we start with 1096 6-tuple images. Each 6-tuple images consists of an original image, and the five stego-images generated by the five data hiding methods. We then randomly selected 896 6-tuple images for training, and use the remaining 200 6-tuples for testing. Again, the 20-time averaged detection rates are listed in Table 1.   

Table 1 Testing results.

Data Hiding Methods

Detection rate

Cox et al.'s SS

98.1%

Piva et. al's SS

98.7%

Huang and Shi' block SS

98.8%

Generic QIM (0.1 bpp)

99.0%

Generic LSB (0.3 bpp)

98.9%

5 methods combined

98.7%

 

Thirdly, to further evaluate our system, a data hiding method, which has not been used in the training process, is tested. We apply Hide4PGP to 200 randomly selected CorelDraw images. The detection rate is 99.5%.

Fourthly, to evaluate the effectiveness of using the moments generated from the prediction-error image, we conduct the same evaluation as stated above, but with the Bayes classifier for the computational simplicity, to the first 39 features (generated from the test images) and the second 39 features (obtained from the prediction-error images), separately. Table 2 contains the comparison results, which has demonstrated the effectiveness of using the prediction-error images. That is, the performance of using features obtained from the prediction-error images is more effective than that obtained from that obtained from the test images (each detection rate with the prediction-error image is higher than that with the original test image (over 1096 CorelDraw aimges)). This is expected as analyzed above.

Table 2 Effectiveness comparison of features from original images and features from prediction-error images.

Detection rate

39D (test mage)

39D (prediction-error image)

Cox et al.'s SS

96.2%

96.6%

Piva et. al's SS

95.2%

98.8%

Huang and Shi's block SS

95.4%

97.9%

Generic QIM (0.1 bpp)

97.9%

98.7%

Generic LSB (0.3 bpp)

94.5%

98.7%

5 methods combined

94.9%

98.4%

 

Fifthly, in order to show the effectiveness of using moments of CF of wavelet subbands, and the effectiveness of using moments of the original test image and all of the LL subbands, i.e, LLi, i=0,1,2,3, as features in steganalysis, we apply each individual feature in the first 39 features among our proposed 78 features alone for the steganalysis of the non-blind spread spectrum data hiding method by Cox et al. Note that we use the Bayes classifier again here for the sake of computational simplicity. The detection rates are listed in Table 3.  

Table3 Detection rate in unit of % by applying each of the first 39 proposed features alone.

 

1stCF moment

2nd CF moment

3rd CF moment

LL0

63.4

65.8

54.0

LL1

63.8

62.9

54.9

LH1

54.7

54.9

54.7

HL1

54.4

54.9

54.6

HH1

55.2

54.9

55.2

LL2

64.2

56.2

53.6

LH2

55.5

55.4

55.3

HL2

55.8

55.6

55.6

HH2

51.7

52.3

53.0

LL3

56.6

52.9

54.2

LH3

62.3

61.6

59.9

HL3

61.5

59.2

55.9

HH3

51.6

52.1

51.3

 

Finally, the effectiveness of using the neural network is evaluated. We conduct experiments with our proposed 78-D feature vectors but using the Bayes classifier and the neural network, respectively, for the five data hiding methods individually and jointly. Table 4 contains detection rate for Cox et al.'s SS data hiding method and for the combined testing. Comparing with the results obtained with the Bayes classifier, a 3% to 4% increase in terms of detection rate has been achieved by using the proposed neural network.

Table 4 Comparison of neural network with Bayes classifier.

Detection rate

Bayes classifier

Neural network

Cox et al.'s SS

95.2%

98.1%

5 methods combined

94.6%

98.7%

  

Discussions and Observations

Above, a framework of a novel general steganalysis system is proposed. Our discovery is summarized below.

a)      Statistical moments of wavelet characteristic functions (CF's) are proposed to be used for steganalysis for the first time. Our theoretical analysis and experimental work have pointed out that the moments of wavelet CF's can reflect the differentiation property of the associated histograms, hence, reflecting sensitively the changes caused by data hiding process.

b)      Prediction-error images are able to enhance the changes caused by data hiding by reducing the effect caused by the diversity of natural images.

c)      Artificial neural network performs better in steganalysis than Bayes classifier due to its powerful learning capability.

d)      Our multiple-data-hiding-method combined steganalysis approach has pointed out a promising way towards blind and practically powerful steganalysis. That is, if we can work on a large-size image database, and if we can train our system with more typical data hiding algorithms, the proposed system will be able to detect stego-images generated by more data hiding algorithms. Some data hiding algorithms, which have not been trained, may be able to be detected.

e)      Our experiments are conducted over a large number of images, which is necessary for steganalysis.

f)       Our proposed steganalysis system has demonstrated a significant performance improvement over the prior-arts.


* This research is jointly conducted by NJIT and Professor G.R. Xuan's group at Tongji University, Shanghai, China.