Steganography Analysis

1503 Words7 Pages

JPEG IMAGE STEGANALYSIS USING MACHINE LEARNING Abstract—This project deals with detection of steganography content. Steganography is the additional method in cryptography which helps to hide the coded message inside pictures, audio or videos. To hide the message is important but to reveal such content is more important to avoid usage by criminals. This project applies an approach of supervised machine learning to detect the presence of steganographic content coded by programs like Steghide in the images. Keywords—Steganography, Stego-images, Cover-images, Steganalysis. I. INTRODUCTION Steganography is the process of hiding the secret information within an ordinary message. Steganography applies to any type of the medium. The stegaogram …show more content…

Sample Extraction The second step involves sample extraction, i.e, features that are extracted from both the original and stego images. The feature that can be extracted and be used to solve our problem is Huffman coding. To extract such kind of information from images a program JPEGSnoop can be used, which is able to work for extended information in image, video and text files. JPEGSnoop is able to extract information such as: • Quality of the image • EXIF information • RGB histogram • Tables of Huffman’s coding Huffman’s coding was designed by David Huffman in 1952. It has two properties - a code with a minimal length, it is not only the prefix code and is therefore uniquely decodable. The disadvantage is that we should know the probability distribution of the occurrence of each symbol. Sample Huffman’s coding table of a clear and coded pictures are as follows (Table1 and Table 2) Table 1: Huffman’s Coding – Clear Picture Bits DC, Class0 DC, Class1 AC, Class0 AC, Class1 1 0 0 0 0 2 82 537 111597 41239 3 2811 494 39917 30606 4 886 602 46384 31571 5 837 542 30163 18650 6 724 475 5825 7639 7 547 293 14139 724 8 213 112 6943 3479 9 44 17 2526 842 10 0 0 2580 352 11 0 0 658 …show more content…

Training Sets For training it is necessary to define suitable training sets. We used photos from ground truth image database[11]. In this group of photos a secret message will be inserted using the program steghide. The message is unique in every image due to a random generator of strings that will be used. Huffman’s coding data from JPEGSnoop will be transferred to training set-all four columns are to given line by line which will create a vector. Examples of clear and coded inputs in a training set are in Figure 3 and Figure 4. {0,82,2811,886,837,724,547,213,44,0,0,0,0,0,0,0,0,537,494,602,542,475,293,112,17,0,0,0,0,0,0,0,0,111597,38817,46384,30163,5825,14139,6943,2526,2580,658,206,0,0,32,947,0,41239,30606,31571,18650,7639,724,3479,842,352,150,54,0,7,11,27} Figure 3: Example of clear input in training set {0,240,2734,853,811,715,535,212,44,0,0,0,0,0,0,0,0,534,497,603,542,474,293,112,17,0,0,0,0,0,0,0,0,111447,39851,46280,30122,5796,14067,6953,2498,2569,621,179,0,015,681,0,41366,30474,31522,18612,7645,716,3524,847,357,158,54,0,7,11,28} Figure 4: Example of coded input in training set As number show there is a difference but here are the examples of two pictures without and with secret messages inside (figure 5 and figure 6). For the first view there is no