We have not confirmed this experimentally, we have yet only formed a hypothesis. 13). This deviation is too small to be seen in the ROC curves without a high magnification. Therefore we only show the results for the AES encrypted file centroid. 15 shows the results, where four clusters of algorithms are visible. 6. RESULTS Detection Rate vs. 14: The results for the Windows PE files centroid for algorithms scoring above 50% detection rate and below 50% false positives rate. 1. BFD, and BFD and RoC, at 11% false positives.

22: A contour plot for a Zip file centroid of 2-grams. The almost evenly distributed centroid has clearly visible patterns. 6 Algorithms One way to represent the performance of a categorisation algorithm is to use a confusion matrix. In our case the matrix shows the number of fragments categorised as different file types when using a specific centroid. Hence the rows represent different centroids, the actual classes, and the columns represent categorisations by the algorithms, meaning the predicted classes.

The DC values are given as the difference between two consecutive DC components. This is the first MCU after a restart marker, hence the DC value 3 − 0. The Huffman codes contain the number of consecutive zeros and the number of bits needed to encode the non-zero value. to increase the precision when creating file type models. 2 Correct decoding An obvious way of checking if two fragments can be joined together into a larger JPEG fragment is to look at the sequence of RST markers. We then use the requirement that for two consecutive fragments, i and i + 1, containing RST markers, the expression irst st RSTFi +1 = (RSTLa + 1) mod 8 i should be true.