To Compress Or Not To Compress, Part II

Read the rest of this series:

A little while ago I did some testing on NTFS compression with HD Tune and Sisoft Sandra. The results seemed to favor using NTFS compression, but didn't really answer the question of which file types would benefit, and which would not. The tests gave a slight edge to the compressed drive for reading, but I didn't feel like one benchmark test was really conclusive enough, and benchmarks are not an indicator of real-world performance. So I decided to run a test of my own making using some real files, and see if I could get some better numbers.

For this test, I chose 3 groups of files:

  • AVI Video File
    • Uncompressed size: 700MiB
    • Compressed size: 697MiB
    • Compression ratio: 0.996
  • Documents Folder
    • Uncompressed size: 112MiB
    • Compressed size: 75MiB
    • Compression ratio: 0.666
  • Program Files Folder
    • Uncompressed size: 210MiB
    • Compressed size: 186MiB
    • Compression ratio: 0.844

The AVI file represents all general "media" files, including AVIs, MP3s, JPGs, etc... because all of those files are already compressed using their own algorithms, and additional compression from NTFS is expected to be negligible. The Documents folder contains various Word documents, Excel files, Text files, etc..., and the Program Files Folder represents general program files, containing .EXE, .DLL, and other program files.

For each set of files, I did four operations:

  • Copy from local drive to USB attached uncompressed drive
  • Copy from local drive to USB attached compressed drive
  • Copy from USB attached uncompressed drive to local drive
  • Copy from USB attached compressed drive to local drive

In all cases the local drive was uncompressed, USB is 2.0, and the USB attached drive is a regular disk drive, not a flash drive.

The test I performed was pretty simple, and was done on a Windows Vista SP1 64-bit system with the help of cygwin. The general process was:

sync; sync; time (cp -a source destination; sync; sync)

which means: flush the disk cache twice, copy the files, then flush the disk cache twice again. 'time' will report how long it took to run the commands inside the ()'s. I did this at least 3 times for each of the file sets and tests above, rebooting the computer in between each test to flush any other memory caches. I then averaged the times reported for each trial, and those averages are reported here.

Results

The results were interesting, and a little different than what I was expecting based on my previous tests.

Reading

This first test compares the amount of time it took to copy the files from the USB drive to the local drive, and shows the differences when the USB drive is compressed vs. uncompressed.

Reading the AVI Video file from the compressed drive adds almost 2 seconds because of the extra work that must be done to uncompress the large video file, even though it didn't benefit from the compression.

When looking at the "Documents" time, we can see that it takes over 10 seconds longer to copy when the data is compressed. I was hoping this test would support my theory that reading compressed data is faster because there is less data on the bus, but that doesn't seem to be the case here.

Finally, looking at the "Program Files", we can see that it took 1 second longer to copy data from the compressed drive. Again, I was hoping to see the compressed drive perform better, but this is about science and not about hope!

Writing

This test compares the amount of time it took to copy the files from the local drive to the USB drive, and shows the differences when the USB drive is compressed vs. uncompressed.

The AVI file took more than twice as long to copy, as the CPU futilely tries to compress the whole video file which can't be compressed any more than it already is. When copying "Documents", compression added about 4 seconds to the time, and for "Program Files" almost 6 seconds. It's clear that if you are doing anything that involves a lot of writing to the disk, compression will probably slow you down a good bit, especially if you're writing video files.

CPU Time

One of the nice things about the 'time' command is that it can tell you how much time a process spent doing different things. The results above report the actual "wall clock" time that you would see if you were using a stopwatch to time how long each operation took. 'time' also tells you how much time each process spent using the CPU.

The following two charts compare the average amount of time each test spent using the CPU. The more time spent using the processor, the more processing power was needed to complete the copy. In most cases, the CPU is sitting idle so you won't notice CPU performance differences when using compression. If, however, your CPU is running a full blast all the time (video games, video compression, mathematical modeling, etc...), then you may notice a small slowdown in that other application.

I think the Read test here is interesting. At first glance it seems like using compression causes less CPU load in some cases. I was unsure what to think about this result, then I considered something that I didn't mention before, which is the number of files in each test. The AVI file is only 1 file, while Program Files has 348 files and Documents contains 1892 files. It seems to correlate that more files in the test has an effect on the CPU load when combined with compression. This is something that I will look into in the future.

The Write test is directly inline with what you'd expect. Compressing data takes more processing power, and as a result the process spends more time using the CPU.

Compression

Finally, the questions of performance and time are really not relevant unless you take a look at the benefits provided by using compression.

For multimedia files, NTFS compression provides almost no benefits, and slows down the system when accessing those files (though only slightly). For Program Files however, you can gain about 15% more disk space by using compression, and since they are almost always accessed for reading and not for writing, the very small performance difference may be worth the additional disk space gained.

Finally, for Document files, around 33% of the space can be reclaimed when using compression, but that comes at the price of performance. Since documents are frequently written to, the additional time spent is probably not worth it. Also, documents frequently occupy relatively little space on a disk, compared to all of the other files, and are often the most accessed. The amount of space gained would most likely be a drop in the bucket, and the slower performance of those files would almost certainly be noticed by the user in the form of slower opening and saving of documents.

Conclusion

This test was by no means exhaustive, but I think I was able to capture a good cross-section of the files people have on their systems and what sort of performance to expect. As for answering the question: "Should I use NTFS compression?", the answer is a resounding: "It depends."

The results above should at least help to answer the question for a particular application. Disk space is cheap, and if you have enough of it, no compression at all is the way to go. On laptops where disks are often smaller, compression can benefit Program Files where it can save some of that valuable disk space (for me it's about 1.5GiB). Document files may be compressed more, but the performance impact is big enough to be noticeable, and documents that might benefit from compression are likely to occupy a very small portion of the disk. Even the high compression ratios may save only a few megabytes.

As for the previously-run benchmarks, these new results just don't sync up. I believe the results here are more indicative of true performance, since benchmarks don't take into account the differences in real-world data, and don't reflect real usage scenarios. Using files gives a much better idea of the performance to expect, and also eliminates some of the issues that Sandra probably ran into. My theory on the Sandra results is that the data that was used may have been uniform, which would make it easy to cache for the compression driver.

In any case, these tests should provide more data, and move the question further down the road to resolution.

Read the rest of this series:

Compressed not faster?

I was told that a compressed drive would read / write faster given enough CPU to do the math. Most new PCs and epically laptops have ample CPU cycles and are severally disk IO bound when it comes to max performance. A compressed disk would read / write more data per pass as it's compressed and so would actually speed up IO. This suggests this isn't true at all. Interesting... confused...

It is worth pointing out that the type of document you used also affects the amount of compression you get.

Office 2007+ documents (docx, xlsx, pptx, etc) are already ZIP compressed so they can't be compressed by any reasonable amount.

Pre Office 2007 documents (doc, xls, ppt, etc) on the other hand show good compression rates.