Compression Test
Sun Review June 17, 2000

If you've downloaded anything from the Internet, you already know that almost every file you'll ever download is compressed or zipped. Whenever you see the three letter extension .zip, you know you're dealing with a compressed file. A compressed file not only reduces download times, but also lets you download all necessary program files (like help files and drivers) as one single file or archive. Zipping files on your hard drive can save space. But many people are unclear as to how zipping really works. Let's examine what happens when you compress a file using WinZip or any other file compression utility such as PKZip or PowerZip.

When you use a compression program to zip a file, the resulting file or archive is smaller and saves you disk space. When you unzip the file, the file is restored to its original condition without any data loss, giving you an exact replica of the original file. Believe it or not, the compression technology used in WinZip and other commonly used compression programs is relatively simple. These programs look for redundant or repetitive strings of data in the file. These repetitive strings are then replaced with "tokens", or smaller strings that signify the bits that have been replaced.

Files contain data in the form of a binary code consisting of 0's and 1's. An example string of 1's and 0's might look like this: 010111111110101. Notice that the number 1 is repeated 8 times in the middle of this string. A compression program might convert that string into 010 !81 0101, with the "!81" serving as a token for the eight 1's. The number of characters in this string has thus been reduced from 15 to 10, which means the size of this string has been reduced 30 percent.

When it's time to unzip this compressed string, our program knows that it should plug in eight 1's wherever it finds "!81". As a result, not a single bit is lost, and the string is restored to its original state. This type of compression is known as "lossless", and is especially useful for text files, databases and certain kinds of images. Text files are among the easiest to compress because they consist mainly of letters and numbers. You can expect at least a 50% compression rate for most text files. Databases may also contain many repetitive numbers or values, and can usually be squeezed down significantly in size. Photographs with large areas of the same colour may be reduced by an even greater percentage.

Lossless compression is not the only way to reduce file size. Lossless technologies cannot achieve compression rates much beyond 4:1, while so-called "lossy compression" techniques use more complicated algorithms to achieve compression rates of 300:1 or better. Often used for multimedia files, lossy compression takes advantage of the fact that audio and video files contain information that is not easily perceived by the human ear or eye. A program using lossy compression looks for data in the original file that can be eliminated without you noticing. The drawback to using lossy compression is that the uncompressed file is not an exact replica of the original.

Stay tuned! Next week's column: Just zip it! How to compress and decompress files using WinZip.

Links:

 

Back to Top

 

 

© 2000 Ingenius Webdesign