Project 27 Compress Files
“How do I compress and uncompress files?”
This project shows you how to compress (or zip) files. It covers the commands gzip, gunzip, bzip2, bunzip2, zip, and unzip.
Compress and Uncompress
Zipping is a cross-platform way to compress files. The compression is lossless, meaning that the original file can be reconstructed verbatim from the compressed file. Specialized compression techniques, such as the JPEG image compression format, are lossy, meaning that some information from the original image is lost in the compressed image.
Two compression formats are in widespread use:
- The original Lempel-Ziv coding (LZ77), implemented by the commands zip and unzip, and the GNU equivalents gzip and gunzip. We’ll concentrate on the GNU equivalents in this project.
- The newer Burrows-Wheeler algorithm, implemented by the commands bzip2 and bunzip2. Compression generally is considerably better than that achieved by LZ77.
Many files can be compressed into a single archive file with command zip or the Unix “tape archiver” tar, which is covered in Project 28.
Let’s simply compress and uncompress a file to demonstrate gzip and gunzip.
$ ls -lh -rw-r--r-- 1 saruman saruman 1M ... list-all.txt $ gzip list-all.txt $ ls -lh -rw-r--r-- 1 saruman saruman 282K ... list-all.txt.gz
You’ll notice three things: The compressed file is considerably smaller than the original, it has replaced the original, and it sports the extension .gz.
Now let’s uncompress the file (extension .gz is assumed if not given).
$ gunzip list-all.txt $ ls -lh -rw-r--r-- 1 saruman saruman 1M ... list-all.txt
You may want to keep the original file when, for example, you compress a file to email it. Use option -c, which sends the compressed file to standard out, and redirect standard out to an appropriately named file.
$ gzip -c list-all.txt > list-all.txt.gz $ ls -lh -rw-r--r-- 1 saruman saruman 1M ... list-all.txt -rw-r--r-- 1 saruman saruman 282K ... list-all.txt.gz
For the reverse case, in which you want to expand the compressed file and keep the original compressed copy, use gunzip with option -c. You must include the .gz extension in the filename when an uncompressed file with the same filename also exists.
$ gunzip -c list-all.txt > copy-of-list-all.txt gunzip: list-all.txt: not in gzip format $ gunzip -c list-all.txt.gz > copy-of-list-all.txt $ ls -lh -rw-r--r-- 1 saruman saruman 1M ... copy-of-list-all.txt -rw-r--r-- 1 saruman saruman 1M ... list-all.txt -rw-r--r-- 1 saruman saruman 282K ... list-all.txt.gz
Options -1 through -9 are used to set compression levels in gzip. Higher settings yield smaller compressed files but also increase compression times. The default setting is -6, so specify an option in the range -7 to -9 for better but slower compression, or use a setting from -5 to -1 for faster compression but larger compressed files.
$ gzip -9 -c list-all.txt >best.gz $ gzip -1 -c list-all.txt >worst.gz $ ls -lh -rw-r--r-- 1 saruman saruman 271K ... best.gz -rw-r--r-- 1 saruman saruman 1M ... list-all.txt -rw-r--r-- 1 saruman saruman 345K ... worst.gz
Option --best is equivalent to -9, and --fast is equivalent to -1.
Create Compressed Archives
Many files can be compressed into a single file with a command like
$ gzip -c *.txt > all.gz
Be warned, however, that when all.gz is uncompressed, it will not be split back into its constituent files.
If you want to archive many files into a single compressed file and be able to recover them as individual files, either use zip and unzip, or archive them first by using the tar command (see Project 28).
Here’s an example that uses zip to compress all the files in a directory called week1 into a single file. Command zip takes the name of the archive as its first argument, followed by a list of files to be deflated into the archive file. The wildcard pathname week1/* denotes every file in directory week1.
$ zip week1.zip week1/* adding: week1/friday.ws (deflated 48%) adding: week1/monday.ws (deflated 47%) adding: week1/thursday.ws (deflated 48%) adding: week1/tuesday.ws (deflated 46%) adding: week1/wednesday.ws (deflated 46%)
We can examine the contents of a zip file by giving option -l to unzip.
$ unzip -l week1.zip Archive: week1.zip Length Date Time Name ------ ---- ---- ---- 1712 05-03-104 17:22 week1/friday.ws 1593 05-03-104 17:22 week1/monday.ws 1546 05-03-104 17:22 week1/thursday.ws 1598 05-03-104 17:22 week1/tuesday.ws 1545 05-03-104 17:22 week1/wednesday.ws ------ ------- 7994 5 files
(These files were apparently created in the year 104!)
To unzip the archive, use
$ unzip week1.zip Archive: week1.zip inflating: week1/friday.ws inflating: week1/monday.ws inflating: week1/thursday.ws inflating: week1/tuesday.ws inflating: week1/wednesday.ws
To find out more about zip and unzip, run them without any arguments. Versions of Mac OS X older than 10.4 do not have man pages for either of them.
The bzip2 and bunzip2 commands are very similar to gzip and gunzip but use the newer Burrows-Wheeler algorithm to provide better compression. They use the extension .bz2 or sometimes just .bz.
Here’s a quick demonstration of gzip versus bzip2.
$ gzip -9 -c list-all.txt > list-all.txt.gz $ bzip2 -9 -c list-all.txt > list-all.txt.bz2 $ ls -lh -rw-r--r-- 1 saruman saruman 1M ... list-all.txt -rw-r--r-- 1 saruman saruman 222K ... list-all.txt.bz2 -rw-r--r-- 1 saruman saruman 271K ... list-all.txt.gz
If you attempt to uncompress a damaged bzip2 file, bunzip2 will warn you of data corruption. There’s a chance that you can recover the compressed file by using the bzip2recover command.