How Does File Compression Work?
We've all interacted with compressed files at some point in our lives. A file comes in zip format and you have to extract it. Or maybe you want to send documents in a folder to a friend so you zip the folder and send it to them via email where they'll probably forget to download them if your friends are anything like mine. But how does compression work and does it have any benefits?
WHAT IS COMPRESSION?
Compression in physics is reducing the amount of space matter occupies, which only applies to gases. File compression is quite similar, it's reducing the amount of space a file occupies on your disk. This is very important because disk space is limited so we want to maximize the space as much as possible.
WHY COMPRESS FILES?
The reason we compress files is to make sending them to other devices easier and ensure optimal use of disk space. Imagine sending that folder full of work documents to your boss, or your ChatGPT code to your lecturer, sending 100MBs of data will require more time and computational power than 50MBs of data. So compressed files require less time and computational power from service providers to send. Which means more time for you and less cost for the servers handling the sending.
HOW DOES COMPRESSION WORK
Understanding the details of compression fully is something that is not meant for mortal men as Thor would say it. But I can make it simple to understand what happens so that it makes sense, then let the nerds like me go deeper into understanding the logic. File compression is always lossless compression, meaning no data is lost. To achieve this algorithms are used to find redundancies in the document and replace them with shorter code that occupies less disk space.
For example: AAAAABBBCC might be represented as 5A3B2C, that way we know the number of occurrences of each letter
Some algorithms, like ZIP, use a dictionary to store frequently occurring data patterns. When these patterns appear, the algorithm refers to the dictionary instead of repeating the data.
For my fellow nerds, common algorithms that are used are:
- Huffman Encoding: Uses shorter codes for frequently used data and longer codes for less frequent data.
- Lempel-Ziv (LZ77): Creates references to previous data sequences in the file.
How Folder Compression Works
When compressing folders, the compression tool combines the contents of all the files and folders into a single archive (e.g., ZIP or RAR file).
Each file is compressed individually using algorithms like those described above.
Metadata (file names, folder structure, permissions) is also stored within the compressed archive to ensure everything can be restored during extraction.
Advantages and Disadvantages
Advantages
- Saves storage space.
- Reduces transfer time for large files.
- Bundles multiple files into a single package (e.g., ZIP archives).
Disadvantages
- Compression takes time and processing power (try compressing a video and see how your CPU fan will sing).
- Compressed files need to be decompressed before use, which takes time.
Comments
Incredibly useful