Data Compression


Data Compression by Definition is the Formula or Method Allowing Keeping and Transferring the Original Digital Data Using Less Space and Network Bandwidth

In computer science and Information Theory data compression it is also called Bit-Rate Reduction: ability to use fewer bits than the original to deliver the same amount of information. Data compression can be lossy or lossless. Lossless compression reduces quantity of bits of by identifying and eliminating what’s extra, so called statistical redundancy, not functional data which could be attributed to noise or frequencies human ear is not capable to hear. Lossy compression prioritizes data based on its role in application and removes what’s marginal expecting the end user not to notice the loss of quality. Modern data management can hardly be imagined without the compression algorithms; because of it we are able to stream movies and videos over relatively slow networks, store large amounts of information on relatively small hard drives and fit huge movies on tiny optical discs. The loss of data is not the only problem with compression; the algorithm also has to be fast enough and CPU resource efficient in order to allow hardware to compress and decompress data quick enough for the end user to enjoy the music. 

Introduction and Background

Today we know lossless data compression theory and rate-distortion theory as Source Coding Theory; it set fundamental limits on the performance of data compression algorithms but does not actually specify any. It is much later, when the storage and bandwidth dilemmas started confronting us; the actual algorithms have been developed. There are additional mathematical principles and theorems such as Fourier transform found their place in the world of compression since then.The Theory of Compression was first proposed and formulated by Claude E. Shannon in 1948; the paper was called "A Mathematical Theory of Communication". He established that there is a limit to how far data can be squished before it starts losing its quality - there is a limit to lossless compression! Shannon also logically concluded that the quality of compressed information cannot possible be made better than the quality of the original, its mathematically impossible, Dah! The loss of data quality during the process of compression is called distortion and the lossless compression always has a distortion rate 0.


Lossless Data Compression

Lossless data became possible because most real-world data carries lots of noise, repetitive and dysfunctional information. Lossless data compression algorithms exploit statistical redundancy to decrease the size of the file without losing the quality of the original data. For example, an image may have some areas that have a color that is not changing across many pixels. In this case the algorithm defines the color once and then simply lists the table with the locations of pixels containing that color instead of listing each individual pixel separately. The same is with compressing text files: there are grammar-based algorithms that are able to extremely compress highly repetitive texts without losing any quality by identifying same words or patterns. There are several effective and refined technics are available today, usually coupled to famous algorithm called Arithmetic Coding. Arithmetic coding is often used in the Bi-level Image-compression standard JBIG, document data compression standard DjVu and text entry system Dasher.


Lossy Data Compression

Lossy data compression is using different principle than the lossless. In these algorithms some loss of information and deterioration of its quality is acceptable. Largely, lossy compression algorithms are guided by the principle of human perception - what human is not able to perceiveor can hardly perceive can be eliminated! For example, the human eye is more sensitive to brightness than it is to different color spectrums, therefore JPEG algorithm works by rounding-off marginal visual information. In lossy audio data compression, methods are being used to remove non-audible or less-audible signal components. Voice coding algorithms take it to the next level allowing even higher data compression rates, they are quite different from audio codecs because specialized techniques are being used to identify and remove all speech-irrelevant signals. 


Audio Codecs

Audio data compression is designed to reduce the transmission bandwidth and requiredstorage capacities to store the data. Lossy audio data compression provides high compression rates at the cost of sound quality. These algorithms, in attempt to reduce space required to store and transmit data, exploit psychoacoustics eliminating less audible and meaningful sounds. Lossy and lossless data compressions, both, use methods such as pattern recognition, coding and linear prediction to remove information redundancy and reduce the amount of information used to represent the original data. There is always a negotiated trade-off between storage size and loss of sound quality takes place during design of any audio data compression algorithm – smaller file gets, worse the quality. For example about 700 MB of space available on compact disc is sufficient to store less than 2 hours of uncompressed sound, while over 7 hours of MP3 format files can fit on the same disc. Higher data compressions allow squeezing 200 hours of speech in the same 700 MB space. Many lossless audio data compression formats are currently used. Better ones are: Free Lossless Audio Codec, Apple Lossless, MPEG-4 ALS, Windows Media Audio 9 Lossless and TTA. Some formats are hybrid: they feature combination of lossless and lossy algorithms allowing scalable compression levels. Such include MPEG-4, SLS, WavPack and others.


Video Codecs

Video data compression employs different coding approaches to reduce redundancy in video data. Most video data compression algorithms combine image compression and temporal motion compensation to achieve high data compression rates. Lots of video codecs utilize audio techniques to manage separate data streams. Most of video codecs use lossy data compression because of high compression requirements in that field. Video compression works with square-shaped groups of pixels called macroblocks.


These groups of pixels are being patterned from one frame to another so only changes have to be updated; therefore bit-rate fluctuates depending on how dynamic the motion picture is at the moment. Video compression algorithms have come a long way since the era of video begun and today extremely high compression rates are available on the market. Recently, new discoveries of Faster Fourier Transform principles will likely lead to the new era of data compression in near future.