Related essay: Image processing
CHAPTER-3
IMAGE COMPRESSION TECHNIQUES
3.1 NEED FOR COMPRESSION
Compression is necessary in modern data transfer and processing whether it is performed on data or an image/video file as transmission and storage of uncompressed video would be extremely costly and impractical. Frames with 352 x 288 contain 202,752 bytes of information. Recording of uncompressed version of this video at 15 frames per second would require 3 MB. As 180 MB of data storage would be required for 1 minute and hence one 24 hours day would be utilized to store 262 GB of database.
Using Compression, at 15 frames per seconds, it takes 24 hrs would take only 1.4 GB and hence 187 days of video could be stored using the same disk space that uncompressed video would use in one day. Hence, Compression while maintaining the image quality is must for digital data, image or video file transfer in fast way and lesser amount of time.
The change from the cine film to digital methods of image exchange and archival is primarily motivated by the ease and flexibility of handling digital image information instead of the film media. While preparing this step and developing standards for digital image communication, one has to make absolutely sure that also the image quality of coronary angiograms and ventriculograms is maintained or improved. Similar requirements exist also in echocardiography.
Regarding image quality, the most critical step in going from the analog world (cine film or high definition live video in the catheterization laboratory) to the digital world is the digitization of the signals. For this step, the basic requirement of maintaining image quality is easily translated into two basic quantitative parameters:
‘ The rate of digital image data transfer or data rate (Megabit per second or Mb/s)
‘ The total amount of digital storage required or data capacity (Megabyte or M Byte).
Computer technology, however, provides flexible principles for processing large amounts of information. Among the algorithms available is image data reduction or ‘image compression’. The principal approach in data compression is the reduction of the amount of image data (bits) while preserving information (image details). This technology is a key enabling factor in many imaging and multimedia concepts outside of medicine. At a closer look one observes that ad hoc approaches to image data compression have been applied in most digital imaging systems for the catheterization laboratory all the time. An example is recording the x-ray images with a smaller matrix of just 512 by 512 pixels (instead of the 1024 by 1024 pixel matrix often applied for real-time displays). In order to objectively assess these and other techniques of image data compression, some systematic knowledge of the tradeoffs implied in different modes of image data reduction is mandatory.
3.2 LOSSLESS AND LOSSY IMAGE COMPRESSION
3.2.1 LOSSLESS IMAGE COMPRESSION
A loss of information is, however, totally avoided in lossless compression, where image data are reduced while image information is totally preserved. It uses the predictive encoding which uses the gray level of each pixel to predict the gray value of its right neighbor. Only the small deviation from this prediction is stored. This is a first step of lossless data reduction. Its effect is to change the statistics of the image signal drastically. Statistical encoding is another important approach to lossless data reduction. Statistical encoding can be especially successful if the gray level statistics of the images has already been changed by predictive coding. The overall result is redundancy reduction that is reduction of the reiteration of the same bit patterns in the data. Of course, when reading the reduced image data, these processes can be performed in reverse order without any error and thus the original image is recovered. Lossless compression is therefore also called reversible compression.
When hearing that image data are reduced, one could expect that automatically also the image quality will be reduced. A loss of information is, however, totally avoided in lossless compression, where image data are reduced while image information is totally preserved.
A simple example demonstrates one of the strategies applied. Let us assume that in one horizontal line of an image the following sequence of gray levels is encountered when starting from the leftmost pixel of that line and going to the right:
212 214 220 222 216 212 212 214…
These gray levels are usually stored as 8-bit-numbers (1Byte). Obviously much smaller numbers or ‘codes’ are involved if one transfers only the first value directly, followed by the differences to the preceding gray levels:
+212 +2 +6 +2 -6 -4 0 +2….
3.2.2 TECHNIQUES FOR LOSLESS IMAGE COMPRESSION
3.2.2.1 PREDICTIVE ENCODING
This strategy of data reduction is called ‘predictive encoding’, since we use the gray level of each pixel to predict the gray value of its right neighbor. Only the small deviation from this prediction is stored. This is a first step of lossless data reduction. Its effect is to change the statistics of the image signal drastically: typically 80% of the pixels in the resulting ‘difference image’ will now require just 8 gray levels (3 bits plus sign). Of course, we can still reproduce the original gray level values from these reduced data without any error if we only know the rule that was applied when generating the sequence.
3.2.2.2 STATISTICAL ENCODING
Statistical encoding is another important approach to lossless data reduction. This term sounds very complex, but a similar trick in information coding had already been used by the famous American inventor Samuel Morse more than 150 years ago for his electromagnetic telegraph. A frequently occurring letter such as ‘e’ is transmitted as a single dot ‘ . ‘, while an infrequent ‘x’ requires four Morse symbols ‘ – . . – ‘. In this way the mean data rate required to transmit an English text is decreased as compared to a solution where each letter of the alphabet is coded with the same number of basic symbols. Accordingly in image transmission, short code words or bit sequences (one to four bits) will be used for frequently occurring small gray level differences (0, +1, -1, +2, -2 etc.), while long code words are used for the large differences (for instance the 212 in our example) with their very infrequent occurrence.
Statistical encoding can be especially successful if the gray level statistics of the images has already been changed by predictive coding. The overall result is redundancy reduction that is reduction of the reiteration of the same bit patterns in the data. Of course, when reading the reduced image data, these processes can be performed in reverse order without any error and thus the original image is recovered. Lossless compression is therefore also called reversible compression. Data compression factors (number of bits required for uncompressed image data divided by number of bits for compressed image data) of 2 to nearly 4 can be attained by reversible compression. A poster (P1672) at this congress will present detailed data on the compression factors attainable.
3.2.3 LOSSY IMAGE COMPRESSON
Lossy data compression has of course a strong negative connotation and sometimes it is doubted quite emotionally that it is at all applicable in medical imaging. In transform encoding one performs for each image run a mathematical transformation that is similar to the Fourier transform thus separating image information on gradual spatial variation of brightness (regions of essentially constant brightness) from information with faster variation of brightness at edges of the image (compare: the grouping by the editor of news according to the classes of contents). In the next step, the information on slower changes is transmitted essentially lossless (compare: careful reading of highly relevant pages in the newspaper), but information on faster local changes is communicated with lower accuracy (compare: looking only at the large headings on the less relevant pages). In image data reduction, this second step is called quantization. Since this quantization step cannot be reversed when decompressing the data, the overall compression is ‘lossy’ or ‘irreversible’.
However, please note that many imaging systems for angiography primarily acquire images as a 1024*1024 pixel matrix while they transfer only 512*512 pixels per image to local storage and to exchange media. In this step, part of the image information is irreversibly lost. So this is an example for a method of data compression that is lossy but that provides a digital image recording format that is presently widely accepted in cardiology. Therefore, instead of banning lossy compression in general, we should discuss objective criteria for the acceptability of specific methods of lossy data compression in coronary angiography.
We all know different strategies for tolerable lossy data reduction also from daily life. For instance, nobody will read all the information offered in a newspaper, so the overall process of information distribution by newspapers is an example of lossy information handling. The first step in this traditional type of ‘information processing’ occurs when the editors divide the incoming events into groups such as world politics, economy, local affairs, and sports. Each of these groups is presented on one (or several) specific pages of the newspaper. Moreover, on each of these pages, large headings draw the attention to those topics that are most important. Note that this first step of the overall process is essentially lossless.
It is this grouping of semantically similar information on certain pages (e.g. sports) which greatly simplifies selective ‘lossy’ data reduction as performed by the reader in a second step. While scanning the contents of the newspaper, he or she can concentrate on those few pages that are especially relevant to him or her (for instance politics). Time is saved by browsing very quickly through the other pages – those that are less relevant to the reader – and reading on those pages mainly the articles with the largest headings. Since the reader will experience a loss of information as compared to the total information offered, this step must be considered as ‘lossy data compression’. But since the information lost is not especially relevant to the reader, he or she will tolerate this and will be glad to use the limited time available for reading the most relevant information. Since the information lost in this process is systematically selected by the reader to be from the less relevant pages of the newspaper, this everyday type of lossy data compression can also be called ‘irrelevancy reduction’.
A trivial example for lossy compression of image data is selecting only 4 to 10 of the most relevant images for exposing them to a multiformat film that is to be sent to a referring physician. This is an interactive strategy of irrelevancy reduction. Usually, however, one applies an automatic algorithm using essentially the same two-step strategy of ‘irrelevancy reduction’ as described in the newspaper example above. For instance, in transform encoding one performs for each image of the cine run a mathematical transformation that is similar to the Fourier transform thus separating image information on gradual spatial variation of brightness (regions of essentially constant brightness) from information with faster variation of brightness at edges of the image (compare: the grouping by the editor of news according to the classes of contents). In the next step, the information on slower changes is transmitted essentially lossless (compare: careful reading of highly relevant pages in the newspaper), but information on faster local changes is communicated with lower accuracy (compare: looking only at the large headings on the less relevant pages). In image data reduction, this second step is called quantization. Since this quantization step cannot be reversed when decompressing the data, the overall compression is ‘lossy’ or ‘irreversible’.
Usually the main overall effect of transform encoding plus quantization is that small structures or ‘edges’ having low contrast are supressed. The applicability of this strategy is based on the experience that low-contrast edges usually do not contribute information that is important for image interpretation by the human visual system. Larger edge signals are essentially preserved in the quantization process. Table 2 summarizes some of the characteristics of lossless and lossy image compression techniques as described in the previous paragraphs.
Table 2 Overview on principal strategies and methods in lossless (middle)
and lossy (right) image compression.
While of course images coded by lossless techniques do not differ in any detail from the original images, lossy images may differ because of lost details or because of artifacts added in the compression process (for instance JPEG ‘blockiness’ artifacts). In the examples shown, the images are shown after edge enhancement, contrast stretching (‘windowing’) and upscanning (1024*1024) for improved visibility of small structures (including the ‘blockiness’ artifacts). The left image is uncompressed, and relatively high data compression factors have been selected for the two other images (CR= 12 and CR=
Fig.1 Comparison of original coronary angiogram (left) with two compression results.
Middle: JPEG data compression by factor of 12, right: factor of 24.
PRINCIPLES BHIND IMAGE COMPRESSION
A common characteristic of most images is that the neighboring pixels are correlated and therefore contain redundant information. The foremost task then is to find less correlated representation of the image. Two fundamental components of compression are redundancy and irrelevancy reduction. Redundancies reduction aims at removing duplication from the signal source (image/video). Irrelevancy reduction omits parts of the signal that will not be noticed by the signal receiver, namely the Human Visual System. In an image, which consists of a sequence of images, there are three types of redundancies in order to compress file size. They are:
Coding redundancy: Fewer bits to represent frequent symbols.
Inter pixel redundancy: Neighboring pixels have similar values.
Psycho visual redundancy: Human visual system cannot simultaneously distinguish all colors.
3.4 APPLICATIONS
Over the years, the need for image compression has grown steadily. Currently it is recognized as an ‘enabling technology.’ It plays a crucial role in many important and diverse applications [1,2] such as:
i. Business documents, where lossy compression is prohibited for legal reasons.
ii. Satellite images, where the data loss is undesirable because of image collecting cost.
iii. Medical images, where difference in original image and uncompressed one can compromise diagnostic accuracy.
iv. Tele-videoconferencing.
v. Remote sensing.
vi. Space and hazardous waste control applications.
vii. Control of remotely piloted vehicles in military.
viii. ‘Facsimile transmission (FAX).
Image compression has been and continues to be crucial to the growth of multimedia computing. In addition, it is the natural technology for handling the increased spatial resolutions of today’ s imaging sensors and evolving broadcast television standards.
3.5 IMAGE COMPRESSION TECHNIQUES:
Two general techniques for reducing the amount of data required to represent an image are Lossless compression and Lossy compression. In both of these techniques one or more redundancies as discussed in last chapter is removed. However, these techniques are combined to form practical image compression system.
Generally a compression system consists of two distinct structural blocks: an encoder and a decoder. An input image f (x,y) is fed into the encoder, which creates a set of symbols from the input data. After transmission over the channel, the encoded representation is fed to the decoder, where a restructured output image g (x,y) is generated. In general g (x,y) may or may not be an exact replica of f (x,y).
3.6 A General Compression Model
A general compression model is shown in figure 3.1. It shows that encoder and decoder consist of two relatively independent functions or sub blocks [1]. The encoder is made up of source encoder, which removes input redundancies, and a channel encoder, which increases the noise immunity of the source encoder’s output. Similarly, the decoder includes a channel decoder followed by a source decoder. If the channel between the encoder and decoder is noise free, the channel encoder and decoder are omitted, and the general encoder and decoder is noise free, the channel encoder and decoder are omitted, and the general encoder and decoder become the source encoder and decoder, respectively.
Figure 3.1: General Compression Model
3.6.1 The Source Encoder
The source encoder is responsible for reducing or eliminating any coding, interpixel, or psychovisual redundancies in the input image. The specific application dictates the best encoding approach. Normally, the approach can be modeled by a series of three independent operations. Operation is designed to reduce one of the three redundancies discussed earlier.
Figures 3.2 (a) Source encoder
Figures 3.2 (b) Source decoder
3.6.2 Mapper
In the first stage of the source encoding process, the mapper transforms the input data into a (usually non-visual) format designed to reduce interpixel redundancies in the input image. This operation generally is reversible and may or may not reduce directly the amount of data required to represent the image.
3.6.3 Quantizer
The second stage or quantizer block reduces the accuracy of the mapper’s output in accordance with some pre-established fidelity criterion. This stage reduces the Psychovisual redundancies of the input image. This operation is irreversible. Thus, it must be omitted when error-free compression is desired.
3.6.4 Symbol Encoder
In the third and final stage of source encoding processes, the symbol coder creates a fixed or variable-length code to represent the quantizer output and maps the output in accordance with the code. The term symbol coder distinguishes this coding operation from the overall source encoding processes. In most cases, a variable length code is used to represent the mapped and quantized data set. It assigns the shortest code words to the most, frequently occurring output values and thus reduces coding redundancy. The operation is completely reversible. Upon completion of symbol coding step, the input image has been processed to remove each of the three redundancies discussed earlier. It is shown that the source encoding processes consist three successive operations, but all three operations are not necessarily included in every compression. For example, the quantizer must be omitted when error free compression is desired. In addition, some compression techniques normally are modeled by merging blocks that are physically separate in figure 3.2 (a).
3.6.5 Source Decoder
The source decoder shown in figure contains only two components: a symbol decoder and an inverse mapper. These blocks perform, in reverse order, the inverse operations of the source encoder’s symbol encoder and mapper blocks. Because quantization results in irreversible information loss, an inverse quantizer block is not included in the general source decoder model shown in the figure 3.2 (b).
3.6.6 Channel Encoder and Decoder
The channel encoder and decoder play an important role in the overall encoding-decoding process when the channel of above figure 3.1 is noisy or prone to error. They are designed to reduce the impact of channel noise by inserting a controlled form of redundancy into the source-encoded data. As the output of the source encoder contains little redundancy, it would be highly sensitive to transmission noise without the addition of this “controlled redundancy”.
3.7 Lossless Compression Techniques
In lossless compression scheme, the reconstructed image after compression, is numerically identical to the original image, i.e. original image can be reconstructed without any errors. However lossless compression can only achieve modest amount of compression. This is important for applications like compression of text. It is very important that the reconstruction is identical to the original text, as very small differences can result in statements with very different meanings. Consider the sentences, ‘do now send money’ and ‘do not send money’. A similar argument holds for computer files and for certain types of data such as bank records. Various techniques for lossless compression are below:
3.7.1 Huffman Coding
The basic idea in Huffman coding is to assign short codeword to those input blocks with high probabilities and long code words to those with low probability. A Huffman code is designed by merging together the two least probable characters, and repeating this process until there is only one character remaining. A code tree is thus generated and the Huffman code is obtained from the labeling of the code tree [11]. An example of how this is done is
shown in table 3.1.
Table 3.1: Huffman Source Reductions
At the far left, a hypothetical set of the source symbols and their probabilities are ordered from top to bottom in terms of decreasing probability values. To form the first source reductions, the bottom two probabilities, 0.06 and 0.04 are combined to form a “compound symbol” with probability 0.1. This compound symbol and its associated probability are placed in the first source reduction column so that the probabilities of the reduced source are also ordered from the most to the least probable. This process is than repeated until a reduced source with two symbols (at the far right) is reached. The second step if Huffman’s procedure is to code each reduced source, starting with the smallest source and working back to its original source. The minimal length binary code for a two-symbol source, of course, is the symbols 0 and 1. As shown in table 3.2, these symbols are assigned to the two symbols on the right (the assignment is arbitrary; reversing the order of the 0 and would work just and well). As the reduced source symbol with probabilities 0.6 was generated by combining two symbols in the reduced source to its left, the 0 used to code it is now assigned to both of these symbols, and a 0 and 1 are arbitrary appended to each to distinguish them from each other. This operation is then repeated for each reduced source
until the original course is reached. The final code appears at the far-left in table 3.2. The average length of the code is given by the average of the product of probability of the symbol and number of bits used to encode it. This is calculated below:
Lavg = (0.4)(1) + (0.3)(2) + (0.1)(3) + (0.1)(4) + (0.06)(5) + (0.04)(5) = 2.2 bits/ symbol and
the entropy of the source is 2.14 bits/symbol, the resulting Huffman code efficiency is
2.14/2.2 = 0.973.
Table 3.2: Huffman Code Assignment Procedure
3.7.2 Arithmetic Coding
Arithmetic coding generates non-block codes. In arithmetic coding, a one-to-one correspondence between source symbols and code words does not exist. Instead an entire sequence of source symbols (or message) is assigned a single arithmetic code word. The code word itself defines an interval or real numbers between 0 and 1. As the number of symbols in the message increases, the interval used to represent it becomes smaller and the number of information units (say, bits) required to represent the interval becomes larger.
Each symbol of the message reduces the size of the interval in accordance with its probability of occurrence.
Figure 3.3: Arithmetic Coding Procedure
3.7.3 Run Length Coding
The technique of run length coding exploits the high interpixel redundancy that exists in relatively simple images [2]. In run length coding we look for gray levels that repeat along each row of the image. A ‘run’ of consecutive pixels whose gray level is identical is replaced with two values the length of the run and the gray level of all the pixels in the run.
Hence, the sequence (50, 50, 50, 50) becomes (4, 50). Run length coding can be applied on a row-by-row basis, or we can consider the image to be a one-dimensional data stream in which the last pixel in a row is adjacent to the first pixel in the next row. This can lead to slightly higher compression ratio if the left and right’hand sides of the image are similar. For the special case of binary images, we don’t need to record the value of a run, unless it is the first run of the row. This is because there are only two possible values for a pixel in a binary image. If the first run has one of the values, the second run implicitly has the other value; the third run implicitly has the same value as the first, and so on. Note that, if the run is of length 1, run length coding replaces one value with a pair of values. It is therefore possible for run length coding to increase the size of the dataset in images where run of length 1 are numerous. This might be the case in noisy or highly textured images. For this reason, it is most useful for the compression of binary images or very simple grayscale images.
3.7.4 Delta Compression
Delta compression (also known as differential coding) is a very simple, lossless technique in which we recode an image in terms of the differences in gray level between each pixel and the previous pixel in the row. The first pixel, of course, must be represented as an absolute value, but subsequent values can be represented as differences, or ‘deltas’. Most of those differences will be very small, because gradual changes in gray level are more frequent than sudden changes in the majority of image. These small differences can be coded using fewer bits. Thus, delta compression exploits interpixel redundancy to create coding redundancy, which we than remove to achieve compression.
3.8 Lossy Compression Techniques
Lossy compression schemes involve some loss of information, and data that have been compressed using lossy techniques generally cannot be recovered or reconstructed exactly. Often this is because the compression completely discards redundant information. However, lossy schemes are capable of achieving much higher compression. This is important for applications like TV signals, teleconferencing. Here is tradeoff between compression and accuracy. Various techniques for lossy compression are discussed below:
3.8.1 Lossy Predictive Coding
A quantizer, that also executes rounding, is added between the calculation of the prediction error e n and the symbol encoder. It maps e n to a limited range of values q n and determines both the amount of extra compression and the deviation of the error-free compression [1,2]. This happens in a closed circuit with the predictor to restrict an increase in errors. The predictor does not use e n but rather q n, because both the encoder and decoder know it.
Figure 3.4: A lossy predictive coding model: (a) encoder; (b) decoder
3.8.2 Transform Coding
Transform coding first transforms the image from its spatial domain representation to a different type of representation using some well-known transform and then codes the transformed values (coefficients). The goal of the transformation process is to decorrelate the pixels of each subimage, or to pack as much information as possible into the smallest number of transforms coefficients [12]. This method provides greater data compression compared to predictive methods, although at the expense of greater computational requirements. The choice of particular transform in a given application depends on the amount of reconstruction error that can be tolerated and the computational resources available.
3.8.2.1 General Model
As shown in figure 3.5 (a), encoder performs three relatively straightforward operations i.e. Sub image decomposition, Transformation and Quantization. The decoder implements the inverse sequence of steps with the exception of the quantization function of the encoder shown in figure 3.5 (b).
Figure 3.7: The Zig-Zag Sequence
Different types of Transforms used for coding are:
1. FT (Fourier Transform)
2. DCT (Discrete Cosine Transform)
3. DWT (Discrete Wavelet Transform)
1) Fourier Transforms: The Fourier Transform utility lies in its ability to analyze a signal in the time domain for its frequency content. The transform works by first translating a function in the time domain into a function in the frequency domain. The signal can then be analyzed for its frequency content because the Fourier coefficients of the Transformed function represent the contribution of each sine and cosine function at each frequency. An Inverse Fourier Transform does just what you’d expect; transform data from the frequency domain into the time domain.
I. Discrete Fourier Transforms:
The Discrete Fourier Transform (DFT) estimates the Fourier Transform of a function from a finite number of its sampled points. The sampled points are supposed to be typical of what the signal looks like at all other times. The DFT has symmetry properties almost exactly the same as the continuous Fourier Transform. In addition, the formula for the inverse Discrete Fourier Transform because the two formulas are identical.
The Fourier Transform is an important image processing tool which is used to decompose an image into its sine and cosine components. The output of the transformation represents the image in the Fourier or frequency domain, while the input image is the spatial domain equivalent. In the Fourier domain image, each point represents a particular frequency contained in the spatial domain image. The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering, image reconstruction and image compression. The DFT is the sampled Fourier Transform and therefore does not contain all frequencies forming an image, but only a set of samples which is large enough to fully describe the spatial domain image. The number of frequencies corresponds to the number of pixels in the spatial domain image, i.e. the image in the spatial and Fourier domain are of the same size. For a square image of size N??N, the two-dimensional DFT is given by:
where f(a,b) is the image in the spatial domain and the exponential term is the basis function corresponding to each point F(k,l) in the Fourier space. The equation can be interpreted as: the
value of each point F(k,l) is obtained by multiplying the spatial image with the corresponding base function and summing the result. In a similar way, the Fourier image can be re-transformed to the spatial domain. The inverse Fourier transform is given by:
a)
b)
d)
Figure 3.8 : a) Input Image without compression
b) and c) Discrete Fourier Transforms
d) Output Image with Inverse transform that has been reconstructed
II. Windowed Fourier Transform / Short Time Fourier Transform:
If f(t) is a non periodic signal, the summation of the periodic functions, sine and cosine, does not accurately represent the signal. The Windowed Fourier Transform (WFT) is one solution to the problem of the better representing the non periodic signal. The WFT can be used to give information about signals simultaneously in the time domain and in the frequency domain.
With the WFT, the input signal f(t) is chopped up into sections, and each is analyzed for its frequency content separately. This window is accomplished via a
weight function that places less emphasis near the interval’s endpoints than in the middle. The effect of the window is to localize the signal in time.
III. Fast Fourier Transform:
To approximate a function by samples, and to approximate the Fourier integral by the Discrete Fourier Transform, requires applying a matrix whose order is the number sample points n. Since multiplying an n x n matrix by a vector costs on the order of n2 arithmetic operations, the problem gets worse as the number of sample points increases. However, if the samples are uniformly spaced, then the Fourier matrix can be factored into a product of just a few sparse matrices, and the resulting factors can be applied to a vector in total of order n log n arithmetic operations. This is the so called Fast Fourier Transform or FFT.
The big disadvantage of the Fourier expansion however is that it has only frequency resolution and no time resolution. This means that although we might be able to determine all the frequencies present in a signal, we do not know when they are present. To overcome this problem in the past decades several solutions have been developed which are more or less able to represent a signal in the time and frequency domain at the same time. The idea behind this Time-Frequency joint representations is to cut the signal of the interest parts and then analyze the parts separately. It is clear that analyzing a signal this way will give more information about when and where are the different frequency components.
A fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. ” There are many distinct FFT algorithms involving a wide range of mathematics, from simple arithmetic to group theory and number theory. A DFT decomposes a sequence of values into components of different frequencies. This operation is useful in many fields but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly: computing a DFT of N points in the naive way, using the definition, takes O(N2) arithmetical operations, while an FFT can compute the same result in only O(N log N) operations. The difference in speed can be substantial, especially for long data sets where N may be in the thousands or millions’in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N / log (N). This huge improvement made many DFT-based algorithms practical; FFTs are of great importance to a wide variety of applications, from digital and solving partial differential equations to algorithms for quick multiplication of large integers.
a) b)
c) d)
Figure 3.8 :
a) The Original Cameraman Image
b) Cameraman Image after decompression (cut off=20, MSE=36.82) using FFT
c) Cameraman Image after decompression (cut off=40, MSE=102.43) using FFT
d) Cameraman Image after decompression (cut off=60, MSE=164.16) using FFT
2. The Discrete Cosine Transform (DCT):
The discrete cosine transform (DCT) helps separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image’s visual quality). The DCT is similar to the discrete Fourier transform: it transforms a signal or image from the spatial domain to the frequency domain.
3. Discrete Wavelet Transform (DWT):
The discrete wavelet transform (DWT) refers to wavelet transforms for which the wavelets are discretely sampled. A transform which localizes a function both in space and scaling and has some desirable properties compared to the Fourier transform. The transform is based on a wavelet matrix, which can be computed more quickly than the analogous Fourier matrix. Most notably, the discrete wavelet transform is used for signal coding, where the properties of the transform are exploited to represent a discrete signal in a more redundant form, often as a preconditioning for data compression. The discrete wavelet transform has a huge number of applications in Science, Engineering, Mathematics and Computer Science. Wavelet compression is a form of data compression well suited for image compression (sometimes also video compression and audio compression). The goal is to store image data in as little space as possible in a file. A certain loss of quality is accepted (lossy compression).Using a wavelet transform, the wavelet compression methods are better at representing transients, such as percussion sounds in audio, or high-frequency components in two-dimensional images, for example an image of stars on a night sky. This means that the transient elements of a data signal can be represented by a smaller amount of information than would be the case if some other transform, such as the more widespread discrete cosine transform, had been used. First a wavelet transform is applied. This produces as many coefficients as there are pixels in the image (i.e.: there is no compression yet since it is only a transform). These coefficients can then be compressed more easily because the information is statistically concentrated in just a few coefficients. This principle is called transform coding. After that, the coefficients are quantized and the quantized values are entropy encoded and/or run length encoded.
Examples for Wavelet Compressions:
‘ JPEG 2000
‘ Ogg
‘ Tarkin
‘ SPIHT
‘ MrSID
‘ Dirac
Quantization:
Quantization involved in image processing. Quantization techniques generally compress by compressing a range of values to a single quantum value. By reducing the number of discrete symbols in a given stream, the stream becomes more compressible. For example seeking to reduce the number of colors required to represent an image. Another widely used example ‘ DCT data quantization in JPEG and DWT data quantization in JPEG 2000.
Entropy Encoding
An entropy encoding is a coding scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols. Typically, entropy encoders are used to compress data by replacing symbols represented by equal-length codes with symbols represented by codes proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes. According to Shannon’s source coding theorem, the optimal code length for a symbol is ‘logbP, where b is the number of symbols used to make output codes and P is the probability of the input symbol. Three of the most common entropy encoding techniques are Huffman coding, range encoding, and arithmetic coding. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code such as unary coding, Elias gamma coding, Fibonacci coding, Golomb coding, or Rice coding may be useful.
There are three main techniques for achieving entropy coding:
‘ Huffman Coding – one of the simplest variable length coding schemes.
‘ Run-length Coding (RLC) – very useful for binary data containing long runs of ones of zeros.
‘ Arithmetic Coding – a relatively new variable length coding scheme that can combine the best features of Huffman and run-length coding, and also adapt to data with non-stationary statistics.
...(download the rest of the essay above)