I have recently worked on JPEG Image Compression. Following is what I understood along the way. Following articles really helped me during the process.

Apart from the above articles, this video is also a good resource along with this paper.

I came across two types of JPEG compression algorithms. One is based on Discrete Cosine Transform (DCT) while the other is based on Discrete Wavelet Transform (DWT). I basically focused on the implementation using DCT and after I got it working, used the same pipeline except for replacing DWT with DCT.

The steps included in the compression phase are as follows:

  1. Color Transform
  2. Range Mapping
  3. Patch Generation
  4. DCT
  5. Quantization
  6. Zigzag Encoding
  7. Huffman Encoding

The steps included in the decompression phase are as follows:

  1. Huffman Decoding
  2. Zigzag Decoding
  3. Inverse Quantization
  4. Inverse DCT
  5. Inverse Patch Generation
  6. Inverse Range Mapping
  7. Inverse Color Transform

I will now try to further explain the steps.

It is a simple conversion from RGB to YCbCr.

It is the step to simply subtract 128 from the image to get the image range in [-128,127].

In this step, 8*8 patches are generated of the input image and all other operations are performed patch wise.

In this step Discrete Cosine Transform is applied on a patch.

It simply divides the pixels by a certain amount different for different channels.

I used the following quantization table for Y channel

Y_Q = [16 11 10 16 24 15 1 61;12 12 14 19 26 58 60 55;14 13 16 24 40 57 69 56;14 17 22 29 51 81 80 62;18 22 37 56 68 109 103 77;24 35 55 64 81 104 113 92;49 64 78 87 103 121 120 101;72 92 95 98 112 100 103 99];

The quantization table used for Cb and Cr channels is

Cb_Cr_Q = [17 18 24 47 99 99 99 99;18 21 26 66 99 99 99 99;24 26 56 99 99 99 99 99;47 66 99 99 99 99 99 99;99 99 99 99 99 99 99 99;99 99 99 99 99 99 99 99;99 99 99 99 99 99 99 99;99 99 99 99 99 99 99 99];

This step involves rearranging the 64 values in a zigzag pattern such that all 0’s appear collectively at the end. The zigzag pattern is as follows:

RLE_i = [1 2 9 17 10 3 4 11;18 25 33 26 19 12 5 6;13 20 27 34 41 49 42 35;28 21 14 7 8 15 22 29;36 43 50 57 58 51 44 37;30 23 16 24 31 38 45 52;59 60 53 46 39 32 40 47;54 61 62 55 48 56 63 64];

Huffman encoding follows a tree structure to encode in the form of 0’s and 1’s using probability of each distinct value.

The decompression process is literally the inverse of each individual step in compression applied reverse in order. For instance, Inverse quantization is basically multiplying the same tables which were divided in compression. Inverse range mapping simply involves adding 128 to the image instead of subtracting it. Inverse color transform simply includes the conversion from YCbCr to RGB.

I also tried Discrete Wavelet Transform for which I kept the entire process exactly the same just replacing DCT with DWT.

As far as 2D images are concerned, same process is followed except that at the start all three channels are created with the same values so as to convert it in to a 3D image.

Compression using DCT, Left: RGB Image after decompression, Right: Gray Scale Image after decompression
Compression using DWT, Left: RGB Image after decompression, Right: Gray Scale Image after decompression

Now, I will explain the code structure.

It contains the code for the complete pipeline using DCT.

It contains the complete pipeline using DWT.

It contains only the compression part using DCT.

It contains only the decompression part using DCT.

It contains only the compression part using DWT.

It contains only the decompression part using DWT.

Compression using DCT with GUI.

Compression using DWT with GUI.

Code can be accessed using this repository.

So, what’s next? I don’t know. Most probably another implementation but when? I hope soon.