Sie sind auf Seite 1von 7

Tuning Baseline JPEG Encoding for Individual Images

EE398A

Chang-chuan Yu
Electrical Engineering Department Stanford University Stanford, CA georgeyu@stanford.edu

Chien-An Lai
Electrical Engineering Department Stanford University Stanford, CA laichnan@stanford.edu

Abstract in this class project, a joint optimization scheme for quantization table, threshold table, and Huffman table for baseline JPEG is presented and examined. A Lagrangian multiplier is used to jointly consider rate and distortion. The optimization is under the constraint of baseline JPEG and is designed to be decoded with a standard JPEG decoder. Besides finding the optimal quantization and Huffman encoding table, an optimal thresholding scheme is also incorporated in the encoder to further reduce the rate. The performance of the optimized JPEG encoder is verified by a rate-distortion metric, compared to JPEG2000 as well as a baseline JPEG with standard recommended tables. Keywords-component; JPEG, Threshold, Huffman Table. JPEG-2000, Quantization,

II.

ALGOTITHM DISCRIPTION

The baseline JPEG standard uses one unified 8*8 quantization table for all the 8*8 blocks in the whole image. The idea of this project is to squeeze as much performance gain as possible under the constraint that the coded image can be decoded by a standard baseline JPEG decoder. To reserve the compatibility, only the tables can be changed. Another way of increasing the flexibility without violating the baseline standard is to insert a thresholding module that filters out some coefficients before sending, as shown in Fig. 1. The algorithms in subsequent sections focus on how to jointly implement optimization for the three flexible parts, namely, quantization, thresholding, and Huffman coding.

I. INTRODUCTION JPEG is an image compression standard which has been widely used. The idea for baseline JPEG is to first divide image into 8*8 blocks and then transformed by 2-D DFT into coefficient domain. 8*8 quantization matrix is applied to each block in coefficient domain to get reasonable rate requirement for tolerable distortion. After quantization, run-length code in zigzag order and Huffman encoding is applied to remove the redundancy for each block. The baseline JPEG standard leaves engineers with the flexibility of changing the quantization table and Huffman table; however, in practice, the default recommended quantization and Huffman tables are adopted in most cases. Rate control is often done via scaling of the quantization table. In order to get better performance in terms of bit rate and distortion, a joint thresholding and quantization selection algorithm is specified in [2]. In section II, we will discuss the basic idea and accelerated searching for quantization matrix, threshold matrix, and Huffman table. These are created based on compatibility to baseline JPEG decoding. In section III, we will show the performance figure, rate-PSNR plots of the optimizing quantization matrix and threshold. We will discuss the phenomenon observed, and give these observations explanations. In section IV, we will briefly review the basic idea and give a brief conclusion.

Figure 1. block diagram of baseline JPEG and modified block diagram with thresholding module

A. Cost Function and Performance Metric The quantization table from JPEG recommendation is not only designed to produce low mean square error, instead it is based on human perception model. This suggests us that the cost function for our optimization problem should not only deal with mean square error and rate. In many studies, perceptual models are used to weight the different importance of DCT coefficients [4]. However in this class project, we abandon the idea of using perceptual weighting. Instead we chose mean

square error and rate as the two elements in out cost function. This is not only for simplicity consideration, but that we want to show the best performance curve in PSNR-rate plots, which is the most accepted performance plot. Our cost function is then the simple Lagrangian function

The purpose of threshold matrix is to trade distortion for lower rate based on the Lagrange multiplier . C. Huffman table
^ b

J ( ) = D + * R
Where D is the mean square error and R is rate of the encoded picture. B. Quantization matrix The quantization matrix is an 8*8 matrix and each item represents the quantization step for a mid-thread uniform quantizer. This quantization matrix is applied to each block in coefficient domain and the valid quantization step is from 1 to 255. In other word, the number of possible quantization matrixes is 255^64. In order to present the structure of the quantization table, we use Qn for each item, and n is from 1 to 64 (Fig. 2).

In order to transmit quantized coefficient C n efficiently, JPEG standard specifies a run-length combined with Huffman coding to transmit the quantized coefficients. The quantized coefficients are categorized as DC and AC coefficients, and they are sent with different encoding schemes. The more important coefficient for an image is its DC coefficient; which is treated independently with a separate Huffman table and differential encoding scheme [1]. The AC transformed coefficients are run-length encoded and then feed to a Huffman encoder. In JPEG standard, the run-length code is not encoded directly, instead the code is categorized and the run-length as well as the category forms a 2-D symbol. The 2-D symbol is then Huffman encoded followed by binary encoded category numbers. The objectives here is to optimize the Huffman encode table according to the statistics from the individual pictures so that it sends the most compact codewords. Here we use HDC for DC Huffman table mapping and HAC b for AC Huffman table mapping. Let z n be the number of continuous zero before C n . S n is the transmitted symbol after Huffman table mapping. We can have
^ b ^ b 1 H DC (C n C n ) <= n = 1 b Sn = ^ b H ( z b , C ) <= n = 2 ~ 64 AC n n
b b

Figure 2. Zigzag order of 8*8 matrix.Threshold matrix

The size of threshold matrix is the same as the size of the compressed image. Here we specify the vertical size to be mf, and the horizontal size to be nf. The size of threshold matrix is mf*nf. For each item in the threshold matrix, the value can be only 1 or 0. If the value is 1, we decide to transmit the quantized signal; otherwise, 0 is transmitted. In order to represent b threshold matrix, we use Tn , where n is from 1 to 64 as mentioned before, and b is the order of the entire block in an image. Let C be the coefficient after 2-D DFT, and C n be the transmitted coefficient after quantization and threshold decision. We can have
^ b Cb 1 => C n = round ( n )Qn Qn Tnb = ^ b 0 => C n = 0 b ^ Cb => C n = Tnb * round ( n )Qn Qn
b n

^ b

D. Optimization algorithm The algorithm presented in [2] tries to minimize the overall COST by setting a Lagrange multiplier . The cost J is defined in J ( ) = D + * R where D is distortion and R is the rate. D is a function of quantization matrix Q and threshold matrix T. And T is a function of Q, T, and Huffman table H. D and R can be found by setting . Since the time consumption of finding the global minimum cost is amazingly high, we decide to find the local minimum recursively and try to approach the global minimum. The original algorithm is summarized below: 1. 2. 3. 4. 5. Find the optimal Qn sequentially for n=1,,64 Recursively find Qn until stable. Based on the Q matrix, we can find optimal T matrix for every block. Use the optimal T and go back to step 1 until T, Q matrix both stabilized. Update Huffman table based on the symbol statistics.

In this project, we try to show the algorithm in different optimization levels and observe the effect of each operation. For example, we can skip the process of T matrix calculation or just

find the local minimum for Q matrix searching. The result is will be shown in section III.

q { ,2,...,255} 1
R1 (q) =
mf *nf / 64 b =1

| S
b =1

b 1

(q) |
^ b

E. Computational simplication The time consumed for local minimum is still amazing if we dont do some simplification to avoid unnecessary computation. This can be done by modification of initialization, Q matrix searching, and dynamic programming of T matrix. The first simplification is done by the comparison between the pervious generated quantized sequence and present sequence. If they are the same, we can ignore the Huffman table mapping and therefore improve the speed of searching. Quantization matrix searching [2] The second simplification is quite straight forward. Instead of probing quantization step from 1 to 255, the upper bound for b quantization step is chosen to be 2*max ( C n ) +1 for all b. If the step is higher or equal to this value, the quantized value is 0 for the entire block at position n. Therefore, no further computation b is needed for quantization step size greater than 2*max ( C n ) +1.
^ b

D1 (q ) =

mf *nf / 64

(C1b (q), C 1 (q))

Q1 = arg min[D1 (q) + * R1 (q)]


q

3.

Update Qn for n=2~64.

q { ,2,...,255} 1
^ b m = min i | n < i 64 & C i 0

Rn ( q ) = Dn ( q ) =

mf *nf / 64 b =1

[| S

b =1

b n

b (q ) |+ | S m (q) |]

mf *nf / 64

We can also find that when we are updating Qn, only C n will be changed. Therefore, the output zigzag sequence will be roughly the same as previous symbol sequence. The only difference is at this coefficient and next nonzero coefficient because the number of continuous zeros may also change. Instead of computing

b (C n (q), C n (q))

^ b

Qn = arg min[Dn (q) + * Rn (q)]


q

Optimal thresholding algorithm [3]


mf *nf / 64 64

R =| EOB | +
D=

1 mf * nf
b n

b =1

| S nb | and
n =1

The third simplification is called optimal thresholding algorithm [3]. We want to minimize J based on given Q, H and different T.

mf *nf / 64 64 b =1

(C
n =1

^ b

min J (Q , T , H ) = min[ D (Q , T ) + * R (Q , T , H 1)]


T T

, C n ) for J.

Q and H are given. Instead of brute force searching through all the possible T combination, this algorithm provides us a recursive method to efficiently utilize the previous searching result. This algorithm is stated below: First we define J k to be the cost optimal Lagrangian cost
*

We only calculate

Rn = Dn =

mf *nf / 64 b =1

[| S
b =1

b n

b | + | S m |] and

mf *nf / 64

b (C n , C n )

^ b

for J,

given that the kth coefficient ends the scan. That means Tn is 0 for n>k and the given b. J i , j (j > i) is the incremental Lagrangian cost accrued by adding coefficient j to a scan that was previously ended by coefficient i, with (j-i-1) zeros in between them. Ri,j is the rate of adding coefficient j in a scan previously ended by coefficient i. By definition we can have the following relationship.

because all the other part are constant through Qn searching. Note that EOB is a special symbol for end of block. (,) is the distortion measure between the original coefficient and quantized coefficient. In this project, we chose this measurement to be MSE for PSNR calculation and PSNR is the standard format for comparison. m is the first nonzero coefficient after n. This algorithm can be summarized below: 1. 2. Initialize Q, T, H, . Update Q1 by computing D1 and R1.

J 1* = (C i ,0)
i =2

64

J i , j = ( (C j ,0) (C j , C j ) + * Ri , j

Ri , j =| Hac( j i 1, C j ) |
Note that we eliminate index b for simplicity. All the block need to implement this algorithm separately. The block diagram of this algorithm is provided below.

III. SIMULATION RESULTS In this section, the simulation results are shown and compared. The simulation is run in Matlab for JPEG encoding and optimization. The comparison JPEG2000 encoder is from the open source code project from [5]. A. Simulation Methods To generate the performance metric for comparing the optimized JPEG encoder, a means of controlling the rate of baseline JPEG is necessary. This is done by linearly scaling the recommended quantization table from the standard. For the optimized JPEG encoder, it is more difficult to control the rate since the only knob in hand is , so we simply scan across a range of . However by adjusting , the rate-PSNR pair does not distribute evenly in the figure, it is more condense at the low rate side, where is huge. Rate curves for JPEG2000 are much easier to get since JPEG2000 supports precise control of bit rate as a standard feature. The baseline JPEG and optimized baseline JPEG encoders are simulated in Matlab while JPEG2000 encoder we have is running in C. Grayscale pictures are consider for simplicity and saving precious time for class project. The picture database is from the homework set as well as some real life pictures taken with megapixel digital cameras with cropping. B. Rate-Distortion Performance Fig. 4 and Fig. 5 show the rate-distortion performance of the optimized JPEG encoder for two different pictures.

Figure 3. Optimal thresholding algorithm.

An intuitive understanding of this algorithm is that we first * * * * calculate J 1 and based on J 1 we can find J 2 . Given J 1 and J 2 , we can easily find J 3 and so on so force. After we find the entire J from 1 to 64, we can find which k* is optimal for minimal cost and backtracking from k* to find the T coefficients. Optimal Huffman encoding The last optimization is the for the Huffman table. JPEG standard devises the usage of 2-D symbols and only the 2-D symbol is encoded with Huffman coding. The encoded 2-D symbol is followed by the category number that is encoded in binary format. The optimization is done by generating the Huffman code for the 2-D symbols according to the statistics from the output of the quantizer + thresholding stage. In our project, we simply swap the entries in the recommended Huffman table. It seems that its an unnecessary simplification to swap the entries, however it is equivalent to natively generate Huffman codes and always assign the shorted Huffman code to the most probable 2-D symbol.
* k

42 40 38 36 psnr dB 34 32 30 28 26 24 JPEG2000 Q+T+H Q+H optimized Q+T optimized Q-optimized JPEG Baseline JPEG 0 0.5 1 1.5 2 rate bits/pixel 2.5 3 3.5

Figure 4. rate-PSNR curves for bridges512x512

44

42

40

38

36 psnr dB

Figure 7. optimized Q table for halfdome512x512

34

32

30

28

JPEG2000 Q+T+H Q+H optimized Q+T optimized Q-optimized JPEG Baseline JPEG 0.5 1 1.5 rate bits/pixel 2 2.5 3

26 0

Figure 5. rate-PSNR curves for holfdome512x512

The optimized JPEG algorithm produces significant gain over baseline JPEG. At a higher rate range, the optimization can give improvement over baseline JPEG for approximately 3dB. Or, as another measure, at the same distortion, the optimized JPEG can have 20~30% lower bit rate. In both rate-PSNR plots, the performance of JPEG2000 is by far the best. The curves shown also represent different levels of optimization of the JPEG encoder. For the first picture, the optimized JPEG encoder performs very close to JPEG2000. However the optimization algorithm does not produce as good result for the second picture. Certainly the nature of the picture will change the effectiveness of the optimization. Also from the curves, it is obvious to see the effects of adding thresholding optimization. By implementing thresholding algorithm, the rate-distortion pair shifts to its lower-left side in the rate-PSNR plot. While doing Huffman table optimization only shifts the rate-PSNR pair horizontally to its left because Huffman encoding does not change the distortion at all.

The reconstructed images of the optimized JPEG encoder are shown in Fig. 9 and Fig. 10, respectively. In Fig. 9, the two images have approximately the same distortion, whereas the optimized JPEG encoder produces a significant rate reduction of up to 30%. There is virtually no perceptible difference between the two pictures encoded with different quantization table. The quantization table generated by the optimization algorithm is shown in Fig. 6. The table generated by the optimization algorithm is very different to the baseline JPEG suggested table, especially for lower frequency part. In Fig. 10, again we compare the reconstructed picture with the same distortion. The optimized one has 20% lower bit rate and the same PSNR. However it is noticeable that the blocky effect of the optimized JPEG is more obvious than the picture encoded with the standard table. It is easily explained that the DC quantization step of the optimized JPEG is too coarse as shown in Fig. 7. If we restrict the search range for DC quantization, we could reduce the blocky effect by losing some rate optimization. Besides, the optimization algorithm is not designed to optimize the perceptual quality of the pictures, so the perceptual image quality of two pictures with the same mean square error might appear different to human eyes. C. Cases Failing to DoBetter Than Baseline JPEG
40 39 38 37 36 psnr dB 35 34 33 32 baseline JPEG Q-optimized-once JPEG Q+H optimized T, noH 0 0.5 1 1.5 rate bits/pixel 2 2.5

Figure 6. Optimized Q table for bridge512x512

31 30

Figure 8. rate-PSNR curves for peppers512x512

There are some cases where the optimization algorithm fails to do any better than the recommended tables. The result is shown in Fig. 8. Certainly this proves that the optimization table is not the global optimal one. This is a direct result from the simplified Q search algorithm that we discussed earlier. In our project, searching always starts with the standard recommended table, so during its path toward the optimization, it may sometimes be trapped with local minimums. Given a different start point, e.g., a scaled standard table, one can end up with a different result. This confirms that the optimization encounters local minimum problems. Unfortunately, this phenomenon tells us that the optimization algorithm cannot guarantee the best result. D. Comparison of Computational Compexity The computational complexity of the optimization algorithm is thousands of times more than a baseline JPEG. For a single search of quantization table, the program has to finish 255 x 64 encoding of the picture, and quantization table may have to be searched for 2~3 times before convergence. The time used to find the optimal table for a 512 x 512 picture is more than one hour in our experiment while the time to encode a picture with baseline JPEG is approximately one second. This computational complexity may well render any practical usage of the optimization in real life. IV. CONCLUSION

The work in the project in evenly distributed among two group members.

REFERENCES
[1] [2] G. K. Wallace, "The JPEG still picture compression standard," Communications of the ACM, vol. 34, no. 4, April 1991. M. Crouse and K. Ramchandran, "Joint thresholding and quantizer selection for transform image coding: entropy-constrained analysis and applications to baseline JPEG", IEEE Transactions on Image Processing, vol. 6, no. 2, February 1997. K. Ramchandran and M. Vetterli, Rate-distortion optimal fast thresholding with complete JPEG/MPEG decoder compatibility, IEEE Trans. Image Processing, vol. 3, pp. 700704, Sept. 1994. A. Watson, DCT quantization matrices visually optimized for individual images, in Proc. SPIE Human Vision, Visual Processing, and Digital Display IV, 1993. http://www.ece.uvic.ca/~mdadams/jasper/

[3]

[4]

[5]

In this class project, we have investigated the optimization algorithm for baseline JPEG, tested it and compared it to state-of-the-art JPEG2000. The optimization result produces significant performance improvement over standard baseline JPEG in terms of rate-PSNR measure. However there are two obvious pitfalls that prevent the optimization algorithm to be widely accepted; one is the computational complexity and the other is the lack of guarantee for global optimization. The simplification in the algorithm to search the best quantization table results in local minimum problems. ACKNOWLEDGMENT We would like to thank David Varodayan for acting as our advisor for the project. WORK BREAKDOWN
work Changchuan Chien-An

Paper reading/ algorithm Jpeg modeling Jpeg2000 modeling Presentation Report

30% 50% 70% 60% 40%

70% 50% 30% 40% 60%

APPENDIX

Figure 9. reconstructed images for bridge512x512

Figure 10. reconstructed images for halfdome512x512

Das könnte Ihnen auch gefallen