Application of Adaptive Constructive Neural Networks To Image Compression

1112
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002
Application of Adaptive Constructive Neural Networks to Image Compression

Liying Ma and K. Khorasani, Member, IEEE
AbstractThe objective of this paper is the application of an adaptive constructive one-hidden-layer feedforward neural networks (OHL-FNNs) to image compression. Comparisons with fixed structure neural networks are performed to demonstrate and illustrate the training and the generalization capabilities of the proposed adaptive constructive networks. The influence of quantization effects as well as comparison with the baseline JPEG scheme are also investigated. It has been demonstrated through several experiments that very promising results are obtained as compared to presently available techniques in the literature. Index TermsFeedforward neural networks (FNNs), constructive algorithms, image compression, generalization capability.
(a)
I. INTRODUCTION IGITAL image presentation requires a large amount of data and its transmission over communication channels is time consuming. To rectify and remedy this situation, a large number of techniques to compress the amount of data for representing a digital image have been developed [20] to make its storage and transmission economical. Particularly, in the past decade numerous attempts have been made to pursue the possibility of using various neural networks (NNs) for image compression (see, for example, [9], [11], and [14] for reviews). Autoassociative neural networks [3], the Kohonen self-organizing map (SOM) [1], [7], cellular neural networks [21], [24], and counter-propagation neural networks [23], among others, have been proposed in the literature. In this paper, we are particularly interested in multilayer perceptron (MLP)-type feedforward NNs (FNNs) due to their structural elegance, abundance of training algorithms, and good generalization capabilities [6], [19]. In the above NN-based algorithms, an image is usually divided into small square blocks of pixels which are then framed into patterns for network training. The size of each block is typically taken as either 4 4, 6 6, 8 8, or even 16 16, although this generally depends on the nature of the image being compressed and the training algorithm used. The NN considered would then have identical input and output dimensions equal to the number of pixels in a given block. In a multihidden-layer FNN used for image compression [19], the hidden layer in the middle of the network has fewer number of nodes than the input layer, and the hidden layer output values
Manuscript received July 2, 2001; revised January 16, 2002. This work was supported in part by the Natural Science and Engineering Research Council of Canada (NSERC) under Research Grant RGPIN-42515. The authors are with the Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada (e-mail: liying@ece.concordia; kash@ece.concordia.ca). Publisher Item Identifier S 1045-9227(02)04436-3.
(b) Fig. 1. FNNs for image compression. (a) Multi-hidden-layer FNN for image compression, where ideally n < I : (b) OHL-FNN for image compression, where ideally n < I :
associated to all the blocks are considered as the compressed image or transmitted image to the receiver. In order to reconstruct the image at the receiver end, the weights for those connections on the right-hand-side (RHS) of the middle layer are also transmitted to the receiver [see Fig. 1(a) for details]. If the number of all the output values for the middle layer units plus their connections on the RHS are less than the total number of pixels of the image, a compression is then achieved. The fewer the number of units in the middle layer, the higher the degree of compression. The multi-hidden-layer FNN considered here is symmetric in structure, with the network from input layer to the middle hidden layer acting as data compressor, while the network from the middle layer to the output layer playing the role of a reconstructor. These two subnetworks have the same number of connections. This technique works quite well as shown by many experimental results in the literature [19]. However, there are two basic problems with this strategy. The first one is due to the large size of the network, which makes the training costly and convergence sluggish. The second problem is the determination of a proper network size. Generally, tedious and extensive trial and error tests have to be performed for choosing the proper
1045-9227/02$17.00 2002 IEEE
MA AND KHORASANI: APPLICATION OF ADAPTIVE CONSTRUCTIVE NNs
1113
network architecture, and consequently in determining the appropriate compression ratio. The idea of using only one-hiddenlayer feedforward neural network (OHL-FNN) was proposed in [4] and [8] (see also the references therein). This idea may lead to a faster training convergence at the expense of lower quality of the reconstructed image for the same compression ratio as in the multi-hidden-layer FNN architecture. To address the above mentioned two problems, a constructive OHL network was proposed for image compression by Setiono and Lu [22] [see Fig. 1(b)]. This algorithm is quite similar to the dynamic node creation algorithm of [2]. It starts with a network that contains only a single unit in the hidden layer. The network is then trained by using a variant of the quasi-Newton method to achieve fast convergence. If the trained network does not satisfy the accuracy requirements, a new hidden unit is added and the whole network is retrained again. This process will be repeated until a network with the desired accuracy and/or compression ratio is achieved. This compression algorithm is thus flexible in the sense that the user can tradeoff between image quality and the compression ratio according to some a priori specifications. However, the algorithm is not as efficient as other constructive algorithms that: 1) freeze the input-side weights of the added hidden units to the existing network and 2) do not retrain the frozen weights. With the exception of the above work, constructive algorithms have not yet been applied to image compression applications. In this paper, we propose a constructive algorithm for an OHL-FNN to adaptively determine a network with both possibly improved generalization performance capabilities for image compression and reduced number of internal weights. Specifically, an input-side weight pruning technique is incorporated in the construction of our proposed OHL-FNN. The network training consists of two phases: input-side training phase and output-side training phase. Once a new hidden unit is added to the existing network, its input-side weights will be fixed (frozen) in the training process that follows. Output-side weights of all the hidden units are then updated in the output-side training phase. We first attempt to compress the benchmark image Girl using our proposed adaptive constructive OHL-FNN. This image is used to train and construct a network, and the images Lena and Lake are then used to test the generalization performance of the resulting constructed network. The results are also developed for and applied to four images from a Football sequence of images. Influence of the quantization factors on the compressed image is investigated. Two types of quantization schemes, namely the uniform and the probability density function (pdf) optimized quantizations, are considered. We introduce a similarity notion (correlation-based metric) between two images in order to formally implement our proposed NN-based image compression technique. The performance of the constructed network is then compared with the baseline JPEG technique in terms of the PSNRs for the same compression ratios. Furthermore, the generalization ability of the proposed constructive OHL-FNN is also investigated during the training and/or generalization of images. Extensive experimental results are then provided to demonstrate the effectiveness and the potentials of the proposed technique.
Fig. 2. Schematic for an OHL-FNN-based image compression.
The outline of this paper is as follows. Section II briefly introduces our adaptive constructive algorithm for FNN. The development and application of our proposed constructive OHL-FNN to image compression are presented. The experimental results illustrating the capabilities and advantages of our proposed technique over fixed structure FNN are reported in Section III. In Section IV, the generalization capabilities of the proposed algorithm are demonstrated. Conclusions are included in Section V. II. IMAGE COMPRESSION USING AN ADAPTIVE CONSTRUCTIVE OHL-FNN is In NN-based image compression the image of size . Each first divided into square blocks of equal size square block is then arranged into a vector of dimension that will be fed to the neural network as an input training pattern. All such vectors are put together to form a of size , where is the number of training matrix . The target matrix for netsquare blocks, i.e., work training is considered to be the same as the input matrix . The schematic of a OHL NN-based image compression is shown in Fig. 2. Algorithms that can determine an appropriate network architecture and/or size automatically according to the complexity of the underlying function/map embedded in the data are very cost-efficient, and thus highly desirable. Efforts toward an automatic and intelligent network size determination have been investigated in the literature, and many techniques have been reported [16] (see also the references therein). The approach developed in this paper for determining the network size is known as constructive learning. Constructive learning alters the network architecture adaptively and simultaneously as learning proceeds, producing automatically a network with an appropriate structure. In this approach one starts with an initial network of a small size, and then incrementally adds new hidden units and/or hidden layers until some prespecified error requirement is reached, or no performance improvement can be observed. The network that is obtained in this way is reasonably sized for the given problem at hand. Generally, this strategy may not yield a minimal network, however a subminimal network can be expected [16], [17]. In [16], Kwok and Yeung survey the major constructive algorithms in the literature. Dynamic node creation algorithm and its variants [2], activity-based structure level adaptation [18], [25], cascade-correlation (CC) algorithms [12], and the constructive one-hidden-layer (OHL) algorithms [17] are among the most important constructive learning algorithms developed so far in the literature.
1114
the output-side training of a new hidden unit, one has to first train its input-side weights. The input-side training is achieved through the quickprop algorithm and is described in details in [12]. Once the input-side weights are trained, the output of the hidden unit can then be expressed as (2) is the sigmoidal activation function of the hidden where unit. The networks th output with hidden units may now be expressed as follows: (3) are output-side weights of the where th hidden unit and is the bias of the output unit with its . The corresponding output error is input being fixed to now given by . Subsequently, the output-side weight training is problem, provided that performed by solving the following the output layer has linear activation functions, that is (4) is Following the output-side training, a new error signal calculated for the next cycle of input-side training associated th hidden unit and its corresponding output-side with the training phase. A. Error Scaling Strategy for Input-Side Training In this section, the correlation-based objective function given in [12] is investigated and improved upon. This objective function is selected as follows: (5)
Fig. 3.
Structure of a constructive OHL-FNN.
The development of our proposed adaptive constructive learning algorithm is motivated by the following factors. F1) The OHL-FNN is simple and elegant in structure. The fan-in problem with the cascade-correlation (cc)-type algorithms is not present in these architectures. furthermore, the deeper the cc structure becomes, the more input-side connections for a new hidden unit will be required. This may give rise to degradation of generalization performance of the CC network, as some of the connections may become irrelevant to the prediction of the output. F2) OHL-FNN is a universal approximator as long as the number of hidden units is sufficiently large. Therefore, the convergence of constructive algorithms can be easily established [17]. F3) The constructive learning process is simple and facilitates the investigation of the training efficiency and development of other improved constructive strategies. The constructive OHL algorithm may be initialized to a small network, say a one hidden unit network. At any given point during the constructive training process assume there are hidden units in the hidden layer, and the problem is then to train the th hidden unit that is to be added to the network. All the hidden units are assumed to have identical activation functions without loss of generality. The network is depicted in Fig. 3. A candidate that maximizes a correlation-based objective function (details are shown in the next section) will be selected from a pool of candidates, and is then incorporated into the network as an th hidden unit. The input to the th hidden unit is given by (1) is the weight from the th input to the th hidden where is the so-called bias of the th hidden unit with its unit, . The training of the network weights is input set to now accomplished in two separate stages; namely input side training and output side training stages. Before proceeding to
where , and with and denoting the mean values of the training error and the output of the th hidden unit over the entire training samples, respectively. The derivative of with respect to the weight is calculated as
(6) . The quickwhere prop algorithm maximizing (5) may now be expressed by (7) at the bottom of the next page, where , and and are positive user-specified parameters. During the input-side training phase, if the error signal varies within the range that is of the activation function beyond the active range
1115
of a hidden unit, the error signal range would be mapped into the active range of the hidden unit, by invoking a linear coordinate transformation as follows:
B. Input-Side Pruning Strategy One intuitive way to determine the network architecture is to first establish through some means a network that is considered to be sufficiently large for the problem under consideration, and then trim the unnecessary connections or units of the network to reduce it to an appropriate size. This is the basis for the pruning algorithms. Since it is much easier to determine or select a very large network than to find the proper architecture needed, the pruning idea is expected to provide a practical but a partial solution to the structure determination problem. The main problem to resolve here is how to design a criterion for trimming or pruning the redundant connections or units in the network. Karnin [15] defined the sensitivity of the error cost function with respect to the removal of each connection and pruned the connections that have low sensitivities. Le Cun et al. [10] designed a criterion denoted as saliency by estimating the second derivative of the error function with respect to each weight, and trimmed the weights with sensitivities lower than some prespecified bound. Castellano et al. [5] developed a new iterative pruning algorithm for FNNs, which does not need the problem-dependent tuning phase and the retraining phase that are required by other pruning algorithms such as those in [10]. The above pruning algorithms have the following limitations: 1) the size of the initial network may be difficult to determine a priori and 2) the computational cost are very excessive due to the fact that in general repeated pruning and retraining processes have to be performed. In the input-side training, one can have one or a pool of candidates to train a new hidden unit. In the latter case, the neuron that results in the maximum objective function will be selected as the best candidate. This candidate is incorporated into the network and its input-side weights are frozen in the subsequent training process that follows. However, certain input-side weights may not contribute much to the maximization of the objective function or indirectly to the reduction of the training error. These connections should first be detected and then removed through a pruning technique. Pruning these connections is expected to produce a smaller network without compromising the performance of the network. Note that since the pruning operation is carried out locally, the generalization performance of the final network is not expected to be improved significantly, as the conventional pruning-and-backfitting performed in standard fixed size network pruning is not implemented here. Toward this end let the best candidate for the th hidden unit . The sensitivity of each results in an objective function weight in this paper is now defined as follows: (13)
(8) Note that only linear transformations can be utilized here if one wants to preserve the wave form of the error signal. Solving the and yields above expressions for the parameters
(9) Therefore, the error signal according to is linearly transformed into
(10) in (5) will now be replaced by The error signal which is calculated from the above equation. If the activation function of the hidden unit is a log-sigmoidal function, then its active range may, for example, be set to [0.1, 0.9]. It is not difficult to determine from the above equations that if and are chosen such that the mean value is , of the error signal one gets (11) regardless of . This is a point where the activation function reaches its maximum first-order derivative. The replacement of by is now expected to improve the capability of the input-side training. What needs to be defined are the bounds for the error signal . Since the correlation between and is stahave to also be selected tistically defined, the bounds for in a statistical sense rather than simple fixed values. Consemay be selected quently, the values for the bounds of according to
(12) where is the standard deviation of .
if if otherwise and (7)
1116
(a)
(b)
(c) Fig. 4. (a) Training PSNRs (the Girl image). (b) Generalization PSNRs (the Lena image). (c) Cumulative number of pruned weights.
where is the value of the objective funcis set to zero, while other connections are tion when unchanged. Note that the bias is usually not pruned. The above sensitivity function measures the contribution of each connection to the objective function. The largest value for the th . If , and/or hidden unit sensitivity is denoted by , say 3% is very small compared to of it, then the weight is removed. After the pruning is performed, the output of the hidden unit is reevaluated and the output-side weight adjustment is performed one more time. III. EXPERIMENTAL RESULTS Two definitions may be provided for quantifying the quality of images reconstructed from the network. The first is the peak signal-to-noise ratio (PSNR) defined by
The second is the compression ratio cording to
that is calculated ac-
(16) where the first term in the denominator is the number of outputs of the hidden layer with respect to all the input patterns, the second and the third terms are the number of the output side weights and biases, and the final term one is the overhead due to the image normalization. There are two possible approaches that one may adopt to compress an image. The first approach is to train a separate network corresponding to each single image that is to be compressed. This approach is clearly simple in its compression policy. However, this may give rise to unrealistically high training costs if this approach is to be used in real-time applications. The compression ratio in this case may be evaluated according to the definition (16). The second approach is to train a single network corresponding to a series of images having similar properties and statistics at first, and then transmit the output of the hidden layer only for a new presented and compressed image. Given that the network is already trained in advance (off-line), the problems of real-time training and convergence are circumvented. However, images that have not been seen during network training might have different degrees of similarities to those used in network training, and which could significantly influence and
PSNR
(14)
or the global signal-to-noise ratio (SNR), which is defined by Target image variance
SNR
(15)
1117
(a)
(b)
(c)
Fig. 5. Reconstructed images of the Girl achieved by three networks with three hidden units trained by the three respective approaches. (a) Constructive training with pruning. (b) Constructive training without pruning. (c) BP-based network training.
(a)
(b)
(c)
Fig. 6. Generalized images of the Lena by a network of three hidden units trained by the Girl image and the three corresponding training approaches. (a) Constructive training with pruning. (b) Constructive training without pruning. (c) BP-based network training.
deteriorate the qualities of the reconstructed images. To address and rectify this problem, a preprocessing procedure is required to monitor and determine the properties and statistics of new images and for formally quantifying the degree of similarities between images (details are provided in Section III-A). If the similarity between images is high, the general idea is that there is no need to retrain the network, if on the other hand the similarity is not sufficiently high, the network may be retrained without a change in its structure to represent the new data, and finally if the similarity is too low, one would then need to redesign the network structure completely. The compression ratio in this case may be evaluated by the following expression: (17) Two benchmark images, the Girl and the Lena, are used in our initial experiments. Input-side weight pruning is also invoked in these experiments. Using the second (off-line) approach as described above for image compression, the input-side weight pruning will result in a reduction in the number of input-side weights, and therefore will reduce the implementation cost. For relative performance of our constructive NN-based compression technique, the PSNR of a BP-based network having the same number of hidden units as the evolving constructive network is also presented. It should be noted that the whole network needs to be retrained each time a new hidden unit is added to the BP-based network (as in [22]). Two typical results are shown below. The first result is the Girl image that is used for training and the second result is the Lena image which is used for generalization. Since the initial weights influence the resulting network in a significant manner, ten runs
have been conducted to yield an average statistical performance evaluation. Both Girl and Lena images are of size 512 512 and the block size is selected as 4 4. The number of hidden units is increased from one to ten. The PSNRs for the training and the generalization are shown in Fig. 4(a) and (b), respectively. The number of pruned weights is shown in Fig. 4(c). Clearly, the network whose weights are pruned has a smaller number of weights compared to the fully connected network, but it yields almost the same PSNR. Also, note that it is much easier and more efficient to trade off between the compression ratio and the quality of reconstructed image using the OHL constructive network as compared to the conventional BP-based network. This is due to the fact that the PSNR of the constructive OHL depends closely on the number of new hidden units added to the active network, which allows the user to decide on a tradeoff as the network training evolves. The reconstructed and the generalized images are shown in Figs. 5 and 6, respectively. A. Similarity Definition for Two Images Suppose a constructive OHL-FNN is trained on a single image sequence, and this network is now under consideration to compress another image. Naturally, one needs to determine to which degree a given image is similar to the one used in network training, and how this similarity is related to the compression quality in terms of PSNR (which are all related to the generalization capability of the given network). Thus, it is imperative that a notion of a similarity metric between two images be clearly defined and quantified. Toward this end, let denote two digital images. They are divided into square blocks
1118
. The number of blocks will be . The blocks are then arranged into vectors and , respectively. Various definitions for similarity may be introduced, with some that can actually be expressed in terms of the standard notion of correlation. For the problem considered in this work, the definition of similarity that is proposed and is found to be most suitable and practical is given as follows:
of size
(18)
bits/pixel, where . Note that the and . The above definition uses a block-based average through may be replaced by , denominator in the RHS of as an alternative normalization factor. This similarity measure has a low computational complexity and has been found in our work to be a suitable metric for determining the similarity of two images. For image compression, this similarity may be used to detect a significant scene change, and therefore to automatically restart the network training. However, the PSNR of a generalized image can not be determined simply by the similarity between the images used for network training and generalization. It should be pointed out that there is an easy and intuitive way to determine how similar an image used for network training is to a new image: Input the new image to the trained network, and evaluate the PSNR of the output. A decrease of the PSNR from that of the training result may serve as an indication to the lack of similarity between the two images. This approach, although simple and direct in nature, is only a brute force method and a priori tells us quantitatively little about the similarity between any two images. Additional results are now provided to demonstrate that images that have different similarities with the image used to train a network yield also different PSNR results. Specifically, the Girl image is first used to train a network, but this time the Lena and the Lake (512 512) are used to check the generalization capability of the trained network. The Lake image is a natural landscape and is clearly different from a human picture. The original image of the Lake is given in Fig. 7. The Lake is of the same size as the Girl and Lena images (512 512). The similarity as computed from (18) between the Girl image and the Lena and Lake images are 0.983 and 0.972, respectively, for a block size of 4 4 and 0.969 and 0.950, respectively, for a block size of 8 8. It is clear that the larger the block size, the smaller the similarity. The PSNRs for the reconstructed Girl image and the generalized Lena and Lake images are shown in Figs. 8 and 9, respectively. Alternatively, if one initially trains a network using the Lena image and subsequently invokes the network to generalize the Girl and the Lake images, the results shown in Figs. 10 and 11 with block sizes 4 4 and 8 8, respectively, are obtained. It is interesting to note from Fig. 10 that the PSNR of the generalized image, Girl is generally higher than the trained image, Lena. One possible interpretation for this result is that the network traind on a more complex image
Fig. 7. Original image of the Lake (size 512
2 512).
Fig. 8. PSNRs of the reconstructed Girl image and the generalized Lena and the Lake images, with the block size of 4 4. The similarities between the Girl and the Lena and the Lake are 0.983 and 0.972, respectively.
Fig. 9. PSNRs of the reconstructed Girl image and the generalized Lena and the Lake images, with the block size of 8 8. The similarities between the Girl and the Lena and the Lake are 0.969 and 0.950, respectively.
Lena contains richer features that have been represented so effectively that it is consequently capable of generalizing better to a simpler image Girl. This is compared to the network that is initially trained on a simpler image Girl which is subsequently then been invoked to generalize to a more complicated image Lena. It can further be observed from the above figures that different similarities would result in different generalization
1119
Fig. 10. PSNRs of the reconstructed Lena image and the generalized Girl and the Lake images, with the block size of 4 4. The similarities between the Lena and the Girl and the Lake are 0.986 and 0.978, respectively.
Fig. 12. PSNRs of the trained Girl image for the three quantization settings.
Fig. 11. PSNRs of the reconstructed Lena image and the generalized Girl and the Lake images, with the block size of 8 8. The similarities between the Lena and the Girl and the Lake are 0.976 and 0.961, respectively.
variance. On the other hand, if this is not the case, a nonuniform quantizer has to be used. In this section, a typical nonuniform quantizer, such as the pdf-optimized quantizer [13] is considered for comparison purposes. Nonuniform quantization can be achieved by first using a nonuniform compressor compressing the signal , then quantizing the compressed signal characteristic employing a uniform quantizer, and finally expanding the quanwhich is the inverse of the compressor tized signal using . This is known as the companding technique (compressing and expanding). Searching for an optimal compressor characteris the major task in designing a nonuniform quantizer. istic For high bit rate applications where and, therefore , are both sufficiently large, approximate compressor characteristic can be derived for the pdf-optimized nonuniform quantizer as
PSNRs. Furthermore, the larger the block size, the smaller the similarity, and consequently the lower the PSNRs for the reconstructed and generalized images. B. Influence of Quantization Effects In our proposed technique, the amplitude-continuous output of the hidden layer for a trained or generalized image is to be transmitted in digital form. This requires quantization. The bit here is eight bits/pixel or sample. Let and rate , denote the decision levels/values and representation levels/values of a quantizer, respectively, where is the number of quantization steps. For a uniform quantizer, we have
(20) is the pdf of the data to be quantized, and where is the upper bound of . Given uniform decision levels , the corresponding optimum decican be obtained by using the above exsion level pression. In our experiments, the output of the hidden units are quan, and . Three cases tized, where are now considered: no quantization, uniform quantization, and pdf-optimized quantization. The PSNRs of the three cases for the trained Girl image are evaluated and compared and plotted in Fig. 12. It is clear that the pdf-optimized quantization results in almost the same PSNR as that of the no quantization case. The pdf-optimized quantization works better than the uniform quantization due to the nonuniform pdf property of the data. The performance advantage of the pdf-optimized over the uniform quantization in this experiment is about 0.1 [dB] at the earlier stages of network training, and is increased to around 0.5 [dB] as the network grows. This is due to the fact that the pdf of the data becomes more deviated from the uniform pdf as more hidden units are added to the network and more data needs to be quantized. From Fig. 12, one can also observe that the difference between the PSNRs obtained without quantization and by the uniform quantizer increases as the number of hidden units increase.
for finite
(19)
where is known as the quantization step size. The actual map, and is known ping between and is represented by as the quantizer characteristics. The quantization error may also . The characteristic be defined according to can be of midtread or midrise type, depending on whether zero is one of the output levels or not. If the data to be quantized represents a uniform probability distribution, the uniform quantizer will be an optimum selection in terms of the quantization error
1120
Fig. 13. Comparison of the PSNRs of reconstructed Girl image obtained by the network trained using Girl image with input-side weight pruning and the JPEG scheme.
Fig. 15. Comparison of the PSNRs of reconstructed Lena image obtained by the network trained using Lena image with input-side weight pruning and the JPEG scheme.
Fig. 14. Comparison of the PSNRs of reconstructed Girl image obtained by the network trained using Girl image without input-side weight pruning and the JPEG scheme.
Fig. 16. Comparison of the PSNRs of generalized Girl image obtained by the network trained using Lena image with input-side weight pruning and the JPEG scheme.
However, the PSNR of the pdf-optimized quantizer stays close to the PSNR obtained without quantization regardless of the number of hidden units. Furthermore, the pdf-optimized quantizer behaves exceedingly well, even when the number of hidden units increase and the pdf becomes more abnormal. When one utilizes the proposed NN-based technique into real-life applications, the uniform quantizer may be adopted taking into considerations both its simplicity and its acceptable PSNR variance in comparison with the other methods. C. Comparison With the Baseline JPEG In this section, our proposed constructive network design when applied to image compression is compared with the most commonly used still picture compressor JPEG. Comparison is made in terms of the PSNRs of the reconstructed images as a function of the compression ratio. The Girl image is first used to train the constructive OHLFNN. Reconstructed images for different number of hidden units (in effect different compression ratios) are saved for subsequent PSNR evaluation during the constructive training process. The reconstructed images are obtained by quantizing the output of the hidden layer by using both the uniform and the pdf-optimized quantization schemes. To make a fair
comparison with the JPEG algorithm, the compression rate of the quantized hidden-layer output using the PSNR-lossless Huffman coding is incorporated into the total compression ratio for the trained constructive OHL-FNN. The baseline JPEG images are constructed using a free image tool for similar compression ratios obtained by the constructive OHL-FNN. The PSNRs of these images are evaluated by simply comparing them with the original image. The PSNRs associated with the JPEG scheme and our technique are plotted in Figs. 13 and 14 for the networks with and without input-side weight pruning, respectively, as a function of the compression ratio. It is seen that our technique yields higher PSNRs for both low and high compression ratios, and comparable PSNR for moderate compression ratio, when compared to the JPEG scheme. Figs. 1517 depict further simulation results on comparison between the JPEG scheme and the PSNRs of the generalized Girl and Lake images obtained from a network that is initially trained using the Lena image. Although, the JPEG scheme results in a higher PSNR as shown in Fig. 17, the purpose for presenting this figure is to illustrate that the complexity or information content of a given image used to train a constructive network plays a significant role in the generalization performance capabilities of that nework for other unseen images. Conceivably, if a constructive network is originally trained on an image that is rich in
1121
Fig. 17. Comparison of the PSNRs of generalized Lake image obtained by the network trained using Lena image with input-side weight pruning and the JPEG scheme.
Fig. 18. PSNRs of the reconstructed Girl image and generalized Lena image for training Case T-I and the three generalization cases G-I, G-II, and G-III.
information content, the PSNR performance of the constructive network should be generally higher than that of the JPEG scheme, as illustrated in Figs. 1316. Although, our technique works generally as well as or in some situations even better than the JPEG, one has to also keep in mind the significant generalization capabilities that our constructive OHL-FNN possess that are not fully reflected in the above figures. Furthermore, it appears that the uniform quantization scheme is more useful than the pdf-optimized quantization scheme. This implies that combination of uniform quantization and Huffman coding schemes perform better than the combination of pdfoptimized quantization and the Huffman coding schemes. In addition, the former combination does also require less computational resources. IV. GENERALIZATION CAPABILITIES OF THE CONSTRUCTIVE OHL-FNNs The generalization capabilities of the constructive OHLFNNs in the context of image compression have not yet been investigated systematically in the literature. In this section, experimental results with some useful and insightful observations are provided. In our first set of experiments the Girl image is used to train the network, while the Lena image is used to test the generalization capabilities of the trained network. Noisy images are constructed by adding Gaussian white noise to the original clean images. The selected SNR is 10 [dB] for all the images. The following three scenarios are considered for the constructive network training. Case T-I) Noiseless input image and noiseless target image: This is the case that was considered in the previous section, and where generalization was performed on the noiseless image. Case T-II) Noisy input image and noisy target image: This case may be viewed to be more relevant in practice, since images are generally corrupted by noise to a certain degree. Case T-III) Noisy input image and noiseless target image: This case is considered to simultaneously assess the performance of the constructed network for its compression and noise reduction capabilities.
Fig. 19. PSNRs of the reconstructed Girl image and generalized Lena image for training Case T-II and the three generalization cases G-I, G-II, and G-III.
Fig. 20. PSNRs of the reconstructed Girl image and generalized Lena image for training Case T-III and the three generalization cases G-I, G-II, and G-III.
For the above training cases, the corresponding generalization cases are considered. Case G-I) noiseless input image and noiseless target image; Case G-II) noisy input image and noisy target image; Case G-III) noisy input image and noiseless target image. Figs. 1820 depict the PSNRs of the reconstructed Girl image subject to the training cases T-I to T-III, as well as the PSNRs of the generalized Lena image subject to the three generalization
1122
Fig. 21. (a) Original Football image (uncompressed). (b) First Football image (uncompressed). Second Football image (uncompressed). Third Football image (uncompressed). (688 480 pixels, bit rate R = 8 bits/pixel.)
cases G-I to G-III. The following comments and remarks are now in order. C1) In all the three training cases, the PSNRs of the reconstructed images always improve each time a new hidden unit is added to the existing network. This is one of the interesting properties of the constructive OHL-FNN. C2) In all the three training cases, the PSNRs of the generalized images corresponding to noiseless input improve as new hidden units are added to the existing network. This suggests that the constructive OHL-FNNs trained by either noiseless or noisy images generalize well to the noiseless images. C3) In Case T-I, the noiseless image is generalized with very good quality (Case G-I), while the generalization PSNR for the noisy image improves little as the number of hidden units is increased in Case G-II. Furthermore, in Case G-III, with the increase of the number of hidden units the PSNR decreases unexpectedly, which implies that the network trained using noiseless image is not necessarily robust to additive noise. C4) The constructive OHL-FNN trained using both noisy input and target images (Case T-II) does not demonstrate any ability in filtering the additive noise of the input image used for generalization. C5) In Case T-III, the network is trained using noisy input and noiseless target images. Obviously, the constructive OHL-FNN can remove the noise in the input image to some degree, however the increase in the number of hidden units improves rather slightly the reconstructed image PSNR. This implies again that the constructive OHL-FNN is not robust to input additive noise, and some preprocessing may be needed to remove the additive noise before the constructive OHL-FNN can be used for image compression.
TABLE I BLOCK-BASED SIMILARITY OF THE FIRST, SECOND, AND THIRD FOOTBALL IMAGES WITH RESPECT TO THE ORIGINAL FOOTBALL IMAGE FOR DIFFERENT BLOCK SIZES
A. Relationship Between Block Size, Image Similarity, and Generalization Capability In previous sections, the still image Girl was used to train a constructive OHL-FNN. The resulting trained network is then used to compress the Lena image which was not seen by the network. It was found that the PSNR of the reconstructed Lena image is quite high, implying that the proposed technique may be considered for use in certain practical applications. Although, the two images appear visually different, they may actually have some inherent features that are similar on the block basis. This suggests that a certain similarity measure on the block basis may establish if the network that was trained on the Girl image may be potentially used for compressing the Lena image. This has motivated us to further investigate the potential generalization capabilities of our proposed approach, where one may initially use an image to train a network and subsequently use the trained network to compress other images as long as it can be confirmed that the latter generalized images are similar to the former training image. Toward this end, our proposed constructive OHL-FNN is now used to compress the Football set of images. The first Football image (original) is used to train an adaptive FNN, and the subsequent images (see the images in Fig. 21 for the original, first, second, and third Football images) are then to be compressed by the corresponding trained network. The closeness among the original image and the subsequent images are first determined by utilizing our similarity measure.
1123
TABLE II TRAINING WITH PRUNING: THE PSNRs OF THE RECONSTRUCTED (ORIGINAL) AND GENERALIZED (FIRST, SECOND, AND THIRD) FOOTBALL IMAGES FOR DIFFERENT NUMBER OF HIDDEN UNITS, WITH BLOCK SIZES 4 4 and 8 8. THE ORIGINAL FOOTBALL IMAGE WAS USED FOR NETWORK TRAINING AND THE OTHER IMAGES WERE GENERALIZED BY THE TRAINED NETWORK. THE PRUNING LEVEL USED WAS 10%, WITH 38% INPUT-SIDE WEIGHTS PRUNED. THE COMPRESSION RATIO IS APPROXIMATELY (16=n) : 1
TABLE III TRAINING WITHOUT PRUNING: THE PSNRs OF THE RECONSTRUCTED (ORIGINAL) AND GENERALIZED (FIRST, SECOND, AND THIRD) FOOTBALL IMAGES FOR DIFFERENT NUMBER OF HIDDEN UNITS, WITH BLOCK SIZES 4 4 AND 8 8. THE ORIGINAL IMAGE WAS USED FOR NETWORK TRAINING AND THE OTHER IMAGES WERE GENERALIZED BY THE TRAINED NETWORK. THE COMPRESSION RATIO IS APPROXIMATELY (16=n) : 1
For example, the similarity between the original and the third Football images is about 0.978. Since the block size is expected to affect the similarity greatly, a comparison between different block sizes is also performed and the results are shown in Table I. Clearly the larger the block size, the smaller the similarity measure. It also follows that the similarity does change slightly for a given scene. This implies that the images within a given scene may in principle be compressed yielding similar qualities (PSNRs) by using the same trained network. The PSNRs of the reconstructed and the generalized (compressed) images are shown in Tables II and III. Moreover, the results with and without pruning are also presented as a function of the number of hidden units. It can be concluded that the trained network can successfully compress all the images that follows the first one. Also, it is shown clearly that the results with pruning is almost the same as those obtained without pruning, where on average the number of the network input-side weights is reduced by approximately 35% by utilizing our proposed pruning technique. We have obtained inconsistent results (not shown) for the block size of 16 16, due to in part the lack of adequate training samples. The PSNR of the reconstructed images improved very little or even dropped each time a new hidden unit was added to the network, although the quickprop algorithm for the input-side training converged quite well. Block overlapping techniques may be used to alleviate this problem to some extend. Furthermore, a larger block size implies a larger number of input-side weights, which could make the network training more complicated and more likely to get trapped in some local minimum. Increasing the number of candidates may lead to some improvement in this case, at the expense of significantly increasing the amount of computational time. Fig. 22 depicts the four compressed Football images, namely the reconstructed image of the original Football image that was used by the constructive network for training, and the general-
ized first, second, and third Football images obtained from the same network (see Fig. 21 for the uncompressed images). It is seen that the reconstructed/generalized images are very similar to their original images. The PSNRs for the reconstructed as well as the generalized first, second, and third Football images are shown in Figs. 23 and 24 for block sizes 4 4 and 8 8, respectively. The comparison of our results and the generalization capabilities obtained by utilizing the constructive OHL-FNN with those obtained by utilizing the JPEG scheme are depicted in Figs. 2528. It is interesting to note from Fig. 25 that the PSNR of the reconstructed original image is generally higher than the other trained images shown in Figs. 2628 as well as the JPEG scheme. One possible interpretation for this result is that the network traind on a simple original image is unable to represent the mapping problem effectively, and that it consequently is incapable of generalizing better to complex first, second, and third Football images, as compared to the network that is initially trained on complex first, second, or third Football images, which is then subsequently invoked to generalize to a simpler original Football image. Remark 1: Note that in the simulation results presented in this work for the images Girl, Lena, and Lake the parameter in (7) is selected as 0.02 and is selected as 0.002. Moreand over, for the Football images we have selected with the nonlinear activation functions all selected as . Remark 2: When the proposed technique is to be used for compression of image sequences with successively different scenes having different properties, one may train a series (banks) of OHL-FNNs off-line to cope with the scenes in real time. This idea is somewhat similar to the well-known vector quantization (VQ) scheme. In this way, for a given image, the OHL-FNN that captures the best PSNR may be selected and used to compress all the subsequent similar images.
1124
Fig. 22. (a) Reconstructed original Football image (compressed). (b) Generalized first Football image (compressed). Generalized second Football image (compressed). Generalized third Football image (compressed). The original Football image was used to train a constructive OHL-FNN (block size = 4 4, five hidden units, and network training with pruning).
Fig. 23. PSNRs of the reconstructed original Football image and the generalized first, second, and third Football images, with the block size of 4 4, and pruning level of 10% with 26% input-side weights pruned.
Fig. 27. Comparison of the PSNRs of the generalized second Football image obtained by the network trained using the original image with pruning level of 10% with 33.75% input-side weights pruned and the JPEG scheme.
Fig. 24. PSNRs of the reconstructed original Football image and the generalized first, second, and third Football images, with the block size of 8 8, and pruning level of 10% with 55% input-side weights pruned.
Fig. 28. Comparison of the PSNRs of the generalized third Football image obtained by the network trained using the original image with pruning level of 10% with 33.75% input-side weights pruned and the JPEG scheme.
1125
a smaller network, while at the same time provides similar performance results when compared to its fully connected network counterpart. The constructive OHL network has an attractive training process that is efficient, and whereby the user can easily tradeoff between the PSNR of the reconstructed image and the compression ratio. Further research is needed to expedite the training convergence time at expense of possibly lowering the compression ratios and/or generalization capabilities through development of other training algorithms. REFERENCES
[1] C. Amerijckx, M. Verleysen, P. Thissen, and J. D. Legat, Image compression by self-organized Kohonen map, IEEE Trans. Neural Networks, vol. 9, pp. 503507, May 1998. [2] T. Ash, Dynamic node creation in backpropagation networks, Connection Sci., vol. 1, no. 4, pp. 365375, 1989. [3] A. Basso and M. Kunt, Autoassociative neural networks for image compression, European Trans. Telecommun. Related Technol., vol. 3, no. 6, pp. 593598, 1992. [4] Y. Benbenisti, D. Kornreich, H. B. Mitchell, and P. A. Schaefer, New simple three-layer neural network for image compression, Opt. Eng., vol. 36, no. 6, pp. 18141817, 1997. [5] G. Castellano, A. M. Fanelli, and M. Pelillo, An iterative pruning algorithm for feedforward neural networks, IEEE Trans. Neural Networks, vol. 8, pp. 519531, May 1990. [6] Y. Chang, D. Kumar, and N. Mahalingam, Data compression for image recognition, in Proc. IEEE TENCONSpeech Image Technol. Comput. Telecommun., 1997, pp. 399402. [7] O. T. Chen, B. J. Sheu, and W. C. Fang, Image compression using selforganization networks, IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 480489, May 1994. [8] G. W. Cottrell, P. Munro, and D. Zipser, Learning internal representations from gray-level images: An example of extensional programming, in Proc. 9th Annu. Conf. Cognitive Sci. Soc., 1987, pp. 461473. [9] C. Cramer, Neural networks for image and video compression: A review, European J. Op. Res., vol. 108, no. 2, pp. 266282, 1998. [10] Y. L. Cun, J. S. Denker, and S. A. Solla, Optimal brain damage, in Advances in Neural Information Processing, D. S. Touretzky, Ed: Denver, 1990, pp. 598605. [11] R. D. Dony and S. Haykin, Neural network approaches to image compression, Proc. IEEE, vol. 83, pp. 288303, Feb. 1995. [12] S. E. Fahlman and C. Lebiere, The Cascade-Correlation Learning Architecture, Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-90-100, 1991. [13] N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, NJ: PrenticeHall, 1984. [14] J. Jiang, Image compression with neural networksA survey, Signal Processing: Image Commun., vol. 14, no. 9, pp. 737760, 1999. [15] E. D. Karnin, A simple procedure for pruning back-propagation trained neural networks, IEEE Trans. Neural Networks, vol. 1, pp. 239242, June 1990. [16] T. Y. Kwok and D. Y. Yeung, Constructive algorithms for structure learning in feedforward neural networks for regression problems, IEEE Trans. Neural Networks, vol. 8, pp. 630645, May 1997. , Objective functions for training new hidden units in constructive [17] neural networks, IEEE Trans. Neural Networks, vol. 8, pp. 11311148, Sept. 1997. [18] T. C. Lee, Structure Level Adaptation for Artificial Neural Networks. Boston, MA: Kluwer, 1991. [19] A. Namphol, S. H. Chin, and M. Arozuliah, Image compression with a hierarchical neural network, IEEE Trans. Aerosp. Electron. Syst., vol. 32, pp. 326337, Jan. 1996. [20] A. N. Netravali and B. G. Haskell, Digital Pictures: Representation, Compression, and Standards, 2nd ed. New York: Plenum, 1995. [21] L. Rodriguez, P. J. Zufiria, and J. A. Berzal, Video sequence compression via supervised training on cellular neural networks, Int. J. Neural Syst., vol. 8, no. 1, pp. 127135, 1997. [22] R. Setiono and G. Lu, Image compression using a feedforward neural network, in Proc. IEEE Int. Conf. Neural Networks, 1994, pp. 47614765. [23] W. Sygnowski and B. Macukow, Counter-propagation neural network for image compression, Opt. Eng., vol. 35, no. 8, pp. 22142217, 1996.
Fig. 25. Comparison of the PSNRs of the reconstructed original Football image obtained by the network trained using the original image with pruning level of 10% with 33.75% input-side weights pruned and the JPEG scheme.
Fig. 26. Comparison of the PSNRs of the generalized first Football image obtained by the network trained using the original image with pruning level of 10% with 33.75% input-side weights pruned and the JPEG scheme.
V. CONCLUSION In this paper, the application of a constructive OHL network to image compression was considered. The proposed technique was applied to the compression of three benchmark images namely, the Girl, Lena, and the Lake, as well as four images from a Football sequences of scenes, with promising results. Influence of quantization effects associated with the hidden layer output on the network performance was also investigated. It was found that the pdf-optimized quantization scheme works better than the uniform quantization scheme, as far as the PSNR of the reconstructed image is concerned. Comparison of our proposed technique with the baseline JPEG was also presented. Experimental results have shown that the proposed technique has comparable or even better capabilities as compared to the well-known JPEG scheme. The generalization capabilities of the constructive OHL network were investigated in some detail. A notion of similarity between any two images was presented. For a given image, the network trained may compress (similar) images with quite a good quality of reconstruction provided that the block size is not large. In all the experimental results presented, it was revealed that the proposed input-side weight pruning technique yields
1126
[24] P. L. Venetianter and T. Roska, Image compression by cellular neural networks, IEEE Trans. Circuits Syst. I, vol. 45, pp. 205215, Mar. 1998. [25] W. Weng and K. Khorasani, An adaptive structural neural network with application to EEG automatic seizure detection, Neural Networks, vol. 9, no. 7, pp. 12231240, 1996.
Liying Ma received the B.S. degree from Northeastern University, Shengyang, China, in 1984, the M.S. degree from Hiroshima University, Higashi-Hiroshima, Japan, in 1991, and the Ph.D. degree from Concordia University, Montreal, QC, Canada, in 2001, in electrical and computer engineering. She was with the Automation Research Institute of Ministry of Metallurgical Industry, Bejing, China, as a Research Electrical Engineer from 1984 to 1988. She was a Systems Engineer at Yokogawa Engineering Service Company, Tokyo, Japan. Her main areas of research interests are in neural networks and their applications to image processing and pattern recognition.
K. Khorasani (M85) received the B.S., M.S., and Ph.D. degrees in electrical and computer engineering from the University of Illinois at Urbana-Champaign, Urbana, in 1981, 1982, and 1985, respectively. From 1985 to 1988, he was an Assistant Professor at the University of Michigan at Dearborn, Dearborn, and since 1988, he has been at Concordia University, Montreal, QC, Canada, where he is currently a Professor in the Department of Electrical and Computer Engineering and since 1998 an Associate Dean in the Faculty of Engineering and Computer Science. His main areas of research are in stability theory, nonlinear and adaptive control, modeling and control of flexible link/joint manipulators, neural-network applications to pattern recognition, robotics and control, adaptive structure neural networks, and distributed and collaborative force feedback of haptic interfaces in virtual environments. He has authored/coauthored more than 175 publications in these areas.

Application of Adaptive Constructive Neural Networks To Image Compression

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Application of Adaptive Constructive Neural Networks To Image Compression

Hochgeladen von

Copyright:

Verfügbare Formate

1112

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002