Privacy Protection in Video Surveillance Systems: Analysis of Subband-Adaptive Scrambling in JPEG XR

170
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBRUARY 2011
Privacy Protection in Video Surveillance Systems: Analysis of Subband-Adaptive Scrambling in JPEG XR

Hosik Sohn, Wesley De Neve, and Yong Man Ro, Senior Member, IEEE
Abstract This paper discusses a privacy-protected video surveillance system that makes use of JPEG extended range (JPEG XR). JPEG XR offers a low-complexity solution for the scalable coding of high-resolution images. To address privacy concerns, face regions are detected and scrambled in the transform domain, taking into account the quality and spatial scalability features of JPEG XR. Experiments were conducted to investigate the performance of our surveillance system, considering visual distortion, bit stream overhead, and security aspects. Our results demonstrate that subband-adaptive scrambling is able to conceal privacy-sensitive face regions with a feasible level of protection. In addition, our results show that subband-adaptive scrambling of face regions outperforms subband-adaptive scrambling of frames in terms of coding efciency, except when low video bit rates are in use. Index TermsJPEG XR, ROI, scrambling, video surveillance.
I. Introduction
RESENT-DAY video surveillance systems are often required not to intrude upon the privacy of the general public. This paper proposes a privacy-protected video surveillance system using the JPEG extended range (JPEG XR) coding format [1]. JPEG XR allows for the scalable coding of still imagesit was approved as an International Standard (formally designated as ISO/IEC 29199-2) and also as an ITU-T recommendation (formally designated as T.832) in the course of 2009. To mitigate privacy concerns [2], [3], our video surveillance system is able to detect and protect face regions. Face regions, which are considered privacy sensitive, are protected using different scrambling techniques operating in the transform domain. The proposed scrambling strategy takes into account the scalability provisions of JPEG XR and preserves format compliance. The research presented in this paper is an extension and an improvement of the research discussed in [4]. In [4], we provide an initial analysis of region of interest-based
(ROI-based) scrambling for scalable surveillance video content coded using JPEG XR. In particular, two issues were discussed in [4]: rst, a subband-adaptive scrambling strategy that takes into account the scalability provisions of JPEG XR, and second, the le size overhead caused by the use of tiling. In this paper, we provide a more detailed analysis of these issues, providing additional background information that better motivates our design decisions. We also propose an improved scrambling technique for DC subbands that offers a higher level of protection. Further, we address the need for scrambling of the high pass subbands, something not considered in [4]. Finally, the analysis presented in this paper uses representative surveillance video content. The rest of this paper is organized as follows. In Section II, we briey review JPEG XR. Section III outlines the proposed video surveillance system. In addition, we explain our subband-adaptive technique for scrambling surveillance video content encoded with JPEG XR. Experimental results are presented in Section IV, illustrating the performance of subband-adaptive scrambling for both complete frames and ROIs. Finally, Section V concludes this paper.
II. Image Coding Using JPEG XR Intra coding is often used in video surveillance systems as it allows for low-complexity and low-delay implementations. Also, intra coding is benecial when CPU power is more expensive than bandwidth and storage. These considerations are important when designing cost-effective video surveillance systems that are required to process a high number of simultaneous video streams with a high spatial resolution [5]. The cameras in our surveillance system encode each video frame using JPEG XR. The main technical benet of using JPEG XR as an intra video codec can be found in its low computational complexity, while offering image quality and scalability provisions that are, from a practical point of view, similar to that of motion JPEG 2000 and the scalable high intra prole of H.264/AVC scalable video coding (SVC) [6]. Due to place constraints, we would like to refer the reader to references for further information regarding the coding tools of JPEG XR, and in particular its transform design [1], provisions for ROI coding [4], and scalability tools [7], [8]. ROI coding allows for selective scrambling, implying that nonauthorized users are still able to recognize the background and
Manuscript received November 8, 2009; revised June 1, 2010; accepted September 19, 2010. Date of publication January 17, 2011; date of current version March 2, 2011. This work was supported by the National Research Foundation (NRF) of Korea, under Grant NRF-D00070. This paper was recommended by Associate Editor Q. Sun. The authors are with the Image and Video Systems Laboratory, Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea (e-mail: sohnhosik@kaist.ac.kr; wesley.deneve@kaist.ac.kr; ymro@ee.kaist.ac.kr). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TCSVT.2011.2106250
1051-8215/$26.00 c 2011 IEEE
SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS
171
Fig. 1.
Overall architecture of the proposed video surveillance system.
event context. Scalability enables video surveillance systems to monitor public and private places anytime and anywhere, using a wide range of network technologies and devices. III. Enabling Privacy Protection in JPEG XR In this section, we rst describe the architectural details of our surveillance system. Next, we discuss subband-adaptive scrambling for video content encoded with JPEG XR. A. Proposed Video Surveillance System 1) System Architecture and Application Scenarios: As shown in Fig. 1, our surveillance system consists of multiple IP cameras that are connected to a central management and storage server (CMSS). The IP cameras and the CMSS are connected by means of a wired LAN that has a speed of 1 Gb/s. In addition, video analysis and encoding are performed by the IP cameras. Specically, video analysis is responsible for detecting face regions, while encoding is responsible for compressing the video data using our privacy-enabled JPEG XR encoder (see Section III-B). Our surveillance system is able to accommodate clients that use diverse network connections and devices. Three application scenarios can be dened to reect this diversity. In a high-complexity scenario, we assume that a desktop PC communicates with the CMSS by means of a wired LAN that has a speed of 100 Mb/s. In a medium-complexity scenario, we assume that a laptop communicates with the CMSS by means of a 70 Mb/s WiMAX network. Finally, in a low-complexity scenario, we assume that a smartphone communicates with the CMSS by means of a 3GPP network that has a speed of 7.2 Mb/s. The scalability tools of JPEG XR can be used to create adaptive video content. 2) Modied JPEG XR Encoder and Decoder: Fig. 2(a) illustrates the design of our modied JPEG XR encoder. Before performing entropy coding, different scrambling techniques are applied to the transform coefcients in the DC, low pass (LP), and high pass (HP) subbands, respectively (see the gray-shaded boxes). A detailed explanation of the scrambling techniques employed is provided in Section III-B. When ROI-based scrambling is in use, the location of each face region is communicated by the face detection module to our modied JPEG XR encoder (we assume face detection is accurate). The modied JPEG XR encoder is then able to construct an appropriate tile layout, making it possible to only scramble MBs located in face regions. Note that pseudorandom numbers are generated by relying on a secret key that
Fig. 2. Modied JPEG XR encoder and decoder. (a) Encoder. (b) Decoder.
is used as a seed value. At the side of the decoder, only an authorized user (i.e., a user who knows the secret key) is able to revert the scrambling process during decoding. The design of our modied JPEG XR decoder is shown in Fig. 2(b). A more detailed description of key management can be found in [9]. B. Subband-Adaptive Scrambling We use a subband-adaptive approach to scramble surveillance video content encoded with JPEG XR. This approach is motivated by the following observation: when scrambling a particular subband, a tradeoff exists between the visual importance of the subband, the available amount of coded data in the subband, the visual distortion and the level of protection offered by the scrambling technique used, and the effect on coding efciency. Similar to [10], we only focus on scrambling luminance channels in order to keep the impact on the coding efciency and the computational complexity low. However, it should be straightforward to extend the proposed scrambling strategy to chrominance channels. 1) DC Subbands: Random Level Shift: In a DC subband, a limited amount of data is available for the purpose of scrambling. Indeed, each MB in a tile only contributes a single DC coefcient to the DC subband of that particular tile. Therefore, we propose to make use of random level shift (RLS) in order to scramble DC subbands with a sufcient level of protection. RLS pseudo-randomly shifts the level of a DC coefcient value as follows: DCcoeff e = DCcoeff + R(L) (1)
where DCcoeff denotes the data to be scrambled and where DCcoeff e denotes the pseudo-randomly level-shifted data. R(L ) represents a pseudo-randomly generated number whose range is from 2L1 to 2L1 . RLS comes with a decrease in coding efciency due to the pseudo-random offset added to each DC coefcient value. Hence, to avoid a signicant loss of coding efciency, it is necessary to select a proper value for L. This selection process is discussed in more detail
172
in Section IV-A1. In [4], we applied RSI to the sign of DC coefcients while we pseudo-randomly ipped the renement bits of DC coefcients. Although this approach does not affect the coding efciency, it provides a lower level of security due to the limited amount of coded data that is available for scrambling. 2) LP Subbands: Random Permutation: An LP subband is visually less important than a DC subband, but visually more important than an HP subband. Also, an LP subband contains more transform coefcients than a DC subband, but less transform coefcients than an HP subband. Therefore, we propose to apply random permutation (RP [11], [12]) to the transform coefcients stored in an LP subband (RP is applied at the level of a MB). RP pseudo-randomly permutes the ordering of LP coefcients in a MB as follows: LPcoeffie = LPcoeffj where i = 1, ..., C j = x1 , ..., xC
Fig. 3. Bit stream overhead when using RLS for scrambling DC subbands (video sequence used: ATM).
IV. Experimental Results We have implemented the proposed scrambling approach in the HD photo device porting kit 1.0 [13]. As no surveillance video content is publicly available with a sufciently high resolution, we have manually generated three representative video sequences: ATM, Stairs, and Hall way [14]. All video sequences contain face images (see Fig. 10), and have 4CIF resolution, a frame rate of 30 frames/s, and a length of 200 frames. Overhead gures were obtained by averaging the overhead over all 200 video frames. This section discusses four experiments. Section IV-A investigates the bit stream overhead and the visual effectiveness of subband-adaptive scrambling in a frame-based setting, for each individual subband (as discussed in Section IV-D, an adversary may separately attack each subband). Section IVB describes the effect of tiling on the coding efciency. Section IV-C analyzes subband-adaptive scrambling in an ROI-based setting. Finally, Section IV-D discusses the level of protection offered by subband-adaptive scrambling. A. Analysis of Frame-Based, Subband-Adaptive Scrambling 1) Scrambled DC Subbands: Fig. 3 shows the impact of RLS on the coding efciency of the DC subbands for a varying bit rate. The selection of an appropriate value for L (see Section III-B1) was done as follows: we rst measured the average bit stream overhead for all video sequences for varying values of L and then selected the maximum value of L that produces less than 30% of bit stream overhead. Although the same value of L can be utilized for the whole range of bit rates, the amount of overhead signicantly increases as the bit rate decreases (see the bit stream overhead results when L = 8 in Fig. 3). Hence, this observation motivated us to make the value of L dependent on the bit rate. Although the use of a small value of L results in a lower level of protection at lower bit rates, the DC video frames at lower bit rates are already severely distorted due to signicant quantization. Fig. 3 shows that RLS causes a signicant decrease in coding efciency. However, the amount of coded data in the DC subbands is relatively small compared to the amount of coded data in the other subbands. As such, the overhead at the level of the DC subband does not signicantly decrease the overall coding efciency. For instance, scrambling DC subbands results in an increase of the overall bit rate of 0.6% at the highest bit rate and in an increase of the overall bit rate of 11.7% at the lowest bit rate (the increase in overhead for the
(2)
where LPcoeffj denotes the j th LP coefcient in a MB and where LPcoeffie denotes the ith LP coefcient in the pseudo-randomly permuted macroblock. In (2), C represents the number of LP coefcients in a MB (i.e., C is equal to 15) and x1 , . . . , xC represent non-overlapping random numbers, ranging from 1 to C. RP offers a higher level of protection than RLS as RP allows for a higher number of possible combinations. However, RP comes with a higher decrease in coding efciency as this approach affects the effectiveness of entropy coding more signicantly (see Section IV-A2). 3) HP Subbands: Random Sign Inversion: An HP subband is visually less important than a DC and LP subband. However, this subband still contains visually important information that could for instance be exploited by face recognition techniques. The image shown in Fig. 8(b) was decoded by only using luminance information from the HP subbands of the ATM image shown in Fig. 8(a), illustrating that the high frequency information in the HP subbands describes edges that may reveal the silhouette of a face. Consequently, we propose to apply random sign inversion (RSI [11], [12]) to the HP transform coefcients. RSI pseudo-randomly ips the sign of each coefcient as follows: HPcoeff e = HPcoeff, +HPcoeff, if r = 1 otherwise (3)
where HPcoeff denotes the coefcients to be scrambled and where HPcoeff e denotes the pseudo-randomly sign-ipped coefcients. In (3), r denotes a 1-bit pseudo-random number. Since the sign information of HP coefcients is signaled using a Boolean ag, the coding efciency is not affected. 4) Flexbits Subbands: No Scrambling: We propose not to scramble the Flexbits subbands. These subbands contain the lower order bits of the HP transform coefcients. Hence, the information provided by the Flexbits subbands is only of limited visual importance. Moreover, the amount of coded data in the Flexbits subband signicantly decreases as the bit rate decreases (see Table I).
173
Fig. 4. Visual effect of RLS when scrambling DC subbands. (a) Original DC image of ATM. (b) RLS (L = 6). (c) RLS (L = 8) (images were cropped and magnied for visualization purposes). Fig. 7. Bit stream overhead when using RP for scrambling HP subbands (video sequence used: ATM).
Fig. 5. Bit stream overhead when using RP for scrambling LP subbands (video sequence used: ATM).
Fig. 8. Visual effect of scrambling HP subbands. (a) Original DC+LP+HP image of ATM. (b) Original HP subband of ATM (luminance only). (c) RP on (b). (d) RSI on (b) (images were cropped and magnied for visualization purposes, contrast and brightness were enhanced).
Fig. 6. Visual effect of scrambling LP subbands. (a) Original DC+LP image of ATM. (b) Original LP subband of ATM (luminance only). (c) RSI on (b). (d) RP on (b) (images were cropped and magnied for visualization purposes).
DC subbands only is, respectively, equal to 12.8% and 24.5% in Fig. 3). Fig. 4(b) and (c) shows DC images scrambled with RLS when L = 6 and L = 8, respectively. The images illustrate that a larger value of L leads to more severe distortion. 2) Scrambled LP Subbands: Fig. 5 shows the bit stream overhead when RP is applied to LP subbands. The overhead becomes higher as the bit rate of the bit streams decreases. This implies that the effect of scrambling on the coding efciency is more signicant at lower bit rates. The signaling overhead also increases as the bit rate decreases. RSI (see Section III-B3) is also a feasible candidate technique for scrambling LP subbands since this technique does not produce any bit stream overhead. However, RSI comes with a lower level of protection compared to RP. Since a MB only contributes 15 coefcients to an LP subband, the use of RSI results in 215 combinations per MB, while the use of RP results in 240 (15!) combinations per MB. Fig. 6 illustrates the visual effect of RSI and RP on the LP subbands, showing that the use of RP leads to more visual distortion than the use of RSI. 3) Scrambled HP Subbands: Fig. 7 shows the bit stream overhead when RP is applied to HP subbands. The HP subbands contain signicantly more coefcient information than the other subbands (see, for example, the bit rates of the DC and LP subbands in Figs. 3 and 5, respectively). Hence, even a slight change of the coefcients may signicantly affect the effectiveness of entropy coding, resulting in a signicant
decrease of the coding efciency. Although the use of RP results in a signicant number of combinations (240!) for an HP subband, it produces a prohibitive amount of overhead (as shown in Fig. 7). Further, the overhead becomes higher as the bit rate decreases. Fig. 8 illustrates the visual effect of RP and RSI on the HP subbands. The use of RP results in more visual distortion than the use of RSI. However, as RP causes a signicant amount of bit stream overhead, it is more appropriate to scramble HP subbands using RSI. 4) Unied Scrambling Strategy: The scrambling techniques discussed in the previous sections can be combined into a single, subband-adaptive scrambling strategy: RLS for scrambling DC subbands, RP for scrambling LP subbands, and RSI for scrambling HP subbands. Table I shows the bit stream overhead when applying subband-adaptive scrambling to complete frames. Results are only shown for a few selected quantization parameter (QP) values, resulting in video bit rates that are in line with the bandwidth capabilities of a conventional IP camera: 1 Mb/s, 5 Mb/s, and 10 Mb/s [5]. In Table I, the labels I and IS represent the bit rates of the original and the scrambled video sequences, respectively. Further, the label O denotes the overall increase in bit stream overhead caused by scrambling the original video sequences (hereinafter, the same labels are used in all other tables). The overhead caused by frame-based, subband-adaptive scrambling can be attributed to the use of RLS and RP. Specically, the bit stream overhead produced by RLS and RP becomes higher as the bit rate decreases since the effect of scrambling on the coding efciency is more signicant at lower bit rates. Fig. 9 allows for a subjective assessment of the effectiveness of our scrambling strategy, visualizing scrambled versions of a representative image from ATM, each time decoded using a different number of subbands. The visual distortion caused by scrambling is sufciently strong to conceal the identity of the subject shown in the original
174
TABLE I Overall Bit Stream Overhead Caused By Frame-Based, Subband-Adaptive Scrambling All Subbands IS (kb/s) O (%) 9070 4948 961 8673 4777 890 9015 4913 876 2.6 6.3 17.2 2.6 3.8 4.0 3.1 6.1 15.3 DC+LP+HP IS (kb/s) O (%) I (kb/s) ATM 9047 2.6 3396 4948 6.3 2330 961 17.2 599 Stairs 8599 2.6 2794 4777 3.8 1873 890 4.0 592 Hall Way 8881 3.2 3236 4913 6.1 1995 876 15.3 531 DC+LP IS (kb/s) 3628 2622 740 3015 2049 627 3508 2276 647 DC IS (kb/s) 828 751 346 692 532 269 805 667 284
QP 20 35 80 35 55 100 25 45 95
I (kb/s) 8837 4656 819 8452 4602 855 8743 4632 760
I (kb/s) 8814 4656 819 8378 4602 855 8609 4632 760
O (%) 6.9 12.5 23.6 7.9 9.4 5.8 8.4 14.1 22.0
I (kb/s) 684 569 263 559 432 228 665 517 221
O (%) 20.9 32.1 31.3 23.7 23.1 18.4 21.0 29.1 28.9
Fig. 9. Visual effect of frame-based, subband-adaptive scrambling. (a) Scrambled version of Fig. 4(a). (b) Scrambled version of Fig. 6(a). (c) Scrambled version of Fig. 8(a) (images were used without cropping).
images. Similar observations were also made for Stairs and Hall way. B. Analysis of Tiling This section investigates the effect of tiling on the coding efciency. Table II shows the bit stream overhead for uniform tile layouts with a varying tile size. The label 1 1 MB for example refers to the use of a uniform tile layout consisting of 44 36 tiles at 4CIF resolution. As shown in Table II, the combined use of a small tile size and a uniform tile layout may signicantly decrease the coding efciency. This can be attributed to a broken entropy coding and an increasing number of tile headers and index table entries. Also, the bit stream overhead becomes higher as the bandwidth decreases since the same syntax structures are used to signal a lower amount of coded image data. The effect of tiling on the coding efciency is also lower in the higher frequency subbands. Table III shows the bit stream overhead for several nonuniform tile layouts. The size of each tile is interactively determined according to the location of the facial regions in the surveillance video content. In summary, our results demonstrate that it is necessary to make use of a non-uniform tile layout to avoid the signicant amount of overhead that comes with the use of ne-grained and uniform tiling. C. Analysis of ROI-Based, Subband-Adaptive Scrambling Fig. 10 illustrates that the face regions in ATM, Stairs, and Hall way are sufciently concealed by our scrambling strategy, protecting the identity of the subjects shown.
Fig. 10. Visual effect of (a) ATM (all subbands). (b) (d) Stairs (all subbands). (e) (g) Hall way (all subbands). (DC+LP).
ROI-based, subband-adaptive scrambling. ATM (DC+LP+HP). (c) ATM (DC+LP). Stairs (DC+LP+HP). (f) Stairs (DC+LP). (h) Hall way (DC+LP+HP). (i) Hall way
Table IV shows the bit stream overhead when using ROIbased, subband-adaptive scrambling (where the ROI is represented by a non-uniform tile layout). Compared to framebased scrambling (see Table I), ROI-based scrambling mostly allows for a lower bit stream overhead. However, at the lowest bit rate, the bit stream overhead of ROI-based scrambling is higher than the bit stream overhead of frame-based scrambling for Stairs and Hall way. The results in Tables III and IV show that this can be attributed to the use of tiling. D. Security Considerations This section analyzes the level of protection offered by the proposed scrambling technique against a brute force attack. For one MB, the use of RLS at the level of the DC coefcients results in 2L +1 possible combinations, the use of RP at the level of the LP coefcients results in 15!/(15 K)! possible combinations, and the use of RSI at the level of the HP coefcients results in 2M possible combinations. Note that L
175
TABLE II Bit Stream Overhead Caused by Uniform Tiling (Video Sequence Used: ATM) All Subbands IS (kb/s) O (%) 14 612 11 163 8667 11 226 7606 4744 9169 5111 1325 65.4 139.8 957.7 27.0 63.4 478.9 3.8 9.8 61.7 DC+LP+HP IS (kb/s) O (%) I (kb/s) 1 1 MB 14 620 65.9 3396 11 163 139.8 2330 8667 957.7 599 2 2 MB 11 161 26.6 3396 7595 63.1 2330 4744 478.9 599 6 6 MB 9139 3.7 3396 5102 9.6 2330 1325 61.7 599 TABLE III Bit Stream Overhead Caused by Non-Uniform Tiling All Subbands IS (kb/s) O (%) 8846 4734 931 8499 4692 963 8868 4793 948 0.1 1.7 13.6 0.6 2.0 12.6 1.4 3.5 24.8 DC+LP+HP DC+LP I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%) ATM: 9 tiles (3 3 non-uniform tile layout) 8814 8824 0.1 3396 3449 1.6 4656 4734 1.7 2330 2394 2.8 819 931 13.6 599 690 15.2 Stairs: 9 tiles (3 3 non-uniform tile layout) 8378 8428 0.6 2794 2850 2.0 4602 4692 2.0 1873 1945 3.9 855 963 12.6 592 686 15.8 Hall way: 15 tiles (5 3 non-uniform tile layout) 8609 8760 1.8 3236 3367 4.1 4632 4793 3.5 1995 2146 7.6 760 948 24.8 531 699 31.7 TABLE IV Bit Stream Overhead Caused by ROI-Based, Subband-Adaptive Scrambling All Subbands IS (kb/s) O (%) 8867 4757 945 8500 4693 998 8870 4796 950 0.3 2.2 15.3 0.6 2.0 16.6 1.5 3.5 25.0 DC+LP+HP IS (kb/s) 8844 4757 945 8428 4693 998 8762 4796 950 DC+LP IS (kb/s) 3469 2417 704 2851 1946 720 3370 2148 700 DC IS (kb/s) 714 605 302 586 463 306 715 572 288 DC IS (kb/s) 708 596 300 585.85 463.11 264.75 714 571 288 DC+LP IS (kb/s) 8593 8309 8052 5493 5001 4353 3704 2691 1055 DC IS (kb/s) 3856 3856 3856 1810 1779 1724 816 713 448
QP 20 35 80 20 35 80 20 35 80
I (kb/s) 8837 4656 819 8837 4656 819 8837 4656 819
I (kb/s) 8814 4656 819 8814 4656 819 8814 4656 819
O (%) 153.1 256.7 1244.5 61.8 114.7 1553.7 9.1 15.5 76.1
I (kb/s) 684 569 263 684 569 263 684 569 263
O (%) 463.5 578.1 1365.0 164.5 212.9 554.9 19.3 25.4 70.2
QP 20 35 80 35 55 100 25 45 95
I (kb/s) 8837 4656 819 8452 4602 855 8743 4632 760
I (kb/s) 684 569 263 559 432 228 665 517 221
O (%) 3.5 4.8 14.0 4.7 7.2 16.4 7.4 10.6 30.6
QP 20 35 80 35 55 100 25 45 95
I (kb/s) 8837 4656 819 8452 4602 855 8743 4632 760
I (kb/s) 8814 4656 819 8378 4602 855 8609 4632 760
O (%) ATM 0.3 2.2 15.3 Stairs 0.6 2.0 16.6 Hall Way 1.8 3.5 25.0 TABLE V
I (kb/s) 3396 2330 599 2794 1873 592 3236 1995 531
O (%) 2.2 3.7 17.5 2.0 3.9 21.6 4.1 7.7 32.0
I (kb/s) 684 569 263 559 432 228 665 517 221
O (%) 4.3 6.3 14.8 4.7 7.2 34.4 7.5 10.7 30.6
Parameters Used for Scrambling ROIs ( N , K , and M Were Obtained by Averaging Over 200 Frames) ATM (N = 132.0) L K 8 8 3 10.9 8.6 2.5 Stairs (N = 9.1) L K 7 5 1 5.2 3.6 1.0 Hall way (N = 12.9 and 17.4) ROI1 ROI2 K M K M 8 13.5 61.7 11.9 41.2 7 11.9 35.1 9.7 23.2 2 3.7 5.4 2.3 3.7
QP 20 35 80
M 35.1 19.1 4.2
QP 35 55 100
M 28.8 16.3 2.2
QP 25 45 95
176
TABLE VI Statistical Properties of the LP Coefficients in the Face Regions Used ( ME and VAR Denote the Mean and the Variance of the LP Coefficients, Respectively) ATM ME VAR 0.3 0.2 0.0 96.5 26.5 0.5 Stairs ME VAR 0.0 0.0 0.0 54.0 9.1 0.2 Hall Way ROI1 ROI2 ME VAR ME VAR 0.2 245.7 0.6 156.0 0.1 45.1 0.3 28.7 0.0 0.6 0.0 0.4
QP 20 35 80
QP 35 55 100
QP 25 45 95
denotes the parameter used by RLS in the DC subbands, K denotes the number of non-zero LP coefcients in a single MB, and M denotes the number of non-zero HP coefcients in a single MB. As such, when subbands are incrementally attacked in the order of DC, LP, and HP, the total number of combinations required to break the complete protection of N MBs is equal to (2L +1)N +(15!/(15 K )!)N +(2M )N . Note that, when offering a maximum level of security, the use of RP at the level of the LP coefcients results in 15! possible combinations. However, our computation of the number of possible combinations of LP coefcients reects the following observations. First, when low bit rates are in use, the major factor that affects the security level is the number of zero-valued LP coefcients (as most coefcients become zero then). Indeed, the presence of repeated zero coefcients in an LP subband reduces the number of combinations that needs to be tested since an adversary can exploit knowledge about the number of the zero coefcients in an LP subband during decoding. Second, when high bit rates are in use, the probability of the presence of repeated identical non-zero coefcients is low. Hence, for simplicity of computation, we assume that all non-zero LP coefcient values in a single MB are different. Similar observations hold true for the HP coefcients. Table V shows the parameters used by RLS, RP, and RSI for the face regions used in our experiments. At the lowest bit rate (when the QP value of ATM, Stairs, and Hall way is 80, 100, and 95, respectively), for each face region consisting of N MBs, the total number of combinations that needs to be tested is approximately equal to 3.7 10453 , 5.0 1010 , 1.8 1058 , and 2.6 1040 (for simplicity of calculation, the K values in Table V were rounded-off). Further, we observed that decoding and descrambling of DC subbands requires about 1.9 ms on a quad-core 2.0 GHz processor. Consequently, the time needed to generate all possible face regions is approximately equal to 2.3 10443 , 3.0, 1.1 1048 , and 1.5 1030 years for each case. These numbers show that ROI-based, subband-adaptive scrambling provides a feasible level of protection against a brute force attack. In practice, attacking the DC and LP subbands may already be sufcient to reveal the identity of a subject [see, for example, Fig. 6(a)]. In this context, the total number of combinations required to break the protection of N MBs is reduced to (2L +1)N +(15!/(15-K )!)N . As such, at the lowest bit rate, the number of combinations required to break the protection of the DC and LP subbands of the face regions described in Table V is equal to 3.7 10453 , 5.0 1010 , 1.8 1058 , and 2.6 1040 , corresponding to approximately
2.3 10443 , 3.0, 1.1 1048 , and 1.5 1030 years needed to generate all possible face images. It should be clear that the absence of HP subbands does not affect the overall security level since the number of combinations required to break the protection of the HP subbands is signicantly smaller than the number of combinations required to break the protection of both the DC and LP subbands. Further, as the bit rate decreases, the security level of the LP subbands decreases. Indeed, most LP coefcients become zero then, thus lowering the effectiveness of RP (see Table VI). In this case, an adversary may attack the LP subbands by making use of an error-concealment attack, simply setting the scrambled LP coefcients to zero [10]. However, as this error-concealment attack comes down to only decoding the DC subbands, we believe that such an approach will, in most cases, be unsuccessful in revealing the identity of a subject [see Fig. 4(a)]. Moreover, the use of lower bit rates also results in DC images with a lower visual quality, thus further hampering face recognition efforts. Therefore, we believe that, in general, an adversary needs to conduct a brute-force attack to at least both the DC and LP subbands in order to achieve visual results that are meaningful for the purpose of face recognition. This could for instance be investigated in more detail in a user study. V. Conclusion We discussed a subband-adaptive approach for scrambling privacy-sensitive face regions in JPEG XR-encoded surveillance video content. Our subband-adaptive approach is the result of a tradeoff between the visual importance of subbands, the amount of coded data in the subbands, the level of security offered by a particular scrambling technique, the effect of scrambling on the coding efciency, and the computational complexity of the scrambling technique used. Experimental results were obtained for representative surveillance video content, having 4CIF resolution and a frame rate of 30 frames/s. Our results show that subbandadaptive scrambling is able to conceal privacy-sensitive face regions with a feasible level of protection. However, the combined use of scrambling and tiling lowers the coding efciency as the video bit rate decreases. Therefore, instead of only scrambling privacy-sensitive face regions, scrambling the whole image region may be more efcient from the point-of-view of coding efciency. In particular, for two out of the three video surveillance sequences used in our experiments (Stairs and Hall way), we observed that frame-based, subband-adaptive scrambling resulted in a lower bit stream overhead than ROI-based, subband-adaptive scrambling when a video bit rate of 1 Mb/s was in use (for all spatial resolutions used). In all other cases, subbandadaptive scrambling for ROIs outperformed subband-adaptive scrambling for complete frames in terms of coding efciency. References
[1] S. Srinivasan, C. Tu, S. L. Regunathan, and G. J. Sullivan, HD photo: A new image coding technology for digital photography, Proc. SPIE, vol. 6696, pp. 66960A.166960A.19, Aug. 2007.
177
[2] H. Sohn, E. T. Anzaku, W. De Neve, Y. M. Ro, and K. N. Plataniotis, Privacy protection in video surveillance systems using scalable video coding, in Proc. IEEE Int. Conf. Adv. Video Signal Based Surveillance, Sep. 2009, pp. 424429. [3] A. Cavallaro, Privacy in video surveillance, IEEE Signal Process. Mag., vol. 24, no. 2, pp. 166168, Mar. 2007. [4] H. Sohn, W. De Neve, and Y. M. Ro, Region-of-interest scrambling for scalable surveillance video using JPEG XR, in Proc. ACM Int. Conf. Multimedia, Oct. 2009, pp. 861864. [5] J. Honovich. (2009, Aug.). Security Managers Guide to Video Surveillance, Version 3.0, pp. 2627 [Online]. Available: http://ipvideomarket. info [6] T. D. Tran, L. Liu, and P. Topiwala, Performance comparison of leading image codecs: H.264/AVC intra, JPEG2000, and Microsoft HD photo, Proc. SPIE, vol. 6696, pp. 66960B.166960B.14, Oct. 2007. [7] C. Perra and D. Giusto, An image browsing application based on JPEG XR, in Proc. Int. Workshop CBMI, Jun. 2008, pp. 396401. [8] W. De Neve, D. Van Deursen, W. Van Lancker, Y. M. Ro, and R. Van de Walle, Improved BSDL-based content adaptation for JPEG 2000 and HD photo (JPEG XR), Signal Process. Image Commun., vol. 24, no. 6, pp. 452467, Jul. 2009. [9] Y. G. Kim, S. H. Jin, and Y. M. Ro, Scalable security and conditional access control for multiple regions of interest in scalable video coding, in Proc. Int. Workshop Digital Watermarking, LNCS 5041. Dec. 2008, pp. 7186. [10] F. Dufaux and T. Ebrahimi, Scrambling for privacy protection in video surveillance systems, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 8, pp. 11681174, Aug. 2008. [11] F. Dufaux and T. Ebrahimi, H.264/AVC video scrambling for privacy protection, in Proc. IEEE ICIP, Oct. 2008, pp. 16881691. [12] W. Zeng and S. Lei, Efcient frequency domain video scrambling for content access control, in Proc. ACM Int. Conf. Multimedia, Oct. 1999, pp. 285294. [13] HD Photo Device Porting Kit 1.0 [Online]. Available: http://www. microsoft.com/whdc/xps/hdphotodpk.mspx [14] IVY Lab Surveillance Video Dataset [Online]. Available: http://ivylab. kaist.ac.kr/demo/vs/dataset.htm
Wesley De Neve received the M.S. degree in computer science and the Ph.D. degree in computer science engineering from Ghent University, Ghent, Belgium, in 2002 and 2007, respectively. He is currently a Senior Researcher with the Image and Video Systems Laboratory (IVY Lab), in the position of an Assistant Research Professor. IVY Lab is part of the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea. Prior to joining KAIST, he was a Post-Doctoral Researcher with both Ghent University-IBBT, Ghent, and the Information and Communications University, Daejeon. His current research interests and areas of publication include the coding, annotation, and adaptation of image and video content, graphics processing unit based video processing, efcient XML processing, and the semantic and social web. Yong Man Ro (M92SM98) received the B.S. degree from Yonsei University, Seoul, Korea, and the M.S. and Ph.D. degrees from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea. In 1987, he was a Visiting Researcher with Columbia University, New York, NY, and from 1992 to 1995 he was a Visiting Researcher with the University of California, Irvine, and KAIST. He was a Research Fellow with the University of California, Berkeley, and a Visiting Professor with the University of Toronto, Toronto, ON, Canada, in 1996 and 2007, respectively. He is currently a Full Professor with KAIST, where he is directing the Image and Video Systems Laboratory, Department of Electrical Engineering. He participated in the MPEG-7 and MPEG-21 international standardization efforts, contributing to the denition of the MPEG-7 texture descriptor, the MPEG-21 digital item adaptation visual impairment descriptors, and modality conversion. His current research interests include image/video processing, multimedia adaptation, visual data mining, image/video indexing, and multimedia security. Dr. Ro received the Young Investigator Finalist Award of ISMRM in 1992 and the Scientist Award in Korea in 2003. He served as a TPC member of international conferences such as IWDW, WIAMIS, AIRS, and CCNC, and was the Program Co-Chair of IWDW in 2004.
Hosik Sohn received the B.S. degree from Korea Aerospace University, Goyang, Korea, in 2007, and the M.S. degree from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2009. He is currently working toward the Ph.D. degree from the Image and Video Systems Laboratory, Department of Electrical Engineering, KAIST. His current research interests include visual quality measurement, video adaptation, multimedia security, bio-cryptography, scalable video coding, and JPEG extended range.

Privacy Protection in Video Surveillance Systems: Analysis of Subband-Adaptive Scrambling in JPEG XR

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Privacy Protection in Video Surveillance Systems: Analysis of Subband-Adaptive Scrambling in JPEG XR

Hochgeladen von

Copyright:

Verfügbare Formate

170

Privacy Protection in Video Surveillance Systems: Analysis of Subband-Adaptive Scrambling in JPEG XR

1051-8215/$26.00 c 2011 IEEE

SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS

Overall architecture of the proposed video surveillance system.

SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS

SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS

M 35.1 19.1 4.2

M 28.8 16.3 2.2

SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS

Das könnte Ihnen auch gefallen