Sie sind auf Seite 1von 4

A Selective Video Encryption for the Region of

Interest in Scalable Video Coding


Yeongyun Kim, Sung Ho Jin, Tae Meon Bae, and *Yong Man Ro
Image and Video Systems Laboratory
Information and Communications University (ICU)
119 Munjiro, Yuseong-Gu, Daejeon, 305-732, South Korea
Abstract-In this paper, we propose an encryption method to
protect the region of interest (ROI) which is standard element
in the scalable video coding (SVC). The proposed video
encryption method provides secured SVC contents with the
selective encryption scheme. In order to confine the encryption
into the ROI, three encoding schemes are also proposed, which
include the constrained motion estimation, exceptional
handlings for the half-pel interpolation and Intra_BL mode in
the boundary of a ROI. Experiments were performed to verify
the proposed methods using the joint scalable video model
(JSVM), and the experimental results showed that the proposed
1
methods protect the ROI in SVC effectively .
Index TermsScalable video coding, region of interest, video
encryption

I.

INTRODUCTION

The scalable video coding (SVC) is an emerging standard


that can support spatial, temporal, and quality scalable coded
videos. And it is considered as one of the new methods of
video coding that can replace the conventional transcoding
method which re-encodes an original video to meet various
network conditions or user devices [1]. With scalable
consumption of SVC, video encryption methods are needed
for the copyright protection of the scalable video contents.
They should achieve the format compliance without
degrading the coding efficiency of video stream, and
dissipating the processing time [2].
On top of the scalability of spatial, temporal, and quality in
SVC standardization, the region of interest (ROI)
functionality, e.g., independent ROI decoding has been
standardized with the flexible macroblock ordering (FMO).
The ROI is regarded as an important and valued area. It
could be a secured region in surveillance video or detailed
region which is allocated more bit resource. It is definitely
allowed for legitimated users only. Until now, there was no
encryption method to protect the ROI in SVC. A new
encryption method is needed to secure the ROI in SVC while
keeping the scalability function of SVC.
In this paper, we propose an encryption method to protect
the defined ROI and three encoding schemes to confine the
encryption into the ROI in SVC. Experiments were
performed to verify the proposed method with the joint
scalable video model (JSVM).

A. Bitstream Structure of SVC


In SVC, encoded bitstream is composed of network
abstraction layer (NAL) units. NAL units are classified into
the video coding layer (VCL) NAL units, which are a
transport entity that contains at least a coded slice data, and
non-VCL NAL units, which contain additional information
such as parameter sets and supplemental enhancement
information (SEI) [1]. The NAL unit is the smallest unit that
can be defined with spatial, temporal, and quality scalability.
Spatial resolution, frame rate, or SNR of the bitstream could
be selected by the bitstream extraction to reduce the bit rate
of the SVC bitstream. And it is performed by removing the
corresponding NAL units from the bitstream [1]. This means
that video data should be encrypted for each NAL unit to
validate encryption after the bitstream extraction.
B. ROI Representation in SVC
The FMO forms a slice group with a set of macroblocks
[3]. Then, the ROI is defined by the map type 2 of
macroblock-to-slice-group-maps, known as foreground and
leftover, which are supported by the FMO. In Fig. 1 (a), the
foreground and leftover map groups the macroblocks
located in rectangular regions into slice groups 0 and 1; and
the macroblocks belonging to background are specified into
slice group 2. Picture parameter set (PPS) in Fig. 1 (b)
contains the geometric information of the ROI, e.g., the topleft and the bottom-right macroblock address of each slice
group, as well as the slice group id for each slice group [1].
Because each NAL unit contains one slice group and the
slice header has the first macroblock address of the slice
group, the slice group id of the ROI identifies NAL units that
belong to the ROI.

(a)

II. BITSTREAM STRUCTURE AND ROI REPRESENTATION IN


SVC
1

*Prof. Yong Man Ro is the director of Image and Video Systems Lab.,
ICU. E-mail: yro@icu.ac.kr, Tel. +82-42-866-6289

(b)
Figure 1. ROI representation in SVC: (a) foreground and leftover type,
and (b) the SVC bitstream structure defined as the ROI

III. PROPOSED APPROACH


A. Selective video encryption of an ROI in SVC
The encryption should be performed in accordance with
the scalability of SVC, which needs to satisfy three
requirements [4]. Firstly, all layers including base and
enhancement layers in SVC should be encrypted together for
robustness of video security. Therefore, all types of encoded
video data such as texture, motion vector, and the fine grain
scalability (FGS) in SVC should be encrypted. In addition,
encryption of texture, motion, and FGS data which are not a
syntax element will achieve the format compliance for SVC
file format.
Secondly, we should consider the bitstream extraction that
is mentioned in Subsection II.A. In order to avoid decrypting
and encrypting bitstream in the extraction stage, encryption
should be applied to segment by segment with the NAL unit.
In addition, conditional access has an important role to play
in a scalable multimedia. It can be used for providing
different portions of SVC contents by adopting the extraction
tool. A low-quality video of the content could be available
for free, which leads consumer to pay for a high-quality
version [5]. Therefore, conditional access control is proposed
to use different keys assigned to the NAL units which have
different scalability information in [4]. Thus, we need to
consider conditional access in the ROI encryption in SVC.
And the last requirement is that the encryption method
should be light-weighted in computational complexity. Since
encoding and decoding of SVC are heavy-weighted in
computational complexity, a light-weighted encryption
method could help video coding process to reduce an
additional complexity and realize a real-time application.
The proposed encryption method has to meet these
requirements.
Fig. 2 shows a block diagram that represents the proposed
encryption method. First, slice group id in the PPS and NAL
unit information in the NAL header can be available to
decide which NAL units are belonging to a certain ROI. In
other words, the ROI is distinguished from a picture by
identifying the slice group id. Then, VCL NAL units are
filtered from the selected NAL units. This process is
established by identifying the encoded NAL data type and
scalability information such as spatial resolution, frame rate,
and SNR quality which are described with NAL_unit_type,
dependency_id, temporal_level, and quality_level in the
NAL header [1]. After selecting the VCL NAL units, intraor inter-coded texture, motion vector difference, and FGS
data of the NAL units are transmitted to context-based
adaptive binary arithmetic coding (CABAC) module as an
entropy coder, where the data are divided into sign and
absolute values.
In the seed number generator, a seed initializing the
random number generator used as a stream cipher is
generated by using the NAL_unit_type, dependency_id,
temporal_level, and quality_level for each NAL unit. And
then a random number sequence is produced from the seed.
As the next step, XOR-operation is performed to the
generated random stream with sign of coded data such as
texture, motion vector difference, and FGS data. In addition,
encrypting sign data do not affect the original coding
efficiency resulted from encoding process which has no
encryption process because the sign data has no effect in

determining the probability of a variable length code in


CABAC [6].
The different seed depends on the NAL_unit_type,
dependency_id, temporal_level, and quality_level, and
thereby encryption based on a NAL unit which has the same
scalability is achieved. The seed can become a crucial
element to access a NAL unit to accomplish conditional
access control in SVC [4]. As a result, inverting the sign of
each data meets the third requirement as a light-weighted
encryption [2].
Finally, the seed used for random value generation is
inserted into encrypted SVC bitstream with a NAL_unit key
using conventional data encryption schemes. Here, a
precondition regarding a particular cipher is out of this work.
The encrypted seed needs to be transmitted with SVC
bitstream because the decryption process requires the seed
producing the same random stream via the same random
number generator in SVC decoder.

Figure 2. Proposed ROI encryption method in SVC encoder.

B. Selective video decryption of an ROI in SVC


The scheme of the decryption method is typically the
inverse of the proposed encryption scheme. After CABAC
process of the decoder, firstly, VCL NAL units belonging to
a certain ROI is selected using slice group id in the PPS and
NAL unit information in the NAL header. The encrypted
seed is then decrypted by using the same NAL_unit key in
the encryption procedure as mentioned in Subsection III.A.
The same random number stream can be obtained from the
seed via the pseudo-random number generator. Finally,
XOR-operation on the encrypted sign data is performed with
the generated random number stream. Thus, the original sign
data is recovered.
C. Coding schemes for Independent ROI encryption in SVC
Independent decoding of ROI substream extracted from
the original encoded bitstream and three specific coding
schemes should be considered in order to confine visual
patterns caused by the ROI encryption into the ROI region.
The coding schemes are motion estimation, exceptional
handlings for half- or quarter-pel interpolation, and Intra_BL
mode in the boundary of a ROI.
1) Constrained motion estimation
Constrained motion estimation for independent ROI
decoding is proposed in [7], which prevents inter-frame
dependency of the ROI on the background in a picture. But

there are still the inter-frame dependency of the background


remained on the ROI. Since the macroblocks of the
background area refer the regions encrypted in the ROI, the
ROI boundary violation occurs where the encryption patterns
of the ROI spread out to the background. To overcome these
problems, search ranges of the ROI and background area
should be constrained independently in each area during the
motion estimation process. The coding efficiency, however,
can be decreased as a trade-off.
2) Half-pel and quarter-pel interpolation on the ROI
boundary
SVC performs motion estimation using the motion vector
accuracy of one-quarter of a luminance sample grid spacing
displacement. A six-tap FIR filter is used for generating the
half-pel, and bilinear interpolation is then applied for
constructing the quarter-pel. Fig. 3 shows the interpolation
for the half-pel .
The luminance value of the half-pel position labeled b is
generated by applying the six-tap filter to the nearest integer
position samples in the horizontal direction [1]. If the
interpolation for the half-pel is performed near the ROI
boundary, it requires integer samples outside of the ROI. As
shown in Fig. 3, the half-pel labeled b requires integer
samples labeled E and F, which samples are located
outside of the ROI. Therefore, if only the ROI is decoded
without background, it causes a mismatch between encoding
and decoding at the interpolated half-pel [7]. To avoid this
problem, there should be an agreement for referencing the
integer sample outside of the ROI.

used for the upsampling; therefore, the referencing sample


outside the ROI occurs in the ROI boundary [7].
The approach to handle this problem is similar to that of
half-pel interpolation. To solve the problem of intra_BL
mode for the ROI boundary, preventing upsampling on the
boundary of ROI could be adapted. It may also result
decrease in the coding efficiency. In case of macroblocks in
inter-layer residual texture prediction mode, residual textures
are reconstructed by using bilinear interpolation filter and
there is no sample referred outside each macroblock. Thus,
no error occurs in inter-layer residual texture prediction [7].
IV. APPLICATION BY USING ENCRYPTED ROIS IN SVC
Fig. 4 shows the system layout of the applications adopting
the encrypted ROIs in SVC. Described ROIs can be
independently consumed in appropriate user devices such as
Digital TV, PC, PDA, cellular phone, etc. If a user demands
a high quality video in terms of spatial, temporal, and quality
scalability, SVC can support a suitable scalability of ROIs to
the user. The required ROIs are encrypted so that only the
user who holds a license to consume the protected ROIs can
enjoy the available content. But malicious attackers cannot
access the ROIs because they have no keys. In addition, the
background area except for the ROIs has also slice group id
as the lowest number. Therefore, the background area can be
encrypted as an ROI. Thus, the entire picture is encrypted
and protected by using the proposed methods.

Figure 3. Half-pel interpolation in the ROI boundary.

In the opposite case, half-pels for the background needs


the integer samples in the ROI. The half-pel, t requires
integer samples labeled G and H, which are samples
located in the ROI. After encrypting the integer samples in
the ROI as mentioned above, the encrypted integer samples
are also adapted for obtaining t by using the motion
compensation on the decoder. In this case, the noise effect is
put into the background. Therefore, we can not see only the
encrypted ROI video after decoding the SVC bitstream.
To solve these problems, the motion estimation for the
half-pel and quarter-pel accuracy should be avoided in the
ROI boundary. The coding efficiency may be decreased as
well.
3) Upsampling of Intra_BL mode on the ROI boundary
Intra_BaseLayer mode performs inter-layer intra texture
prediction for the inter-layer prediction of intra coded
macroblocks [1]. By using the intra texture of the base layer,
the encoder predicts that of the enhancement layer. When the
spatial resolution of the base layer is half that of the
enhancement layer in terms of vertical and horizontal
direction, the intra texture of the base layer should be upsampled. The interpolator for generating the half-pel is also

Figure 4. Applications by using the encrypted ROIs in SVC.

V. EXPERIMENTAL RESULTS
We implemented the proposed method in JSVM version 6.0,
and performed functional verification of ROI-independent
encryption. SVC test sequence, Foreman is used for the
experiment. For the sequence, two layer configurations
{QCIF, 15 fps}, {CIF, 30 fps}were used for encoding
sequence with one ROI. The size of ROI is 80 64 (pixel
pixel) in QCIF resolution.
Fig. 5 shows the decoded results with or without the
application of the constrained motion estimation. In Fig. 5
(a), the noise pattern of the encrypted macroblock spreads
out to the background area. However, the restricted
encryption only for the ROI is achieved by using constraint
of motion search range between the ROI and the background
as shown in Fig. 5 (b).
Fig. 6 shows the decoded pictures with or without the
application of the boundary handling for half-pel
interpolation. As shown in Fig. 6 (a), there are distinct errors
near the ROI, which appear like vertical stripes. These are

due to the half- or quarter-pels of the boundary macroblocks.


In addition, these errors are drifted with motion vector. In
Fig. 6 (b), the errors do not occur because of restricted halfand quarter-pel prediction in the ROI boundary. Hence, we
can obtain the video which has a clear ROI boundary.

(a)
(b)
Figure 5. Comparison of (a) of handling with the unconstrained motion
estimation, and (b) constrained motion estimation.

(a)
(b)
Figure 6. Comparison of handling with the restricted half-pel interpolation
and quarter-pel on the ROI boundary: (a) not applied, and (b) applied.

are encrypted respectively. And, Table I represents the


luminance PSNRs of decoded pictures shown in Fig. 8. Here,
the PSNR of Fig. 8 (e) is the lowest.
TABLE I
PSNR OF DECODED PICTURE IN FIG. 8
Encrypted Data
PSNR Y (dB)
38.3728
None (original)
22.3170
Texture
29.3865
Motion vector difference
35.067
FGS
20.2982
Texture + Motion vector difference + FGS

Table II shows the decrement of coding efficiency


according to handling three coding schemes for ROI
encryption and ROI sizes. The experimental results reveal
that the coding efficiency is inversely proportional to the
spatial resolution of ROIs. The cost of coding efficiency by
the constrained motion search range is decreased with the
highest difference and the second is to constrain the half-pel
and quarter-pel interpolation on the ROI boundary. Finally,
the decrease of coding efficiency for handling the
upsampling of the Intra_BL is the lowest amongst the coding
schemes investigated in this work.
TABLE II
THE DECREMENT OF CODING EFFICIENCY ACCORDING TO HANDLING THE
CONSTRAINED CODING SCHEMES AND VARIOUS ROI SIZES (%)
Constrained coding scheme
48x48
64x64
80x80
Motion search range
4.6
4.65
5.38
Half-pel and quarter-pel
4.5
4.53
4.23
interpolation
Upsampling of Intra_BL mode
1.88
2.44
2.69

VI. CONCLUSION

(a)
(b)
Figure 7. Comparison of handling with the restricted Intra_BL mode on the
ROI boundary: (a) not applied, and (b) applied

(a)

(b)

In this paper, we proposed an efficient encryption method for


the defined ROI and three coding schemes which are able to
restrict the visual effects through the encryption into the ROI
in SVC. For the ROI encryption, we analyzed requirements
for the secured ROI and coding schemes in SVC. The
experimental results show that the proposed method
encrypted the ROI effectively. For future work, the
consideration will be given to minimize the degradation of
coding efficiency resulted from the constrained coding
schemes.
REFERENCES

(c)
[1]
[2]

(d)
(e)
Figure 8. Visual patterns comparison of encrypted video data in terms of
texture, motion, and FGS: (a) original picture, (b) picture encrypted texture
data, (c) picture encrypted motion vector data, (d) picture encrypted FGS
data, and (e) picture encrypted combined all data.

Fig. 7 shows the decoded pictures with or without


handling the boundary for upsampling. The noises are
obviously shown outside the ROI in Fig. 7 (a). By applying
the proposed method to the ROI boundary, a result which has
no noise in the background is obtained as shown in Fig 7 (b).
Fig. 8 shows visual patterns of encrypted videos when
only texture, motion vector, FGS, and three combined factors

[3]
[4]
[5]
[6]

[7]

ISO/IEC JTC 1/SC 29/WG 11, Joint Scalable Video Model (JSVM)
6.0 Reference Encoding Algorithm Description, N 8015, April 2006,
Switzerland.
S. Lian, Z. Liu, Z. Ren, and H. Wang, Secured Advanced Video
Coding Based on Selective Encryption Algorithms, IEEE Trans. on
Consumer Electronics, pp. 621-629, vol. 52, no. 2, 2006.
T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, Overview
of the H.264/AVC video coding standard, IEEE Trans. on Circuits
Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003.
Y. G. Won, T. M. Bae, and Y. M. Ro, Scalable Protection and Access
Control in Full Scalable Video Coding, LNCS, 4283, pp. 407-421,
Nov. 2006.
M. Grangetto, E. Magli, and G. Olmo, Multimedia Selective
Encryption by Means of Randomized Arithmetic Coding, IEEE Trans.
on Multimedia, vol. 8, no. 5, pp. 905917, Oct. 2006.
D. Marpe, H. Schwarz, and T. Wiegand, Context-Based Adaptive
Binary Arithmetic Coding in H.264/AVC Video Compression
Standard, IEEE Trans. on Circuits Systems for Video Technology, Vol.
13, no. 7, pp. 620-636, July 2003.
T. M. Bae, T. C. Thang, D. Y. Kim, Y. M. Ro, J. W. Kang, and J. K.
Kim, Multiple Region-of-Interest Support in Scalable Video Coding,
ETRI Journal, vol. 28, no. 2, April 2006.

Das könnte Ihnen auch gefallen