Beruflich Dokumente
Kultur Dokumente
Abstract — Most of the existing video streaming systems e.g., the quantization point (QP), to prevent the application
employ the worst case analysis in application layer buffer layer buffers from overflow or underflow.
size dimensioning. Even though the worst case buffer size The motivation for this work comes from the fact that the
dimensioning provides deterministic quality of service dynamic buffer management for video coding is not very
(QoS) guarantees that are desirable in multimedia well studied in the literature, mainly because of the variable
transmission, however, this also over-provisions the scarce sizes of video frames. The fixed group of pictures (GOP)
memory resources. In this paper, we propose a dynamic size causes the periodic inclusion of intra (I-) frames for
technique for buffer and rate allocation under two MPEG-4 video sequences, thus making the video frames
scenarios: 1) when the channel conditions are known a- variable in nature. In the case of H.264 video encoding
priori, and 2) when the channel conditions are unknown. scheme, where only a single or a very few I-frames are used
Simulation results show up to an order of magnitude to encode the video sequence [1], the generated bits per
savings in the application layer buffer requirements for the
frame are relatively identical for successive frames, unlike
two scenarios considered. Furthermore, a-priori knowledge
those for the MPEG-4 scheme. Therefore, there is a
of the channel conditions at the application layer yields an
requirement of revisiting the transcoder and decoder buffer
improved video quality.
dynamics, to get the optimized sizes of the application layer
Index Terms — Buffer dimensioning, Video streaming, buffers, under the constraints of avoiding buffer overflow
Wireless networks, Rate-control. and underflow. For example, Reference [3] employed the
transcoding ratio constraints to avoid overflow and
I. INTRODUCTION underflow of the transcoder and decoder buffers, regardless
The H.264 video format is the latest state-of-the-art of the video encoding scheme but assuming fixed buffer
international video coding standard developed by the Joint sizes. It is shown in this paper that dynamic buffer sizes,
Video Team (JVT) of ITU-T and ISO/IEC [1], which is also used in conjunction with controlling the encoding rate help
useful for video streaming purposes. For video streaming prevent buffer overflow and underflow. The advantage of
over wireless networks, high efficiency can be achieved by dynamic buffer sizes over fixed buffer sizes is the
making the mobile devices adapt to the network conditions application layer memory savings.
(e.g., the real-time channel conditions, available The main contribution of this paper is the proposal of a
transmission bandwidth, traffic load, desired spatial or technique for dynamic buffer and rate control management,
temporal resolution, delay allowance, error resilience, and studied with and without a-priori knowledge of channel
so forth). information at the application layer. Aside from the memory
The dynamic content of video frames makes the bit-rate savings, it is also shown that the a-priori knowledge of
of the encoded video variable in nature, necessitating buffer channel information at the application layer enhances the
management at the application layers of the transcoder and video quality. The problem is formulated as an optimization
decoder. To avoid long delays in real-time video streaming, problem, where the goal is to minimize the distortion
the transcoder and decoder buffer sizes are usually limited. without violating the buffer constraints. Throughout the
However, with smaller buffer sizes, there is an inherent risk paper, a packet refers to an IEEE 802.11 data-link layer
of packet-dropping at the application layer. Moreover, on protocol data unit, whereas a frame denotes a video frame at
client devices, the memory is an important contributor to the the application layer.
overall power budget [2]. Hence, an optimized solution is The paper is organized as follows. In section II, the
required at the application layer to balance the tradeoff preliminaries of the analysis are first discussed followed by
between packet-delay and packet-dropping. The key to a formal definition of the problem. Section III contains the
application layer buffer management is the rate-control proposed solution scenarios for dynamically controlling the
schemes employed. A rate control scheme determines the transcoder and decoder buffers. In Section IV, simulation
optimum encoding rate, which is used during the video results are presented and the paper concludes in section V.
compression process for adjusting the coding parameters,
A. Preliminaries
Rbt( y ) (T ) = ∫
( y −1)T
Rc (t )dt . (3)
A2. The decoder waits for D video frames in its buffer Specifically, the transcoder buffer occupancy after
before starting the decoding process. It is necessary to keep transcoding y frames is given as:
a certain minimum threshold number of frames in the yT
decoder buffer to provide a cushion against any blackout
periods, in the case of buffer underflow.
Bt ( yT ) = ∫ ( r' (h) − R (h) )dh .
0
c (5)
A3. The decoder buffer is considered empty when there are This can also be written in discrete form as:
only D frames in the buffer. The deadline time, during y
which the next frame should arrive, is assumed to meet the
criteria of maintaining the threshold of D frames in the
Bt ( yT ) = ∑ ⎡⎣ R
j =1
( j) ( j) ⎤
bg (T ) − Rbt (T ) ⎦ , (6)
r' (t ) = β (t )r (t ) , where β(t) is a scaling function. After a According to assumption A2, the decoder waits for D
video frame y is processed at the transcoder, the total frames before starting the decoding process, this
( y) corresponds to a delay of DT seconds. The initial decoder
number of bits generated Rbg (T ) at the buffer, during a buffer occupancy at t = DT is calculated by:
video frame interval time T, is calculated by: D
yT Bd ( DT ) = ∑R ( j)
bt (T ) , (9)
∫
( y)
Rbg (T ) = r' (t )dt , (2) j =1
790
decoder buffer occupancy is a function of the initial buffer optimum values for the transcoder and decoder buffer,
occupancy and accumulated bits at the decoder buffer. subject to non-occurrence of buffer underflow and overflow.
5) Channel Estimation For the decoder buffer, it is important to note that both the
As given in [4] and [5], for an IEEE 802.11 wireless underflow and overflow are critical, as the former will lead
channel, the channel information can be estimated at the to terminal screen blackout due to packet starvation, while
data-link layer using the number of transmission attempts. the latter would cause the packet-dropping eventually
Each transmission attempt at the data-link layer costs a leading to video jerks. In case of the transcoder, buffer
round-trip time (RTT), which is a measure of the delay on overflow is more critical than the underflow because
the network. Because of the RTT cost, the maximum overflow leads to packet-dropping, hence resulting in
number of transmission attempts (Rmax) is limited for time- quality loss. Conversely, transcoder buffer underflow would
sensitive applications, such as video streaming. According not cause much harm as the decoder still carries a cushion of
to [5], if the number of transmission attempts reaches Rmax, packets (assumption A2) to be displayed at the terminal.
this indicates a bad network condition. The typical Rmax The decoder buffer underflow can be avoided if:
value for IEEE 802.11 based wireless network is 4 [6]. In 0 < Bd ( yT ) . Applying (10) and, after rearranging the terms,
this paper, we introduce three thresholds L1, L2 and L3 the buffer underflow constraint becomes:
packet transmission attempts for defining the state of the y
channel. We assume the threshold L1 = 1 transmission
attempt indicates a good channel. The threshold L2 = 2
∑ ⎡⎣ R
j =1
( j) ( D+ j)
br (T ) − Rbt (T ) ⎤
⎦
< Bd ( DT ) . (13)
packet transmission attempts indicates a moderate channel The expression given in (13) is the key in finding the
condition. Finally, the threshold L3 = 3 or 4 packet threshold number of video frames (D) that the decoder must
transmission attempts denotes a bad channel, this setting is keep in the buffer to avoid any underflow. From knowledge
consistent with [4] and [5]. The channel information (i.e., of the fixed video rates and, by combining equations (9) and
good, moderate or bad channel condition) is available after a (13), the minimum number of D packets can be determined.
successful transmission of each data-link layer packet and Similarly, from (10) the decoder buffer overflow can be
this information is used for encoding the next video frame.
avoided if Bd ( yT ) ≤ Bdmax . This can also be written as:
B. The Optimization Problem y
Define a vector G, which denotes the application layer
parameters:
Bd ( DT ) + ∑ ⎡⎣ R
j =1
(D+ j)
bt
( j)
(T ) − Rbr (T ) ⎤
⎦
≤ Bdmax . (14)
{
G = Bt ( yT ), Bd ( yT ), ( y)
Rbg }.
(T ) (11) Once D is determined from equations (9) and (13), the value
of Bdmax can be found using equation (14).
y
where Bt(yT), Bd(yT) and Rbg (T ) are given by eqns. (7),
At the transcoder side, the overflow constraint is given
(10) and (2), respectively. by: Bt ( yT ) ≤ Btmax , which can also be written as:
Problem P1:
arg min {MSE( y )} , Bt ( ( y − 1)T ) + Rbg
( y)
(T ) − Rbt( y ) (T ) ≤ Btmax . (15)
(G ) From knowledge of transcoding rate, channel rate and
subject to: previous buffer occupancy conditions, the limit on the
(12)
transcoder buffer can be determined using (15).
1 0 < Bd ( yT ) ≤ Bdmax ,
B. Scenario 2: With Known Channel Information
2 0 < Bt ( yT ) ≤ Btmax .
When the channel information is known, the optimization
where MSE(y) is given by eqn. (1). According to (12), the
problem P1 is solved by controlling the transcoding rate in
goal is to find the application layer parameters vector G, for
the vector G, in conjunction with varying the buffer sizes. In
which the video distortion is minimized without violating
the following, we propose a three-step strategy to solve the
the buffer constraints.
problem P1.
III. SOLUTION SCENARIOS 1) Step 1:
Regardless of the video coding standard used, the source
The problem P1 is solved by considering two scenarios. encoding parameters (e.g., target bit-rate and quantization
A. Scenario 1:Without Knowledge of Channel point) are pre-estimated for the required bit-budget
Information allocation. For example, in the H.264 standard, an estimate
When the channel information is not known at the of bit-budget is made to distribute the available bits to each
application layer, the transcoding rate cannot be adapted to frame [1], considering the empty buffers at the application
the channel. The problem P1 is then solved to determine the layer.
791
2) Step 2: should be avoided. Also, a lower value (<0.8) of the
Find the optimal buffer sizes for which the distortion can multiplication factor for the bad channel condition would
be minimized, as given in section III.A. This sets an upper distort the video quality.
bound on the size of transcoder and decoder buffer for
which the optimization is achieved. IV. SIMULATION RESULTS
3) Step 3:
We used the JM reference software [8] to transcode the
For the given frame, after capping the transcoder and
video sequence, while NS2 [9] based simulation is used for
decoder buffer sizes to a fixed value determined in step 2,
video streaming. The first frame of the video sequence is
the new transcoding rates are calculated to ensure that the
encoded as an I-frame, while subsequent frames are P-
constraints are not violated. It is proposed here to further
frames (i.e., predicted frames). The rate-distortion (RD)
reduce the video transcoding rate if the channel condition is
optimization was enabled and context-adaptive binary
bad. This will not only help improve the loading on the
arithmetic coding (CABAC) was used for the entropy
network, but also smooth-out the transcoded video stream.
coding. The video frame rate is set to 30 frames per second.
For the moderate channel, it is proposed to use the
Without loss of generality, an IEEE 802.11b link is selected
calculated video bit-rate as is to take the full advantage of
between the access point (AP) and the client device, as
the current channel state. When the channel condition is
shown in Fig. 1. A video frame is divided into the data-link
good, the target bit-rate is increased to exploit the good
layer packets of equal sizes, such that a packet size would
channel condition for higher video quality.
not exceed 1000 bytes. A joint FEC/ARQ scheme, as
When the error correction mechanisms, e.g., joint forward
described in our previous work [7], is implemented at the
error correction (FEC) and automatic repeat request (ARQ)
data-link layer for error correction. Three different video
[7] are used for video streaming over wireless networks, the
sequences, i.e., Foreman, Container and Akiyo are selected
packet transmission information is readily available at the
each from high, medium and low motion categories,
data-link layer. In this paper, we use the cross-layer
respectively. Three test cases are considered, where in 1) no
signaling strategy to convey the transmission and hence
buffer and rate control scheme is applied, 2) buffer control
channel condition information to the application layer,
is applied when no channel information is available, and 3)
where the transcoder utilizes this information for video
buffer and rate control schemes are applied when the
transcoding. An algorithm for refining the calculated target
channel information is available. The maximum size of the
transcoding rate is given as follows:
decoder buffer (Bdmax) is set as equal to the five times the
Algorithm I: Refining the Calculated Target Transcoding
size of the largest frame in the video sequence, i.e., the I-
Rate
( y)
frame. The cushion value of 3 video frames in the decoder
Input: number of transmission attempts=L, Rbg (T ) buffer gives sufficient protection against any blackout
( y)
Output: Rbg (T ) period.
Begin A. Improvement in Video Quality
if ( L = L1 ) Peak-signal-to-noise-ratio (PSNR) is used as the measure
{ /* c h a n n e l s t a t e = G o o d * / of objective video quality. The quality of wireless channel is
R b( gy ) ( T ) = 1 .2 R b( gy ) ( T ) determined from the data-link layer packet error probability
} (µ), which is varied from 10-4 to 10-1, i.e., going from good
e ls e i f (L = L2 ) to bad channel condition. As shown in Fig. 2, when the
{ /* c h a n n e l s t a t e = M o d e r a te * / channel condition deteriorates, there is a general trend of
R b( gy ) ( T ) = R b( gy ) ( T ) lower PSNR, as expected. For all the video sequences,
} under bad channel condition (µ=10-1) a 0.5 dB to 1 dB
e ls e i f (L = L3 ) increase in PSNR is observed, when both the buffer and rate
{ /* c h a n n e l s t a t e = B a d * / control schemes are used. This is attributed to the fact that,
R b( gy ) ( T ) = 0 .8 R b( gy ) ( T ) for the bad channel condition, there is a high probability of
}
End
where L1, L2 and L3 are given in section II.A.5. Note that the
multiplication factors in Algorithm I are empirically
determined values that best suit the channel conditions. For
the good channel condition, a larger value (>1.2) of the
multiplication factor would lead to the disturbance in pre-
calculated bit-budget allocation in H.264 encoder [1], hence Figure 1: Simulation model
792
20000
No Control
10000
5000
0
Foreman Container Akiyo
Video Sequence
793