Application Layer Optimization For Efficient Video

APPLICATION LAYER OPTIMIZATION FOR EFFICIENT VIDEO
STREAMING OVER IEEE 802.11 BASED WIRELESS NETWORKS

Azfar Moid and Abraham O. Fapojuwo
Department of Electrical and Computer Engineering
University of Calgary, AB, Canada, T2N 1N4
{amoid@ucalgary.ca, fapojuwo@ucalgary.ca}
Abstract — Most of the existing video streaming systems e.g., the quantization point (QP), to prevent the application
employ the worst case analysis in application layer buffer layer buffers from overflow or underflow.
size dimensioning. Even though the worst case buffer size The motivation for this work comes from the fact that the
dimensioning provides deterministic quality of service dynamic buffer management for video coding is not very
(QoS) guarantees that are desirable in multimedia well studied in the literature, mainly because of the variable
transmission, however, this also over-provisions the scarce sizes of video frames. The fixed group of pictures (GOP)
memory resources. In this paper, we propose a dynamic size causes the periodic inclusion of intra (I-) frames for
technique for buffer and rate allocation under two MPEG-4 video sequences, thus making the video frames
scenarios: 1) when the channel conditions are known a- variable in nature. In the case of H.264 video encoding
priori, and 2) when the channel conditions are unknown. scheme, where only a single or a very few I-frames are used
Simulation results show up to an order of magnitude to encode the video sequence [1], the generated bits per
savings in the application layer buffer requirements for the
frame are relatively identical for successive frames, unlike
two scenarios considered. Furthermore, a-priori knowledge
those for the MPEG-4 scheme. Therefore, there is a
of the channel conditions at the application layer yields an
requirement of revisiting the transcoder and decoder buffer
improved video quality.
dynamics, to get the optimized sizes of the application layer
Index Terms — Buffer dimensioning, Video streaming, buffers, under the constraints of avoiding buffer overflow
Wireless networks, Rate-control. and underflow. For example, Reference [3] employed the
transcoding ratio constraints to avoid overflow and
I. INTRODUCTION underflow of the transcoder and decoder buffers, regardless
The H.264 video format is the latest state-of-the-art of the video encoding scheme but assuming fixed buffer
international video coding standard developed by the Joint sizes. It is shown in this paper that dynamic buffer sizes,
Video Team (JVT) of ITU-T and ISO/IEC [1], which is also used in conjunction with controlling the encoding rate help
useful for video streaming purposes. For video streaming prevent buffer overflow and underflow. The advantage of
over wireless networks, high efficiency can be achieved by dynamic buffer sizes over fixed buffer sizes is the
making the mobile devices adapt to the network conditions application layer memory savings.
(e.g., the real-time channel conditions, available The main contribution of this paper is the proposal of a
transmission bandwidth, traffic load, desired spatial or technique for dynamic buffer and rate control management,
temporal resolution, delay allowance, error resilience, and studied with and without a-priori knowledge of channel
so forth). information at the application layer. Aside from the memory
The dynamic content of video frames makes the bit-rate savings, it is also shown that the a-priori knowledge of
of the encoded video variable in nature, necessitating buffer channel information at the application layer enhances the
management at the application layers of the transcoder and video quality. The problem is formulated as an optimization
decoder. To avoid long delays in real-time video streaming, problem, where the goal is to minimize the distortion
the transcoder and decoder buffer sizes are usually limited. without violating the buffer constraints. Throughout the
However, with smaller buffer sizes, there is an inherent risk paper, a packet refers to an IEEE 802.11 data-link layer
of packet-dropping at the application layer. Moreover, on protocol data unit, whereas a frame denotes a video frame at
client devices, the memory is an important contributor to the the application layer.
overall power budget [2]. Hence, an optimized solution is The paper is organized as follows. In section II, the
required at the application layer to balance the tradeoff preliminaries of the analysis are first discussed followed by
between packet-delay and packet-dropping. The key to a formal definition of the problem. Section III contains the
application layer buffer management is the rate-control proposed solution scenarios for dynamically controlling the
schemes employed. A rate control scheme determines the transcoder and decoder buffers. In Section IV, simulation
optimum encoding rate, which is used during the video results are presented and the paper concludes in section V.
compression process for adjusting the coding parameters,
978-1-4244-3508-1/09/$25.00 ©2009 IEEE 789

II. PROBLEM FORMULATION yT
A. Preliminaries
Rbt( y ) (T ) = ∫
( y −1)T
Rc (t )dt . (3)
1) Model Assumptions By assumption A4, the instantaneous transcoder buffer

The system model assumptions are as follows: occupancy at any time t can be calculated as:
A1. The maximum size of transcoder and decoder buffers t
is limited and denoted by Btmax and Bdmax (in bits), Bt (t ) = ∫ ( r' (h) − Rc (h) )dh . (4)
respectively. 0
A2. The decoder waits for D video frames in its buffer Specifically, the transcoder buffer occupancy after
before starting the decoding process. It is necessary to keep transcoding y frames is given as:
a certain minimum threshold number of frames in the yT
decoder buffer to provide a cushion against any blackout
periods, in the case of buffer underflow.
Bt ( yT ) = ∫ ( r' (h) − R (h) )dh .
0
c (5)
A3. The decoder buffer is considered empty when there are This can also be written in discrete form as:
only D frames in the buffer. The deadline time, during y
which the next frame should arrive, is assumed to meet the
criteria of maintaining the threshold of D frames in the
Bt ( yT ) = ∑ ⎡⎣ R
j =1
( j) ( j) ⎤
bg (T ) − Rbt (T ) ⎦ , (6)
decoder buffer. where j is the frame index.

A4. Transcoder and decoder buffers are empty at the The expression in (6) shows that the buffer occupancy
startup time t=0, i.e., Bt(t=0) = 0 and Bd(t=0) = 0, after transcoding the yth frame is just the summation of all
respectively. Here, Bt(t) and Bd(t) are respectively the the accumulated bits during the interval 0 to yT at the
transcoder and decoder buffer size at time t. transcoder buffer. Equation (6) can also be written in a
2) Video Distortion recursive manner, where the current buffer occupancy after
Video distortion is a measure of the pixel quality of the transcoding the yth frame is written in the form of buffer
received video as compared to the transmitted video. For a occupancy after transcoding the (y-1)th frame.
given frame y, usually it is estimated as the mean-square
error (MSE) value of the difference between pixel value y −1
( f ) of the transmitted frame and pixel value ( f ) of the Bt ( yT ) = ∑ ⎡⎣ R
j =1
( j) ( j) ⎤ ⎡ ( y) ( y) ⎤
bg (T ) − Rbt (T ) ⎦ + ⎣ Rbg (T ) − Rbt (T ) ⎦ ,
(7)
received frame, as given in (1):
N SL N MB N PX
= Bt ( ( y − 1)T ) + ⎡ Rbg
( y)
(T ) − Rbt( y ) (T ) ⎤ .
∑∑ ∑ E ⎡⎢⎣{ f } ⎤⎥⎦ .
2
MSE( y ) = ( y)
− fz(,ys ,)g (1) ⎣ ⎦
z,s, g
z =1 s =1 g =1 4) Decoder Buffer
In (1), NSL is the number of slices in frame y, NMB is the Let r''(t) denote the rate (in bits/sec) of rendering the
number of macro-blocks in a slice, and NPX represents the video sequence to the user terminal. The number of bits
( y)
number of pixels in a macro-block. rendered Rbr (T ) to the video terminal during the interval
3) Transcoder Buffer ( y − 1)T to yT, is given as:
Let r(t) denote the incoming video bit-rate (in bits/sec) at
yT
the transcoder input, r'(t) denotes the bit-rate (bits/sec) of
∫
( y)
the transcoded video, and Rc(t) is the channel bit-rate Rbr (T ) = r'' (t )dt . (8)
(bits/sec). The transcoded video bit-rate can be written as: ( y −1)T
r' (t ) = β (t )r (t ) , where β(t) is a scaling function. After a According to assumption A2, the decoder waits for D
video frame y is processed at the transcoder, the total frames before starting the decoding process, this
( y) corresponds to a delay of DT seconds. The initial decoder
number of bits generated Rbg (T ) at the buffer, during a buffer occupancy at t = DT is calculated by:
video frame interval time T, is calculated by: D
yT Bd ( DT ) = ∑R ( j)
bt (T ) , (9)
∫
( y)
Rbg (T ) = r' (t )dt , (2) j =1
( y −1)T In general, the decoder buffer occupancy after decoding the

yth frame is given by:
y
where y is the video frame index and T is the frame inter-
arrival time. Bd ( yT ) = Bd ( DT ) + ∑ ⎡⎣ R
j =1
(D+ j)
bt
( j)
(T ) − Rbr (T ) ⎤
⎦
. (10)
Similarly, the transmitted bits Rbt( y ) (T ) from the
The expression given in (10) shows that the instantaneous
transcoder buffer, during the interval (y-1)T to yT, is:
790
decoder buffer occupancy is a function of the initial buffer optimum values for the transcoder and decoder buffer,
occupancy and accumulated bits at the decoder buffer. subject to non-occurrence of buffer underflow and overflow.
5) Channel Estimation For the decoder buffer, it is important to note that both the
As given in [4] and [5], for an IEEE 802.11 wireless underflow and overflow are critical, as the former will lead
channel, the channel information can be estimated at the to terminal screen blackout due to packet starvation, while
data-link layer using the number of transmission attempts. the latter would cause the packet-dropping eventually
Each transmission attempt at the data-link layer costs a leading to video jerks. In case of the transcoder, buffer
round-trip time (RTT), which is a measure of the delay on overflow is more critical than the underflow because
the network. Because of the RTT cost, the maximum overflow leads to packet-dropping, hence resulting in
number of transmission attempts (Rmax) is limited for time- quality loss. Conversely, transcoder buffer underflow would
sensitive applications, such as video streaming. According not cause much harm as the decoder still carries a cushion of
to [5], if the number of transmission attempts reaches Rmax, packets (assumption A2) to be displayed at the terminal.
this indicates a bad network condition. The typical Rmax The decoder buffer underflow can be avoided if:
value for IEEE 802.11 based wireless network is 4 [6]. In 0 < Bd ( yT ) . Applying (10) and, after rearranging the terms,
this paper, we introduce three thresholds L1, L2 and L3 the buffer underflow constraint becomes:
packet transmission attempts for defining the state of the y
channel. We assume the threshold L1 = 1 transmission
attempt indicates a good channel. The threshold L2 = 2
∑ ⎡⎣ R
j =1
( j) ( D+ j)
br (T ) − Rbt (T ) ⎤
⎦
< Bd ( DT ) . (13)
packet transmission attempts indicates a moderate channel The expression given in (13) is the key in finding the
condition. Finally, the threshold L3 = 3 or 4 packet threshold number of video frames (D) that the decoder must
transmission attempts denotes a bad channel, this setting is keep in the buffer to avoid any underflow. From knowledge
consistent with [4] and [5]. The channel information (i.e., of the fixed video rates and, by combining equations (9) and
good, moderate or bad channel condition) is available after a (13), the minimum number of D packets can be determined.
successful transmission of each data-link layer packet and Similarly, from (10) the decoder buffer overflow can be
this information is used for encoding the next video frame.
avoided if Bd ( yT ) ≤ Bdmax . This can also be written as:
B. The Optimization Problem y
Define a vector G, which denotes the application layer
parameters:
Bd ( DT ) + ∑ ⎡⎣ R
j =1
(D+ j)
bt
( j)
(T ) − Rbr (T ) ⎤
⎦
≤ Bdmax . (14)
{
G = Bt ( yT ), Bd ( yT ), ( y)
Rbg }.
(T ) (11) Once D is determined from equations (9) and (13), the value
of Bdmax can be found using equation (14).
y
where Bt(yT), Bd(yT) and Rbg (T ) are given by eqns. (7),
At the transcoder side, the overflow constraint is given
(10) and (2), respectively. by: Bt ( yT ) ≤ Btmax , which can also be written as:
Problem P1:
arg min {MSE( y )} , Bt ( ( y − 1)T ) + Rbg
( y)
(T ) − Rbt( y ) (T ) ≤ Btmax . (15)
(G ) From knowledge of transcoding rate, channel rate and
subject to: previous buffer occupancy conditions, the limit on the
(12)
transcoder buffer can be determined using (15).
1 0 < Bd ( yT ) ≤ Bdmax ,
B. Scenario 2: With Known Channel Information
2 0 < Bt ( yT ) ≤ Btmax .
When the channel information is known, the optimization
where MSE(y) is given by eqn. (1). According to (12), the
problem P1 is solved by controlling the transcoding rate in
goal is to find the application layer parameters vector G, for
the vector G, in conjunction with varying the buffer sizes. In
which the video distortion is minimized without violating
the following, we propose a three-step strategy to solve the
the buffer constraints.
problem P1.
III. SOLUTION SCENARIOS 1) Step 1:
Regardless of the video coding standard used, the source
The problem P1 is solved by considering two scenarios. encoding parameters (e.g., target bit-rate and quantization
A. Scenario 1:Without Knowledge of Channel point) are pre-estimated for the required bit-budget
Information allocation. For example, in the H.264 standard, an estimate
When the channel information is not known at the of bit-budget is made to distribute the available bits to each
application layer, the transcoding rate cannot be adapted to frame [1], considering the empty buffers at the application
the channel. The problem P1 is then solved to determine the layer.
791
2) Step 2: should be avoided. Also, a lower value (<0.8) of the
Find the optimal buffer sizes for which the distortion can multiplication factor for the bad channel condition would
be minimized, as given in section III.A. This sets an upper distort the video quality.
bound on the size of transcoder and decoder buffer for
which the optimization is achieved. IV. SIMULATION RESULTS
3) Step 3:
We used the JM reference software [8] to transcode the
For the given frame, after capping the transcoder and
video sequence, while NS2 [9] based simulation is used for
decoder buffer sizes to a fixed value determined in step 2,
video streaming. The first frame of the video sequence is
the new transcoding rates are calculated to ensure that the
encoded as an I-frame, while subsequent frames are P-
constraints are not violated. It is proposed here to further
frames (i.e., predicted frames). The rate-distortion (RD)
reduce the video transcoding rate if the channel condition is
optimization was enabled and context-adaptive binary
bad. This will not only help improve the loading on the
arithmetic coding (CABAC) was used for the entropy
network, but also smooth-out the transcoded video stream.
coding. The video frame rate is set to 30 frames per second.
For the moderate channel, it is proposed to use the
Without loss of generality, an IEEE 802.11b link is selected
calculated video bit-rate as is to take the full advantage of
between the access point (AP) and the client device, as
the current channel state. When the channel condition is
shown in Fig. 1. A video frame is divided into the data-link
good, the target bit-rate is increased to exploit the good
layer packets of equal sizes, such that a packet size would
channel condition for higher video quality.
not exceed 1000 bytes. A joint FEC/ARQ scheme, as
When the error correction mechanisms, e.g., joint forward
described in our previous work [7], is implemented at the
error correction (FEC) and automatic repeat request (ARQ)
data-link layer for error correction. Three different video
[7] are used for video streaming over wireless networks, the
sequences, i.e., Foreman, Container and Akiyo are selected
packet transmission information is readily available at the
each from high, medium and low motion categories,
data-link layer. In this paper, we use the cross-layer
respectively. Three test cases are considered, where in 1) no
signaling strategy to convey the transmission and hence
buffer and rate control scheme is applied, 2) buffer control
channel condition information to the application layer,
is applied when no channel information is available, and 3)
where the transcoder utilizes this information for video
buffer and rate control schemes are applied when the
transcoding. An algorithm for refining the calculated target
channel information is available. The maximum size of the
transcoding rate is given as follows:
decoder buffer (Bdmax) is set as equal to the five times the
Algorithm I: Refining the Calculated Target Transcoding
size of the largest frame in the video sequence, i.e., the I-
Rate
( y)
frame. The cushion value of 3 video frames in the decoder
Input: number of transmission attempts=L, Rbg (T ) buffer gives sufficient protection against any blackout
( y)
Output: Rbg (T ) period.
Begin A. Improvement in Video Quality
if ( L = L1 ) Peak-signal-to-noise-ratio (PSNR) is used as the measure
{ /* c h a n n e l s t a t e = G o o d * / of objective video quality. The quality of wireless channel is
R b( gy ) ( T ) = 1 .2 R b( gy ) ( T ) determined from the data-link layer packet error probability
} (µ), which is varied from 10-4 to 10-1, i.e., going from good
e ls e i f (L = L2 ) to bad channel condition. As shown in Fig. 2, when the
{ /* c h a n n e l s t a t e = M o d e r a te * / channel condition deteriorates, there is a general trend of
R b( gy ) ( T ) = R b( gy ) ( T ) lower PSNR, as expected. For all the video sequences,
} under bad channel condition (µ=10-1) a 0.5 dB to 1 dB
e ls e i f (L = L3 ) increase in PSNR is observed, when both the buffer and rate
{ /* c h a n n e l s t a t e = B a d * / control schemes are used. This is attributed to the fact that,
R b( gy ) ( T ) = 0 .8 R b( gy ) ( T ) for the bad channel condition, there is a high probability of
}
End
where L1, L2 and L3 are given in section II.A.5. Note that the
multiplication factors in Algorithm I are empirically
determined values that best suit the channel conditions. For
the good channel condition, a larger value (>1.2) of the
multiplication factor would lead to the disturbance in pre-
calculated bit-budget allocation in H.264 encoder [1], hence Figure 1: Simulation model
792
20000
No Control
Decoder Buffer Size (bits)

Buffer Control (w/o channel info)
15000
Buffer+Rate Control (w/ channel info)
10000
5000
0
Foreman Container Akiyo
Video Sequence
Figure 3: Comparison of decoder buffer sizes

Figure 2: Comparison of the average PSNR values information is available at the application layer, the video
packet-dropping at the data-link layer, but the rate reduction quality improves by up to 1 dB, which translates to a better
viewing experience. Additional saving of the application
mechanism proposed in this paper reduces the video bit-rate,
thereby lowering the probability of packet-dropping and layer buffer by approximately an order of magnitude is also
achieved, thereby decreasing the memory requirement.
hence an increase in PSNR. For the good channel condition
(i.e., µ=10-4), a slight increase of about 0.1 dB to 0.2 dB can
be seen in all the three video sequences, when rate control ACKNOWLEDGMENT
mechanism is used, due to the increase in video encoding The authors acknowledge the support of the University of
bit-rate as given by Algorithm I. Calgary, TRLabs and National Sciences and Engineering
Research Council (NSERC) Canada for this research.
B. Dynamic Buffer Stabilization
From Fig. 3, it is seen that, under moderate channel REFERENCES
condition (µ=10-2), the average buffer requirement drops by [1] ITU-T and ISO/IEC JTC1, “Advanced video coding for generic
almost an order of magnitude when both buffer and buffer audiovisual services,” ITU-T Recommendation H.264 – ISO/IEC
plus rate control schemes are employed. The reduction in 14496 AVC, 2003.
[2] M. Yokotsuka, “Memory motivates cell-phone growth,” Wireless
buffer size for the adaptive schemes is due to the fact that Systems Design, vol. 9, no. 3, 2004, pp. 27–30.
buffer sizes are now calculated in real-time for each video [3] Z. Lei and N. D. Georganas, “Adaptive video transcoding and
frame instead of being at a fixed value, as is the case when streaming over wireless channels,” Journal of System and Software,
March 2004, pp. 253 – 270.
there is no control. Comparing only the two adaptive [4] M. van der Schaar and D. S. Turaga, “Cross-layer packetization and
schemes, there is a small increase of about 20 bits for the retransmission strategies for delay-sensitive wireless multimedia
case where the buffer plus rate control scheme is employed. transmission,” IEEE Transactions on Multimedia, vol.9, no.1, Jan.
This is attributed to the fact that in the case of availability of 2007, pp.185-197.
[5] J. Lee and M. Kang, “Design of a dynamic bandwidth reallocation
channel information, the transcoder gets another chance of scheme for hot-spot video stream transmission over the IEEE 802.11
increasing or decreasing the encoding rate, based on good or WLAN,” 2006 TENCON, IEEE Region 10 Conference, Nov. 2006.
bad channel conditions, respectively. Under bad channel [6] V. Sgardoni, P. Ferre, A. Doufexi, A. Nix and D. Bull, “Frame delay
and loss analysis for video transmission over time-correlated
condition, the encoding rate drops thus giving less number 802.11a/g channels,” IEEE 18th International Symposium on
of bits per frame, however the packet error probability Personal, Indoor and Mobile Radio Communications, PIMRC 2007,
increases, hence negating the effect of lower bit-rate for the 3-7 Sept. 2007.
[7] A. Moid and A. O. Fapojuwo, “An analytical model for optimum
decoder buffer. Under good channel condition, the increase byte-level and packet-level FEC assignment using buffer dynamics,”
in encoding bit-rate translates to a higher buffer Research Letters in Communications, Article ID 546184, 2008.
requirement, but this is not a very significant increase when [8] JM Reference Software, Available at:
http://iphome.hhi.de/suehring/tml, Accessed on June 20, 2008.
compared to the fixed buffer case. [9] Network Simulator (NS2), Available at: http://www.isi.edu/nsnam/ns,
Accessed on May 03, 2007.
V. CONCLUSION
In this paper, we have presented a technique for
dynamically optimizing the application layer parameters and
compared it against the case where no such scheme is
implemented. It is shown in this paper that when the channel
793

Application Layer Optimization For Efficient Video

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Application Layer Optimization For Efficient Video

Hochgeladen von

Copyright:

Verfügbare Formate

APPLICATION LAYER OPTIMIZATION FOR EFFICIENT VIDEO

STREAMING OVER IEEE 802.11 BASED WIRELESS NETWORKS

978-1-4244-3508-1/09/$25.00 ©2009 IEEE 789

1) Model Assumptions By assumption A4, the instantaneous transcoder buffer

decoder buffer. where j is the frame index.

( y −1)T In general, the decoder buffer occupancy after decoding the

Decoder Buffer Size (bits)

Figure 3: Comparison of decoder buffer sizes

Das könnte Ihnen auch gefallen