An Evaluation of Quality of Service For H.264 Over 802.11e WLANs

An Evaluation of Quality of Service for H.264 over 802.
11e WLANs
Richard MacKenzie, David Hands and Timothy OFarrell
School BT
of Electronic & Electrical Engineering, University of Leeds, Leeds, UK Innovate, British Telecommuncations PLC, Adastral Park, Ipswich, UK Institute of Advanced Telecommuncations, Swansea University, Swansea, UK Email: eenrtm@leeds.ac.uk, david.2.hands@bt.com, T.OFarrell@swansea.ac.uk
Abstract802.11 wireless local area networks are now a common feature in the home. In order to meet the quality of service (QoS) demands for the increasing number of multimedia applications on these home networks the 802.11e amendment was developed. A suitable video coding standard for these multimedia applications is H.264 due to its high compression and error resilience. In this paper we investigate how the quality of H.264 video is affected as the number of concurrent video streams sent over a multi-rate 802.11e network is increased. Several packet mapping schemes are compared. We show that the mapping schemes which differentiate video packets based on their frame type are more successful at maintaining acceptable video quality when congestion occurs, providing a more gradual quality degradation as congestion increases rather than the cliffedge quality drop that tends to occur with the other mapping schemes. These differentiated schemes are more successful for videos that do not have a high amount of temporal activity. We also identify that impairments caused by congestion tend to occur towards the bottom of each frame when the exible macroblock ordering (FMO) feature of H.264 is not used but the use of FMO can reduce this effect.
I. I NTRODUCTION A common feature in the modern home is an 802.11 [1] wireless local area network (WLAN) with an internet connection. The increase in 802.11 physical layer data rates along with the availability and affordability of high speed internet connections has led to an increase in the number of multimedia internet applications used in the home. There are a wide range of video applications now available including low resolution video such as Youtube, video conferencing, and standard denition and high denition internet protocol television (IPTV). The home network typically consists of many devices which may be requesting different services at the same time. Each service may have its own quality of service (QoS) requirements. The 802.11e amendment to the original standard was developed to meet the need to be able to provide QoS over 802.11 networks. Enhanced distributed channel access (EDCA) allows for service differentiation by having four parallel queues which can each have different priorities to access the wireless channel.
This work is funded as an industrial case scholarship agreement between British Telecommunications PLC (BT) and the Engineering and Physical Sciences Research Council (EPSRC) under BT/EPSRC case studentship agreement CT1080038286
The H.264 video coding standard [2] was developed by the Joint Video Team (JVT) which was formed by a partnership between the ITU-T Video Coding Experts Group (VCEG) and the Moving Pictures Experts Group (MPEG). This coding standard is appropriate for internet video applications due to its high coding efciency and is designed to be network friendly for applications which include video telephony, TV broadcasting and internet streaming [3]. Different packets in a video stream can be of varying importance to the decoding process, so prioritising the more important packets can help to maintain a better quality received video. Over an 802.11e network simply mapping packets into the appropriate EDCA queues can have a signicant effect on the QoS of video applications. There is a great deal of work related to providing video QoS using EDCA. In [4], [5] and [6] a variety of trafc mapping schemes have been investigated. Each scheme prioritises packets based on their slice type, slice group or partition type depending on how the video has been encoded. The priority assigned to each packet determines which EDCA queue each packet is mapped into. In all of these works the video is of CIF resolution (352x288) and is encoded at bitrates typically well below 1Mb/s. IPTV services are usually of standard denition television (SDTV) or high denition television (HDTV) resolutions with bitrates ranging upwards from 1.5Mb/s. This is reected in some of the more recent works focused on IPTV in the home such as [7], [8], [9] and [10]. In this paper we investigate how many concurrent SDTV streams can be maintained with acceptable QoS. Several packet mapping schemes are tested and compared to see which perform the best. Tests are performed over a range of physical layer rates to show how each scheme would cope with a sudden change in the physical layer rate. We identify that losses tend to occur towards the bottom of each frame if the exible macroblock ordering (FMO) feature of H.264 is not used and have results to show that FMO can reduce this effect. This work on FMO extends the work in [9] by testing a greater range of both trafc mapping schemes and FMO patterns to see which can offer the best performance. Peak signal-tonoise ratio (PSNR) has been a common way to judge video quality as in [4], [5], [6] and [7]. PSNR is, however, a poor indicator of perceptual quality [11]. In [8] the video quality estimation tool described in Annex D of the J.144 standard for
objective perceptual video quality measurement techniques has been used [12]. We are using the tool described in Annex A of the same standard. Within the standard these models are not validated for error impairments such as dropped packets. Also the various mapping schemes used in our tests produce different types of video impairments. In [10] subjective quality values have been collected for the same mapping schemes used in this paper. The correlation between those subjective scores and the objective quality scores acquired using the Annex A model is 0.91. We have mainly focussed on sending synchronised videos over EDCA because this is the most challenging scenario when trying to provide QoS while sending multiple videos over EDCA. We do, however, show how the system performance changes when the videos are not synchronised and show that the overall QoS of the system is not signicantly changed. The rest of this paper is organised as follows. An overview of the EDCA protocol and the H.264 standard are provided in sections II and III. Section IV describes the packet mapping schemes that we will be comparing. The testing procedure follows in section V. Test results are shown in section VI followed by a summary in section VII. II. EDCA P ROTOCOL The distributed coordination function (DCF), as dened in the 802.11 standard, provides contention based channel access using carrier sense multiple access with collision avoidance (CSMA/CA). The 802.11e amendment was developed to meet the need to be able to provide quality of service (QoS) over 802.11 WLANs. This amendment species the enhanced distributed channel access (EDCA) function which provides differentiated, contention-based channel access for eight user priorities (UPs). Each UP is mapped into one of four access categories (ACs). Within the 802.11 standard the description of the trafc intended for each of the four ACs are voice, video, best effort and background. These ACs are named AC VO, AC VI, AC BE and AC BK respectively. Each AC has its own queue which contends for the channel using its own EDCA function. Each EDCA function uses its own set of EDCA parameters which includes an arbitration interframe space (AIFS[AC]), a minimum and a maximum contention window value (CWmin[AC] and CWmax[AC]) and a transmission opportunity (TXOP) limit (TXOP limit[AC]). AIFS[AC], CWmin[AC] and CWmax[AC] are used in the same way as the distributed interframe space (DIFS) and the minimum and a maximum contention window values (CWmin and CWmax) are used by the DCF. The TXOP limit[AC] species the maximum duration of an EDCA functions TXOP. If TXOP limit[AC] = 0 then that EDCA function can only attempt one frame exchange each time it contends for the channel. If, however, TXOP limit[AC] > 0 then once that EDCA function has successfully contended for the channel it can attempt multiple frame exchanges, separated by short interframe spaces, without having to contend for the channel again so long as the total duration of the TXOP does not exceed the TXOP limit[AC].
III. H.264 S TANDARD The H.264 standard can be described as two distinct layers: The video coding layer (VCL) and the network abstraction layer (NAL). The VCL deals with the block based compression of video samples while the NAL puts the coded video data into a suitable and exible form for mapping onto various transport mechanisms. The VCL provides efcient compression of the video. Each frame within the video consists of one or more slices and each slice can be independently decoded provided that all required reference frames are available. Each slice will typically consist of consecutive macroblocks in raster scan order. In this case the loss of a slice can therefore result in the loss of all coded information for the entire area of the frame that the lost slice covered. Error concealment for this area can be very difcult as all coded information within this area has been lost. Flexible macroblock ordering (FMO) is an error resilience feature of H.264 which can help improve the situation of a missing slice. Each macroblock is assigned to a slice group. Each slice then contains consecutive macroblocks from within the same slice group. The pattern of the slice group is dened by the slice group map. One common slice group map type is interleaved. The value for run length denes how many consecutive macroblocks are assigned to each slice group before switching to the next slice group. Another common slice group map type is the dispersed map type. Here the slice groups are scattered. For example with two dispersed slice groups the slice group map has the appearance of a checkerboard. If a slice is lost when FMO is being used the missing macroblocks are more likely to have neighbouring macroblocks, from other correctly received slice(s), available which can provide more local information to help improve error concealment. Frames used for reference are generally considered to be more important than non-reference frames. If a reference frame contains errors then these errors can propagate into other frames that reference it. For this work all frames formed by B-slices are considered non-reference while frames formed by either I-slices or P-slices can be used for reference. I-frames (frames formed by I-slices) are considered the most important as they make no reference to other frames and are used as references for the successful decoding of any associated Pframes or B-frames. In contrast to the relative importance of frame types, B-frames tend to offer the highest compression while I-frames tend to offer the lowest compression. This results in the larger encoded frames tending to be the most important. Parameter sets, which are created separately from slice information, contain syntax elements that can apply to the decoding of many frames. A slice refers to the parameter set that it uses in its slice header. Therefore the parameter set information (PSI) must have arrived at the decoder before any slices that require it. As PSI is required to correctly decode slice data it should be considered extremely important. A NAL unit (NALU) consists of a 1B header plus a payload
of encoded video data. There are three elds within the NALU header: The forbidden zero bit, the nal ref idc (NRI) and the NALU type. The 2-bit NRI eld can be set by the H.264 encoder to represent the relative importance of each video packet to the decoding of the video sequence and a value of 00 indicates that the NALU payload is not used to reconstruct reference frames. The NALU type eld indicates what sort of information is contained in the NALU payload such as slice data or parameter sets IV. H.264 PACKET M APPING
FOR
TABLE I: QoS Mapping Schemes

Scheme Best Effort (Default) Scheme 1 Scheme 2 Scheme 3 PSI AC AC AC AC BE VI VO VO I-Slices AC AC AC AC BE VI VO VI P-Slices AC AC AC AC BE VI VI VI B-Slices AC AC AC AC BE VI BE BE
EDCA
The 2-bit NRI eld in the NALU header can be used to indicate the relative priority of each video packet. The media server must consider this value when sending the video packets and mark them in such a way that when each packet arrives at an access point with a 802.11e MAC they can be mapped into the correct AC. Over an IP network a common way to identify the packet priority is to use the DiffServ Code Point (DSCP) eld in the IP header. This has been done in [6] which is an example of a practical implementation of video packet mapping for EDCA. An alternative could be to get the 802.11e access point to read the NRI value directly from the NALU header. This would, however, be a less practical solution which is specic only to H.264 video. If a packet arrives at an 802.11e MAC and is not assigned a priority value the default is to assign the packet into AC BE which is described as a best effort service. Video streams are bursty in nature, where larger encoded frames result in larger bursts of packets. Larger frames are therefore more susceptible to losses due to queue overows and exceeding delay constraints. Unfortunately the larger frames tend to be the more important frames that are used for reference. This means that errors are likely to propagate throughout a group of pictures (GOP) as the reference frames may not be decoded perfectly. To improve the protection of more important frames over less important frames the more important frames can be placed into higher priority ACs at the 802.11e MAC. Another benet of using higher priority ACs is that they tend to transmit faster due to their favourable EDCA parameter set values resulting in a higher capacity channel. Table I shows the packet mapping schemes considered for this work. The rst scheme is the default mapping into AC BE. Scheme 1 follows the 802.11e description that video trafc should be mapped into AC VI. Scheme 2 and scheme 3
follow the reasoning that different slice types are of different levels of importance. As parameter set information (PSI) NALUs are required to correctly decode the slice data, these are mapped into the highest priority AC VO. B-slices, being the lowest importance slice types, are mapped into AC BE while the more important I-slices and P-slices are given higher priority. Scheme 2 continues the packet differentiation by assigning I-slices to the higher priority AC VO over P-slices which are mapped into AC VI. Scheme 3, on the other hand, treats both I-slices and P-slices equally by mapping them both into AC VI. Scheme 2 and scheme 3 may be referred to jointly as the differentiated mapping schemes, while scheme 1 and the default scheme can be jointly referred to as the non-differentiated mapping schemes. V. T EST P ROCEDURE In order to evaluate how many video streams can be concurrently downloaded through an 802.11e access point (AP) while maintaining an acceptable level of quality we performed simulations using the Evalvid version of the NS-2 simulator [13]. So that we could perform tests with a range of video content we selected three Video Quality Experts Group (VQEG) sequences which have a range of characteristics. These sequences are Fries, Mobile & Calendar and Rugby. Each sequence was looped three times. The original 220 frame sequences were clipped to 216 frames before being looped to ensure that each loop started with an I-frame. Each sequence was encoded at SDTV resolution (720x576) at target bitrates of 2Mb/s and 4Mb/s. The frame rate was 25fps and the group of pictures (GOP) size was 24 with an IBBPBBP GOP structure. Table II shows the characteristics of the encoded sequences. The bitrate distribution for each sequence is greatly affected by the video content. For example the Mobile & Calendar sequences which have a high amount of spatial information have large I-frames relative to the other
TABLE II: H.264 Coded Video Characteristics

Video Sequence name Target bitrate I-slice load (Mb/s) P-slice load (Mb/s) B-slice load (Mb/s) Mean packets per I-frame Mean packets per P-frame Mean packets per B-frame 0.273 1.055 0.691 25.2 14.2 4.4 Fries 2.0 Mb/s 4.0 Mb/s 0.468 2.080 1.489 42.6 27.3 8.9 Mobile & Calendar 2.0 Mb/s 0.611 1.133 0.275 55.2 15.2 2.1 4.0 Mb/s 0.915 2.405 0.717 81.7 31.4 4.5 0.197 1.031 0.791 18.2 13.8 5.0 Rugby 2.0 Mb/s 4.0 Mb/s 0.335 1.950 1.752 30.8 25.7 10.4
TABLE III: EDCA Parameters

AC AC AC AC AC VO VI BE BK CWmin (aCWmin+1)/4-1 (aCWmin+1)/2-1 aCWmin aCWmin CWmax (aCWmin+1)/2-1 aCWmin aCWmax aCWmax AIFSN 2 2 3 7 TXOP limit HR/DSSS ERP-OFDM 3.264ms 6.016ms 0 0 1.504ms 3.008ms 0 0
TABLE IV: Physical Layer Dependent Parameters

Physical layer HR/DSSS ERP-OFDM aCWmin 31 15 aCWmax 1023 1023 aSlotTime 20s 9s
sequences. The Rugby sequences on the other hand, having a high amount of temporal information, tend to have relatively large B-frames. The average NALU size of these sequences is 1270B while the maximum size is 1400B. Each NALU is encapsulated into RTP packets using the single NAL unit mode as described in [14]. The video packets are delay sensitive and so are therefore considered to be lost if they do not arrive at the decoding application of the receiving station within the delay bound. Video packets can also be dropped at the MAC layer of the AP due to either the number of retransmission attempts reaching the retry limit or due to queue overow. Each EDCA function has its own transmit queue. This is a rst in rst out (FIFO) queue. Queue overow refers to the situation where a packet attempts to join a queue that is currently full and so the packet is subsequently dropped and is therefore lost. We have used a queue size of 600 packets. For delay sensitive trafc the queue length should ideally be limited so it is only so long that all packets which join it arrive within their delay bounds. A shorter queue will drop packets that could have arrived within their delay bound. A longer queue, on the other hand, may waste bandwidth sending packets that will exceed their delay bound and can also lead to an increased delay to any packets which join the queue afterwards. Therefore the ideal queue length is dependent on the testing scenario. This complex problem is outside the scope of this paper but more information on this subject can be found in [7]. Each test uses a physical layer (PHY) that is either high rate direct sequence spread spectrum (HR/DSSS) or extended rate PHY-orthogonal frequency division multiplexing (ERPOFDM) with the EDCA parameter set values, as recommended in the 802.11 standard, which are shown in Table III. Other important physical layer dependent parameters are shown in Table IV. The HR/DSSS physical layer at 11Mb/s was used as the benchmark for our testing. For each test of n concurrent video streams the simulator was setup to transmit n streams in the download direction from the AP to the n receiving stations. An error le is generated for each video stream which shows which packets have been lost and which have arrived successfully within
the delay bound. A delay bound of 2 seconds has been used which is acceptable for linear TV broadcasts. The error le for each video is applied to the original transmitted sequences generating the n received video sequences. Each received video is then decoded. In order to maintain the 25fps frame rate in the event that all slices for a particular frame are lost the decoder will generate a repeat of the previous frame in play order. Only the second loop of the video sequences have been used to produce our reported results. Most of our tests for this paper use synchronised video streams. For each test with synchronised videos the same video content is used for all of the streams in that test. As the streams are synchronised with one another they all receive very similar treatment by the 802.11e AP and repeating the same test will give very similar results. Therefore only one run of each test is carried out with each encoded video sequence when synchronised videos are used. We perform some tests to demonstrate how system performance is different when videos are not synchronised and the content of the videos in each test is mixed. As the videos in these tests are not synchronised repeating a test can give very different results so 12 runs of each test scenario are performed. In each of these tests every video has a randomly selected start time which can occur over the rst 4 seconds of the test. The quality of each received video is objectively measured using the video quality estimation tool dened in the J.144 standard, Annex A. This is a full-reference tool. This means that each degraded sequence is compared to the original nondegraded sequence to predict the mean opinion score (MOS) of the degraded sequence. This predicted MOS (pMOS) uses the 5-grade absolute category rating (ACR) scale [15]. VI. R ESULTS Unless stated otherwise, the results in this section refer to tests with synchronised videos. The rst set of tests compare the four trafc mapping schemes as described in Table I using an 11Mb/s HR/DSSS physical layer. The average packet loss rate (PLR) for each mapping scheme is shown in Fig. 1a. When the total video load is below 6Mb/s the PLR is zero. When the load reaches 6Mb/s congestion starts to occur for the default mapping scheme resulting in a very small PLR (0.015). For the other three mapping schemes congestion starts when the total video load is above 6Mb/s. The PLR for the default scheme is usually higher than for the other mapping schemes. However, there is a steep rise in PLR for the two differentiated mapping schemes when congestion is rst encountered. The lowest PLR is always achieved by mapping scheme 1. The
0.6 0.5 0.4 Best Effort Scheme 1 Scheme 2 Scheme 3
PLR
0.3 0.2 0.1 0 2 4 6 8 10 12
Video Load (Mb/s)

(a) All slices PLR
1 0.8 Best Effort Scheme 1 Scheme 2 Scheme 3 2 4 6 8 10 12
(a) No packet losses
(b) Best Effort
PLR
0.6 0.4 0.2 0
Video Load (Mb/s)

(b) I-slice PLR
0.8 0.6
(c) Scheme 1
(d) Scheme 2/3
Fig. 2: Comparison of video streaming schemes with 2Mb/s Fries video sequence using different mapping schemes
Best Effort Scheme 1 Scheme 2 Scheme 3

2 4 6 8 10 12
0.4 0.2 0
Video Load (Mb/s)

(c) P-slice PLR
1 0.8 Best Effort Scheme 1 Scheme 2 Scheme 3
PLR
0.6 0.4 0.2 0 2 4 6 8 10 12
Video Load (Mb/s)

(d) B-slice PLR
Fig. 1: Mean PLR for each mapping scheme
average PLR for I-slices and P-slices only are shown in Fig. 1b and Fig. 1c respectively. Here we nd that the default scheme always has the highest PLR for both I-slices and Pslices, followed by mapping scheme 1. Mapping scheme 2 shows a total protection of I-slices even at the total video load of 12Mb/s. Mapping scheme 3 has a high protection of Islices, although not as high as scheme 2. Scheme 3 does on the other hand offer a higher protection of P-slices than scheme 2. Fig. 1d shows the average PLR for B-slices only. The non-differentiated mapping schemes show a relatively good protection of B-slices compared to how they protect I-slices and P-slices. The differentiated mapping schemes both show
the same behaviour in their protection of B-slices. For the differentiated mapping schemes, while the access point is not congested all B-slices are totally protected but as soon as the access point becomes congested almost every B-slice is lost. This results in most B-frames being a frame repeat which effectively means that during congestion there is a drop in the frame rate of the video when the differentiated schemes are used. These PLR results do show that although scheme 1 offers the lowest PLR overall, the non-differentiated schemes offer poorer protection to more important slices. The differentiated schemes however sacrice the less important B-slices during congestion in order to maintain a higher protection for the more important I-slices and P-slices. As shown by the PLR results each mapping scheme offers different levels of protection for each slice type. This leads to different characteristics in the decoded video. Fig. 2 compares the same frame from the Fries video sequence encoded at 2Mb/s, for the various mapping schemes. Fig. 2a shows the decoded sequence when there are no packet losses. Fig. 2b, Fig. 2c and Fig. 2d show sequences that suffer losses when the total video load is 8Mb/s for the default scheme, scheme 1 and the differentiated schemes respectively. The differentiated schemes both output identical frames in this scenario: Both have perfectly reconstructed I-frames and P-frames, while all B-slices have been lost resulting in all B-frames being frame losses thus effecting a form of temporal scaling. Therefore the frame in Fig. 2d, which was a B-frame, shows no noticeable impairments but is actually a repeat of an earlier frame. Both non-differentiated schemes do show noticeable impairments. These impairments tend to be focused towards the bottom of the frame. The reason is that no exible macroblock ordering (FMO) has been used so the macroblocks from each frame are encoded in raster scan order. The encoded slices are therefore produced and sent in scan order with each frame transmission
PLR
Best Effort Scheme 2 5
Scheme 1 Scheme 3 5
Best Effort Scheme 2
Scheme 1 Scheme 3 5
Scheme 1 Scheme 3 5
Scheme 1 Scheme 3
pMOS
pMOS
pMOS
pMOS
1 2 3 4 5 6
1 1 2 3 4 5 6
1 1 2 3
1 1 2 3
Number of Streams (a) 2Mb/s sequences
Number of Streams (b) 4Mb/s sequences
Fig. 3: Video quality results for Fries video sequence
Fig. 5: Video quality results for Rugby video sequence
Best Effort Scheme 2 5
Scheme 1 Scheme 3 5
Scheme 1 Scheme 3
1 1 2 3 4 5 6
1 1 2 3
Fig. 4: Video quality results for Mobile & Calendar video sequence
resulting in a burst of packets. Packets towards the end of each burst are more susceptible to losses due to queue overow or exceeding the delay bound. This results in more slice losses towards the bottom of each frame. This effect is more severe for larger frames which tend to be the reference frames and as a result the lossy decoded sequences tend to show lots of impairments towards the bottom of the screen when FMO is not used. This effect appears as soon as congestion is experienced when videos are sent with a non-differentiated mapping scheme while the differentiated schemes avoid this effect when congestion is rst experienced by causing B-slices to be lost in order to maintain the integrity of the reference frames. The video quality scores for the Fries, Mobile & Calendar and Rugby sequences are shown in Fig. 3, Fig. 4 and Fig. 5. For the 2Mb/s sequences the schemes 1, 2 and 3 all maintain the quality of the original encoded sequences until the number of concurrent video streams exceeds 3. The default scheme however shows a slight drop in quality for the 2Mb/s Mobile & Calendar sequence when there are just 3 streamed videos. The reason is a combination of the fact that the default scheme has the lowest channel capacity due to the EDCA parameter settings along with characteristics of this
particular sequence. This sequence has relatively high spatial content with relatively low temporal content compared to the other sequences. As can be seen from Table II this results in relatively large I-frames and small B-frames. These larger Iframes therefore result in larger bursts of packets which are more prone to losses. In general both of the non-differentiated mapping schemes show a cliff-edge drop in quality as soon as congestion is experienced. This means that as soon as the video load is high enough to cause congestion the video quality drops to below 2 on the ACR scale. The differentiated schemes show better quality performance than the non-differentiated schemes. For the Fries and Mobile & Calendar sequences the differentiated schemes avoid this cliff-edge drop in quality providing a more gradual degradation in quality. The differentiated schemes are less effective in producing this gradual degradation in quality for the Rugby video sequence. This again is due to the video characteristics. The reduction in frame rate caused by the differentiated schemes during congestion is more noticeable and therefore less acceptable for high motion content. This next set of tests looks to solve the issue identied earlier where impairments tend to appear towards the bottom of the frame during congestion. The reason that errors tended to occur towards the bottom of each frame is that exible macroblock ordering (FMO) was not used so macroblocks were encoded in, and the resulting coded slices sent in, raster scan order. Each sent frame results in a burst of packets where packets towards the end of the burst are more susceptible to losses. This effect is more severe for the non-differentiated schemes which do not protect the packet bursts caused by reference frames as well as the differentiated schemes. We now test to see if the use of FMO can reduce this effect. Video encoded without using FMO is now compared to video encoded using one of three different FMO slice group map patterns as described in Table V. The run length for the two interleaved slice group maps is equal to one frame width, resulting in interleaved rows. The encoder used only applies FMO to P-frames. Fig. 6 compares the same frame from the Fries video
pMOS
pMOS
TABLE V: FMO patterns

Number of slice groups 1 (no FMO) 2 3 4 Slice group map type N/A interleaved interleaved dispersed Run length macroblocks frame width N/A 45 45 N/A N/A 1 1 N/A
5 4
no FMO FMO - 3 interleaved slice groups
FMO - 2 interleaved slice groups FMO - dispersed

5 4
pMOS
pMOS
3 2 1 1 2 3 4 5 6
3 2 1 1 2 3
Fig. 7: Average pMOS for multiple video streams using different FMO patterns with packet mapping scheme 1
(a) no FMO (b) 2 interleaved slice groups
(c) 3 interleaved slice groups
(d) 4 dispersed slice groups
Fig. 6: Comparison of 2Mb/s Fries video sequence using different FMO patterns with mapping scheme 1
sequence encoded at 2Mb/s. The total video offered load is 8Mb/s and mapping scheme 1 has been used. The use of FMO is successful at preventing impairments from being as concentrated towards the bottom of the frame. This is done by spreading lost slice information throughout the screen. By doing this missing macroblocks are more likely to have some correctly received neighbouring macroblocks. The decoder should therefore have more local information for a better chance of successfully concealing errors in the event of losses. Fig. 7 shows the video quality for each FMO pattern. These values are averaged over all 3 video sequences when sent using mapping scheme 1. When the video load is too low to cause congestion the sequences encoded with FMO show a lower quality value than when FMO is not used. This is due to the lower encoding efciency experienced when using FMO [16]. This reduction in quality is, however, not very signicant. When congestion does occur we see that the quality is very similar whether FMO is used or not, regardless of which of the tested FMO patterns are used. Although results are only shown here using mapping scheme 1, this was the observation for all of our mapping schemes. It is important to note however that while we nd no signicant benet or loss from using FMO, these results are decoder dependent. The decoder used for all our tests is designed to be extremely robust. It uses a fast error concealment method which uses
temporal information whenever possible before reverting to spatial information. Fig. 6 shows that FMO does succeed in spreading out errors throughout the screen. This spreading of errors should mean that each lost macroblock is likely to have more local information available to help its error concealment. During congestion this should allow for a decoder with more advanced error concealment techniques to produce a higher quality output for a video sequence encoded with FMO than for a sequence encoded without FMO. So far, our results have focused on sending synchronised videos over EDCA. We now see how system performance varies when the videos are not synchronised and the video content in each individual test is mixed. We only provide results in this section for scheme 1 and scheme 3 to represent the non-differentiated and differentiated schemes respectively. PLR results are compared between synchronised and nonsynchronised video tests in Fig. 8. While the overall system PLR remains similar for both synchronised and nonsynchronised video tests, the average PLR per slice type can be quite different. Fig. 8a shows the PLR per slice type when mapping scheme 1 has been used. The synchronised videos receive better protection for B-slices at the expense of poorer protection to the more important I and P slices when compared to non-synchronised videos. This demonstrates why synchronised video tests allow us to test a worst case scenario. As, already explained, a large frame results in a large burst of packets which will be more suseptible to losses than a smaller burst. When there are several synchronised videos a large frame results in several synchronised large bursts of packets meaning that large frames have an even higher likelihood of suffering losses. Fig. 8b shows the PLR per slice type when using mapping scheme 3. Here there is very little difference in how the synchronised and non-synchronised video frames are treated. This shows that a further advantage of the differentiated mapping schemes is that the treatment of video packets are less affected by the relative timings of videos than the differentiated mapping schemes. Fig. 9 shows the video quality results for the 4Mb/s Fries video sequences from the non-synchronised video tests. Fig.
NON-SYNC I-SLICES SYNC I-SLICES
NON-SYNC P-SLICES SYNC P-SLICES
NON-SYNC B-SLICES SYNC B-SLICES
MEAN 5
MAX
MIN 5 4
MEAN
MAX
MIN
1.0 0.8
1.0
pMOS
0.8
pMOS
3 2 1 1 2 3
3 2 1 1 2 3
PLR
0.4 0.2 0.0 1 2 3
PLR
0.6
0.6 0.4 0.2 0.0 1 2 3
Number of 4Mb/s Streams Number of 4Mb/s Streams (b) Scheme 3 (a) Scheme 1
Number of 4Mb/s Streams (b) Scheme 3
Number of 4Mb/s Streams (a) Scheme 1
Fig. 8: PLR per slice type for multiple 4Mb/s videos. Synchronised (SYNC) videos compared with non-synchronised (NONSYNC) videos
Fig. 9: Video quality for 4Mb/s Fries video sequences during tests with non-synchronised videos
9a shows results when using mapping scheme 1 while Fig. 9b shows results when using mapping scheme 3. When compared to the average pMOS values for the synchronised tests in Fig. 3b we see that when the videos are not synchronised the average pMOS is greatly improved for scheme 1 but for scheme 3 there is no signicant improvement. For each test scenario in Fig. 9 we have shown the mean pMOS along with the minimum and maximum received pMOS values because there is a large range of pMOS values experienced from each test scenario. The reason being that the relative timings between videos within the same test can affect how well each video is treated. With the synchronised video tests all sent videos in a particular test receive very similar treatment so the mean pMOS value is a good measure for the QoS of the system. If there is a great variation in the quality of the videos sent in a particular test, the overall system performance is likely to be judged more by the pMOS of the poorest quality received video than the average pMOS of all of the videos in the system. If we compare the minimum pMOS values for the non-synchronised tests with the average pMOS values for the synchronised tests we see that the system performance is very similar. Another noticeable advantage of the differentiated schemes over the non-differentiated schemes is that during non-synchronised video scenarios the videos receive more even treatment when congestion is rst experienced. The tests with synchronised videos show the same loss patterns as the non-synchronised videos where the errors tend to be concentrated towards the bottom of the screen when FMO has not been used. We have already shown, with the results from our earlier tests, that sending videos through an 802.11e access point using the differentiated mapping schemes can allow for a more gradual quality degradation during congestion than using the non-differentiated schemes. Those tests effected congestion by increasing the number of concurrent video streams sent over a single rate physical layer (PHY). Over a wireless network a common cause of increased congestion can be a reduction in channel capacity due to PHY rate switching. The following
tests compare the same four mapping schemes as earlier, as described in Table I, for different ERP-OFDM physical layer rates which can range from 6Mb/s up to 54 Mb/s. In a home network it is quite possible that there could be as many as 6 linear TV connections attempted concurrently, so we have tested for 1 up to 6 concurrent (synchronised) video streams for each combination of mapping scheme and PHY rate. Fig. 10 shows the video quality for 1 up to 6 concurrent 2Mb/s sequences. One subgure is provided for each of the four mapping schemes. The quality values (pMOS) are averaged over the quality scores for the Fries, Mobile & Calendar and Rugby sequences. The physical layer rates tested range from 6Mb/s up to 18Mb/s. From 18Mb/s upwards all four mapping schemes can send at least 6 concurrent 2Mb/s videos over the network successfully without them suffering a reduction in quality. Fig. 11 shows the video quality for 1 up to 6 concurrent 4Mb/s sequences. Again, one subgure is provided for each of the four mapping schemes and the quality values are averages for the quality scores of the Fries, Mobile & Calendar and Rugby sequences. The physical layer rates tested range from 6Mb/s up to 54Mb/s. In order to successfully send at least 6 concurrent 4Mb/s videos over the network without suffering any loss in video quality the physical layer rate must be at least 48Mb/s for the default mapping scheme but only 36Mb/s for the other three mapping schemes. When comparing the results for the multiple 2Mb/s and 4Mb/s sequences there are many similar characteristics: The default best effort scheme, which has the lowest capacity, shows poorer performance than the other three mapping schemes. For each physical layer rate, the differentiated mapping schemes do show a more gradual degradation in quality than the non-differentiated schemes. Scheme 1 has the highest channel capacity as it is the only scheme that does not map any video data into the slow AC BE queue. This can allow the maximum video quality to be maintained for a slightly higher load than the differentiated schemes. This effect can, however, only be seen with the 9Mb/s physical layer rate results. In general the differentiated schemes tend to show the best performance and also allow for a more gradual quality
5 4
5 4
pMOS
3 2 1 1 2 3 4 5 6
pMOS
3 2 1 1 2 3 4 5 6
Number of 2Mb/s Streams (a) Best Effort

5 4 5 4
pMOS
3 2 1 1 2 3 4 5 6
pMOS
3 2 1 1 2 3 4 5 6
Number of 2Mb/s Streams (c) Scheme 2

6 Mb/s 9Mb/s 12Mb/s
Number of 2Mb/s Streams (d) Scheme 3

18Mb/s
Fig. 10: Average pMOS for multiple 2Mb/s streams sent using different physical layer datarates
5 4 5 4
pMOS
3 2 1 1 2 3 4 5 6
pMOS
3 2 1 1 2 3 4 5 6
Number of 4Mb/s Streams (a) Best Effort

5 4 5 4
pMOS
3 2 1 1 2 3 4 5 6
pMOS
3 2 1 1 2 3 4 5 6
Number of 4Mb/s Streams (c) Scheme 2

6 Mb/s 24Mb/s 9Mb/s 36Mb/s 12Mb/s 48Mb/s
Number of 4Mb/s Streams (d) Scheme 3

18Mb/s 54Mb/s
Fig. 11: Average pMOS for multiple 4Mb/s streams sent using different physical layer datarates
degradation than either of the non-differentiated schemes. This gradual quality degradation can be a great benet if there is a sudden drop in the physical layer rate. For example if we are streaming ve 4Mb/s videos through an access point with a physical layer rate of 36Mb/s then all mapping schemes work successfully and maintain the maximum video quality. If the physical layer rate drops to 24Mb/s all mapping schemes suddenly suffer a drop in video quality. For the default best effort scheme the quality instantly drops to below
2 on the ACR scale, the quality using scheme 1 is around 2.5, while the differentiated schemes only drop as low as around 3.5. A further drop in physical layer rate to 18Mb/s sees the quality using scheme 1 fall well below 2, while the differentiated schemes are still able to stay above 2. It is this ability to maintain a better quality during a sudden increase in congestion that highlights the benet of these differentiated mapping schemes. Both differentiated mapping schemes show similar perfor-
mance with scheme 2 being the best. A real time video service is almost certain to have an accompanying audio stream. Audio streams are typically very low data rates relative to the video stream. The recommended EDCA parameter sets were designed with the intention that real time audio streams are mapped into AC VO so that they do not have to compete with the higher rate video streams which should be mapped into AC VI. However, scheme 2 already maps I-slices into AC VO. This means that scheme 2 should experience a drop in audio quality well before scheme 3. While both schemes map PSI into AC VO this low amount of data will have little effect on audio streams. The overall quality of service for an audiovisual service is a combined effect of both the quality of the received video and the received audio [17]. So in order to maintain a robust video streaming service while trying to avoid the loss of audio information, which should be mapped onto AC VO, we would recommend the use of trafc mapping scheme 3.
R EFERENCES
[1] IEEE Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specications, Std. 802.11-2007 (Revision of IEEE Std 802.11-1999), 2007. [2] ITU-T Recommendation H.264: Advanced video coding for generic audiovisual services, ITU-T Std. H.264, 2007. [3] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, Overview of the H.264/AVC video coding standard, Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, no. 7, pp. 560576, July 2003. [4] A. Ksentini, A. Gueroui, and M. Naimi, Improving H.264 video transmission in 802.11e EDCA, in IEEE ICCCN 2005., Oct. 2005, pp. 381386. [5] U. I. Choudhry and J. Kim, Performance evaluation of H.264 mapping strategies over IEEE 802.11e WLAN for robust video streaming, Lecture Notes in Computer Science, vol. 3768, pp. 818829, 2005. [6] N. Cranley and M. Davis, Video frame differentiation for streamed multimedia over heavily loaded IEEE 802.11e WLAN using TXOP, in IEEE PIMRC 2007, Sept. 2007, pp. 15. [7] E. Shihab, L. Cai, F. Wan, A. Gulliver, and N. Tin, Wireless mesh networks for in-home IPTV distribution, Network, IEEE, vol. 22, no. 1, pp. 5257, Jan.-Feb. 2008. [8] K.-H. Lee, S. T. Trong, B.-G. Lee, and Y.-T. Kim, QoS-guaranteed IPTV service provisioning in home network with IEEE 802.11e wireless LAN, in Network Operations and Management Symposium Workshops, 2008. NOMS Workshops 2008. IEEE, April 2008, pp. 7176. [9] R. MacKenzie, D. Hands, and T. OFarrell, Effectiveness of H.264 error resilience techniques in 802.11e WLANs, in IEEE WCNC 2009, Budapest, Hungary, April 2009. [10] , QoS of video delivered over 802.11e WLANs, in IEEE ICC 2009, Dresden, Germany, June 2009. [11] Objective and Subjective Measures of MPEG Video Quality, ANSI T1A1 Contribution Number T1A1.5/96-121, 1997. [12] ITU-T Recommendation J.144: Objective Perceptual Video Quality Measurement Techniques for Digital Cable Television in the Presence of a Full Reference, ITU-T Std. J.144, 2004. [13] W.-S. H. A. Z. Chih-Heng Ke, Ce-Kuen Shieh, An evaluation framework for more realistic simulations of MPEG video transmission, Journal of Information Science and Engineering. [14] T. S.-M. W. D. S. S. Wenger, M. M. Hannuksela, RTP payload format for H.264 video, Internet proposed standard RFC 3984, Feb. 2005. [15] ITU-R Recommendation BT.500-11: Methodology for the subjective assessment of the quality of television pictures, ITU-R Std. BT500-11, 2002. [16] S. Wenger, H.264/AVC over IP, Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, no. 7, pp. 645656, July 2003. [17] J. G. Beerends and F. E. D. Caluwe, The inuence of video quality on perceived audio quality and vice versa, Journal of the Audio Engineering Society, vol. 47, no. 5, pp. 355362, May 1999.
VII. S UMMARY In this paper we have compared streaming multiple concurrent H.264 streams through an 802.11e access point for several trafc mapping schemes. We have shown that all mapping schemes used provide better performance than the default best effort service. The best performing mapping schemes differentiate packets based on their slice type and allow a more gradual drop in video quality as congestion is increased, avoiding the much steeper decline in quality that occurs with the other schemes. We have also shown how these differentiated schemes can offer a benet over the nondifferentiated schemes during a drop in physical layer rate. These differentiated schemes are less effective, although still better performing than the non-differentiated schemes, when the video content has a high amount of temporal activity. We have shown the difference between sending multiple synchronised videos with multiple non-synchronised videos. We have shown that while sending synchronised videos is a worst case scenario in terms of the average perceptual quality of a received video, the overall system QoS is not signicantly changed. We have also identied that without the use of FMO impairments caused by congestion tend be concentrated towards the bottom of the frame. By distributing errors, which are often severe, across the frame by the use of FMO the video quality remains similar for an H.264 decoder that does not have sophisticated error concealment techniques. Further investigation is needed to conrm whether or not this spreading of severe loss errors across the frame will in fact allow a decoder with advanced error concealment techniques to conceal errors more effectively than if FMO is not used. This does however highlight the benet of the differentiated mapping schemes which are a simple and effective way to improve the robustness of video streams on a congested network without the need for a sophisticated decoder.

An Evaluation of Quality of Service For H.264 Over 802.11e WLANs

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

An Evaluation of Quality of Service For H.264 Over 802.11e WLANs

Hochgeladen von

Copyright:

Verfügbare Formate

An Evaluation of Quality of Service for H.264 over 802.

TABLE I: QoS Mapping Schemes

TABLE II: H.264 Coded Video Characteristics

TABLE III: EDCA Parameters

TABLE IV: Physical Layer Dependent Parameters

0.6 0.5 0.4 Best Effort Scheme 1 Scheme 2 Scheme 3

0.3 0.2 0.1 0 2 4 6 8 10 12

Video Load (Mb/s)

(a) No packet losses

(b) Best Effort

0.6 0.4 0.2 0

Video Load (Mb/s)

(d) Scheme 2/3

Best Effort Scheme 1 Scheme 2 Scheme 3

Video Load (Mb/s)

0.6 0.4 0.2 0 2 4 6 8 10 12

Video Load (Mb/s)

Fig. 1: Mean PLR for each mapping scheme

Best Effort Scheme 2 5

Best Effort Scheme 2

Best Effort Scheme 2

Best Effort Scheme 2

Number of Streams (a) 2Mb/s sequences

Number of Streams (b) 4Mb/s sequences

Number of Streams (a) 2Mb/s sequences

Number of Streams (b) 4Mb/s sequences

Fig. 3: Video quality results for Fries video sequence

Fig. 5: Video quality results for Rugby video sequence

Best Effort Scheme 2 5

Best Effort Scheme 2

Number of Streams (a) 2Mb/s sequences

Number of Streams (b) 4Mb/s sequences

TABLE V: FMO patterns

no FMO FMO - 3 interleaved slice groups

FMO - 2 interleaved slice groups FMO - dispersed

Number of Streams (a) 2Mb/s sequences

Number of Streams (b) 4Mb/s sequences

(c) 3 interleaved slice groups

(d) 4 dispersed slice groups

NON-SYNC I-SLICES SYNC I-SLICES

NON-SYNC P-SLICES SYNC P-SLICES

NON-SYNC B-SLICES SYNC B-SLICES

0.4 0.2 0.0 1 2 3

0.6 0.4 0.2 0.0 1 2 3

Number of 4Mb/s Streams (b) Scheme 3

Number of 4Mb/s Streams (a) Scheme 1

Number of 2Mb/s Streams (a) Best Effort

Number of 2Mb/s Streams (b) Scheme 1

Number of 2Mb/s Streams (c) Scheme 2

Number of 2Mb/s Streams (d) Scheme 3

Number of 4Mb/s Streams (a) Best Effort

Number of 4Mb/s Streams (b) Scheme 1

Number of 4Mb/s Streams (c) Scheme 2

Number of 4Mb/s Streams (d) Scheme 3

Das könnte Ihnen auch gefallen