Sie sind auf Seite 1von 12

the start of the packet can never be 0. Therefore, the range is 1-32, which is biased by 1 to fit in 5 bits.

Set to 0 if the packet begins with a GOB header. QUANT Quantizer field. (5 bits) Shows the Quantizer value (MQUANT or GQUANT) in effect prior to the start of this packet. Set to 0 if the packet begins with a GOB header. HMVD It stands for Horizontal motion vector data field. (5 bits) It represents the reference horizontal motion vector data (MVD). Set to 0 if V flag is 0 or if the packet begins with a GOB header, or when the MTYPE of the last MB encoded in the previous packet was not MC. HMVD is encoded as a 2s complement number, and `10000 corresponding to the value 16 is forbidden (motion vector fields range from +/-15). VMVD It stands for Vertical motion vector data (VMVD). (5 bits) It indicates Reference vertical motion vector data (MVD). Set to 0 if V flag is 0 or if the packet begins with a GOB header, or when the MTYPE of the last MB encoded in the previous packet was not MC. VMVD is encoded as a 2s complement number, and `10000 corresponding to the value 16 is forbidden (motion vector fields range from +/-15).

G.711 G.711 is the international standard for encoding telephone audio on a 64 kbps channel. It is a pulse code modulation (PCM) scheme operating at an 8 kHz sample rate, with 8 bits per sample. According to the Nyquist theorem, which states that a signal must be sampled at twice its highest frequency component, G.711 can encode frequencies between 0 and 4 kHz. Telcos can select between two different variants of G.711: A-law and mu-law. A-law is the standard for international circuits. Each of these encoding schemes is designed in a roughly logarithmic fashion. Lower signal values are encoded using more bits; higher signal values require fewer bits. This ensures that low amplitude signals will be well represented, while maintaining enough range to encode high amplitudes. 2.3 System architecture

H.323 Videoconferencing can be used in two modes: point-to-point and multipoint. Pointto-point H.323 Videoconferencing architecture is shown in Fig 2.6. Users in a point-to-point 14

call connect to each other by using any one users IP address or alias. In order to have multiple users in a conference, multipoint Videoconferencing is used. Multipoint H.323 Videoconferencing architecture, as shown in Fig. 2.7 contains 4 primary components: a terminal, a gateway, a gatekeeper and a multipoint control unit (MCU). A terminal can be a stand-alone appliance or a PC that runs an application capable of bi-directional multimedia communications. Gateways are used to initiate calls between two disparate networks. A gatekeeper is an optional component that is deployed to provide services such as addressing, authenticating terminals, billing and managing bandwidth. When three or more participants participate in a conference, a MCU is used to multiplex and switch audio and video between the conference participants appropriately.

Connect using IP address or alias Terminal 1 Terminal 2

Fig. 2.6 Point-to-point H.323 Videoconferencing architecture

Connect via MCU, Gatekeeper (optional), Gateway (optional) MCU

Terminal 1

| | |
Terminal N

Packet Switched

Gatekeeper

Gateway

Fig. 2.7 Multipoint H.323 Videoconferencing architecture

15

2.4

Factors affecting the system

There are numerous factors that affect the performance of a H.323 Videoconferencing system. The factors can be subdivided into 3 categories: 1. Human factors 2. Device factors 3. Network factors The human factor deals with the perception of the quality of the audio/video. Though the quality judgment is a very subjective issue, there is a limit, above which the quality of the audio/video is unacceptable to any person. Human error due to negligence or lack of training can also become a performance bottleneck, which affects the quality of the audio/video. For example, a user not muting his microphone during a conference can introduce undesirable noise into the entire Videoconference. Section 2.4.1 deals with the human perception of the quality of the Videoconference in greater detail. The devices such as H.323 end-points, MCUs, routers, firewalls, Network Address Translators (NATs) and other devices such as modems, also affect the quality of the Videoconference. The extent to which a H.323 end-point affects the quality of the Videoconference depends on the codec, operating system, processor speed and memory capacity at the end-point. MCUs and routers affect the quality of the Videoconference by contributing to the network factors that are quantified in terms of the overall end-to-end delay, jitter and packet loss. Thus it is hard to clearly distinguish the device factors and network factors. Sections 2.4.2 and 2.4.3 describe the issues concerning the performance of H.323 audio/video traffic with regards to the overall end-to-end delay, jitter and packet loss. Network bandwidth also largely affects the quality of H.323 audio/video. Popular dialing speeds for making a H.323 Videoconference call are 128Kbps, 384Kbps and 768Kbps. Of these, 384Kbps has been found to be good enough to qualify a H.323 Videoconference as high quality. In the course of the work presented in this thesis, it was also verified that increasing the dialing speed to 768Kbps does not significantly increase the perceived quality of the 16

H.323 Videoconference. On the contrary, it affects the overall end-to-end delay in a multipoint scenario. Please refer to Section 3.3 of Chapter 3 for more details. In general practice, 64Kbps is assigned for audio traffic and the remaining bandwidth is used for video traffic. Also it is to be noted that, placing a call at 384Kbps actually requires 480Kbps of bandwidth to be available in the network, considering approximately 25% IP overhead [14]. Another requirement for acceptable performance is the usage of switched Ethernet instead of shared Ethernet; the Ethernet being a full duplex type and not a half duplex type. Category 6 wiring is generally used as a part of the above. End-system LAN cards generally support 10Mbps while LANS with MCUs support 100Mbps or multiple 100Mbps cards in some systems. Devices such as firewalls and NATs obstruct the H.323 protocol itself and hence are a major hindrance in the system. Sections 2.4.4 and 2.4.5 discuss the problems associated with using firewalls and NATs, respectively and describe the proposed solutions to solve the problems.

2.4.1

Human Perception

The most common experience of human interaction through a communication system is a local telephone call through the public telephone network, which has approximately 12ms one-way delay [3]. In order for a human to interact naturally, it has been observed that a delay of 0-150ms in audio results in good interactivity and a one-way delay of up to 400ms is tolerable [4]. The human visual perception is also limited. The visual information that the eye captures is about 800Mbits out of which the brain processes 1/100th of the information [5]. Hence if there were to be a reduction in the quality of the video image, a normal person would barely notice. Between audio and video, audio latency is observed to be more intolerable because it causes choppiness and breakup in the audio playback at the receiver end. Though ITU recommends a maximum of 300ms two-way latency for acceptable voice communication, it has been seen that users are willing to accept delays of 400ms or more if the solution will greatly reduce the cost. Delays above 600ms are rejected by 40% of telephone users [3]. These factors have helped to indicate redundancies in the audio and 17

video streams that are the inspiration for developing audio/video compressions technologies.

2.4.2

End-to-end delays

There are many contributing elements towards the end-to-end delay in a typical H.323 Videoconferencing system. These elements need to be identified and suitable solutions have to be developed to keep the total end-to-end delay below the desired bound. The various delays as shown in Fig. 2.8, can be categorized into three types: sender-side delays, network delays and receiver-side delays.
Sender Side Compression Delay Serialization delay Electronic delay Network Propagation Delay Processing Delay Queuing Delay Receiver Side Resynchronization Delay Decompression Delay Presentation Delay

Fig. 2.8 Various delay elements of the end-to-end delay Sender-side delay mainly involves the compression delay due to the codec, serialization delay caused at the senders side to digitize the audio and video into a serial string of bits to be sent on the communication line, and the electronic delays caused by communication equipment such as modems at the senders side. Network delay is the delay that occurs for the data units from the source to reach the destination. It includes the propagation delay of the signal, processing delay and queuing delay, in the intermediate routers and other network equipment. Routers have been shown to introduce a latency of about 1ms to 2.4ms on Fast Ethernet links. Our experiments on an isolated LAN show that, without any queuing, the router propagation delay is normally less than 2ms.

18

Receiver-side delay is caused by the resynchronization delay that involves the collection of the compressed audio/video data units by using a decoder buffer before decompression. The decoder buffer is also called as a dejitter buffer. Jitter is the variation in the delay. Jitter needs to be addressed since it influences the way the buffering and replay is done on the receiver side, which is crucial for acceptable audio/video quality. The design of dejitter buffers can be tailored to overcome the various jitter delays but every solution involves a trade off. For example, by having large sized dejitter buffers, we can avoid late packets from being dropped at the receiver, but this adds to the total delay of the system. It is common practice to use adaptive dejitter buffers that change behavior on the fly based on the network conditions, so as to limit the delay within the bounds sufficient for human interaction. The decompression process also adds to the delay on the receiver side. Presentation delay is introduced due to the data being served into a video frame buffer, which is periodically scanned by a video adapter to obtain a video trace on the video output screen. The presentation delay is on the order of 12ms to 17ms [7]. Studies have shown that the presentation delay must be limited such that the audio is not more than 20ms ahead of the video or more than 120ms behind it [6]. Presentation delay can be eliminated by suitably synchronizing the decoder output such that the video frame buffer is always maintained with the required data to obtain a video trace. For a decoder to produce such a constant output there has to be synchronization between the network and decoder, receiver network interface and sender network interface, network and encoder, encoder and capture card. On the sender and receiver sides, if we consider PCs being used for videoconferencing, we can identify many factors that could add to the delay. There could be operating system overhead due to multitasking and also transmission time through the protocol software. The processor speed of the PC also affects the performance. If a modem-based dial-up line is used at the sender and receiver ends to connect to an ISP, an additional 40ms per modem pair is introduced [9], provided that the modems compression and error correction options are turned off. The modem transmission time also affects the overall delay. If a 56kbps modem is used and 10 characters are sent over the modem, the link time would be 80/56000 bps = 1.4ms. Thus totally 41.4 ms of delay is introduced by using modems on the sender

19

and receiver sides when connected to the Internet via modems through ISPs. The local telephone connection to the ISP also adds another 8ms to 12ms, to the end-to-end delay. As a consequence of the many factors contributing to the end-to-end delay, we need solutions that can make necessary trade-offs at each step so as to achieve an overall QoS for the audio/video traffic on the network. The tradeoffs should address the computing and memory resources at the end-system, the video packet size and burstiness of video packets originating from the codecs and the crucial network parameters such as delay, jitter and packet loss which have to be kept within bounds for a user to successfully experience a H.323 Videoconference.

2.4.3

Jitter and Packet Loss

Jitter is introduced due to the internal operations of the components in the network. Queuing and buffering of the data in the network, packet rerouting, packet loss, network multiplexing and other such factors can cause jitter. Jitter can also be introduced at the enduser system, which is the source of network traffic. This jitter is called the insertion jitter that is introduced when certain packets are delayed before placing them in transmission slots because of the previous transmission being incomplete. Insertion jitter needs to be regulated, as the network tends to amplify the jitter. The packet sizes also influence the magnitude of the insertion jitter. Long packet sizes increase the overall delay due to the packet-processing overhead. This is the one of the reasons that multimedia applications have characteristically small packet sizes. To alleviate some of the sender side jitter, playback buffer devices can be used at the end points. Though a network calculus proof [8] supports the viability of the approach, results need to be verified experimentally. Appropriate scheduling of video and audio traffic could also reduce sender side jitter. The tradeoff in the scheduling is to aid audio transmission at the cost of sacrificing video bandwidth to ensure better QoS. Packet drop can be caused by the change in the inter-arrival times of the audio packets due to the intermediate router processing along the path of the packets. The packet drop value is 20

negligible or small for smaller changes in packet inter-arrival times [9]. At the receiver end, when buffers are used for reproducing the data units, buffer overflow or buffer refreshing frequency can cause packet drop. The impact of packet drop depends on the application. For a multimedia application, dropping of some important frames might be disturbing for the end user. Selective discard of packets on the receiver end can help applications to maintain their QoS to the user. The problem of regulating network parameters is that one network parameter influences the other network parameters. Sharp variations in jitter values lead to a significant increase in packet loss. Studies have shown that a change of 1% in one-way packet loss is equivalent to a change of 220ms in one-way delay [7]. This dependency of the network parameters is one of the challenges that a network engineer faces to achieve a balance on the network.

2.4.4

H.323 vs. Firewalls

A firewall is a device designed to regulate access between networks and enforce an organizations security policy [18]. It is usually a combination of hardware and software. It can exist within a router, a personal computer, a host computer, or a collection of host computers. Without a firewall, a network is exposed to inherently insecure services such as Telnet and FTP. Also, a firewall simplifies the management of network security by providing a single point of access to the network. All the traffic entering or leaving the network must pass through the firewall. The firewall examines all data and blocks data that does not meet specified security criteria. H.323 uses TCP as well as UDP during phases of call setup and during phases of call setup and during audio/video transport. TCP and UDP use so-called port numbers in their packets to identify individual connections or circuits. These port numbers are a key element of these packets used by firewalls to classify traffic so that policy may be applied. Unfortunately, H.323 uses both statically and dynamically allocated port numbers. That is, most of the data traffic generated by an H.323 call use TCP and UDP port numbers that are 21

either predetermined or are assigned during call setup. Generally, TCP ports 1718-1720 and 1731 are statically assigned for call setup and control. UDP Ports in the range of 1024-65535 are dynamically assigned for audio/video data streams. A traditional firewall, with static policy definition, cannot predict which ports will be used for each call. The result is that a standard firewall must permit all possible port numbers to pass, which leaves the network open to a variety of hacker attacks. Also, messages sent with the H.323 protocol contain embedded transport addresses, which the firewall cannot access. Hence, traditional firewalls tend to block the passage of the H.323 traffic. Many solutions have been proposed to overcome this issue [17]. A non-scaleable yet feasible solution is to allow unrestricted ports for specific, known, external IP addresses. Another solution forces the Videoconferencing clients to confine dynamic ports to a specific narrow range, which can be specified in the firewall policy. Some vendors proposed a solution where a H.323 application proxy can be used to relay H.323 calls to another H.323 endpoint. This concept is also known as software plug-boarding technique. However this approach leads to complex software and hence itself is vulnerable to attack besides becoming a performance bottleneck due to the duplication process. The best solution proposed so far uses a firewall that snoops on the H.323 call set-up channels (static ports) and opens ports for audio/video (dynamic ports) as needed.

2.4.5

H.323 vs. NATs

Network Address Translation devices (NATs) are used to translate IP addresses so that users on a private network can see the public network, but public network users cannot see the private network user [17]. Typically, on outgoing packets, a NAT device maps local private network addresses to one or more global public IP addresses. On incoming packets, the NAT device maps global IP addresses back into local IP addresses. NATs affect the performance of H.323. This can be explained by considering the following example. If an H.323 endpoint A, which is inside the network and behind a NAT, sends a 22

call setup message to another H.323 endpoint B on the outside, in the simplest case, H.323 endpoint B extracts the source IP address from the call setup message and sends a response to this address. Because the call setup message came from H.323 endpoint A behind the NAT, the source IP address is fictitious (private) and incorrect. The call setup will not succeed and hence the attempt to place a call fails. The obvious solution to overcome the above problem is to use H.323 protocol-aware NATs. These NATs maintain information about the H.323 calls originating from the private network and map the source addresses to valid IP addresses before passing the messages to the outside network and later hand over the acquired responses to the appropriate H.323 endpoints within the private network. 2.5 2.5.1 Mechanisms to enhance the Network QoS for H.323 traffic Queuing Mechanisms

Since the packet sizes of real-time traffic are generally smaller compared to other best-effort traffic and real-time traffic is mostly latency sensitive, one would believe a good queuing strategy would involve identifying and prioritizing the queues to favor real-time traffic. However, such a strict priority queue to reduce latency may be detrimental to H.323 Video traffic as it leads to burstiness [10], especially if the loads are sufficiently large. Burstiness manifests itself in the form of jitter because multiple fragments cluster in small intervals of time leading to jitter. If the variation of the jitter peaks were greater than 33ms, it would cause bad audio and video at the receiver end, which can be explained as follows: It is known that the jitter cannot be greater than the display time needed to create one frames worth of data [11]. Using this as a measure, the jitter is independent of bandwidth and dependent on the frame rate. At 30 frames per second this interval is 1/30th of a second, which is approximately 33ms. Hence the move is towards queuing schemes that perform some kind of round robin queuing that keeps jitter peaks from being greater than 33ms as well as cater to the latency demands of the real-time traffic such as H.323 audio/video.

23

Of the many queuing disciplines, weighted fair queuing (WFQ) has been proven to provide better latency guarantees that are necessary for multimedia traffic. WFQ offers priority to low-volume traffic over high-volume traffic. Before WFQ transmits a packet over an outgoing link, it examines the en-queued packets and chooses to send the packet that arrives first at the destination. Multimedia packets, which are usually bursty and small in size, move rapidly to the head of the queue, reducing the average packet latencies for real-time traffic packets without starving larger packets. Such an operation of WFQ involves considerable computational power. Though WFQ scheme provides a latency guarantee, the guarantee is inversely proportional to the bandwidth allocated to each flow and proportional to the packet length and the number of hops [7].

2.5.2

Resource Reservation

Multimedia traffic tends to be bursty and thus large resources on the network need to be reserved along the path. The ITU [1] suggests the usage of RSVP to cater the QoS requirements of H.323 video and audio streams. RSVP is a transport level signaling protocol used to reserve resources by having routers along the data path to keep a record of the packets that need special treatment on the network. By synchronizing RSVP procedures with H.323, end points can set QoS parameters for the multimedia streams in a call, along with the RSVP parameters. The end points can signal their intentions, capabilities and requirements by using a call admission control (CAC) scheme which reserves resources if available and provides best-effort service if there are no adequate resources to support the reservation. RSVP is only a signaling protocol; no actual QoS reservations are possible. However, together with appropriate reservation classes (guaranteed service or controlled load), suitable queuing mechanisms, Random Early Detection (RED) and policy-based QoS management framework [12], RSVP can deliver required QoS for a H.323 Videoconference. RSVP is also designed targeting point-to-point links and in cases where the conference participants share bandwidth on an Ethernet LAN, resource reservation mechanisms such as Subnet 24

Bandwidth Management (SBM) can be used. All the mechanisms mentioned are completely controlled from within RSVP. Therefore, H.323 endpoints willing to reserve resources on the network will have to posses the ability to do RSVP signaling. Recipient endpoints can use RSVP to make reservations for different quality of media streams suitable to their bandwidth availability and the intermediate routers can be configured to deliver the right flows to the right receiving ends. Though the above discussed queuing strategies and resource reservation mechanisms have proven to be promising solutions to enhance the QoS of H.323 audio/video [20], they are found to be non-scaleable solutions in terms of end-to-end guarantees, across network boundaries. End-to-end guarantees seem impossible in todays Internet due to factors such as the incoherence in the policies of ISPs and the system complexity involved in supporting such guarantees. Other than queuing mechanisms and resource reservation mechanisms, suitable traffic shaping mechanisms can be deployed at network boundaries to cope up with packet loss of the H.323 audio/video packets on the network and to assuage the problem of burstiness, at the cost of adding shaping delay in the network. Congestion control algorithms based on RTCP [9] or based on number of conferees in a conference [13], can also serve in reducing packet loss and provide better Network QoS to the H.323 audio/video traffic. 2.6 Megaconferences: Worlds largest H.323 Videoconferences

The Megaconferences, brainchild of Dr. Bob Dixon [19] have been conducted 3 times up till now. In October 1999 at the Internet2 Fall meeting, the Megaconference-I event was conducted which consisted of almost 50 national and international research institutions and networking organizations collectively engaged in a live demonstration of the capabilities of H.323 video conferencing. Each participating institution had an opportunity to address the conference participants, discuss their deployment of H.323, and talk about and showcase H.323 applications at their site. This event was the largest H.323 multipoint conference ever conducted, and was simultaneously broadcast on the Internet. Megaconference-II 25

Das könnte Ihnen auch gefallen