0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
33 Ansichten4 Seiten
This document proposes a characteristics-based bandwidth reduction technique for pre-recorded lecture videos. It captures lectures using a single video camera and exploits unique characteristics of the videos, like periods of static screen content, to reduce bandwidth beyond standard compression techniques. It detects shot boundaries to identify frames where significant changes occur. It then extracts key frames and associated audio segments to create a slideshow presentation in SMIL format, reducing bandwidth to 4-13% of standard encodings while retaining lecture content and quality. This approach aims to make distance learning more practical by addressing bandwidth limitations.
This document proposes a characteristics-based bandwidth reduction technique for pre-recorded lecture videos. It captures lectures using a single video camera and exploits unique characteristics of the videos, like periods of static screen content, to reduce bandwidth beyond standard compression techniques. It detects shot boundaries to identify frames where significant changes occur. It then extracts key frames and associated audio segments to create a slideshow presentation in SMIL format, reducing bandwidth to 4-13% of standard encodings while retaining lecture content and quality. This approach aims to make distance learning more practical by addressing bandwidth limitations.
This document proposes a characteristics-based bandwidth reduction technique for pre-recorded lecture videos. It captures lectures using a single video camera and exploits unique characteristics of the videos, like periods of static screen content, to reduce bandwidth beyond standard compression techniques. It detects shot boundaries to identify frames where significant changes occur. It then extracts key frames and associated audio segments to create a slideshow presentation in SMIL format, reducing bandwidth to 4-13% of standard encodings while retaining lecture content and quality. This approach aims to make distance learning more practical by addressing bandwidth limitations.
A CHARACTERISTICS-BASED BANDWIDTH REDUCTION TECHNIQUE
FOR PRE-RECORDED VIDEOS
Wallapak Tavanapong Srikanth Krishnamohan Department of Computer Science Iowa State University Ames, IA 5001 1-1040 Email: {tavanapo,srikk} @cs.iastate.edu ABSTRACT Advances in networking infrastructures, computing capabili- ties, multimedia processing technologies, and the World Wide Web have brought about real-time delivery of multimedia content through the Intemet. While more communication bandwidth is expected in the near future, it is either costly or unavailable to a large number of users. For important applications such as distance leaming to be successful, bandwidth limitations must be taken into account. In this paper, we propose a novel approach using unique char- acteristics of lecture videos to mitigate their bandwidth requirement for distance leaming applications. The experimental results on our test videos demonstrate that when using our approach, the band- width requirement is reduced down to only about 4% of the video encoded in MPEG and about 13% of the same video in Real format. 1. INTRODUCTION Advances in networking infrastructures, computing capabilities, mul- timedia processing technologies, and the World Wide Web have made delivery of multimedia content through the Intemet a reality. Recent years have witnessed much progress of multimedia deliv- ery in electronic commerce such as the ubiquitous mp3 music and online news videos; nevertheless, incorporating video and audio in course materials for distance leaming has not been as successful due to two major problems. 1. Limited bandwidth at the users end: Despite the fact that the bandwidth of the backbone network is approaching giga- bit rates, each individual user is unlikely to enjoy this huge bandwidth since it is cost prohibitive. Thus, end users op- tions for higher bandwidth over modem rates are via tech- nologies such as ADSL and cable modem, with higher cost of ownership. For distance leaming to reach a large number of audience, such environment cannot be assumed. 2. Inadequate and complicated authoring tools: Multimedia au- thoring still remains a challenging research problem. While several tools [ 1, 2] have been developed, they require a con- siderable amount of computer-related knowledge and prepa- ration efforts making multimedia authoring a time consum- ing and labor intensive process. The first problem has been rigorously investigated as evident by developments of various compression techniques [3, 41and client- server technologies [5, 6, 71to utilize the limited bandwidth effec- 0-7803-6536-41001$10.00 (c) 2000 IEEE tively. Current solutions for the second problem involve adminis- trative supports to offer some incentives to lecturers who develop online courses. In this paper, we propose an altemative solution to address both problems. While existing solutions can produce high to very high quality multimedia presentations, they require either multiple capturing devices, changes in the lecturers teaching style, or preloading of course materials in some format such as Power- Point. In most cases, these techniques are not concemed with the bandwidth consumption of the presentation. Toreduce the amount of preparation efforts, we capture lectures using a single camera. Without losing any lecture content or degrad- ing the lecture quality, we exploit the unique characteristics of the videos to reduce the bandwidth consumption of the presentation be- yond that obtained using well known compression techniques such as MPEG or Realvideo. The output of our approach is a slide-show presentation in SMIL (Synchronized Multimedia Integration Lan- guage) [8] format. The quality of our presentation varies depending on the capability of the video camera. The unique characteristics of the lecture videos are such that a discussion over an issue pre- sented on the screen often lasts for several minutes in which there are no change in the screen content. Displaying one high quality image of the screen and streaming only the associated audio seg- ment results in the minimum bandwidth. Additional benefit of the proposed technique over advance preparations of online materials is that interactions between the lecturer and hisher audience are also captured, which can later be shared with other remote audience. The remainder of this paper is organized as follows. Section 2 presents related work to make the paper self-contained. Wedescribe our proposed technique and quantify the benefits of our approach in Section 3 and Section 4, respectively. Our concluding remarks are discussed in Section 5. 2. RELATED WORK To constrain the bandwidth requirement of a video within the capa- ble rate of the users accessing devices, video quality is often com- promised. Frequently used techniques are reductions i n frame sizes, frame rates, and pixel depths (number of bits per pixel). While these techniques may be suitable for some types of videos such as news clips, they do not fit well for lectures or seminars. Since text and pictures presented on the screen are important to the understanding of the lectures, reductions in frame sizes are not acceptable. Using lower frame rate or less number of bits per pixel do not result in the minimum bandwidth. 1751 Notable research projects developed for distance leaming are the Experience-on-Demand (EOD) project [9], the Classroom 2000 (C2K) [I O], the Comell Lecture Browser [ 111, the AutoAudito- rium [12], STREAMS [13], and the techniques by Minneman et. al. [ 141. Wepresent important features which distinguish these sys- tems from one another in the following. 0 Hardware requirements: Capturing devices range from a cou- ple of video cameras to a specialized room equipped with a variety of devices such as video cameras, electronic white- boards, and sensoring devices. 0 Capturing techniques: Lecturers can either conduct their pre- sentation normally (e.g., the Comell Lecture Browser, the EOD, STREAMS) or they are required to aid the capturing activities such as preparing their materials using Powerpoint slides or teach using electronic whiteboards (e.g., C2K, and 0 End results: The results of these systems are either presenta- tions (e.g., the Comell Lecture Browser, C2K), videos with various resolutions (e.g., STREAMS) or tape (e.g., AutoAu- ditorium). ~41). Using the same criteria, our system differs from each of these systems in at least one aspect. In summary, our proposed technique requires a single video camera, passively captures the lecture, and produces a slide-show presentation as end result. Most importantly, the proposed technique focuses on reducing bandwidth requirement for the presentation. 3. CHARACTERISTICS-BASED BANDWIDTH REDUCTION TECHNIQUE To identify video frames in which significant changes occur, we adapt a shot boundary detection technique proposed in the litera- ture. A cut is a joining point between two different shots without the insertion of any special effect. Special effects such as fades, dis- solves, or wipes are typically inserted between two cuts and result in gradual transitions. The algorithm takes a captured lecture video encoded in MPEG- 1 format as input. In the ideal case, the algorithm outputs a set of J PEG images, audio segments, and a SMIL document which defines the layout, the presentation order, and the association between im- ages and audio segments. The proposed technique consists of two phases as follows. Learningphase : This phase consists of several steps to derive color differences of consecutive frames. Video frames are extracted from the MPEG file and passed through the fol- lowing steps. 1. Channel Separation: In this step, the red components of the RGB color are extracted. Since there are none or few changes in the background during one lecture, one color component is sufficient for shot boundary detections. We note that other color models such as YUV or HSV can also be used. 2. Edge Detection: The red frame is then filtered to re- duce noise. Canny edge detection algorithm is then applied to the frame to extract edges used in the next step. 0-7803-6536-4/00/$10.00 (c) 2000 IEEE 3. 4. Cut Phase 1752 Content Region Detection: To help reduce a trigger of new images due to the lecturers habits such as the use of fingers to move the slide up and down, we do not consider the movements outside the margins of the content portion (e.g., outside the text area). Such mar- gins are determined from the edge detected frame as follows. For each horizontal line, we first identify pos- sible candidates for the left margin and the right mar- gin. For the left margin, we start from the leftmost pixel and search for the first edge pixel that seems to bepart of the first character from the left (i.e., several edge pixels following this pixel in the same horizontal line). Candidates for the right margin are detected sim- ilarly except that the search for the candidates starts from the rightmost pixel. We choose both margins from their corresponding candidates that result in the narrowest content region. Since detecting the margins as proposed is time consuming and the margins do not change often, we perform this procedure once and use them for other frames in the video. Frame Difference Calculation: We use a 2-bin his- togram to capture the distribution of the pixel values of a red frame. If the pixel value is below 128, the pixel color is approximated as black; otherwise, i t is approx- imated as white. Wedefine D( i , i+l) as a color differ- ence between frame i and i +1. Let B( i ) and W( i ) be the number of black pixels and white pixels in frame i, respectively. D(i , i +1) can be expressed mathemati- cally as D( i , i +1) = p( i ) - B( i +1)1+ IW(i) - W( i +1)l (1) Wenote that one can use more number of histogram bins if the lecture video is more colorful. : In this phase, we use the calculated frame differences to determine which images to generate. A sharp cut is usu- ally detected when the difference between two consecutive frames is significantly larger than a pre-defined threshold ( T) and much smaller frame differences immediately follow af- ter the cut point. A gradual transition is generally detected when there is a gradual increasing in frame differences fol- lowed by a gradual decreasing of frame differences. For gradual transitions, the difference between two consecutive frames is not very high; however, the accumulated difference of all frame differences between the beginning of the gradual transition and the middle of the transition is large. Since we observe from our lecture videos that movements of the lecturers exhibit two properties: (i) large frame differ- ences as found in traditional cuts and (ii) gradual changes in these differences as found in gradual transitions, we employ the following techniques to generate the output. Threshold Calculation: Let ,U and (T be the mean and the standard deviation of the frame differences, respec- tively. Weselect T to be as large as ,U +4 * U due to the property (i ). 4 * U is determined experimentally. Representative Frame Selection: Wefirst record frames whose differences are larger than T. These frames are boundary frames. Due to the gradual change property, these frames are in the middle of the changes and not suitable as representative frames. We currently choose the frame with smallest difference among its boundary frames as the representative frame. We think that tech- niques based on cross-modal relationships between au- dio and text can be used to select a better representative frame. It is possible that the boundary frames occur near each other (within one second apart) when the lecturer fre- quently moves the slides or accidentally blocks the camera view. In this case, instead of generating several images, which makes it difficult for synchronization with audio segments, we encode the boundary frames and frames in between as one MPEG file. 3. Media Synchronization: Once the representative frames are selected, they are encoded in J PEG format. The playout times of the representative frames are extracted and audio packets with the playout time between the playout times of the two consecutive representative frames are taken from the MPEG system file and re- encoded. PE G is chosen as an image file format be- cause it offers a good compression-ratio and it is ac- ceptable by most SMIL viewers. SMIL is an XML- based language being developed by W3C. SMIL spec- ification allows integration and synchronization of sev- eral media elements such as text, videos, and audio. Presentation layout and timing relationships among these elements can be specified. testvid4 testvid5 4. PERFORMANCE STUDY In this section, we discuss the experimental setup and the experi- mental results. Weassess the effectiveness of our technique using bandwidth consumption ratio (BC,) defined as follows. Bandwidth Consumption Ratio (BC,): total s i z e ( i n by t e s ) of fi l es generated r ef er ence size BC, = Reference size is the size of the original video encoded in MPEG-1 format in bytes. Wemeasure the bandwidth consumption ratio us- ing our approach (Characteristics-Based) (CB - BC,), Real format with standard quality (Real BC,), and Real format with slide show quality (Real Slide BC,). Wecompare those to Optimal BC, defined as the ratio of the size of the files selected manually to the reference size. The best technique i s the one which has the band- width consumption ratio closest to the optimal consumption ratio. The target bitrate for both Real formats was set to 28.8kbps. I I , I 1000 I 844,134 I 3 5000 I 3,572,256 I 7 4.1. Test Videos Weused five videos, each was captured from a lecture at 29.97 fps. The videos were later encoded into MPEG-1 system files using an MPEG-I hardware encoder. An MPEG demultiplexer was used to separate the video and audio components. Wepresent the results that account for the video component only. The captured frame size was 352x240 pixels. The position of the video camera was fixed throughout the entire capturing process and its focus was set to the overhead projected screen. The characteristics of the videos are presented in Table 1. It shows the number of frames in each test video and the optimal num- ber of manually selected frames. These frames represent all the con- tent of the video. To compute the reference size as shown in Table 1 , 0-7803-6536-4/00/$10.00 (c) 2000 IEEE Table 1: Characteristics of Test Videos [ Video I Frames I Referencesize I OptimalFrames I I testvidl I 5000 I 4.319.924 I 9 I I I , , testvid2 I 5000 I 4,274,344 I 3 testvid3 1 5500 I 4.782.729 I 12 I we extracted a number of consecutive frames from the MPEG sys- tem file and encoded them back using the Berkeley MPEG software encoder. Wenote that the number of frames chosen to create each test video is small since it is time consuming to manually identify the best representative images for each video. Nevertheless, these short videos are sufficient to demonstrate the performance of our technique, which can be seen shortly. 4.2. Experimental Results Table 2 demonstrates the bandwidth consumption ratios using var- ious techniques. For each video, we present the number of J PEG and MPEG files generated using our technique. It can be seen that using our technique, the consumption ratios are around 4% for all the videos except testvid4. Compared with the optimal BC,, our technique has at most four times more bandwidth while Real for- mats require at most twenty-five times more bandwidth. The band- width consumption ratios for videos in Real format are at least 21% for slide show quality and 29% for standard video quality. Though the frame size is reduced in Real video format, the Real BC, is still high compared to the optimal BC,.. In our technique, the frame sizes remain the same and there is no compromise in qual- ity. The high values of Real BC, can be attributed to noise present in the frames. The noise can be due to lecturers hand movement etc. Noise can cause large frame differences between consecutive frames though the useful content of these frames is similar. As a re- sult, the benefit of interframe coding used in standard compression techniques is reduced. Wenote that the small MPEG files obtained from our technique were encoded using the same configuration for the Berkeley MPEG encoder as in the reference video file. The distribution of frame differences calculated using Equa- tion l is shown in Figure l for testvid5. The straight line shows the threshold T used in the Cut phase. Figure 1 demonstrates the three locations of the frames which constitute the three small MPEG files (the points where clustered frames have frame differences over the threshold line). These files contain the lecturers movements to change the position of the cover paper up and down. Wefound that the content-region detection step in Section 3 has lessen the effects of finger movements outside the content region in several frames such as frame 433 to 435 and frame 654 to 727. 5. CONCLUDING REMARKS Wehave presented an altemative approach to reduce bandwidth re- quirements for lecture-like videos for distance learning applications using the unique characteristics of the lecture videos. In most cases, our technique can reduce the bandwidth requirement down to only around 4% of the video encoded in MPEG format and about 13% of the videos in Real format. The amount of savings in the opti- mal case is extremely attractive. The proposed technique requires about four times more bandwidth than the optimal case. Weare cur- 1753 . Video No. JPEG Files No. MPEG Files Optimal BCr CB BCr Real BC, testvidl 8 5 0.015 0.044 0.297 testvid2 14 3 0.013 0.039 0.311 testvid3 11 3 0.026 0.041 0.303 testvid4 2 1 0.032 0.122 0.327 testvid5 5 3 0.017 0.042 0.354 (Standard Video Quality) Figure 1: Distribution of Frame Differences for testvid5 Real-Slide BC, (Slide Show) 0.232 0.329 0.311 0.337 0.217 rently developing a new technique to further reduce the bandwidth for lecture videos and extending the idea for videos generated from sensoring devices observing real-world tasks such as those of surg- eries and scientific experiments. 6. REFERENCES R. Baecker, A. J . Rosenthal, N. Friedlander, E. Smith, and A. Cohen. A multimedia system for authroing motion pic- tures. In Proc. of ACM Multimedia!%, pages 3142, Boston, MA, November 1996. D. C. A. Bulterman, L. Hardman, J. J ansen, K. S . Mullender, and L. Rutlege. GRiNS: A Graphical INterface for Creating and Playing SMIMocuments. Computer Networks and ISDN Systems, September 1998. K. R. Rao and J. J . Hwang. Prentice-Hall PTR, Upper Saddle River, NJ, 1996. ISOlIEC J TCl/SC29/WGlI. J tcl/sc29/wgll n2995 coding of moving pictures and audios. In http://drogo. cselt. stet. it/mpeg/standards/mpeg-4/mpeg-4. htm, October 1999. W. Tavanapong, K. A. Hua, and J . 2. Wang. A framework for supporting previewing and vcr operations in a low bandwidth environment. In Proc. of ACM Multimedia97, pages 303- 312, Seattle, WA, November 1997. K. A. Hua, W. Tavanapong, and J . Z. Wang. 2psm: An effi- cient framework for searching video information in a limited bandwidth environment. ACMMultimedia Systems, 7(5):396- 408, September 1999. Realplayer. In http://www. real.com. W3C. W3c working draft: Synchronized multime- dia integration language (smil) boston specification. In http://www. w3.orgflR/smil-bostod, November 1999. Education on demand project. In http:/hvww. informedia.cs.cmu.eddeo&. G. Abowd et. al. Teaching and learning as multimedia author- ing: The classroom 2000 project. In Proc. of ACM Multime- dia96, pages 187-198, Boston, MA, November 1996. Sugata Mukhopadhyay and Brian Smith. Passive capture and structuring of lectures. In Proc. of ACMMultimedia99, pages 477487, Orlando, FL, October 1999. M. Bianchi. Autoauditorium: A fully automatic, multi- camera system to televise auditorium presentations. In Joinr DARPA/NIST Smart Spaces Workshop, Gaithersburg, MD, J uly 1998. R. Cruz and R. Hill. Capturing and playing multimedia events with streams. In Proc. of ACM Multimedia94, pages 193- 200, San Francisco, CA, October 1994. S . Minneman, S . Harrison, B. J anssen, T. Moran, G. Kurten- bach, and 1. Smith. A confederation of tools for capturing and accessing collaborative activitiy. In Proc. of ACM Multime- dia95, pages 523-534, San Francisco, CA, November 1995. 0-7803-6536-4/00/$10.00 (c) 2000 IEEE 1754