Sie sind auf Seite 1von 182

ISBN:9789899729209 Title:EuroITV2011AdjunctProceedings Editors:Damsio,ManuelJos;Cardoso,Gustavo;Quico,Clia;Geerts,David Date:20110401 Publisher:COFAC/UniversidadeLusfonadeHumanidadeseTecnologias

PREFACE
DearEuroITV2011participants, It is with great pleasure that, on behalf of Universidade Lusfona de Humanidades e Tecnologias/ CICANT Center for Research in Applied Communications, Culture and NewTechnologiesandLINILisbonInternetandNetworksInstitute,wewelcomeyou toourinstitutionsandthecityofLisbon.Ourinstitutionsassociatedthemselvestothis eventconvincedoftherelevanceofitforreflectionanddevelopmentofthestateof theartarounddigitaltelevisionandassociatedapplications.Theevolutionoftelevision andmediahasbeenacentraltopicofresearchandtrainingatourinstitutions,anditis our belief that an organization like the EuroITV, with the high scientific quality that characterizesit,isanundeniablecontributiontotheadvancementofknowledgeinthis fieldandtheshapingofacommunitariandynamicaroundthistopic. EuroITVisincreasinglycompetitiveintermsofpapersubmissionandacceptancerate. Thisyear,wereceivedapproximately200papersubmissionsrelevantforthemainand adjunct proceedings of this conference from 24 countries. Through peer reviewing processbyprogramcommitteeandexperts,weacceptedfortheadjunctproceedings 6demos,8doctoralconsortiumproposals,8posters,12industrialcasestudiespapers and24workshoppapers.Webelievetohavecreatedarichincontentandhighquality technicalprogramspanningthreefulldays,coveringfourdifferenttracks,representing theactivitiesinTVresearcharea.PresentationsincludekeynotespeechesbyJonathan Taplin from University of Southern California "Long Time Coming; Has Interactive TV Finally Arrived?, Fernando Pereira from Instituto Superior Tcnico "Visual Compression:theFoundationalTechnologyforBetterTVExperiences"andAlKovalick fromAVID"TheMediaCloudanditsFuture".

We would like to thank the authors for providing the content of the program and all members of the organizing committee for their dedication to the success of EuroITV2011 and timely review of the submissions. We would like to acknowledge AVID, Caixa Geral de Depsitos, Fundao Gulbenkian, Fundao para a Cincia e
3

Tecnologia, Instituto de Cinema e Audiovisual, Gabinete para os Meios de ComunicaoSocial,FundaoLusoAmericana,AbreuandACMfortheirsponsorships andsupport.Wealsothankourmediapartners,RTPandtheInternationalJournalof Digital Television. Finally, we are grateful to ourcolleagues at Universidade Lusfona and LINI who worked in the organization of this event. It is therefore with great pleasure that we collaborate with EuroITV and wish you a fruitful event that is not depletedinthesedaysoftheconferencebutthatopensupspacesandopportunities fornewcollaborationsandscientificdialogues. GustavoCardoso EUROITV11GeneralChair LINI,PT DavidGeerts EUROITV11ProgramChair K.U.Leuven,BE

ManuelJosDamsio EUROITV11GeneralChair UniversidadeLusfona,PT CliaQuico EUROITV11ProgramChair UniversidadeLusfona,PT

TABLEOFCONTENTS
PREFACE .............................................................................................................................................. 3 TABLEOFCONTENTS ........................................................................................................................... 5 Keynotes ............................................................................................................................................. 8 Demos ................................................................................................................................................12 NScreenLiveBaseballGameWatchingSystem:NovelInteractionConceptswithinaPublicSetting..13 Extractionof ContextualWeb Informationfrom TVVideo ..........................................................15 AutomaticMeasurementofPlayoutDifferencesforSocialTV,InteractiveTV,GamingandInter destinationSynchronization ............................................................................................................18 UserInterfaceToolkitforUbiquitousTV..........................................................................................20 Demo:usingspeechrecognitionforinsituvideotagging .................................................................22 Valueaddedservicesandidentificationsystem:anapproachtoelderlyviewers ............................... 24 DoctoralConsortium ..........................................................................................................................26 Researchfor Developmentof Value AddedServicesforConnectedTV...................................27 Collaborationin BroadcastMedia and Content .........................................................................31 TelevisualLeisureExperiencesofDifferentGenerationsofBasqueSpeakers ....................................35 MobileTV:TowardsaTheoryforMobileTelevision .........................................................................39 EnhancingandEvaluatingtheUserExperienceofInteractiveTVSystemsandtheirInteraction Techniques .....................................................................................................................................43 SubjectiveQualityAssessmentofFreeViewpointVideoObjects ......................................................47 AllocationAlgorithmsforInteractiveTVAdvertisements .................................................................51 Videoaccessandinteractionbasedonemotions ............................................................................55 Posters...............................................................................................................................................59 OnlineiTVusebyolderpeople:preliminaryfindingsofarapidethnographicalstudy........................60 MultipleyeConcurrentInformationDeliveryon PublicDisplays ............................................64 OlderAdultsandDigitalInteractiveTelevision:UseofaWiiController ............................................68 PredictingWhere,WhenandWhatPeopleWillWatchonTVBasedontheirPastViewingHistory ....72 UnusualCoProduction:OnlineCoCreationinCrossMediaFormatDevelopment ............................76 TrendyEpisodeDetectionataVeryShortTimeGranularityforIntelligentVODService:ACaseStudy ofLiveBaseballGame .....................................................................................................................80

SpatialTilingAndStreaminginanImmersiveMediaDeliveryNetwork .............................................84 Inclusionof multipledevices, languages andadvertisementsin iFanzy, a personalizedEPG .......................................................................................................................................................88 ITVinIndustry ....................................................................................................................................92 ConvergenceofTelevisedContentandGame ..................................................................................93 heckle.atTV ....................................................................................................................................94 Lettheaudiencedirect ....................................................................................................................95 Rend ezVous .....................................................................................................................................96 Settop,overthetop,future!...........................................................................................................97 SmartTVandhowtodoitright.......................................................................................................99 InnovatingUsability ......................................................................................................................100 Theubiquitousremotecontrol. .....................................................................................................101 ContentplusInteractivityas aKeyDifferentiator...........................................................................102 ITVstrategy:TheuseofDirectandIndirectCommunicationasastrategyforthecreationof interactivescripts..........................................................................................................................103 AmbientMediaEcosystemsforTVAforecast2013......................................................................105 Tutorials ...........................................................................................................................................106 DesigningandEvaluatingSocialVideoandTelevision.....................................................................107 HowtoinvestigatetheQualityofUserExperienceforUbiquitousTV? ..........................................109 DeployingSocialTV:Content,Connectivity,andCommunication ...................................................111 Workshops .......................................................................................................................................113 Workshop1:QualityofExperienceforMultimediaContentSharing:UbiquitousQoEAssessment andSupport................................................................................................................................114 QualityofExperienceofMultimediaServices:Past,Present,andFuture...................................... 115 InternetTVArchitecture.................................................................................................................. 120 BasedonScalableVideoCoding ...................................................................................................... 120 Adaptivetestingforvideoqualityassessment ............................................................................... 128 Aligningsubjectivetestsusingalowcostscommonset................................................................. 132 ImpactofReducedQualityEncodingonObjectIdentificationinStereoscopicVideo ................... 136 Impact of DisturbanceLocationson Video QualityofExperience ..................................... 140 Workshop2:FutureTV2011:MakingTelevisionPersonal&Social ............................................144

AnalysisoftheInformationValueofUserConnectionsforVideoRecommendationsinaSocial Network .......................................................................................................................................... 145 EmployingUserAssignedTagstoProvidePersonalizedaswellasCollaborativeTV Recommendations .......................................................................................................................... 145 SocialandInteractiveTV:AnOutsideInApproach .................................................... 145 ........................... 145 145

AnalyzingTwitterforSocialTV:SentimentExtractionforSports

OurTV:CreatingMediaContentsCommunitiesthroughRealWorldInteractions.....

ITVservicesforsocializinginpublicplaces ..................................................................................... 145 Ubeel:GeneratingLocalNarrativesforPublicDisplaysfromTaggedandAnnotatedVideoContent ........................................................................................................................................................ 145 HybridalgorithmsforrecommendingnewitemsinpersonalTV ............................ 145

MiningKnowledgeTV:AProposalforDataIntegrationintheKnowledgeTVEnvironment ......... 145 Workshop3:InteractiveDigitalTVinEmergentEconomies.......................................................146 GEmPTV:GingaNCLEmulatorforPortableDigitalTV ................................................................... 147 BusinessProcessModelinginUMLforInteractiveDigitalTelevision............................................. 151 Guidelinesforthecontentproductionoftlearning ...................................................................... 155 EvaluationofaninteractiveTVservicetoreinforcedentalhygieneeducationinchildren ........... 159 ExperiencesinDesigningandImplementinganExtensionAPItoConvergeiDTVandHomeGateway ........................................................................................................................................................ 163 AnApproachforContentPersonalizationofContextSensitiveInteractiveTVApplications ......... 169 AFrameworkArchitectureforDigitalGamestotheBrazilianDigitalTelevision............................ 171 EuroITV2011OrganizingCommittee ..........................................................................................176

Keynotes

JonathanTaplin (AnnenbergInnovationLabUniversityofSouthernCalifrnia)
Jonathan Taplin is a Professor at the Annenberg School for Communication at the University of Southern California. Taplin is the Managing Director of the USC Annenberg Innovation Lab (http://www.annenberglab.com/) and also blogs at

http://jontaplin.com/,aboutwhichCoryDoctorowofBoing,Boing said,"Taplin'sblogisaseclecticasheis,astraightupanalysisblog thatripsintotheheadlines,illuminatingeverythingfromeconomic newstothewriters'striketoheavyweathertodemocraticpolitics. Taplin'sareasofspecializationareininternationalcommunicationmanagementandthefield of digital media entertainment. Taplin began his entertainment career in 1969 as Tour ManagerforBobDylanandTheBand.In1973heproducedMartinScorsese'sfirstfeaturefilm, MeanStreetswhichwasselectedfortheCannesFilmFestival.Between1974and1996,Taplin produced 26 hours of television documentaries (including The Prize and Cadillac Desert for PBS)and12featurefilmsincludingTheLastWaltz,UntilTheEndoftheWorld,UnderFireand ToDieFor.HisfilmswerenominatedforOscarandGoldenGlobeawardsandchosenforThe CannesFilmFestivalseventimes. In1984TaplinactedastheinvestmentadvisortotheBassBrothersintheirsuccessfulattempt to save Walt Disney Studios from a corporate raid. This experience brought him to Merrill Lynch, where he served as vice president of media mergers and acquisitions. In this role, he helped reengineer the media landscape on transactions such as the leveraged buyout of Viacom.TaplinwasafounderofIntertainerandhasservedasitsChairmanandCEOsinceJune 1996. Intertainer was the pioneer videoondemand company for both cable and broadband Internetmarkets.Taplinholdstwopatentsforvideoondemandtechnologies.ProfessorTaplin has provided consulting services on Broadband technology to the President of Portugal and theParliamentoftheSpanishstateofCatalonia.InMayof2010hewasappointedManaging DirectoroftheAnnenbergInnovationLab. Mr. Taplin graduated from Princeton University. He is a member of the Academy Of Motion PictureArtsandSciencesandsitsontheInternationalAdvisoryBoardoftheSingaporeMedia Authority and the Board of Directors of Public Knowledge. Mr. Taplin was appointed by GovernorArnoldSchwarzeneggertotheCaliforniaBroadbandTaskForceinJanuaryof2007. 9

FernandoPereira (InstitutoSuperiorTcnico,Lisboa/Portugal)
Fernando Pereira is currently with the Electrical and Computers Engineering Department of Instituto Superior Tcnico and with Instituto de Telecomunicaes, Lisbon, Portugal(http://www.img.lx.it.pt/~fp/). HeisresponsiblefortheparticipationofISTinmanynational and international research projects. He acts often as project evaluatorandauditorforvariousorganizations. HeisanAreaEditoroftheSignalProcessing:ImageCommunicationJournal,amemberofthe Editorial Board of the Signal Processing Magazine, and is or has been an Associate Editor of IEEE Transactions of Circuits and Systems for Video Technology, IEEE Transactions on Image Processing,IEEETransactionsonMultimedia,andIEEESignalProcessingMagazine.Heisorhas been a member of the IEEE Signal Processing Society Technical Committees on Image, Video and Multidimensional Signal Processing, and Multimedia Signal Processing, and of the IEEE Circuits and Systems Society Technical Committees on Visual Signal Processing and Communications, and Multimedia Systems and Applications. He was an IEEE Distinguished Lecturerin2005andelectedasanIEEEFellowin2008. He is/has been a member of the Scientific and Program Committees of many international conferences. He has been the General Chair of the Picture Coding Symposium (PCS) in 2007 andtheTechnicalProgramCoChairoftheInt.ConferenceonImageProcessing(ICIP)in2010. He has been participating in the MPEG standardization activities, notably as the head of the Portuguesedelegation,chairmanoftheMPEGRequirementsGroup,andchairmanofmanyAd HocGroupsrelatedtotheMPEG4andMPEG7standards.HeisacoeditorofTheMPEG4 BookandTheMPEG21Bookwhicharereferencebooksintheirtopics. He won the first Portuguese IBM Scientific Award in 1990, an ISO award for Outstanding Technical Contribution for his contributions to the MPEG4 Visual Standard in 1998 and an HonourMentionoftheUTL/SantanderTottaAwardforElectrotechnicalEngineeringin2009. He has contributed more than 200 papers in international journals, conferences and workshops,andmadeseveraltensofinvitedtalksatconferencesandworkshops.Hisareasof interest are video analysis, coding, description and adaptation, and advanced multimedia services. 10

AlKovalick (AVID,U.S.A.)
Al Kovalick has worked in the field of hybrid AV/IT systems for the past 18 years. Previously, he was a digital systems designer and technical strategist for HewlettPackard. Following HP, from 1999 to 2004, he was the CTO of the Broadcast Products Division at Pinnacle Systems. Currently, he is with Avid and serves as an Enterprise Strategist and Fellow. Alisanactivespeaker,educator,authorandparticipantwith industry bodies including SMPTE and AMWA. He has presentedover50papersatindustryconferencesworldwide andholds18USandforeignpatents.Heistheauthorofthe book Video Systems in an IT Environment; The Basics of Networked Media and FileBased Workflows (2nd edition, 2009). Al was awarded the SMPTE David Sarnoff Gold Medal in 2009. He has a BSEE degree from San Jose State University and MSEE degree from the UniversityofCaliforniaatBerkeley. HeisalifememberofTauBetaPi,IEEEmemberandaSMPTEFellow.

11

Demos

12

N-Screen Live Baseball Game Watching System: Novel Interaction Concepts within a Public Setting
Hogun Park, Geun Young Lee, Dongmahn Seo, Sun-Bum Youn, Suhyun Kim, Heedong Ko
Imaging Media Research Center, Korea Institute of Science and Technology (KIST), Seoul, Korea {hogun, gylee, sarum, dmonkey, suhyun.kim, ko}@imrc.kist.re.kr

ABSTRACT
Recently, as social media has taken place on an interactive TV domain, many researches have attempted to provide better emotional engagement and satisfaction. However, their approaches are still limited in utilizing many types of screens and supporting their social collaboration within a public setting. For example, when people are watching TV together, TV is not a suitable place to have a personal chat, and mobile phone is too small to support every sharing activity. In order to provide seamless social experience across any connected screens, in this paper, we present an N-Screen-based collaborative baseball watching system. It provides user engagement interfaces within a public setting and novel N-Screen interaction concepts.

this system, a public display like TV constitutes a novel global communication medium. It connects all surrounding displays to enable ubiquitous and cooperative watching. To address it, we have the following contributions: (1) User engagement interfaces within a public setting (2) New N-Screen interaction concepts and their implementation for baseball game watching. For evaluation, we implemented a live baseball game watching system on an android set-top box, an android mobile phone application, and a PC. In this paper, we present an overview of our proposed system (Section 2) and details of N-Screen-based watching system interfaces (Section 3).

Categories and Subject Descriptors


H.5.1 [Information Interfaces and Presentation]: Multimedia Information SystemsVideo; H.5.3 [Information Interfaces and Presentation]: Group and Organization Interfaces

2. SYSTEM OVERVIEW

General Terms
Design

Keywords
Social TV, N-Screen, Live Sports Game Figure 1. Overall Framework of Proposed System

1. INTRODUCTION
As social media research has taken place on an interactive TV domain, many researches have tried to provide more socialized experience on TV. However, most existing social TV platforms aim to connect viewers with their friends and families by providing a virtual shared space [1][2]. Even if their communication is more socialized around TV contents, they are still limited in utilizing many types of screens and supporting their social collaboration. In particular, a number of communities and corresponding individual viewpoints are derived from live events such as a baseball game and a musical performance. Depending on the interest of co-viewers, some parts of contents would be worthwhile to share, but others are not. To facilitate their social watching activity, it is necessary to provide seamless experience across screens of participants within a public setting. In this paper, we present a N-Screen collaborative sports watching system. In

The system framework of our proposed approach is illustrated in Figure 1. First, every user needs to register their personal displays like mobile phones to one of the nearest public displays. The public display serves as a medium to help group-based watching activities, and any users can initiate some sharing activities through the public display. In this system, it is newly introduced that one of participants can become a leader, so-called a media jockey (Section 3.2.2,) who can act as a media director and a producer. In our engagement interface, the media jockey takes a proactive role to organize all information and provide intermediate response to a live event and co-viewers. In other words, a group of users including a media jockey and viewers creates own social watching communities, and they collaboratively organizenew broadcasting stream. The community can be a group of friends/supporters and not necessarily placed on the same place. Our P2P streaming system supports to transmit/receive multiple live video streams, and media jockey plays a leading role of making broadcasting channel by organizing and selecting some of them. In our demonstration, a TV tuner, 3 HD cameras, and 3 spherical cameras were installed at a baseball stadium, and realtime streaming which guarantees synchronized [5] and low media zapping time [3] (<0.5 sec) was accomplished. Furthermore, micro blogs which correspond to a live event are analyzed to

13

extract summary keywords, and it generates highlight video clips by associating timespans of keywords with recorded videos. On TV, keyword summary and available video clips are displayed on the bottom of screen when new issue occurs.

3.2.2 Media Jockey


One of community members can take a role of media jockey. In this system, the media jockey mediates interaction among a group, recommends angles to be interested, and organizes their public displays. In other words, he can monitor a list of emerging videos and all social activities among a group, so he can become a contents programmer for generating a new channel. A concept of media jockey is to encourage proactive participation and provide better awareness of co-viewers in N-Screen environment.

3. INTERFACE OF N-SCREEN LIVE BASEBALL WATCHING SYSTEM 3.1 Public display

Figure 2. Public Display Interfaces In a public display, it serves as a ubiquitous media medium to help their watching within a public setting. It provides supporting clues to communicate and understand emerging issues of a game and activities of members. The key design criterion is for users not to be distracted too much frequently. In particular, to provide highlight video and keywords, we designed an algorithm [4] to measure an appropriate interaction time by detecting a bursty period with keywords. There are several steps to go through. Firstly, social media feeds corresponding to a live baseball game are segmented into a fixed group of time spans for extracting the most representative keywords. To get a keyword, it makes use of a parameter-free bursty keyword detection algorithm. Later, it associates time spans having bursty keywords with recorded video, and notifies viewers on the screen, as in Figure 2. The figure also shows screen-dumps of their ranking of predictionbased game, keywords summary, and video sharing (in this case, a panorama video.)

Figure 4. Media Jockey Interfaces

4. DISCUSSION AND CONCLUSION


This paper proposed a N-Screen live baseball game watching system, in order to provide seamless social experience across any connected platform. It supports public and personal display interfaces with corresponding dynamic resource and interaction management. For evaluating the system, it demonstrated 2010 championship series of the Korea baseball league. It provides novel experience on a live baseball game and given information and media are organized as a more intuitive way. In future work, more immersive temporal and spatial metaphor will be incorporated in order to organize all information and media. We believe that virtual world or spherical video can be a good initial point to develop.

3.2 Personal Display


3.2.1 Mobile Phone

5. ACKNOWLEDGMENTS
This research is supported by Korea Institute of Science and Technology under "Development of Tangible Web Platform" project.

6. REFERENCES
[1] Mate, S. and Curcio , D. D. I. 2010. Mobile and interactive social television. IEEE Communications Magazine, Vol. 47, No. 12, 116-122. [2] Nathan, M., Harrison, C., Yarosh, S., Terveen, L., Stead, L., and Amento, B. 2008. CollaboraTV: making television viewing social again. In Proc. of UXTV 2008, 85-94. [3] Joo, H., Song, H., Lee, D.B., Lee, I., 2008. An Effective IPTV Channel Control Algorithm Considering Channel Zapping Time and Network Utilization, IEEE Transactions on Broadcasting, Vol. 54, No. 2, 208-216. Figure 3. Mobile Phone Interfaces Mobile phone allows participants to watch a live game and compete against their friends using given polling and voting interfaces. As in figure 3, users can suggest a vote topic, and we can see the result graphically. Participants can also request a different video angle or navigate the live panorama video using our multi-touch interface. This kind of activities is shared on public display, and they can feel more engaged and immersive experience. [4] Park, H., Youn, S.B., Lee, G.Y., Ko, H. 2011. Trendy Episode Detection at a Very Short Time Granularity for Intelligent VOD Service: A Case Study of Live Baseball Game. In Proc. of EuroITV 2011. [5] Seo, D., Kim, S., Song, G. 2010. SyncStream: Synchronized Media Streaming System in a Peer-to-Peer Environment. In Proc. of HumanCom 2010, 1-5

14

Extraction of Contextual Web Information from TV Video


Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services t.chattopadhyay@tcs.com

T. Chattopadhyay

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services aniruddha.sinha@tcs.com

Aniruddha Sinha

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services avik.ghose@tcs.org

Avik Ghose

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services provat.biswas@tcs.org

Provat Biswas

Innovation lab, Kolkata Saltlake Electronics Complex Tata Consultancy Services arpan.pal@tcs.org

Arpan Pal

ABSTRACT
This demo is extracting contextual web Information from TV Video and render it to the same TV or any second screen like iPad or iPhone. The objective of this demo is to present an emulation that depicts the functionalities of the STB and TV in a PC based environment. The Connected TV can be described as an Internet enabled TV, via either a set-top- box or some other technology. Recent market trends on consumer electronics show that consumers are demanding for Internet connected TV largely as the sale for such product raises to $1 billion in the second quarter of 2009 compared to $776 million in the first quarter of the same year. We have a Home Infotainment Platform (HIP) that supports Internet and TV simultaneously. The input to the HIP is analog video, either through composite (CVBS) or RF cable. The output of HIP is composite video where the input TV signal is blended with graphics and internet. In this demo we have proposed a novel system for connected TV that can recognize the context of TV news videos and the related information from internet can be obtained from internet or RSS feeds. That information can be pushed to a second screen like any handheld mobile device like iPhone. We can demonstrate the system by recognizing the context of a recorded video on a laptop and push those as a text stream to an iPad or an iPhone adjacent to it.

1.

HIGH-LEVEL DESIGN OF THE PROPOSED TOOL


Analog TV signal with video and audio Extraction of breaking news from the news videos Finding the keywords from the breaking news Browser based application to fetch information related to the keywords from internet or RSS feeds Blended display module for mashing up the internet information with the TV video or push it as a text stream to iPad or iPhone

The proposed system consists of the following items:

2.

This demo can send relevant information to a second screen that may be of interest of the person watching the TV In India as per report from TRAI, still 93% TV users are using analog RF cable feed. The proposed system is compatible to this input, also. The proposed tool is an end to end system that can bridge the gap between technology and market of TV web mashup with contextual information It can localize the News area, breaking news automatically

MERITS

General Terms
Demo

Keywords
Demo, OCR, Video OCR, Keyword spotting

3.

NOVELTY

The existing connected TV services are more or less restricted to movie purchase and rental, TV show purchase, access to You Tube, Music Services and media on home network. Moreover, the internet mashups are primarily based on predefined set of RSS feeds. The proposed solution enhances the user experience by deriving the context of the viewed channel and then fetching the related information to provide an ubiquitous experience of mashups.

15

6.

TECHNICAL REQUIREMENT

The following systems are required to show the demo: Laptop: To store the recorded video and process the video Wireless internet connectivity to search internet to fetch related information iPad or iPhone as second screen to show the related information

7.

DEMO SCENARIO

The demo scenario is like below: Laptop opens the stored TV content and extracts the contextual information from the news video using the above mentioned approach. It represent the context in XML format User launches the application (app) on the second screen when there is an intent to have augmented information on the broadcast content The app requests for the content (in XML format) for the connected TV over either an ad-hoc Wi-fi connection or a blue-tooth pairing. The gadget then connects to the Internet and requests augmented information based on the class of context detected. Extracted keywords are used as inputs to web 2.0 search and news API provided by web-site owners like Google, Yahoo etc. This augmented information is then displayed on the second screen with real-time context. The apps have been implemented on iPhone, iPad and Android tablet (Galaxy Tab).

Figure 1: Overview of the Technical steps involved

4.

APPLICATIONS
Connected TV with contextual internet mashups Interactive advertisements and purchase - This will involve fetching information from the internet that is relevant to a particular advertisements allowing viewers to do online review and purchase of the products.

5.

TECHNICAL DETAILS

8.

SCREEN SHOTS

The overview is depicted in Figure 1. The demo is using the following technical modules: Text localization from the streamed video Preprocessing the text region prior to Optical Character Recognition (OCR) Recognition of the characters from the video Removal of false positives Spotting the breaking news from the text streams Keyword spotting Efficient search in the internet Push the retrieved information to iPad or iPhone or show as blended text over the same video

Here we give the video displayed in the connected TV (Fig. 2)(in this demo the laptop will simulate the connected TV). Screen shots of the second screen namely iPad and iPhone is shown in Figur 3 and Figure 4 respectively. These second screens are used to display the information related to the video in connected TV.

16

Figure 2: News Video Displayed in the Connected TV

Figure 4: Related information in the iPhone

Figure 3: Related information in the iPad

17

Automatic Measurement of Play-out Differences for Social TV, Interactive TV, Gaming and Inter-destination Synchronization
R.N. Mekuria
TNO P.O. Box 5050 2600 GB Delft, Netherlands +31 6446 66987

H.M. Stokking
TNO P.O. Box 5050 2600 GB Delft, Netherlands +31 6516 08646

Dr. M.O. van Deventer


TNO P.O. Box 5050 2600 GB Delft, Netherlands +31 651 914 918

roefi20@gmail.com ABSTRACT

hans.stokking@tno.nl

oskar.vandeventer@tno.nl

Inter-destination media (play-out) synchronization for social TV has gained attention from research and industry in recent years. Applications include social TV and interactive game shows. To motivate further research of inter-destination synchronization technologies, pilot measurements of play-out differences in different TV-broadcasting systems are desirable. However to our knowledge no broadly applicable measurement system exists. This paper fills this gap by presenting and implementing a robust system that can detect constant or slowly varying differences in media play-out between different receivers without accessing receiver hardware or network. The measurements are relevant in various use-cases such as football watching, social TV, interactive game shows around TV-content, TV input lag comparison for gaming and validation of current inter-destination synchronization technologies. The robustness of the system is demonstrated in a working prototype.

destination media synchronization for IPTV and web broadcasting have also been introduced recently [2] [3]. Techniques for fixing play-out differences in networks are an active topic of research of which a survey is given in [4]. Play-out differences in TV broadcasting became noticeable to consumers in the Netherlands during the recent FIFA 2010 world cup. Consumers subscribed to different digital broadcast technologies saw goals at different times, spoiling the experience of consumers lagging behind. Playout differences can cause unfairness in an interactive quiz show around TV-content where users can answer questions by phone or internet to compete with other viewers. When consumers and broadcasters become more aware of the play-out difference issue, measurements will become relevant to them as an indicator of the quality of service. This paper presents a tool that can accurately and automatically measure play-out differences (inter-destination synchronization) between receivers on a similar location relevant to each of the use-cases described in this section.

Categories and Subject Descriptors


H.5.1 [Multimedia Information broadcasting, media synchronization. Systems]: Video

2. EXISTING PLAY-OUT DIFFERENCE MEASUREMENT METHODS AND THEIR LIMITATIONS


Traditionally, inter-destination media (play-out) synchronization techniques have been applied in video conferencing applications to enhance the collaborative work experience. Play-out differences need to be measured in a test setup to measure and compare the performance of these techniques Nunome and Taskaka performed such a validation study [5]. Timestamps of received packets were used together with a fixed estimate of the expected decoding and play-out delay to estimate differences between receivers. This approach works well with terminals running similar hardware and a similar network/protocol stack which can often be expected in commercial video conferencing systems, however it is not practical for measuring play-out difference in TV-broadcasts were many different networks and receiver types exist. Also proprietary TV broadcast systems are often not open for third party measurements in the network and receivers. Another drawback of the timestamp method is that delays can be introduced after digital reception (timestamp) by both the set-top box and the (digital) TV. Therefore we conclude that measuring differences in play-out between broadcasters with timestamps is not always possible and not always completely accurate. In the gaming community a similar synchronization measurement problem exists related to the measurement of TV input lags [6]. In [6], a digital clock was displayed on different screens, which were recorded together with a separate video camera. Time lags were determined by reading out the displayed

General Terms
Measurement, Performance.

Keywords
Inter-destination media synchronization, Social-TV, video delay, input-lag, broadcast-tv, gaming

1. INTRODUCTION
Digital TV-broadcasting and video technologies have improved visual image quality and support different devices such as TV, PC and smart phones and enable interactivity between consumers of content. Users can watch video or TV and simultaneously use text or voice chat obtaining a social-TV experience. However as different digital broadcasting techniques/schemes have different processing delay play-out differences between receivers occur. An interactive experience such as Social-TV can be disrupted by these play-out differences. This was noticed by Shamma et al.[1] who proposed play-out synchronization to enhance this social-TV watching. Other companies also started to offer solutions for watching video together synchronized .e.g. Watchitoo, clipsync.com and BBC i-player. Web sites like Youtube social and Synchtube enable synchronized playback of YouTube videos while watching with Facebook friends. Standards on inter-

18

digital clocks. This approach is accurate but hard to perform or to automate in practice. Also when measuring TV broadcasts a video of a clock signal is not always available. Stokking et al. [7] conducted several measurements comparing play-out differences between TV broadcasts by using a front recording and comparing scene changes manually on a frame by frame basis. From these measurements the play-out difference was computed. Whereas this approach is accurate, end-to-end and does not require access to receiver hardware or network, it is a lot of manual work and hence unsuited for long-time and/or real-time measurements. We have automated the latter approach by extending this method with automatic scene change detection and robust play-out difference estimation. Our method accounts for processing delay after packet reception and does not need onscreen clocks to compute the play-out difference.

4. DEMONSTRATION
Our Demonstrator, implemented in the MATLAB language, shows that recordings of movies playing on different devices ranging from smart phones to flat screen TVs can be fed into the system to quickly estimate the play-out difference. It enables playout difference detection even when no timestamps or displayed clock signals are available. The robustness to screen size and brightness is also clearly observed in practice. Currently, our system works on recorded files, but it is ready to be extended to real-time measurements. By using a mobile DVB-T receiver that was tested to have approximately constant play-out differences at different locations we were able to compare play-out differences of non co-located TV broadcast receivers indirectly as in[7].

5. CONCLUSION
This article presented a system that measures and computes playout difference between receivers at a similar physical location. The method is more accurate, more automatic and easier to deploy than previous methods. It enables easy measurements on proprietary TV networks or systems. TV Distributors can use the system to compare their signal delay with their competitors and potentially use the results to promote their services to for example co-located soccer match viewers. Companies offering interactive games around TV content can use play-out difference measurements as a base for their synchronization strategy. The system can also be used to validate and asses the quality of the current generation of inter-destination algorithms and solutions on the market. Moreover, gamers can compare TVs for their input lag by connecting two screens to a single gaming console or DVD player.

3. SYSTEM OUTLINE

Receiver 1

Receiver 2

Video camera

6. REFERENCES
Figure 1. Front recording test setup Our system uses a recording of two devices displaying a similar video file or similar TV content as shown in Figure 1. A (web-) camera is used to record the terminals and to store it as a video signal V(x,y,t). From V(x,y,t) the device on the left side and the right side are automatically extracted by separating into smaller videos L(x,y,t) and R(x,y,t) after first manually selecting the upper and lower points with a user-interface. The two separate video segments are fed into the scene change detector that was tested to detect scenes with a probability of approximately 9095% and gives false positive (detection) rate of approximately 0.1%. [1] Shamma, D. A., Bastea-Forte, M., Joubert, N., and Liu, Y. Enhancing online personal connections through the synchronized sharing of online video. In CHI '08 Extended Abstracts on Human factors, 2008 [2] ETSI TS 183064. Telecommunications and Internet converged services and protocols for advanced networking(TISPAN); NGN integrated IPTV subsystem stage 3 specification, 2010 [3] Stokking et al. RTCP XR Block Type for inter-destination media synchronization, Internet Draft, 2010 [4] Boronat,F , Loret, J, Garcia, M. Multimedia group and interstream synchronization techniques: A comparative study. In Elsevier Information systems volume 34 issue 1 pages 108131, 2009 [5] Toshiro Nunome, Shuji Tasaka. An Application-Level QoS Comparison of Inter-destination synchronization schemes for continuous media multicasting GLOBECOM03, 2003 [6] HDTV lag the unofficial guide: http://sites.google.com/site/hdtvlag/ Figure 2. Play-out difference detection system outline The scene-change detector used was a custom built implementation of [8]. The detector was carefully tuned to meet the performance with smaller screen sizes, dark images and low resolution recordings. Figure 2 shows the outline of the system in a diagram. The computed cross correlation gives an easily detectable peak /maximum at the play-out difference. [7] H.M. Stokking, M.O. van Deventer, O.A. Niamut, F.A. Walraven, R.N. Mekuria, "IPTV Inter-Destination Synchronization: A Network-Based Approach", ICIN 2010, Berlin, 11-14 October 2010. [8] Chung-Lin Huang and Bing-Yao Liao: A Robust SceneChange Detection Method for Video Segmentation, IEEE Transactions on Circuits and Systems for Video Technology, 2001

19

User Interface Toolkit for Ubiquitous TV


Javier Burn Fernndez
1. Department of Informatics and Numeric Analysis, Cordoba University, Crdoba, Spain
1 Konstantinos Chorianopoulos Enrique Garca Salcines , Carlos 1 Department of Informatics, Ionian de Castro Lozano , Beatriz Sainz Univerity, Corfu, Greece de Abajo2 choko@ionio.gr 2. Department of Communications

jburon@uco.es

and Signal Theory and Telematics Engineering, Higher Technical School of Telecommunications Engineering, University of Valladolid, Spain.

(egsalcines,ma1caloc)@uco.es beasai@tel.uva.es ABSTRACT


The wide adoption of small and powerful mobile computers, such as smart phones and tablets, has raised the opportunity to employ them in multi-user and multi-device iTV scenarios. In particular, the standardization of HTML5 and the increase of cloud services have made the web browser a suitable tool for managing multimedia content and the user interface, in order to provide seamless session mobility among devices, such as smart phones, tablets and TV screens. In this paper we present an architecture and a prototype that let people transfer instantaneously the video they are watching between web devices. This architecture is based on two pillars: Websockets, a new HTML5 feature, and Internet TV (Youtube, Yahoo Video, Vimeo, etc.). We demonstrate the flexibility of the proposed architecture in a prototype that employs the Youtube API and that facilitates seamless session mobility in a ubiquitous TV scenario. This flexible experimental set-up let us test several hypotheses, such as user attention and user behavior, in the presence of multiple users and multiple videos on personal and shared screens. distinction. Moreover, the plentitude of devices enable the creation of ubiquitous computing scenarios, where the user can interact with two of more devices. Then, one significant research issue is to balance the visual interface system between two devices with output abilities. The remote control has been the most common way to interact with iTV. Moreover, some released products as RedEye1 that let the user interacts with TV through a second screen to do some basic operations of content controlling, however, it works only like Wifi to Infrared traductor in different devices. However, the popularity of mobile computers such as smart phones and tablets allow us to leverage the established way of interaction. A second screen could give the user more information and the possibility to interact controlling, enriching or sharing the content (Cesar et al. 2009, Cesar 2008). In this work, the researchers examine alternative scenarios for controlling the content in a dual screen set-up.

2. SYSTEM ARCHITECTURE
In our research, we are exploring alternative multi-device visual interface configurations in the context of a leisure environment and for entertainment applications. For this purpose, we have developed a flexible experimental set-up, which we plan to employ in several user evaluations. The latter are focused on the actual user behavior in the face of important parameters, such as attention, engagement, and enjoyment. The system architecture for the experimental set-up consists of: 1) A Tv connected to a Laptop, 2) Apache/PHP Server, 3) An Ipad and Iphone, 4) A Local Area Network, 5) An Apple Remote connected to the TV using Infrared. In Figure 2 a simplified draw of the system arcuitecture, in order to be well understood, can be observed. Firstly it is necessary to remark that the novelty of this architecture is the use of a technology drafted in HTML5 called WebSockets. The use of this technology let us connect bidirectionally two browsers. Thanks to this characteristic we convert an Ipad or any device with a WebSockets supported browser in a TV remote controller. For this several webs (depending on the scenario) have been developed. From these webs the user will be able to control other Webs that represent the TV.
1

Keywords
Tablet, TV, interaction, design, evaluation

1. INTRODUCTION
Since the advent of the PDAs there have been some studies to replace the remote control in the interaction with interactive television. One of the most influential research for this work is the Robertson one (1996), which proposed a prototype for real estate searching by a PDA bidirectionally communicated via infrared with interactive television. The author proposes a design guide remarking the importance of distributing information through appropriate devices. So the right information for display on PDA's is text and some icons, but television is suitable for displaying large images, video or audio. So the nature and quantity of information determines how to display and on which device. This research also gives priority to increase a synchronized cooperation between both devices. By now user interface systems consider a clear distinction between the input and the output devices. Indeed, the user interface systems in desktop computers, TVs, telephones, have usually distinguished between the input and the output devices. Smart phones and tablets are devices that don't consider this

https://thinkflood.com/products/redeye/what-is-redeye/

20

This work is focused on the secondary-screen as a control device for TV content. Previous research has regarded the secondaryscreen as an editing and a sharing interface, but has neglected the control aspect. In particular, we are seeking to understand the balance between the shared and the personal screen during alternative TV-control scenarios that regard the secondary-screen as a: 1) simple remote control, 2) related information display, 3) mirror of the same TV content.

interesting one. The user can extend what they are watching to other shared screen and also retrieves or fly in any video that is being watched in the TV. As it has been shown three scenarios include the same options and functionalities. It is important to remark because the more complex are these functionalities the more appropriate it will be the tablet to do that. But when we do common actions that we usually do when we watch videos on Internet is when the advanced visual interfaces in a second screen can affect the user attention in a negative way (Figure 2).

Figure 1 System Architecture Simplified Figure 2 Fourth scenario illustration. In summary, we are motivated by the introduction and wide adoption of small and powerful mobile computers, such as smart phones and tablets. The latter has raised the opportunity of employing them into multi-device scenarios and blending the distinction between input and output.

3. ONGOING RESEARCH
For our research we consider the following situation: Peter is watching a list of short sailing videos and he wants to control the video content playing, pausing and stopping the video, pass to the next and the previous video and see more information about the video including the next video. It is worth highlighting that the proposed functionality is a subset of that provided by the API of YouTube, which is a rather diverse and growing pool of video content. Is necessary to remark that the researchers want to evaluate interaction concepts on iTV. For this very simple actions will take in part in these prototypes to come up with conclusion for the design of coupled display interfaces in general in a leisure environment. So far we have developed four scenarios of iTV interaction: 1. To Interact with iTV using a remote control: In this case, user interacts with iTV using remote controller. To control the content there is a play/pause button and two arrows, right and left, to select the next or the previous video. The Menu button will be used to show the information related to the video and the next video on the list. To interact with iTV using a tablet as remote controller: In this case, all the overlay information shown in the first scenario is displayed in the tablet cleaning the first screen of interactive information so it wouldn't disturb other users. To interact with iTV using a tablet as remote controller: In this case, all the overlay information is displayed in the TV. iTV inside the tablet and a screen shared: This scenario suppose that user is watching the iTV in the tablet and there is a TV shared.. The user can fly out or expand what he is watching in the TV shared. This scenario is the most

4. ACKNOWLEDGEMENTS
This work was partially supported by the European Commission Marie Curie Fellowship program (MC-ERG-2008-230894) and by the ORVITA2 project at EATCO, Cordoba University, Spain.

5. REFERENCES
[1] Cesar, P., Bulterman, D. C., Geerts, D., Jansen, J., Knoche, H., and Seager, W. 2008. Enhancing social sharing of videos: fragment, annotate, enrich, and share. In Proceeding of the 16th ACM international Conference on Multimedia MM '08. ACM, New York, NY, 11-20. [2] Cesar, P., D.C.A. Bulterman, and J. Jansen, 2009. Leveraging the User Impact: An Architecture for Secondary Screens Usage in an Interactive Television Environment. Springer/ACM Multimedia Systems Journal (MSJ), 15(3): 127-142 [3] Fallahkhair, S., Pembertom, L. and Griffiths, R. 2005. Dual Device User Interface Design for Ubiquitous Language Learning: Mobile Phone and Interactive Television (iTV) [4] Robertson, S., Wharton, C., Ashworth, C., and Franzke, M. 1996. Dual device user interface design: PDAs and interactive television. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI '96. ACM, New York, NY, 79-86

2.

3. 4.

21

Demo: using speech recognition for in situ video tagging


John Grnvall
Arcada University of Applied Science Jan-Magnus Jansson Plats 1 00550 Helsinki, Finland +358 50 368 4607

Chengyuan Peng
VTT PL 1000 02044 VTTd , Espoo Finland +358 20 722 111

Lasse Becker
Lingsoft Oy Linnankatu 10A 20100 Turku, Finland +358 2 279 3300

john.gronvall@arcada.fi ABSTRACT

chengyuan.peng@vtt.fi

lasse.becker@lingsoft.fi

In this demo we present our attempt to solve a well-known problem: how to generate interesting and useful tags describing the content of user generated mobile phone video material. In our case the tagging takes place just after the video recording, at the end of the video file. By using specific keywords we delimit the tags from the rest of the audio track.

3. PROBLEM DESCRIPTION
Having reliable and good metadata for UGC clips is essential when gathering material for the local TV broadcast. The two research questions we set out to answer are: How can we generate interesting and useful tags in situ, describing the content of user generated video material at the time of recording?

Categories and Subject Descriptors


H5.2. User interfaces. Natural language. Theory and methods.

General Terms
Documentation, Design, Experimentation, Human Factors, Standardization, Languages,

4. THE SUITCASE DEMO

OF

STORIES

Keywords
Mobile video, tag, speech recognition, metadata search, participatory journalism, mediation, local TV.

The goal of Suitcase of stories experiment was to create a total of 500 minutes of speech-tagged video. A class of Film & Television students where chosen for the experiment using 20 Nokia N97 phones for a period of one week. Everyone in the group should produce short video segments (30-90 seconds of length) totalling 5 minutes of content each day, for five days in a row. The students were asked to film at 5 specific locations. In addition to this they should enable the location tagging via GPS in the phone and shoot 5 minutes in a tram, thus testing the usability of the GPS system. Each video clip was tagged by speaking into the handset at the end of the filming process. The tagging was specified as follows to the students: When done filming your 30-90s press PAUSE, Say the word 'KIRAHVI' (giraffe) Speak your single word tags Say the world 'KROKOTIILI' (crocodile) Speak one free-text continuous sentence describing the scene into the phone Press STOP After five days the phones were collected. The soundtrack was extracted using ffmpeg and then sent for recognition to the Lingsoft (http://stt.lingsoft.fi/stt.php) voice recognition system. Which in turn produced plaintext words that were stored in a database together with the video.

1. INTRODUCTION
There has been quite substantial research done on participatory elements in different news contexts [1][2][3][4][5]. Therefore we conclude that if the citizen really wants to make his/her voice heard in a local TV context, given the right tools the aether is currently open for delivering ones personal opinion [6]. However, when people publish their videos, they usually do not bother to enter the necessary metadata tags most likely because they are considered awkward and laborious. [7] In this paper we address a well-known problem: how to generate interesting and useful tags for user generated video content. We present our system of speech tagging, in situ while recording video material on a mobile phone.

2. CURRENT AND PREVIOUS WORK


For some time there has been a multitude of annotation tools available for tagging (or annotating) video material in real time or in retrospect. [8][9][10][11][12] Using speech to tag videos is not new [13][14][15] We base our work on these prior research projects but in contrast to these systems we want to develop a simple and automated process for retrieving the metadata, in situ, at the moment of video recording. While voice recognition per se is a well-established technique, there have to our knowledge not been any attempts to use it for insitu tagging of UGC videos at the time of recording.

5. RESULTS AND ANALYSIS


To get an estimate of how the transcription succeeded, we calculated the percentage of correctly recognized tags, meaning words spoken after the keyword KIRAHVI using a small group of samples 97 out of approx. 800 total samples. Out of these samples there were 54 correctly recognized tags and 43 words recognized incorrectly (or not at all). This corresponds to 56 % accuracy, which is much less than what the recognizer

22

does in ideal circumstances (with no background noise, quality microphone, clearly spoken words). Furthermore, in discussions with the student group after the production week it became clear that the main difficulties in our tagging experiment were both technical and due to the end users: People did not think carefully before speaking their tags, which resulted in sloppy pronunciation and bad recognition. Many of the words were out of vocabulary because of names of persons and physical locations in the city. The students should have been put through a retrieval exercise first, learning to identify what constitutes a good tag. In this way the use of sensible tags would have increased Similarly, if they had a chance to try out an automatic speech recognition system in advance they would have understood the importance of speaking clearly, into the microphone, with a firm voice and pronunciation thus improving the ASR process. The amount of tags per clip was quite good, but some students clearly did not remember what they just had filmed, or were suffering from the awkwardness of the milieu they where in at the moment. The students said at times they felt silly entering the tags at public locations. The tags were for the most part substantives and verbs, with a few adjectives mixed in. The content of the videos was to large extent scenery from the places we asked them to visit. Many clips were shot at the University where they study as well as shots of their friends doing their everyday business. Not very representative of what typical UGC material could be. Even though the group of students were enthusiastic few of them managed to produce the required 5min of footage per day. The free text sentence in the end was regressed into words in their base form, making the whole idea of one human readable sentence useless. The main keywords were not always understood as the triggers (delimiters) they were intended to be. The GPS-based location metadata was lost altogether since its not stored with the actual video file. And we did not understand how to retrieve it. Also the GPS failed to get adequate satellite coverage in the time needed when an interesting shooting opportunity appeared. The process of extracting the audio from hundreds of video files was tedious, even thought ffmpg and some bash scripts made it easier. Our original idea that the video should be uploaded over the existing open wlans in Helsinki was doomed because of the actual transmission speeds from the phone being far to slow. Many did not realize that you cannot shoot with the phone in horizontal landscape mode; this led to a number of videos that would have to be rotated 90 degrees. Often the footage was surprisingly steady, probably because the students are all familiar with the use of a video camera thus not representative of a general UGC.

tagging process more natural than having to go through the material and tag it in retrospect.

7. REFERENCES
Boczkowski, P.J. (2004a) The Processes of Adopting Multimedia and Interactivity in Three Online Newsrooms, Journal of Communication 54(2): 197213. Boczkowski, P.J. (2004b) Digitizing the News: Innovation in Online Newspapers. Cambridge, MA: MIT Press. Mitchelstein, E. and Boczkowski, P.J. Between tradition and change: A review of recent research on online news production, Journalism, vol. 10, 2009, pp. 562-586. Wardle, C. and Williams, A. (2010) Beyond user-generated content: a production study examining the ways in which UGC is used at the BBC. Media Culture Society 2010 Niekamp, R. Sharing Ike: Citizen Media Cover a Breaking Story, Electronic News, vol. 4, 2010, pp. 83-96. Scheufele, D.A. and Nisbet, M.C. Being a Citizen Online: New Opportunities and Dead Ends, The Harvard International Journal of Press/Politics, vol. 7, 2002, Rodden K, Wood K, How Do People Manage Their Digital Photographs? Chi 2003. W. E. Mackay, EVA: An experimental video annotator for symbolic analysis of video data, Acm Sigchi Bulletin, vol. 21, no. 2, p. 71, 1989. Abowd, G. D., Gauger, M.,Lachenmann A.2003. The family video archive: An annotation and browsing environment for home movies. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval, 18. Shamma D. A, R. Shaw, P. L. Shafton, and Y. Liu, Watch what I watch: using community activity to understand content, in Proceedings of the international workshop on Workshop on multimedia information retrieval, 2007, p. 275284. N. Diakopoulos, S. Goldenberg, and I. Essa, Videolyzer: quality analysis of online informational video for bloggers and journalists, in Proceedings of the 27th international conference on Human factors in computing systems, 2009, p. 799808. P. Cesar, D. C. A. Bulterman, J. Jansen, D. Geerts, H. Knoche, and W. Seager, Fragment, tag, enrich, and send, ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 5, no. 3, pp. 1-27, 2009. M. Cherubini, X. Anguera, N. Oliver, and R. de Oliveira, Text versus speech: a comparison of tagging input modalities for camera phones, in Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services, 2009, p. 110. R. Zhang, S. North, and E. Koutsofios, A comparison of speech and GUI input for navigation in complex visualizations on mobile devices, in Proceedings of the 12th international conference on Human computer interaction with mobile devices and services, 2010, p. 357 360. P. Froehlich and F. Hammer, Expressive Text-to-Speech: A user-centred approach to sound design in voice enabled mobile applications, in Proc. Second Symposium on Sound Design, 2004

6. CONCLUSIONS
The results show that the idea of in situ speech recognition is promising but having to do the speech recognition offline makes the process unpractical. The in situ aspect is important, when entering the tags immediately the author has a better recollection of what he/she has just shot. This makes the

23

Value-added services and identification system: an approach to elderly viewers


Telmo Silva
CETAC.MEDIA/DeCA Universidade de Aveiro Portugal +351 234370200

Jorge Ferraz
CETAC.MEDIA/DeCA Universidade de Aveiro Portugal +351 234370200

Pedro Almeida
CETAC.MEDIA/DeCA Universidade de Aveiro Portugal +351 234370200

Osvaldo Pacheco
Depart. Elect. Telec. e Inf. Universidade de Aveiro Portugal +351 234370200

tsilva@ua.pt ABSTRACT

jfa@ua.pt

almeida@ua.pt

orp@ua.pt

Nowadays, with the advent of TV technical and interactive improvements, operators may provide to user a more personalized television experience which demands a reliable identification system - centred on the viewer rather than on the Set-Top-Box. When senior viewers are at stake, an automatic and non-intrusive system seems to be more suitable than the input of a user ID and a matching PIN. In this work, with an approach based on hands-on experience, demos, direct observations and interviews we attempt to find the Viewer Identification System more suited for this generation of users.

group in 2050) [5]. This trend justifies a careful consideration not only of the needs and characteristics of this target group when developing appliances and services to their homes but also on their corresponding gratifications [6].

2. IDENTIFYING ELDERLY VIEWERS


Each person/environment set is unique and encloses specificities that are complex to address in systems design. Taking this into account, we defined a research process that begins with a set of exploratory interviews, followed by the development and validation of a prototype through a new set of interviews. This research process is supported in a User-Centred Design approach as David Geerts suggests in [2]: "A good human-centred design approach will lead to applications that take users needs and the context of use into account".

Categories and Subject Descriptors


H.1.2 [Models and Principles]: User/Machine Systemshuman factors.

2.1 Research process


The defined research process starts with a set of five exploratory interviews to help in the system design. The experience gathered from this first contact helped us adjust the interviewing style. Taking in consideration the sociological and technical literature review and the data collected on this first phase, a prototype (explained in 2.1.2) was developed to support a following round of interviews. This new round (nine interviews) was carried out to gather opinions concerning a medical reminder service (thought as one of the potential valueadded services to this age group) and the different identification systems presented in the prototype.

General Terms
Human Factors, Experimentation, Verification.

Keywords
User Centred Design, iTV, Elderly, Viewer Identification.

1. INTRODUCTION
Watching TV is a daily activity for most of human beings. In recent years, with the advent of new TV distribution systems such as Digital Terrestrial Television (DTT) and IPTV (Internet Protocol Television), this activity is changing[1]. Some of these systems introduced a return channel which has the potential to provide a high level of content personalization. In this technological scenario, the multiplicity of interactive TV (iTV) services faces a constant increase. Since the typical scenario relies on multiple unidentified viewers using the same STB the offered experience is not completely adjusted to the viewer. This limitation can be overcome if the TV provider knows who is really watching TV, allowing the offer of interactive services more suitable to the viewers profile, such as: personalized ads, automatic tune of favourite channels, adjusted audio descriptions, personalised health care systems or communication services. To accomplish this it is of a great importance to improve the TV provider infrastructure with a reliable Viewer Identification System (VIS). In the particular context of this work we are especially interested in the development of a VIS targeted to senior viewers. Therefore it is important to understand their needs, motivations and behaviours when they are in front of the TV set. The motivation for this target audience derives from the actual worldwide scenario where the number of older persons is increasing as confirmed by the World Health Organization (W.H.O.). It reports an increasing rate of 2.7% per year in the group of people having more than 65 years (with a prediction of 2 billion people in this

2.1.1. Exploratory interviews phase I


Elderly are a highly heterogeneous group that live in multifaceted environments influenced by several social structures. Due to this heterogeneity we decided to use a broad interview guide approach to ensure that all the issues were addressed in the interviews. This approach allowed a degree of freedom and adaptability in the interviewing process that was important to create a more relaxed and enthusiastic environment [3].To the exploratory interviews the participants (two women and three men) were, randomly, selected using a list of inhabitants from Anadia, a small size Portuguese city. In order to assure a relaxed environment in the interviews a preliminary phone call to each interviewee explaining the process and the motivations of the work was made; and all the interviews were carried at elders houses. In this group, the estimated time spent viewing television was three and a half hours per day. Thus, we could verify that people of this set: i) do not use computers and internet often; ii) watching TV is their main occupation; iii) generally, dont like to speak about technological gadgets; iv) have difficulties in conceiving scenarios related with the integration of new technologies; v) most of them are concerned about health issues and agree on the advantages of an iTV

24

service in this field. Concerning the viewer identification system, we described (using a common sense language) a set of options to the interviewees: i) RFID card (a card that should be passed near a reader); ii) Fingerprint reader in remote control; iii) PIN code; iv) Voice recognition; v) a bracelet with an identifier; vi) Mobile phone with Bluetooth activated; vii) Face recognition; and viii) Remote control with a gyroscope providing handling recognition. These interviews (phase I) revealed that without a prototype it becomes difficult to the seniors to clearly identify the advantages of an automatic identification system in iTV context. Due to this constraint, they tended to disperse their answers: Fingerprint reader and RFID card each got one vote and face recognition two. The other methods did not get votes. However, three of the five interviewees referred the need to be able to turn-off this system anytime they wanted. All valued the importance of a system that can be used to help in daily activities and events (e.g. medical prescription) and help their caregivers network. After these interviews we strengthen the need to develop a functional prototype to clearly demonstrate the identification methods correlated with a value-added service.

most suitable VIS. Along with the prototype we also proposed other identification options (same as in phase I) and we tried to perceive which the favourite one was.

2.1.4. Selected findings


The collected data (from interviews phase I and phase II) show that the estimated time watching TV (from all the participants)is around three and a half hours a day. Specifically from phase II we gathered the following information: i)Despite all the interviewees were familiarized with a TV set, they used the remote control mostly to adjust volume and select channels (6 in 9 users); ii)All the interviewees considered the identification system useful and recognized the enhancements that can be obtained in iTV services based on it; iii)4 in 9 of the interviewees referred that the VIS should be as automatic as possible (without the need of user intervention); iv)Communication services and life support systems, specially related with medical care, were often referred as a key enhancements that could be obtained if a viewer identification system is implemented; and v) Help instructions should be largely used and should be always present. In spite of these general indications our main focus was to detect a trend amongst the several automatic user identification techniques. Concerning this fact, we found that the spectrum of answers about VIS was considerably large making it impossible to find a clear trend.

2.1.2. medControl prototype


In order to present to users a layer of services that benefit from the identification systems, the prototype was built on top of a medical reminder service. This module (medControl) was developed under the research project iNeighbourTV (PTDC/CCI-COM/100824/2008) targeted at senior citizens. The medControl triggers alerts on top of the TV screen when the senior viewer needs to take his medication. MedControl was developed using the MS Mediaroom Presentation Framework (PF). Over this iTV service a multi-modal viewer identification system was developed and used in the interviews. This multimodal system comprises the ability to perform viewer identification through: i) PIN insertion using the remote control; ii) Bluetooth pairing with the user mobile phone; and iii) Detection of 13.56 MHz RFID tags (in an identification card). A laptop computer running the PF simulator was used as a STB at the interviewees homes. A RFID tag reader and a Bluetooth driver were also part of the prototype. The identification module, a Java based software (VIS- Viewer Identification System), reads the RFID tags, discovers the nearest Bluetooth devices and forwards viewer identification data to the iTV service.

3. CONCLUSIONS
At the beginning of this work we aimed to identify a trend concerning a VIS and we defined a simple research process. We also realized that it was very difficult to illustrate and explain our ideas without a tangible prototype. Although the results of the interviews were non-statistical it was clear that this is an excellent mean to involve and explain to older people technological enhancements in interactive TV area. Regarding the Viewer Identification System, we cannot clearly identify a general suitable method.

4. REFERENCES
[1] Ardissono, Liliana, Kobsa, Alfred, and Maybury, Mark T., Personalized Digital Television. HumanComputer Interaction Series. Vol. 6. 2004: Springer. 331. [2] Geerts, David, Sociability heuristics For interactive Tv. Supporting the social uses of television., in Faculteit Sociale Wetenschappen. . 2009, Katholieke Universiteit Leuven. [3] Kvale, Steinar, Interviews: An Introduction to Qualitative Research Interviewing, ed. S. Publications. 1996. [4] Lewis, Clayton and Rieman, John, Task-centered user interface design - A Practical Introduction. 1994. [5] Organization, World Health. World Health Organization launches new initiative to address the health needs of a rapidly ageing population. 2004 cited 2-1-2011, Available from: http://www.who.int/mediacentre/news/releases/2004/pr60/ en/. [6] Ruggiero, Thomas E., Uses and Gratifications Theory in the 21st Century. Mass Communication & Society, 2000. 3(1): p. 34.

2.1.3. Exploratory interviews phase II


This set of interviews was useful to get insights from the participants experiences and to gather more depth information onthe concerned topics [3]. This type of interview is common in the development of projects based in an User Centred Design (UCD) methodology [4]. Like in the first phase of interviews, we visited the participants at their homes in order to maintain them engaged at their natural environment. The researcher also tried to contribute to a relaxed environment making clear that it was not intended to test the participants technical skills. The second group of interviewees included nine new participants (five women and four men) over 55 years. The adopted procedures were similar to the ones of the first group of interviews. All the invited individuals accepted, were very kind and demonstrated their willingness to help and to be interviewed again if necessary. All interviews, except one (due to a request from the interviewee) were recorded. During the tests/interviews we could figure that the use of the prototype was very important giving the interviewees a solid and tangible image of the VIS goals. This evaluation helped us not only to improve the prototype but also to get information about the

25

DoctoralConsortium

26

Research for Development of Value Added Services for Connected TV


Tanushyam Chattopadhyay t.chattopadhyay@tcs.com
ABSTRACT
Recent trends on emerging market [1], [2] shows that the connected TV is becoming very popular in developing countries like India, Philippines. Connected TV can be described as an Internet enabled TV. One such product, referred as Home Infotainment Platform (HIP) [3], combines the functions of a television and a computer, by allowing customers to use their television sets for low-bandwidth video chats and access internet websites. It is now commercially available in India [4] and Philippines [5]. The research presented in this thesis is motivated by the business need to implement some value added services, hereinafter referred as VAS, on top of this product which can work as the Unique Selling Point for the above mentioned product. Main motivation behind the research described in this thesis is to develop some interesting VAS for the product of the company using frugal computing. As a result of that the thesis is not very focused to a particular problem, instead provides a frugal solution for several goals need to be achieved to develop the VAS for the global product. Some of such planned VAS for those connected TVs are video conferencing, Video encryption and watermarking, context based web and TV mash up, video summarization and Electronic Program Guide (EPG) for cable feed channels [8]. This thesis is devoted to the development of those above mentioned VAS for connected TV. As all these services need to be developed on an embedded platform, the primary task is to realize the required video CODEC on the target DSP platform. As H.264 is adjudged as the best video CODEC of the day [9] because of its compression efficiency, video quality, and network friendliness, we have developed most of the VAS on top of H.264 CODEC. Security can be ensured by either encrypting the video or putting a watermark in the video. Context can be extracted at top level by recognizing the channel the user is viewing and then getting the relevant information from the website of that particular channel. On the other hand, textual information in a TV show provide some information related to the show at any particular instant of time.

Innovation Lab, Kolkata Tata Consultancy Services India

Keywords
VAS, Connected TV, H.264, Text localization, EPG, Video Security

1.

INTRODUCTION

The total media viewing and sharing experience is changing and getting richer everyday, as videos, music and other multimedia content flood the Internet. The main reason behind the popularity of this product (HIP) in developing countries is that the Internet penetration in those countries is significantly low [6] compared to the penetration of Television. As per the report [7], 75% of the population of India owns a TV. TV has been a favored device of home infotainment for decades. In order to provide an unified Internet Experience on TV, it is imperative that the Internet experience blends into the TV experience. This in turn means that it is necessary to create novel VAS that enrich the standard broadcast TV watching experience through Internet capability. This necessity is eventually translated into the need for different applications like secured distribution of multimedia content, communication using video chat over TV, and applications that can understand what the user is watching on broadcast TV (referred to as TV-context) and provide user with additional information/interactivity on the same context using the Internet Connectivity. Understanding the basic TV context is quite simple for digital TV broadcast (cable or satellite) using metadata provided in the digital TV stream. But in developing countries, digital TV penetration is quite low. For example, in India, more than 90% TV households still have analog broadcast cable TV. Understanding the TV context in the analog broadcast scenario is really a big challenge. Even for the small percentage of homes where satellite TV has penetrated in form of Direct-to-Home (DTH) service, almost all of them lack in back-channel connectivity for proving true interactivity.

2.

REVIEW OF RELATED WORKS

In this section a brief overview of the state of the art in the related field is given. Realization of Video CODEC on Embedded Platform : In the literature [9] it is reported that the improvement in video quality and compression ratio for H.264 is obtained at the cost of increase in computational complexity and memory requirement. So the State of the art is analyzed in light of these challenges. Detail discussion can not provided here because of the page constraint and so we are providing the gist of the analysis here. Reduction of Computational Complexity : Two different approaches were taken to reduce the computational complexity which can

27

be measured in terms of Mega Cycles Per Second (MCPS). These approaches are (i) Platform independent optimization and (ii) Platform specific optimization. Some approaches are also there who had described the optimization of the encoder execution time as a whole like [10], [11]. Memory Optimization: In [12] authors have proposed a novel near-optimal filtering order so that significant reduction in memory requirement is achieved. This work also gives significant reduction reduces MCPS. However, their methodology is applicable to an FPGA prototype. It cannot be used in a commercially available DSP platform, where the user does not have the flexibility to modify the hardware architecture. The above state of the art reveals some limitations like the platform independent optimization techniques gives good optimization but at the cost of coding efficiency. Moreover some of these algorithms are sub optimal and not compliant to the standard. These algorithms are generic and thus can be applied to any type of videos. As the target application is mainly video telephone and video conference, the motion in the videos are very less. So if the nature of video can be exploited, more optimization can be obtained. Thus in this thesis an optimization technique is proposed based on the statistical analysis of the selected mode for these type of videos. On the other hand, the platform specific optimizations are not suited for our choice of video conferencing/video telephone. No literature also focuses on efficient rate control at low cost to get a better video quality. A comprehensive survey on the H.264 video security is described in [13], [14]. As the video security itself is a vast field of research, we have restricted the State of the Art analysis to the study of encryption and watermarking for videos and more specifically for H.264 compressed domain videos. Encryption: Description of video encryption can be found in [15] - [18]. Some of such techniques are (i) Encrypt the motion vector, (ii) Encrypt the entropy coded stream or (iii) Scramble the prediction modes to achieve encryption. But to the best of our knowledge only two such work can be found in [16], [17]. Watermarking: Different classifications for watermarking technology is described in [14]. Broadly video watermarking techniques can be classified in two types of approaches namely (i) Pixel Domain where watermark can be directly inserted in the raw video data and (ii) Compressed domain where watermark is integrated during encoding process or implemented by partially decoding the compressed video data. The major problem of implementing the pixel based approaches in the proposed solution is that there is an additional overhead of decoding the compressed video. Moreover the watermarking technique for the proposed system should be compliant to compressed H.264 video format which differs from the previous video codecs in different aspects as described in [9]. We have also described the differences between H.264 and other video codecs in chapter 2. Text Information Extraction from Video: The input video format for the proposed system is different for different sources of input signal. The input video may come from a Direct To Home (DTH) service or in form of Radio Frequency (RF) cable. In case of DTH, the input video is in H.264 or MPEG or any other compressed digital video format and on the other hand in case of video RF cable, the input is an analog video signal. In the second case initially the video is digitized for

further processing. The Text Information Extraction (TIE) module localizes the candidate text regions from the video. The state of the art shows that the approaches for TIE can be classified broadly in two sets of solution: (i) Using pixel domain information when the input video is in raw format and (ii) Using the compressed domain information when the input video is in compressed format. A comprehensive survey on TIE is described in the paper [19] where all different techniques in the literature between 1994 and 2004 have been discussed. A survey on the recent works in this field can be found from [20]. EPG for RF Enabled TV: Some related work on the channel logo recognition can be found in [21] - [24]. The best performance is observed in x86 platform for the approaches described in [24]. But the approaches taken in [24] involve PCA and ICA which is very much computationally expensive and thus is difficult to be realized in the said DSP platform to get a real time performance. So there is no solution is available in the literature that can recognize the channel logos realtime and can provide the EPG for RF feed TVs.

3.

MOTIVATION FOR THE PRESENT WORK

Review of the previous works on VAS for HIP like systems reveals that most of the studies concentrate on different subproblems instead of providing a complete end to end solution. The work embodied in this thesis is motivated to fill this gap. The major challenge in developing such a product is the resource constraint namely CPU speed and memory of the target hardware. Some of these algorithms describe a good solution for some of the sub-problems in PC environment. But these solutions cannot be implemented on a fixed point DSP platform. The proposed study is focused on developing the following VAS like (i) security of the broadcast video, (ii) context information extraction from streamed video. Moreover all of those solutions need to be deployed in the target hardware. So we have a plan to incorporate the following VAS as a feature for the HIP. Low bandwidth video applications: This feature enables the user to do video conferencing with another person having the similar HIP installed in his/her home while watching TV. The basic motivation behind this feature is that the TV screen would be minimized to a lower resolution and the user can use the rest part of the TV screen for video conferencing. This feature comes as the wish list from the customers of urban area of India whose wards are working abroad. Another such solution based on low bandwidth requirement is place shifting solution. This solution enables the user to access the home video content over broadband. But both of these solutions can be implemented when there is an efficient video CODEC, satisfying the requirement of high video quality at a low bandwidth, is realized on a DSP platform. As H.264 is proved to be the best video CODEC of the day, we have implemented H.264 on a low cost DSP platform. Video Encryption: This feature was motivated by the demand from the TV Channel agencies when the PVR was set in the market. The video encryption algorithm allows the user to record the video content using the key which can be derived from the hardware identification number of the PVR or HIP. As a consequence the user can be tracked if he/she

28

wants to use the recorded content for some illegal business purpose. Video Watermarking: The need for video watermarking was motivated by the need of one of the major content provider company. They had a need to insert watermark to the content provided by them in a content delivery network (CDN). The same algorithm can be extended for streaming video applications like video on demand (VoD) services provided by the DTH service providers. They also looked for a watermarking evaluation system that can evaluate any watermarking system. Mash up of TV context and Internet information: Living in a generation of Google TV, Yahoo Connected TV, it is impossible to sustain in the market of connected TV without providing the mash up feature. But as in India most of the people are using analog cable TV and all of those above mentioned products are based on Digital videos only, there is a need to develop such system to address this variation of input, too. Moreover the quality of the videos obtained from analog feed video is quite poor in comparison to those obtained from the DTH service. EPG for RF feed TVs: The same gap in technology arising from the source of video content motivates us to develop such a service of EPG for the users using RF feed signal for TV. We have proposed the solutions that can perform at per of the 80% accuracy and efficiency of the related best PC based solution at a 20% cost in terms of execution time and hardware cost. This concept, commonly known as froogle computing, is mainly targeted for the CE products in developing countries. Current thesis is mainly motivated to provide such froogle solutions that can be deployed on the top of the HIP product already developed by the organization.

Retrieval (CLIR) are further research issues involved with this problem. Automatic channel logo region identification: We have found that the channel logo region identification for the animated channels is a challenge. Automatic localization for these channels (like 9xM) is a possible future extension of the present research.

5.

CONTRIBUTION OF THE THESIS

As far the state of the art is concerned, this thesis has several contributions for development of some VAS for a connected Television. Some of the major contributions are briefly discussed below: In this thesis some novel approaches to provide better video quality, coding efficiency and reduced MCPS even under the constraint of target hardware has been proposed. Improvement in video quality and coding efficiency under a constant bit rate is achieved by implementing a novel algorithm for adaptively selecting the basic unit for rate controlling. The proposed method also reduces the computational complexity using platform independent and platform specific optimization techniques and yet meets the very low memory constraint of the target processor for a standard H.264 baseline encoder without sacrificing the rate-distortion performance. The platform independent optimizations are useful as this version of the code can be ported to any DSP platform for further platform specific optimizations. Almost 40% MCPS reduction with respect to optimized reference code is achieved at the cost of less than 1% reduction in Peak Signal to Noise Ratio (PSNR). This thesis deals with an encryption scheme for H.264 video that can be implemented on a DSP platform. In case of Personal Video Recorder (PVR) enabled STBs and connected TVs any user can easily store any TV program. The proposed technique is capable to protect illegal distribution of video content stored in PVR. This thesis presents a fast yet robust video encryption algorithm that performs real-time encryption of the video in H.264 format on a commercially available DSP platform. This algorithm is applied in a real-time place-shifting solution on DSP platform, too. However, the approach has no negative effect as far as compression ratio and video quality are concerned. Mathematically, it can be shown that the proposed method is more robust than those methods for encrypting H.264 video described in the state of the art analysis. With the advent of high-speed machine, a hacker, now a day, finds it less difficult to break any encryption key even though it may require large number of attempts. Therefore, an encryption method alone is not sufficient for copyright protection and ownership authentication of stored and streamed videos. In this thesis digital watermarking techniques has been proposed for this purpose. A fast method of watermarking of streamed H.264 video data is proposed in the thesis to meet the real time criteria of a streaming video. This solution

4.

SCOPE OF FUTURE RESEARCH

The study presented here can be extended in several directions. Some of them are highlighted below: Video Screen Layout Segmentation: The layout of a video is very complex. We have tried to run different document page layout segmentation techniques on different video frames of news video. But none of these methods can produce a significant result. Frame by frame annotation of video frame using multimodal cues: The proposed method for mash up of web information and TV context is based on textual content of video only. But a better result perhaps can perhaps be obtained if multimodal cues like audio and image can be used. This can be used for annotating the frames and indexing the video. Cross lingual Information Retrieval: The textual content from the news video can be further used to retrieve related information from other languages. Script identification and subsequently Cross lingual Information

29

was deployed in a content delivery network (CDN) environment, too. In this thesis a novel TV and web mash-up application is described. This application initially extract the relevant textual information from the TV video coming in either analog or digital format and then mash up the related information from the web to provide a true connected TV experience to the viewers. Unlike digital TV transmission it is not possible to automatically get contextual information of TV programs from any Meta data. The text in a TV channel is extracted by text region identification followed by pre-processing of the text regions and performing Optical Character Recognition (OCR) on the text regions. The applications are presented for x86 based PC platform and ARM based dual-core platform. This type of system is not available in the literature. The thesis presents a novel method for recognizing the channel logos from the streamed videos in real time, which has various applications for VAS in the connected TV space. The prototype is developed in X86 platform and then ported on a commercially available DSP with nearly 100% accuracy in real time. In India, where most of the people are still watching TV using Radio Frequency (RF) feed cable, this image processing based approach solution for providing EPG is novel in nature. [12]

[13]

[14]

[15]

[16]

[17]

[18]

6.

[1] A. Wooldridge, A special report on innovation in emerging markets, The Economist, Page(s) 6, 17th April, 2010. [2] B. Stelter, A TV-Internet Marriage Awaits Blessings of All Parties, [3] A. Pal, C. Bhaumik, M. Prashant, A. Ghose, Home Infotainment Platform, UCMA2010, Miyazaki, Japan, June 2010. [4] Economic Times, Dialog can raise internet penetration, Economic Times Kolkata, Section-Business and IT, Page(s)5, Apr 27, 2010. [5] SMART launches SurfTV, http://smart.com.ph/corporate/newsroom/SurfTV.htm, Last accessed on 13th Jan, 2011. [6] World Internet Usage Statistics News and World Population Stats, http://www.internetworldstats.com/stats.htm., Last Accessed on Oct 2010. [7] ITU report baffled over RPs high mobile phone, TV penetration standard document styles, http://technews.com.ph/?p=1627, Last Accessed on Oct 2010. [8] T. Chattopadhyay and C. Agnuru, Generation of Electronic Program Guide for RF fed TV Channels by Recognizing the Channel Logo using Fuzzy Multifactor Analysis, ISCE10,7-10 June, Germany, 2010. [9] I. E. G. Richardson, H.264 and MPEG-4 Video Compression, ISBN 0-470-84837-5, 2003.

REFERENCES

[19]

[20]

[21]

[22]

[23]

[24]

Reduction of the Reference Frames for H.264 Encoder, IICCA05, Vol. 6, Page(s) 1040-1043 , June 2005. S. Y. Shih, C. R. Chang, and Y. L. Lin, A near optimal deblocking filter for H.264 advanced video coding, Asia and South Pacific Conference on Design Automation, 2006., Page(s) 24-27, 2006. T. Chttopadhyay and A. Pal, A Survey on Video Security with Focus on H.264: Steganography, cryptography and watermarking techniques, Proc. of the 2nd National Conference on Recent Trends in Information System (ReTIS 2008), Page(s) 63-67, Kolkata, India, 2008. S. Bhattacharya, T. Chttopadhyay, and A. Pal, A Survey on Different Video Watermarking Techniques and Comparative Analysis with Reference to H.264/AVC, ISCE06, Page(s) 616-621, Russia, 2006. Y. Ye, X. Zhengquan, and L. Wei, A Compressed Video Encryption Approach Based on Spatial Shuffling, Proc. of 8th International Conference on Signal Processing, Volume 4, Page(s)16-20, Greece, 2006. Y. Li, L. Liang, Z. Su, and J. Jiang, A New Video Encryption Algorithm for H.264, ICICS05, Page(s) 1121-1124, Thailand 2005. Y. Zou, T. Huang, W.Gao, and L. Huo, H.264 video encryption scheme adaptive to DRM, IEEE Transactions on Consumer Electronics, Vol.52, no.4, Page(s) 1289-1297, Nov. 2006. Z. Liu and X. L. Sch, Motion vector encryption in multimedia streaming, Proc. of 10th International Multimedia Modelling Conference, Page(s) 64-71, http://www.nytimes.com/2011/01/10/business/media A= /10tv.html?r ustralia, 2004. 2, 9th January, 2011. Last accessed on 14th January, 2011. K. Jung, K. I. Kim, and A. K. Jain, Text Information Extraction in Images and Video: A Survey, Pattern Recognition, Volume 37, Issue 5,Page(s) 977-997, May 2004. T. Chattopadhyay, A. Pal, A. Sinha, Recognition of Characters from Streaming Videos, Chapter 2 of Character Recognition, Published by SCIYO,Page(s) 21-42, 2010, ISBN 978-953-307-105-3. E. Esen, M. Soysal, T. K. Ates, A. Saracoglu, and A. A. Alatan, A fast method for animated TV logo detection, CBMI 2008, Page(s) .236-241, June 2008. A. Ekin and E. Braspenning, Spatial detection of TV channel logos as outliers from the content, in Proc. VCIP, SPIE, 2006. J. Wang, L. Duan, Z. Li, J. Liu, H. Lu, and J. S. Jin, A robust method for TV logo tracking in video streams, ICME, 2006, ICME, 2006. N. Ozay and B. Sankur, Automatic TV Logo Detection And Classification In Broadcast Videos, EUSIPCO 2009, Page(s) 839-843, Scotland, 2009.

[10] X. Kim and C. C. Jay Kuo, A Feature-based Approach to Fast H.264 Intra/Inter Mode Decision, ISCAS05, Page(s)308-311, May 23-26, 2005. [11] H. Wang and Z. Zhu, Fast Mode Decision and

30

Collaboration in Broadcast Media and Content


Department of Telecooperation Johannes Kepler University Linz, Austria

Sabine Bachmayer

sabine.bachmayer@jku.at

ABSTRACT
In recent, we have observed a marked shift in broadcasting from mainly passive analog, such as conventional television, towards digital technology. This caused a boom in developing interactive applications (beyond teletext in TV) and in inviting the viewer to participate with the content in a collaborative manner. Several popular participatory TV program formats have demonstrated this by inviting their audiences, for instance, to vote for a person. To date, the collaborative acts have been executed via parallel platforms, such as telephone and Internet. In summary, broadcast formats with audience involvement, and technologies for adding interactivity both exist but remain unlinked in most cases. This poses a problem since synchronicity is lost, and collaboration requires a common focus which, in broadcasting, is the broadcast content. This proposal describes the challenges and working process of my PhD research in the field of computer science which focuses on collaboration in the broadcasting area. The main research issue is examining whether it is possible to expand 1:n parallel broadcasting into collaboration without the usage of parallel platforms. The main contribution is the development of a reference architecture for realizing such scenarios.

1.

INTRODUCTION

In the past decade, digital and participatory broadcast technology have emerged besides the traditional analog, mainly passive (except simple interactive applications like teletext) and informative broadcasting. Therefore the concept of collaboration in broadcasting is not completely new. A simple and well-established example is inviting the audience to vote for or against a person or an item as it is practised by most participatory reality and casting shows. State-of-the-art work in collaborative broadcasting can be categorized firstly concerning to the boundedness to TV content (if it is (1) loose or (2) tied to TV content1 ) and secondly if it focuses either on (a) enhancing broadcast environments with collaborative services or (b) using parallel platforms (e.g., telecommunication features). Case (1a) and (2b) were found quite often in state-of-the art work. Examples of (1a) are synchronous and asynchronous chats or commendation functions, tools for group building and recommendation [1, 3, 5, 6, 11, 12]. (2b) deals with participatory content, using parallel platforms for participation (mostly individual) and collaboration [13, 14]. The gap reveals in case (2a) describing collaborative applications enhanced to TV environments and tied to a certain (genre of ) TV program format. Exceptional cases occur in T-Learning, for instance by Lpes-Nores [9, 10], and in entertaining TV o by the LIVE system. The LIVE system provides passive collaborative influence on the content through the viewers behavior (e.g., channel switching), which is observed by the broadcaster [7]. Case (1b) is not relevant for this work. The main contribution of this PhD research is to create the missing link between medium, content and collaboration from a technical perspective. In detail, the following three steps are comprised: Firstly, the key feature is the development of non-linear and participatory broadcast content which invites the audience to become active instead of adding activity to passive, linear content. Secondly, the development of collaborative applications, which are embedded in the television environment. Thirdly, the realization of a linkage mechanism between the delivery medium, participatory content and the collaborative activity, regardless of whether the collaboration influences delivery medium and content. Delivery medium (hence termed as medium or media ) denoAdapted from http://soc.kuleuven.be/com/mediac/ socialitv/results.htm, introductory presentation, slide 7
1

Categories and Subject Descriptors


H.5.3 [Group and Organisation Interfaces]: Collaborative ComputingSynchronous Interaction ; H.5.1 [Multimedia Information System]: VideoInteractive TV, Collaborative TV, MPEG-J

General Terms
Design, Human Factors

Keywords
Collaborative TV, Interactive TV, Collaboration in Broadcasting

31

tes the technical realization and representation of the content. Well-known media standards are, for example, MPEG2 and MPEG-4 as employed in the digital video broadcasting (DVB) standard in Europe. Characteristics of the medium are metadata, frames, time stamps and other features that are defined by the used standard. Content defines the substance that is transmitted via the medium as abstract model (e.g., a movie or radio show) and consumed by the audience. Characteristics of the content are, for instance, start and end point(s), scenes and characters in scenes. Its course is defined by the narrative structure, timing and pace. Access to the content is provided via the medium by a mapping mechanism (mapping characteristics of the medium to those of the content). I want to develop a reference architecture for realizing prototypical collaborative broadcast scenarios beyond the wellknown voting. One option is to extend MHP2 with collaborative support. This was, for instance, suggested in TLearning by Lopez-Nores [9, 10]. Another possibility is to use the MPEG-J framework [8], in MPEG-4 scenarios. For this research, I will pursue the latter possibility. The next section presents the aim and objective of my research, including the main research questions. This is followed by a description of the methodologies applied and previous work. The paper concludes with prospects and future work.

tools (e.g., computer-supported cooperative work tools, rating and recommendation systems) applicable for this purpose and for connecting to the broadcast medium / content [4]? In addition, hooks in the collaboration that correspond to those in the medium and content, are necessary (e.g., to ascertain the majority, reactions and counter-reactions). Q4: How can collaborative interaction be measured for further application in a collaborative broadcast scenario? To use collaborative interaction for further application (e.g., to influence the course and characteristics of content), it is necessary to measure (e.g., the level of activity), analyze (e.g., outcome of the collaborative activity) and finally quantify (i.e., indicate the outcome of the measurement and analysis as numeric values) the collaborative activity. Q5: What requirements and support must be satisfied? In general, which technology best supports user participation and collaboration? Define technical requirements (e.g., a run-time environment), security requirements (e.g., privacy and resistance against attacks), exception handling (e.g., handling the unexpected drop-out of participants) and the manner of support for this real-time system.

3.

METHODOLOGY AND PREVIOUS WORK

This section introduces the applied methodology, consisting of two main steps, and work done so far.

2.

AIMS AND OBJECTIVES

3.1

This research project aims to develop a reference architecture incorporating a pool of predefined collaborative services on the one hand and standard linkage mechanisms on the other hand. The architecture shall act as a tool kit for realizing a prototypical collaborative broadcast scenario. The main research question What is necessary to expand 1:n parallel broadcasting into collaboration? implies five research challenges: Q1: How can the linkage mechanism be designed? This question addresses the challenge of linking collaboration with both, the medium and content. Link to the medium: by linking to characteristics of the medium. For instance the audience is invited to participate a live chat to a certain topic, for a certain time period. The service is called, activated and deactivated by metadata embedded in and transmitted via the broadcast medium. Link to the content: by linking to course or characteristics of the content. For instance activating a collaborative service to help a candidate answering a question in a game show. The linkage is realized by using the candidate, who is acting in the scene, as a hook point. Q2: How must broadcast content be structured, and where are the hooks to link medium and content to collaboration? Structuring the content is related to the contents storyline. The key is to design and produce participatory content that invites people to collaborate. Which genres are suitable? For the linkage, a set of hooks must be defined, for instance, the appearance of a certain person or object in a scene of the content, the beginning of frames, time stamps or metadata in the medium. Q3: How must collaboration be designed for this purpose? Are existing collaborative mechanisms (e.g., communication mechanisms, concurrency control, context awareness) and
2

Step 1 - Descriptive and Non-Empirical Research

The descriptive and non-empirical research phase consisted of literary research work and scenario construction.

3.1.1 Step 1a - Literature Review


To classify the state-of-the-art work in collaboration in streaming and broadcasting, as this work was initially entitled, it was necessary to construct a new taxonomy because, firstly, most of the existing taxonomies were too specialized (e.g., those covering the geometric structure of video content [2] and those relating to the network and the user interface) and, secondly, a focus on the linkage mechanism was missing completely. 32 approaches in this area were chosen for the state-of-the-art analysis with the aim to build a representative overview of scientific work done so far. The taxonomy developed consists of six categories, namely the narration space of the content (to analyze whether the content is participatory), the level of linkage between collaboration and medium / content (to analyze if any linkage exists), the scope of collaboration (to analyze the level of collaboration), the type of interaction (to analyze the type of interaction used for the collaboration), delivery medium and delivery network (both to distinguish between streaming and broadcasting). After conducting this analysis, it was necessary to focus on the linkage between collaboration and content. For this purpose, the results of the analysis were narrowed down by building five classes from the categories level of linkage, narration space and type of delivery network. The classification indicates whether linkage exists and, in case it does, (i) whether it relates to the content (storyline, topic, genre, etc.) or to the delivery medium (e.g., by changing the mediums state collaboratively from play to pause), (ii) to which type of content it links (linear or participatory) and (iii) whe-

Multimedia Home Platform http://www.mhp.org

32

ther the linkage is realized by streaming or broadcasting. As mentioned in the introduction, linkage between content and collaboration has been realized in very few cases. In streaming, collaboration has been realized more frequently with focus on controlling the state of the medium collaboratively or enhancing the medium with interactive elements. In the following, this work focuses on the broadcast area, to be more exact, on television environments. The decision is justified firstly on the mentioned findings, secondly on technological differences which make broadcasting more attractive for collaboration as for example: - Connectivity - 1:n in broadcasting, 1:1 in streaming. - Marked standardizations in broadcasting - DVB which uses MPEG-2 / MPEG-4 media format in Europe, by contrast in streaming where a lot of media formats are in use (e.g., Windows Media, Quicktime, MP4, Flash, ...). And thirdly the possibility to tie in with popular and in the broadcasting sector well-established participatory TV program formats which currently use parallel platforms. In summary, the result of this step was an analysis of existing work. The lack of linkage mechanisms and reference architectures in broadcasting, and the prevailing usage of linear content became the starting point.

tain attribute of the medium.


Consumer scenario: Influencing collaboration

affects the characteristics of the linked medium (e.g., Figure 1e) or course and the characteristics of the content (e.g., Figure 1d). One example would be to provide a 60-second chat as a joker in Who wants to be a millionaire. This service is linked to the content and influences its course, and it is linked to the medium, as it is provided for a specified time interval. Influencing the medium means for example, adapting the duration of the interval depending on the intensity of the collaborative activity.
Producer, broadcaster scenario: Enhancing content

Prepare medium and content before broadcast for (non-) influencing collaboration by enhancing selected collaborative services and linkage.
Broadcaster scenario: Changing content

3.1.2 Step 1.b - Scenario Building


Based on these results, concrete story scenarios were built and in a next step classified and abstracted to deduce functional and non-functional requirements. The focal issues in the classification of these scenarios are, on the collaboration between the viewers (consumers side), and, on linking television medium / content and collaborative services (producers or broadcasters side). Summarizing, the consumer scenarios were classified and abstracted into a-/synchronous influence and non-influence of the medium and content as shown in Figure 1. The producer and broadcaster scenarios into enhancement and real-time change of the medium and content. This work focuses on synchronous scenarios.
a) b) c) d) e) # 1 <metadata ... ... > Frame #X # 5 Attribute X # 2 # 1 changes

Doing syn- chronized changes on the medium and content automatically and in real-time during the broadcast. By abstracting and classifying story scenarios, functional and non-functional requirements were identified. Functional requirements include the enhancement and update of medium and content, provision of private and open groups, session management, notification of opportunity to participate, medium and content analysis.

3.2 Step 2 - Engineering Research


In the engineering research phase, the reference architecture (artifact) is modeled, developed and evaluated.

3.2.1 Step 2.a - Model Construction (using UML)


UML use cases are designed for the 1.) Producers and broadcasters view: - Define and provide the collaborative services used. - Enhance medium and content with selected collaborative services (by using MPEG-J). - Provide methods and interfaces to define hooks in (a) the medium and content and (b) in the collaboration. - Link hooks (a) and (b) by using the linkage mechanisms provided. 2.) Consumers perspective (client): - Provide a player that receives the incoming media stream and displays the decompressed video content. To enable participation, the player must analyze the characteristics of the received and decompressed medium and its content. Available collaborative services must be indicated to the viewer. - Connect participants on the one hand, and provide a back channel to the broadcaster (server) on the other hand. - Measure, analyze and quantify collaborative activity. Send results to the broadcaster (server). - Provide support and exception handling. 3.) Broadcasters perspective (server): - Receive and analyze incoming quantified data from the consumers. - Change medium and content with respect to the received data and their analysis. - Compress and broadcast the modified media stream. Each step of the model construction includes the determination and analysis of characteristics and requirements that are necessary on the medium and content, the collaboration and the linkage. This step will lead in answering the research questions Q1 to Q4 in a theoretical manner.

#2 <m et a-.. da . ... > data ... ... >


<meta-

Drama / Game Show # 3 Non-Influencing Collaborative Service Legend Starting Point Content (structured in several steps) # 4

# 6

Influencing Medium

Figure 1: Schematic representation of collaborative services without influence (left side) and influence (right side) on medium and content Consumer scenario: Non-influencing collaboration does not affect the characteristics of the linked medium (e.g., Figure 1c) or course and characteristics of the linked content (e.g., Figure 1a and b). An example would be to enable participation by providing a collaborative quiz service in conjunction with a broadcast quiz show. The quiz service is linked, for instance, to the genre, which is a characteristic of the content, and it is linked to the medium as it is, for example, initiated and terminated by metadata or by a cer-

33

3.2.2 Step 2.b - Construction of an artifact


Based on the previously defined model, a layered reference architecture (Figure 2) will be developed in this phase. The
LayeredReference Architecture: Medium/Content Collaboratio n Linkage UserInterface Management HookingPoints Management Analysis Change Linkage Service Management HookingPoints Management Measurement Analysis Quantificatio n NetworkLayer Collaboration

5.

REFERENCES

[1] J. Abreu, P. Almeida, and V. Branco. 2beon: interactive television supporting interpersonal communication. In Proceedings of the 6th Eurographics workshop on Multimedia 2001, pages 199208. Springer New York, 2002. [2] S. Bachmayer, A. Lugmayr, and G. Kotsis. Convergence of collaborative web approaches and interactive tv program formats. International Journal of Web Information Systems, 6(1):74 94, 2010. [3] E. Boertjes. Connectv: Share the experience. In Proceedings of the 5th international conference on Interactive TV: a Shared Experience, volume 4471 of Lecture Notes in Computer Science, pages 139140. Springer, 2007. [4] U. Borghoff and J. Schlichter. Computer-Supported Cooperative Work - Introduction to Distributed Applications. Springer Berlin, 2000. [5] T. Coppens, L. Trappeniers, and M. Godon. Amigotv: A social tv experience through triple-play convergence. In Proceedings of the 2nd international conference on interactive TV, Brighton, UK, 2004. [6] F. de Oliveira, C. Batista, and G. de Souza Filho. A3tv: anytime, anywhere and by anyone tv. In Proceedings of the 12th international conference on Entertainment and media in the ubiquitous era, pages 109113. ACM, 2008. [7] S. Grunvogel et al. A novel system for interactive live tv. In Proceedings of the 6th international conference on Entertainment Computing, pages 193204. Springer Berlin / Heidelberg, 2007. [8] ISO / IEC. International Standard 14496: Information technology Coding of audio-visual objects, Part 5: Reference software, 2001. [9] M. Lopez-Nores et al. Technologies to support collaborative learning over the multimedia home platform. In W. Liu, Y. Shi, and Q. Li, editors, ICWL, volume 3143 of Lecture Notes in Computer Science, pages 8390. Springer, 2004. [10] M. Lopez-Nores et al. Formal specification applied to multiuser distributed services: experiences in collaborative t-learning. Journal of Systems and Software, 79:11411155, August 2006. [11] K. Luyten et al. Telebuddies: social stitching with interactive television. In Proceedings of the 1st international conference on Human factors in computing systems, pages 10491054. ACM, 2006. [12] M. Nathan et al. Collaboratv: making television viewing social again. In Proceeding of the 1st international conference on Designing interactive user experiences for TV and video, pages 8594. ACM, 2008. [13] P. Tuomi. Sms-based human-hosted interactive tv in finland. In Proceedings of the 1st international conference on Designing interactive user experiences for TV and video, pages 6770. ACM, 2008. [14] M. Ursu et al. Interactive tv narratives: Opportunities, progress, and challenges. ACM TOMCCAP, 4(4):139, 2008. All links were checked in April 2011.

Figure 2: Schematic Representation of the Reference Architecture

reference architecture processes an MPEG-2 / MPEG-4 video stream delivered via IPTV and enables the implementation of a prototypical collaborative broadcasting scenario, as described in the previous sections and in step 1.b. This step will lead in answering the research questions Q1 to Q4 in a practical manner.

3.2.3 Step 2.c - Destruction of an artifact


Destruction of the artifact includes testing and evaluation. Testing of this reference architecture will be conducted by building working prototypes of the defined collaborative broadcast scenarios. Due to a lack of time and money, the prototypes will use existing (and maybe linear) video content and consider the consumers and the producers, broadcasters perspectives, as mentioned before. By building prototypes from this reference architecture, its functionality is proven. The prototypes in turn are examined by measuring the previously specified essential and desirable functional and non-functional requirements. This step will lead in testifying the research questions Q1 to Q4 and answering Q5.

4.

PROSPECTS AND FUTURE WORK

Having finished descriptive and non-empirical research and step a) of the engineering research, I have recently started constructing the artifact. During the next months I will develop a reference architecture as illustrated in Figure 2. Before developing the basic packages (Medium / Content, Linkage and Collaboration ), an analysis of broadcast content (of medium and content) and of existing collaborative services will be done. The reference architecture will be evaluated by realizing the already defined story scenarios (mentioned in step 1.b) prototypic. The prototypes will in turn be evaluated in terms of the previously elaborated essential and desirable functional and non-functional requirements. Since the design and production of a participatory broadcasting content format is beyond the scope of this work, the prototypes will be tested with existing broadcast video content. However to address the big picture and to give a complete example scenario, I will create a storyboard of a participatory TV program format. Its purpose is to illustrate the role of participatory content in any activity beyond passive watching of TV.

Medium / Content

Lin k

34

Televisual Leisure Experiences of Different Generations of Basque Speakers


Xabier Landabidea Urresti
Institute of Leisure Studies University of Deusto Unibertsitateen etorbidea 24 48007 Bilbao Basque Country, Spain (+34) 944139075 xlandabidea@deusto.es

Iratxe Aristegi Fradua


Faculty of Social and Human Sciences - University of Deusto Unibertsitateen etorbidea 24 48007 Bilbao (+34) 944139075 Basque Country, Spain iariste@deusto.es

Aurora Madariaga Ortuzar


Institute of Leisure Studies University of Deusto Unibertsitateen etorbidea 24 48007 Bilbao (+34) 944139075 Basque Country, Spain aurora.madariaga@deusto.es

ABSTRACT
In this research project I argue that the connection between Leisure Studies and Audience Studies is unavoidable, fundamental and fertile in its possibilities, as media are becoming increasingly significant in defining the leisure of the 21st century citizens of affluent societies. Audience Studies have focused on the quantitative analysis of media consumption, while research practices traditionally associated with Leisure Studies such as time budgets have tended to measure observable activities overlooking the meanings given to and taken from those same practices [10]. This project is innovative in stressing the need of a new model for audience analysis that blends the polysemic nature of the terms leisure and television, especially in a transforming media landscape. It is argued that adapting the concept of leisure experience to the field of TV audiences is key for a better understanding of the present and future forms of television. This paper presents the justification and the relevance of the issue at hand, the broad theoretical concepts on which is based, the thesis aim, scope and objectives and a brief note about its methodology and proposed case study with Basque speakers, still in a preliminary stage of definition, as well as the provisional index of the study.

1. JUSTIFICATION
Media related leisure has gained increasing psychological, sociological and economic weight in industrialized societies to such degree that it can be argued that entertainment and media have become inseparable concepts for the majority of the citizens of affluent societies [3]. Today it becomes increasingly difficult to portray the leisure of a 21st century citizen without considering the media texts (written, audiovisual, digital, analog...) that he/she consumes and participates in. Mass media have become meaningful elements in the everyday life of individuals and communities alike in postindustrial and developing countries. Citizens everyday spaces sitting and sleeping rooms, class rooms, cultural centers, pubs, vehiclesand daily times -routines, habits, frequencies, periodsare saturated with media texts. Individuals of developed countries spend most of their free time watching television, listening to the radio, surfing the internet, playing on the console. Media have become omnipresent, and they have done it especially through leisure. Today, it is unavoidable to study leisure when considering media as well as studying media when considering leisure [10]. Television provides a fertile empirical and conceptual starting point for a joint exploration of the leisure experiences of media audiences. Studying TV and its audiences from the perspective of Leisure Studies is justified both because of the massive aggregate time that is destined to it in and because of its contested but not still overthrown centrality in a rapidly changing digital media landscape. Both present and future TV are critical for Leisure Studies. Traditional audience research methods have been mostly interested in counting and weighing the time we spend in front of the screen and trying to measure the effects and range of media in their audiences, but so far they havent provided us with answers to questions such as how and why do we insert television in our dailyness, what meanings does TV take into peoples everyday life or which pleasures do we find in our relations with it.

Categories and Subject Descriptors


H.5.2. [User Interfaces] - Evaluation/methodology.

General Terms
Human Factors, Measurement, Performance.

Keywords
Audience Studies, Leisure Studies, Television, Experience, Media Ecosystem, Basque speakers.

35

Leisure and Audience Studies can no longer avoid these challenges. Not only because media have become, along with tourism, major social manifestations of leisure in contemporary societies, but also because they are the objects of human choice and the specific articulations of human freedom. My PhD thesis aims to contribute to the understanding of the choices behind audience figures analyzing the meanings that people build in relation to television and the roles that TV plays in their everyday life. A model for the analysis of audiences televisual leisure experiences will be presented for that purpose.

and strains of modern life [7]the concept of leisure has undergone continuous transformations through history, and has reached a relative conceptual and theoretical emancipation, possibly induced by its explosively growing social relevance. Considered, until very recently to be a danger or a secondary matter, today it is understood as a field of development, identification and a right [4] On the one hand, from an objective point of view leisure has to do with the available free time, with the time period spend on doing something, with the resources used and the related actions [4]. It refers to the employed materials, occupied spaces, repeating habits and practices that are carried out. On the other hand, a subjective standpoint gives more relevance to the satisfaction, pleasures and meanings that can arise from the experience. Leisure is an area of human experience which is searched for and composed of freely chosen pleasant activities, but its outcome will never be entirely dependent on the action itself, neither on the subjects free time, economic or education level by themselves. As long as leisure is a personal experience, at the same time individual and social, it cannot be understood as a completely subjective phenomenon because a persons life always will be situated in a specific social and material context. For the Humanist Leisure Perspective of the Leisure Studies Institute of the University of Deusto, leisure is, at the same time, a social phenomenon, an integral personal experience, and a basic human right. This threefold meaning has been explored by the author in relation to the television in his dissertation [9] and represents the theoretical starting point of the current PhD thesis. Marie Gillespie states that, The media are cultural institutions that trade in symbols, stories and meanings. As such they shape the forms of knowledge and ignorance, values and beliefs that circulate in society [6]. It is this trade in meanings, stories and symbols that constitutes the core of their social relevance as manifestations and enablers of leisure, which this thesis aims at exploring. The meanings of television and leisure are not fixed, but change along history and through human groups and individuals. Exploring the everyday connections that different generations of Basque speakers make between TV and leisure will be one of the keys elements to understand their conceptions of what place does television have in their lives and to understand the evolution of these two terms, charged with multiple meanings.Comparing the discourse of different age-groups, showing different levels of media literacy and expertise with information and communication technologies (ICTs) will help us to clarify the possibilities and manifestations of social and individual interaction with television.

2. RESEARCH FRAMEWORK
The term television, far from a unique meaning traditionally associated to it, refers to multiple realities: a household electronic device, a social institution, a content production and distribution system, a leisure resource, a leisure practice, an industry, a market and so on. Alain Le Diberder and Nathalie Coste-Cerdan [2] refer to it as an unknown social object, our societys immense and central object, which, unable to avoid, we stop perceiving, like the totem, expressing and concentrating all the hopes and fears of the modern tribe (1990: 12). TV in its more traditional as well as most interactive and crossmedia forms is, in Javier Callejos words [1], a media of multiple identities which accumulates socially represented attributes, such as of that of the audience who represents itself in front of the TV set. Beyond the specific devices and their technical specifications, my thesis is concerned with the concept of television that people constructs in their everyday relations with it: how do they watch, read, understand, love, hate, take into consideration and reject TV in their everyday lives. The object of study of this thesis is the television able to enable or make impossible, facilitate or prevent, limit or condition leisure experiences in relation to it. Interestingly, the concept of leisure shares its polysemic nature with that of television, to the point that it is difficult to determine what it is and what it is not leisure. Despite the rich and colorful development of interdisciplinary Leisure Studies during the 20th century, especially in the decade of 1990, a conclusive and universal definition of leisure remains elusive. Although leisure theorists adventure different definitions like the one proposed by Robert Stebbins Leisure may be defined as: uncoerced activity engaged in during free time, which people want to do and, in either a satisfying or a fulfilling way (or both), use their abilities and resources to succeed at this [15], or the one by the Institute of Leisure Studies, Broadly, Leisure comprises freely chosen experiences and actions, carried out in areas of freedom, without primarily utilitarian objectives, and which report satisfaction to the individual. [11], it is also generally accepted John Neulingers proposition that Perhaps it is best to realize that there is no answer to this question, or better, that there is no correct answer [12]. Furthermore, as Leisure can be studied from different paradigms, each perspective tends to highlight certain characteristics, while overshadowing others. Indeed, a too simplistic definition becomes problematic when confronted with the nuances of everyday human experience. Traditionally understood in terms of opposition with workleisure as rest and recuperation from work or as an antidote to the stresses

3. SCOPE AND OBJECTIVES


This research project aims at contributing to the Audience Studies with an analysis of the leisure experiences of different generations of Basque speakers that goes beyond the counting of time and the analysis of the textual content of the medium. I attempt to explore television consumption not as a leisure practice, but as a complex, multi dimensional leisure experience. The preliminary objectives and working hypotheses are introduced below:

36

Main objective: To explore the leisure experiences of different generations of Basque speakers in their relationships with a television in transition. Secondary objectives: O1: To explore the contributions of the Humanist Leisure perspective to the Audience Studies and vice versa . O2: To analyze the discourse of participating subjects in relation to their attitudes, emotions and feelings in the use of television in their everyday lives. O3: To identify and compare the key structuring elements of various age-groups relation with television and their conception of it. O4: To introduce a model for audience analysis that takes television as a framework for leisure experiences, beyond the perspective of leisure practice. This project is based on the following working hypotheses: i. Different people establish and develop different relationships with TV. Television viewers show different skills, goals, strategies and usages of TV, and they result in different experiences, including those of leisure. Both media and the way audiences engage with media are changing profoundly. Different generations have had different contacts with media that result in distinct expertise developments, which lead to different ways of engaging with media. The skills, goals, strategies and usages employed in TV consumption are different between age-groups, due to, among other factors, the difference in expertise with information and communication technologies, media literacy levels and personal and social agendas. These relationships are complex and varied and are not exhausted by quantitative audience measurements. TV-related leisure experiences do not necessarily start when switching on the TV set and do not end when it is switched off either. The experience of the subject of media leisure can be known through the analysis of her/his discourse.

The methodological ambition of the thesis is one of understanding, not of totality [16]. The in-depth case study of different generations of basque speakers will provide a "detailed examination of a single example" that "can be used in the preliminary stages of an investigation to generate hypotheses" but not limited to it [5]. This approach will enable a necessarily incomplete but intensive exploration of the meanings and pleasures found and built around television, following the ethnographic statement declaring that experience shows that intensive study provides understanding, while extensive doesnt [8]. The nature and complexity of the phenomenon of human experience requires a qualitative approach for its understanding. In the terms used by Chris Rojek [13], what is needed here is more an ideographic approach than a nomothetic approach: a non-generalizing methodology, more than a generalizing one. My interest lies in the construction of meaning and in the living of meaningful recreation, entertainment and leisure experiences. Therefore the object of study can only be approached through the narration of these experiences, as it only occurs within the person, never outside it. The importance, meaning and significance that the audiences grant to television can only be known through their own expression. The working language of this PhD thesis is Basque. The text itself will be written in Basque and in English, and the fieldwork (interviews and focus groups) will also be naturally conducted in Basque, although English, Spanish and French will also be employed when indispensable in order to clarify important aspects of the case study to the participants (especially in the case of migrant Basque speakers abroad). The techniques chosen for the collection of data through the case study will be the in-depth interview and the focus group (both homogeneous and heterogeneous in their composition regarding age-groups). Two pilot focus groups have been completed to this stage, one in Spanish and another one in Basque, which have helped pretest the interview and groupdiscussion scripts and to determine their limits and reach regarding the objectives proposed.

ii.

iii.

iv. v. vi.

5. PROVISIONAL INDEX OF THE THESIS


This PhD thesis will have three distinct parts: Section 1: Theoretical framework Chapter 1: Leisure and leisure experience Chapter 2: Television and audience studies Chapter 3: The media ecosystem Section 2: Analysis of the experiences of the audience Chapter 4: Methodology of fieldwork Chapter 5: Analysis and results of the Case Study Chapter 6: Comparison and typology of leisure experiences Section 3: The audience analysis model Chapter 7: The audience analysis model Chapter 8: Conclusions and recommendations At this point in time the index of the thesis is in a provisional state, as the author is in the process of drafting the theoretical section. The methodology of the fieldwork and the definition

Given that "Television does not mean what it once did" [14], we must follow that neither does the study of its audiences. While time and space have been the main parameters in the past century, but XXIst century Audience Studies must deepen in the leisure experience of individuals and communities. This is the concern that the present project has been initiated with, and the main contribution that it aims to make: to understand and to provide a model to approach television as an enabling, limiting, conditioning and changing reality of leisure experiences.

4. METHODOLOGY
The aim of this thesis is not to study the times and spaces Basque speakers watch TV (when, where, how much, how many times) the contents they consume (what, how, through which channels), or to generalize trends in Basque audiences, but to collect the audiences discourses in order to compare and understand their leisure experiences.

37

of the Case Study will be nurtured with the contributions of the present conference and redefined during in a predoctoral stay in during 2011.

[11] Maiztegui, C., Martinez, S. and Monteagudo, M. J. Thesaurus de Ocio. Universidad de Deusto, Bilbao, 1996. [12] Neulinger, J. The psychology of leisure. C.C. Thomas, Michigan, 1981. [13] Rojek, C. The labour of leisure: the culture of free time. SAGE Publications Ltd, , 2009. [14] Shimpach, S. Television in Transition. Wiley-Blackwell, Chichester, UK, 2010. [15] Stebbins, A. R. Choice and experiential definitions of leisure. Leisure Sciences, 27( 2005), 349-352. [16] Velasco, H. and Daz de Rada, . La lgica de la investigacin etnogrfica. Trotta, Madrid, 2006.

6. ACKNOWLEDGMENTS
I would like to thank my two supervisors, Dr. Iratxe Aristegi Fradua and Dr. Aurora Madariaga Ortuzar. Without their help this research project would not have been the same, as well as to Jone Goirigolzarri Garaizar for her invaluable help and constant support.I also want to thank all sixteen participants in the two focus groups that have constituted the first methodological pretest.

7. REFERENCES
[1] Callejo, J. La audiencia activa. El consumo televisivo: discursos y estrategias. Centro de Investigaciones Sociolgicas (CIS), Madrid, 1995. [2] Coste Cerdan, N. and Le Diberder, A. Romper las cadenas. Una introduccin a la post-televisin. Gustavo Gili, Barcelona, 1990. [3] Cuenca Amigo, J. and Landabidea Urresti, X. El ocio meditico y la transformacin de la experiencia en Walter Benjamin: hacia una comprensin activa del sujeto receptor. In VIII Congreso Vasco de Sociologa y Ciencia Poltica Sociedad e Innovacin en el Siglo XXI. (Bilbao). 2010, 33. [4] Cuenca, M. Las artes escnicas como experiencia de ocio creativo. In Cuenca, M., Lazcano, I. and Landabidea Urresti, X. eds. Sobre ocio creativo: situacin actual de las Ferias de Artes Escnicas. Universidad de Deusto., Bilbao, 2010, 13. [5] Flyvberg, B. Five misunderstandings about case-study research. In Seale, C., Gobo, G., Gubrium, J. and Silverman, D. eds. Qualitative Research Practice. Sage, London and Thousand Oaks, CA, 2004, 420-420-434. [6] Gillespie, M. Media audiences. Open University Press, Mainhead, 2005. [7] Haywood, L., Kew, F., Bramham, P., Spink, J., Capenerhurst, J. and Henry, I. Understanding leisure. Stanley Thornes, Cheltenham, , 1995. [8] Herskovits, M. J. Some Problems in Ethnography. In Spencer, E. F. ed. Method and Perspective in Anthropology. University of Minnesota Press, Minneapolis, 1954. [9] Landabidea Urresti, X. Hacia una aproximacin cualitativa a las experiencias televisivas de distintas generaciones. (2009). [10] Landabidea Urresti, X., Aristegi Fradua, I. and Madariaga Ortuzar, A. Aisiazko praktikatik aisiazko esperientziara: ezinbesteko berrikuntzak telebista-audientzien ikerketan. (2011).

38

Mobile TV: Towards a Theory for Mobile Television


Luis Miguel Pato
Dept. of Communication and Art University of Beira Interior Covilh, Rua Marqus d'vila e Bolama (+351) 275 319 700

luis13pato@gmail.com ABSTRACT
With people's successful adoption of mobile devices and the imminent change to Digital Terrestrial Television, the transition of TV to the emergent mobile scenarios is a foreseeable future. In this upcoming scenario viewing patterns and behaviors are defined by time dimension, place and social context. It is within these specifications that the transmission of personalized television through mobile phones is believed to have a tremendous end-user impact. In this doctoral investigation our aim will be to measure this aspect in a country where this emergent media is verifiable (Portugal). Our methodology proposes the apprehension of mobile televisions (mTV) reality through fundamental social and theoretical assumptions, interviews with Portugal's mTV markets basic players and a statistical evaluation of Portuguese's mTV's viewing motivations and consequent satisfaction levels. We intend to base our theoretical framework on the media gratifications theoretical perspective and through a laboratory sessions with three samples of mTV users/adopters in Portugal. Through this approach we believe it is possible to apprehend mTV's usage and current reality in Portugal and possibly its foreseeable future.

1. INTRODUCTION
Today, wireless communication networks are believed to be one of the fastest evolving issues in contemporary societies. From a media theoretical perspective McLuhans historic globalization desire is defined through todays general access to media content and technological devices to consume it without geographical boundaries. Therefore, it is understandable that the Mobile Phone is considered generally as the trademark of contemporary society. In Inniss terms we could say that it (this device and its impact) defines the society that we live in [1]. However, when we regard the possibility of consuming television through a mobile phone, it is a whole different story. First, it is important to understand that we are talking about the convergence of two successful case studies Television and this mediums consumption through an enhanced end-user mobile media experience in a cell-phone. In addition, with the deadline for the implementation of DTT (Digital Terrestrial Television) fixed in 01 of January 2012 and the progressive adoption of DVB H, mobile television (mTV) is defined as a fundamental killer application of TVs near future. We also have the evermore present use of YouTube to watch mTV. In this study we consider both of these realities as mTV. So, we could say that there are some issues to solve regarding the sources and forms of diffusing this type of television. Nevertheless, besides these issues, we regarded that most of the current trails and academic research are based on mere technical analyses. And, to our understanding, this aspect represents a serious problem because, in a world where wireless telecommunications emerge quickly, overlooking data provided by early adopters may be regarded as a problematic issue. Why? Well, through the media studies long history of researching and evaluating new mass media, one basic idea has always emerged early end user acceptance is precious. Therefore, our perspective is of a more social order because we believe: its about the people () not just the technology[2].

Categories and Subject Descriptors


H.4.3 [Communications Applications], H.51 [Multimedia Information Systems]

General Terms
Measurement, Experimentation, Human Factors, Theory.

Keywords
Mobile Television, Expectancies, Consumers, Portugal, Media Theory Satisfaction Levels,

2. BACKGROUND
Today, several realities are associated to the term: Mobile TV. Nevertheless, they all mean the same: television diffused through mobile platforms. When we look at the basics -these television contents are based on live broad-cast emissions (Pull) and Push and Store for a quasi PVR (Personal Video Recorder) television consumption experience. When it comes to methods of delivery we have satellite (DMB, GPRS) cellular

39

operators (UMTS, CDMA) and terrestrial (DVB-H and WiFi) [3], [8]. When it comes to the delivered television content, through the revision of the literature, we observed that the grids are basically completed with the re-emission of recycled broadcast television programs. The same thing happens in Portugal the country where our doctoral investigation shall be realized. We believe that consumers may not be satisfied with this issue. However, this is nothing new. Historically, we can regard that for example: the railway did not introduce movement, or the wheel or road () but it enlarged the scale of previous human functions, creating new kinds of work and leisure [4]. Therefore, we believe it is safe to assume that an experimental phase is needed when any new media is developed. We also believe that mTV is in this phase [5]. It still has unsolved identity issues [6], [7]. Currently, the mTV market is defined through the following realities of re-used broadcast television [5]: TV in your Pocket It is too pure and simply rebroadcast the television programs that are emitted first on conventional television programming grids [5]. If we look at contemporary mTV corporate options, we may observe that this concept can be characterized through the inherent idea of a promise of an individualized, personal television experience but without specific mTV content [5]. The user may personalize his television experience but the basis will always be the conventional reality. This kind of mTV rebroadcasting of regular television linear content is defined as: Simulcasting Linear TV [7]. Still in this topic, we can also retrieve the concept of Repurposed TV where existing content is recycled for the mobile medium with minimal adaptation basically it is the same content as what is aired on the regular TV grids; however, counterparts are split up into smaller segments or are cropped to better suit the smaller screens of mobile devices [9]. TV anytime, anywhere This concepts basis can be observed the release of the television viewers from the constraints of the obligation of consuming television in a specific place. This theoretical approach intends to highlight the consumers ability to control their medium to an extent in which they may choose how, where, when, and what kind of TV content they consume [10]. TV on the Go It promotes a fast-food idea of television. Therefore, there is the intention of emphasizing the differences among mobile and traditional TV viewing [5]. We agree with this perspective, however. Why? Because when we think about mTV we can observe that mobile devices are operated at arms length and continued viewing can cause eye discomfort and eyestrain [11]. Therefore, we also consider that television content for this type of television must have a short duration. This kind of content is mobile specific a necessary final step in the evolution of mobile television [9]. Shani Orgad considers that the mobile phones small screen, shorter usage duration, noisier usage environment should lead to a new visual grammar that will eventually be expressed through mobile specific content [5]. Enhanced TV In a similar mode as the one that happens with interactive television this perspective is what some authors define as an out of the box issue [12]. In simple terms this

proposal is based on the interactive possibilities that characterize television it regards the potential creation of innovative manners of including users and tailoring media contents to satisfy individual needs[5], [12]. In what regards specifically mTV there has been a large discussion on the potential that this reality has in providing a platform for user generated content [5]. Through these various stages, we feel that we are obliged to ask: Does mTV enhance a new television experience? Or, is it possible to conclude that: Conventional television interaction and consumption habits are not enhancing a new television experience? As an end-note, and in an attempt to answer these questions, we can say that currently the specificity of mTV content is the main discussion. However, currently linear content is an essential reality in the present and future of mTV content because currently this kind of television is regarded in a very McLuhanian manner as an extension of the classical media. Or as a parallel media reality that exists side by side with classical media [4]. However, one thing is sure: plenty of issues are still unsolved and therefore this reality is still unclear. We believe that television diffused through a mobile phone has should apprehend the unique benefits of this gadget [13]. Therefore, we believe that with the application of this kind of TV, consumers might not be satisfied with mTVs current reality Why? When we consider the specifications of mobile phone use we can see that they are fragmented and, therefore, their media consumption desires are individualized and divided through various fixed and mobile media platforms [14], [15], [18]. When it comes to its applications and content, we believe that mobile phone users demand interactive, flexible, enhanced, personal, and context-aware media realities [7]. And the same thing occurs with mTV [9]. So, we believe that we can define these aspects as: possible mTV expectations. So, can redistributed mTV content satisfy these needs?

3.

THEORETICAL

PERSPECTIVE

When we think about the motivations in which we can support the potential desire for Mobile TV (MTV), we believe it is important to look at this issue from a theoretical point of view. Therefore, we will observe this topic through the Uses and Gratifications Theory (UGT). The justification for the use of this theory is due to the simple fact that we suggest a theoretical approach based on investigating an active audience with their media and the UGT perspective considers this reality [11]. Its theoretical approach intends to understand consumers motivations and concerns in the context of media use. We approached this theory through previous investigations that we believe can define the mobile phones current converged reality Television [11], [19], [27], [26], [28], Internet [20], Digital Television [23]. Internet, Computer Mediated Technology [22], [24] and Cellular Phone [29], [21]. However, we observed that there does not exist any UGT study that investigates the Mobile Television (mTV) reality. Thus, this aspect gave this investigation the desired originality and the desired scientific contribution because we believe that we can expand this theories theoretical approach by including mTV reality. As an end-note, we believe that it is important to state that the selected UGT based studies all indicate that nontangible issues (of emotional nature) are, in fact, the most important elements regarding the expectations that consumers

40

have of their media. This aspect is fundamental to develop this doctoral studys inquires for the selected samples. Through the choice of the UGT theoretical concept, we observed that its application is going to help this investigation understand how and why consumers use their cellular phones to watch television. This aspect leads us to our investigation methodology. These aspects shall be regarded in the following part of this paper.

4.3.

mTV market expert session of interviews

4. METHODOLOGY
This investigation will have a fourfold perspective that is divided in the following phases:

4.1.

The revision of the literature

In this part our main intention is to identify the problem mobile television end-user expectancy and acceptance and explore it through various theoretical perspectives. Since mobile television (mTV) literature is scarce, we will retrieve all the important media and social academic investigations that are believed to be fundamental. We shall start through McLuhans perspective. We chose this approach because he was the first theoretician to identify the concept of the contemporary active audience. He implied that it desired constant connectivity and communication without any geographical boundary concern [4]. Since we are focusing on the social apprehension of a wireless technology and its progressive social changes, we shall also retrieve other social academic analyses. Given that we are evaluating the consumers mTV interaction, we selected the Media Effects Theory. The reason for this selection is because we will study mTVs effects on its consumers. However, as expressed earlier, we also intend data regarding end-user expectancies and satisfaction levels towards mTV consumption, the Uses and Gratifications Theory (UGT) suits this objective perfectly. This sub traditional premise of the media effects theory focuses on investigating the motives behind the selection of a certain mass media by consumers [11]. This theory also allows an experimental or quasi-experimental approach where the manipulation of the evaluated data to respond to the purposes of discovering motives and media selection patterns is possible [11]. And since this investigation includes a laboratory phase, this theorys perspective is necessary for the apprehension of the consumers expectancies and satisfaction levels towards the use of mTV. Besides this academic perspective, we intend to consult current academic journals, market reports and Whitepapers. This way a multidisciplinary theoretical perspective is always guaranteed.

The potential lack of professional based data regarding mTV will be completed with other sources of information experts in the field of mTV, for example. We believe that this approach is important to apprehend the mTV panorama. Therefore, we selected a panel of mTV experts that will be composed of two types of experts Portuguese Television and TV Production companies (RTP, SIC, TVI and Produes Fcticias) as well as professionals that are responsible for wireless networks companies (TMN, Vodafone, Sapo Sapo Mobile and Optimus). The causes for the selection of these experts lies in the fact that when it comes to mTV content production and progressive emissions, only the referred national televisions have ventured in this emergent market. Evidently the wireless companies are responsible for the support of these mobile emissions. Besides this corporate reality we might also include interviews with those experts regarding mTV that have demonstrated their expertise through academic and scientific publications in the fields that we are studying.

4.4.

Laboratory evaluation of mTV interaction

4.2.

Data analyses of previous mTV user studies

Since we are approaching an emergent technology in the media corporate scenario, we believe that it is important to analyze end-user studies conducted through mobile television trails. These investigations might help us apprehend the contemporary Mobile Television (mTV) reality from the markets perspective. However, in these scientific productions all of the narrated trails are industry driven events and thus, these studies results must be analyzed with caution. Nonetheless, and since we do not believe that this data may complete an mTV market perspective, we also intend to conduct semi-structured interviews with Mobile TV Channel/Project and mTV distribution Directors/ Project Managers. This leads us to the following investigation moment.

We believe that the previous research moments shall provide us with the necessary elements to develop two laboratory sessions with end-user samples. Thus, for these phases we shall select three samples of mTV users composed by teenagers, young adults and middle aged people. We estimate that we may need at least 200 individuals. Through these heterogeneous samples we believe that a general apprehension of Portugals mTV reality is possible. In the first part of the Laboratory sessions, we shall consider what variables (expectancies) may be defined as: unique expectancies for mTV consumption. So, before the first mTVs interaction moment, we will apply a questionnaire with closed questions based on estimating expectancies. These evaluation elements will result from previous UGT investigations regarding mobile phones, television and the Internet technical and media elements that are converged by current mobile phones and progressively mTV. After this first evaluation moment, each user shall watch a previously prepared current national mTV applications session with programs that reflect its current market offer in Portugal. The TV genres that will be evaluated are news and entertainment programs that are broadcasted through mTV emissions and other contents that have been downloaded previously to the mobile phone that we will use in these sessions. After this interaction moment, we shall apply to each user a second questionnaire to apprehend data regarding the satisfaction levels of the previous expectancies. Thus, through this approach, we believe we will outline mTVs unique expectancy variables. In both of these evaluation moments we will apply a nine point Lickert Questionnaire starting with Strongly Disagree and ending in Strongly Agree. Since we are dealing with emotional aspects, we will also apply an Osgood Semantical Scale in both of these evaluation moments to observe if a shift of the end-users opinion occurs. All collected data from these sessions shall be statistically evaluated by SPSS statistical software. Since we are still in an initial phase we can include a similar mTV end-user evaluation moment in an exterior environment thus excluding any eventual bias in the proposed investigation.

5.

CONCLUSION

As an end-note we could say that mTV assures new, engaging and customized promises of television experience. In

41

our investigation we will attempt to explain how this reality will occur in Portugal and evaluate mTVs usage and current reality from the end-users perspective. Through this approach we consider that we will achieve an identity of the Portuguese national mTV market from the end-users expectancy perspective and see if the current offer is satisfactory. Thus, we can propose some changes in this reality. Another foreseeable conclusion resides in the fact that mTV will be considered a parallel market regarding the national broadcast TV. Besides these points we also consider that an understanding if mTV enhances the cell-phones specifications or the mediums characteristics will be accomplished. Through the selected enduser based approach we believe we will apprehend the unique dimensions of the consumption of this type of TV in Portugal and thus expanding the UG theories development. However, since this paper represents a doctoral investigation that is now at its beginning, these aspects have yet to be proved through the scientific approach that we intend to develop in this study.

Ahonen, T. T. 2008. Mobile as the 7th of the Mass Media, London, Future Text Ltd. Ling, R., 2004. The mobile connection: The cell phone's impact on society, Oslo, Morgan Kaufmann Pub. Ling, R., 2008. New Tech, New Ties How Mobile Communication is Reshaping Social Cohesion, Cambridge Massachusetts, MIT Press. Pavlik, J. V.. 2008. Media in the Digital Age Columbia, New York, University Press. Personaz, J. J. I. 2006. Mtodos Cuantitativos de Investigacin en Comunicacin., Barcelona, Editorial Bosch S.A Levinson, P., 2004. Cellphone: The story of the world's most mobile medium and how it has transformed everything!, New York, Palgrave. Conway, J., & Rubin, A., 1991. Psychological predictors of television viewing motivation. Communication Research, 18(4), (Jul. 2010), 443. Ko, H., 2002. A Structural Equation Model of the Uses and Gratifications Theory. Ritualized and Instrumental Internet Usage. Retrieved from https://listserv.cmich.edu/cgibin/wa.exe?A2=ind0209&L=aejmc&T=0&O=D&P=2218 2. Leung, L., & Wei, R. 2000. More than just talk on the move: Uses and gratifications of the cellular phone. Journalism and Mass Communication Quarterly, 77(2), 308-320. Lin, C. 2001. Audience attributes, media supplementation, and likely online service adoption. Mass Communication and Society, 4(1), 19-38. Retrieved from http://pdfserve.informaworld.com/225045_778384746_78 5315275.pdf. Livaditi, J., Vassilopoulou, K., Lougos, C., & Chorianopoulos, K. 2003. Needs and gratifications for interactive TV implications for designers. In System Sciences, 2003. HICSS 2003. 36th Annual Hawaii International Conference. IEEExplore, Waikoloa, HI. Doi = http://doi.ieeecomputersociety.org/10.1109/HICSS.2003.1 174237. Papacharissi, Z., & Rubin, A. 2000. Predictors of Internet use. Journal of Broadcasting & Electronic Media, 44(2), 175196. Retrieved from http://pdfserve.informaworld.com/918041_778384746_78 3685029.pdf. Papacharissi, Z., & Zaks, A. 2006. Is broadband the future? An analysis of broadband technology potential and diffusion. Telecommunications Policy, 30(1), 64-75. Rubin, A. 1983. Television uses and gratifications: The interactions of viewing patterns and motivations. Journal of Broadcasting & Electronic Media, 27(1), (Jun. 2008), 37-51. Rubin, A., & Perse, E. 1987. Audience activity and television news gratifications. Communication Research, 14(1), 58. Rubin, A., & Rubin, R. 1982. Older Persons' TV Viewing Patterns and Motivations. Communication Research, 9(2), (Mar. 2010), 287. Chorianopoulos, K. 2008. Personalized and mobile digital TV applications. Multimedia Tools and Applications, 36(1), (Jun. 2008), 1-10.

6. REFERENCES
Innis, H. 2008. The Bias of Communication, Toronto, University of Toronto Press. Norman, D. 2002. The Design of Everyday things, New York, Basic Books. Knoche, H., McCarthy, J., & Sasse, M. 2005. Can small be beautiful?: assessing image resolution requirements for mobile TV. In Proceedings of the 13th Annual ACM International conference on Multimedia. (New York 2005). Multimedia '05. ACM Press, New York, NY, 1-10. DOI= 10.1145/1101149.1101331. McLuhan, M. 1995 Understanding Media: The Extensions of Man, Cambridge Mass USA, Cambridge Press. Orgad, S. 2006. This box was made for walking. London School of Economics., http://europe.nokia.com/NOKIA_COM_1/Press/Press_Eve nts/mobile_tv_report,_november_10,_2006/Mobil_TV_Re port.pdf. Carlsson, C., & Walden, P. 2007. Mobile TV-to live or die by content. In System Sciences, 2007. HICSS 2007. 40th Annual Hawaii International Conference. IEEExplore, Waikoloa, HI. Doi = 10.1109/HICSS.2007.382. O'Hara, K., Mitchell, A., & Vorbau, A. 2007. Consuming video on mobile devices. In Proceedings of the SIGCHI conference on Human factors in computing systems. New York DOI = 10.1145/1240624.1240754. Kumar, A., 2007. Mobile TV: DVB-H, DMB, 3G Systems and Rich Media Applications, Oxford, Focal Press. Marcus, A., Roibs, C., Anxo 2010. Mobile TV: Customizing Content and Experience Vol. 1., London - United Kingdom, Springer. Sdergrd, C. 2003. Mobile television-technology and user experiences. Report on the Mobile-TV project. VTT, Finland. Katz, E., Blumler, J., & Gurevitch, M., 1973. Uses and gratifications research. Berverly Hills California, SAGE Pub. Gawlinski, M. 2003. Interactive television production. London: Focal Press.

42

Enhancing and Evaluating the User Experience of Interactive TV Systems and their Interaction Techniques
Michael M. Pirker
ICS-IRIT 118, Route de Narbonne 31062 Toulouse, France 0033 (0) 561 55 77 07

Michael.Pirker@irit.fr ABSTRACT
This paper describes the focus of my PhD thesis on how to enhance and evaluate the User Experience (UX) of interaction technologies that are applied in interactive Television (iTV) systems. Interaction technologies for iTV systems are different from standard work on desktop interactions; my thesis will thus describe the following aspects: (a) the usage context (how iTV usage, e.g. in the living room, is differing from other usage situations), (b) the set of currently available methods on how to evaluate UX and (c) how to enhance the UX of interaction technologies for iTV systems. Given that UX evaluation methods and especially methods that support UX-oriented development are rare, the following research objectives were defined: to understand (1) How users UX concepts are related to interaction technologies that are used for iTV systems and how an interaction technology does contribute to the overall UX when interacting with an iTV system. (2) How usability and user experience are related in that specific domain (e.g. does the enhanced UX of a gesture based interaction really contribute to a positive UX in the long term, or is usability the key factor for a long term use). (3) How to inform the design and development process to improve UX of the interaction technique and the system (before a product is available), and finally (4) How the consumption of iTV content on a variety of devices (cross-device-usage) will change the overall UX. The main contribution of this phD thesis lies within the developed evaluation methods which should allow to better understand and evaluate the UX of iTV services and their respective interaction technologies in the future.

1. INTRODUCTION
The living room, sometimes called the campfire of the new age, is still one of the most central and important areas in the home. It is a place where people can relax, but also gather together and enjoy leisure activities including entertainment and games. Interactive systems used in the living room are receiving only limited attention from the HCI research community: While there is lots of work on how to improve future generations of games including UX measurement 0, as well as work on social media, personalization, recommendation, and communities, less work is dedicated to understanding the user experience of entertainment applications including interactive TV, especially when it comes to the evaluation of interaction techniques for the living room. Interactive systems that are mainly used in the living room are currently subject to a dramatic change: the ways how to consume TV and other media is changing due to new forms of interactive TV services including IPTV and new generation of TVs. The user does not only have the possibility to watch a certain amount of TV channels; new TVs and Set-Top boxes enable the user to e.g. access Internet on his TV, rent Video on Demand (VOD) movies, play games, access weather and traffic information, watch video clips, communicate with others and use apps. Concerning the interaction techniques, controlling the TV and its services is also changing: TV and entertainment services found in IPTV offers today are no longer controlled with a standard remote control, but also simply by the mobile phone, game-oriented input devices allowing motion control (e.g. Nintendo Wii, Free Box 6.0) as well as gesture recognition (Microsoft X-Box 360 Kinect). When measuring the UX of these new forms of interaction techniques the following problems occur: it is unclear to what extent the user experience of an interaction technique in the living room can be investigated in the same way as in other domains (e.g for a mobile phone). Same holds true for the comparison of user experience evaluation of games: are the same factors important for entertainment activities in the living room? Games are different to standard interactive TV applications, as they are not task-oriented, and typically focusing primarily on the fun aspect, which is not the case at e.g. a VOD service. But is the User Experience in terms of interaction with an interactive TV really comparable to a game? Is it comparable to a mobile phone interaction? Or will we simply fail to understand the User Experience in the living room when applying UX evaluation methods from other areas? We thus see the need to develop specialized methods that are appropriate for the evaluation of interaction techniques for iTV in the living room context. These methods subsequently can help to improve UX of interacting with iTV already in the design process and early development stages.

Categories and Subject Descriptors


H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous.

General Terms
Measurement, Human Factors

Keywords
User Experience Evaluation, iTV, interaction techniques

43

has also been proposed 0 to measure various aspects of UX both in different contexts and at different points of time.

2. STATE OF THE ART


Given our research goals and objectives, our research is focusing on UX and its evaluation, which will be briefly discussed in this section. A lot of effort has recently been put in by researchers and practitioners alike to find a clearer definition of UX and its evaluation methods 0, but nevertheless the HCI community has still no unified definition of UX. An ISO Standard defining UX exists (ISO 9241-210), but leaves a lot of room for interpretation: A persons perceptions and responses that result from the use and/or anticipated use of a product, system or service. The difficulties in getting a more refined definition of UX are caused by several reasons. UX is associated with a broad range of fuzzy and dynamic concepts 0 having a multitude of meanings, ranging from being a synonym for traditional usability to beauty, hedonic, affective or experimental aspects of technology usage. Additionally, the term UX is also influenced by several concepts from other areas, like fun, playability, or flow. Within this multitude of concepts, it has been pointed out 0 that the inclusion and exclusion of particular variables seem arbitrary, depending on the authors background and interest. For our research on understanding UX in the living room, we compiled a working definition of UX, based on definitions by Hassenzahl & Tractinsky 0 and Desmet & Hekkert 0: The user experience when interacting with an iTV system in the specific living room context is mainly influenced by: the subjective perception of the quality of experience that is elicited by the interaction of a user with the interactive TV system, which may change dynamically depending on the situational context of usage and time. Factors influencing the quality of experience include feelings and emotions that are elicited (emotional experience), the degree to which our senses are gratified by the system (aesthetic experience), meanings and values that are attached to the system, the perception of system characteristics like utility, purpose and usability, and how well these factors fit the current situational and temporal context. In the current literature, UX is described as being dynamic, context-dependent, and subjective (individual) 0. It highlights non-utilitarian aspects of interactions, shifting the focus to user affect, sensation, and the meaning as well as value of such interactions in everyday life 0. More generally, UX focuses on the interaction between a person and a product or service, and is likely to change over time and with an embedding context 0,0. A broad variety of UX evaluation methods is available today. To measure the user experience beyond the instrumental, taskbased approach, Hassenzahl introduced the AttrakDiff1 questionnaire. Approaches focusing on the evaluation of emotion and affect include approaches that evaluate the emotional state of the user with questionnaires, while other evaluation approaches include physiological measurements or evaluation of valence and arousal. To evaluate situational or temporal experiences, some approaches in mobile UX exist, using conceptual-analytical research and data gathering techniques 0. For prototypes, usability evaluation methods can be enhanced by including experiential aspects to the evaluations, e.g. experience sampling in long-term field trials 0. To be able to get a clear picture how UX changes over time, it

For the development and application of UX evaluation methods, it is important to start from a clear definition of UX 0 with an appropriate underlying model 0. The formal definition of UX issued by ISO suggests that UX can be measured in a way similar to the behavioral and attitudinal metrics of usability (i.e. users performance and satisfaction) 0. As a result of the still ongoing research to define the scope of UX, current methods, techniques and tools used to evaluate UX are most of the time taken from the large pool of traditional usability methods 0, thus established techniques such as questionnaire, interview, and think-aloud remain important for capturing self-reported data 0. For the development of a new UX evaluation approach it would be important to understand the relationship of UX to other factors which are important for the development of interactive systems. Especially usability seems to be connected to user experience and is likely to be a sub-factor within UX 0, which also matches our position (cf. working definition above), while others 0 just see it as a source of product experience. Based on our research goals and objectives, other research topics that will be investigated within the thesis, but will not be further discussed in this paper due to page limit constraints, include: the evaluation of interaction techniques; research about influences of the usage context, e.g. for cross-device usage; design- and development methods which are supporting UXmodels; and models that explain the interrelation of usability and UX.

3. RESEARCH PROBLEM
The goal of the presented research is to develop a set of methods to better capture the UX of interaction technologies, as well as entertainment services and systems in the living room, focusing especially on interactive Television (iTV). The living room itself incorporates a special usage context and serves many specific usage situations during leisure time. This includes various aspects of entertainment and social activities, where different usage situations are arising when using different devices, some of them passive and laid-back, others requiring active usage and participation. The factors context and usage situation heavily influence the user experience when interacting with an iTV system; while the users likely wants to change the volume simply by pressing a button on the remote control blindly while being immersed in watching a movie, other activities, especially games-related ones, may be enhanced by performing gestures to interact with the user interface (as can be observed with recent developments for games with gesture input e.g. Microsoft Kinect). The major problem is that currently available UX evaluation methods do not support various aspects that we are interested in our research e.g. factors related to the properties of the remote control or the interaction technique itself. Evaluation of UX in games showed that user experience can be quite independent of usability. While games have to provide a minimum degree of usability (e.g. possibility to control the game), it is just a sub-factor amongst other factors within UX (e.g. presence, involvement, and flow 0) that seem to shape the UX more intensely and gain a lot of importance once a certain level of usability is given. UX evaluation in games today includes a broad variety of factors, one being playability 0 amongst others. In the context of the living room, it can be assumed that different factors are of importance and influencing the media usage and UXs than in a work environment: e.g. voluntariness or mood may be named as a major difference between work and leisure.

Seealso http://www.attrakdiff.de/AttrakDiff/Publikationen/

44

Another important aspect that has to be kept in mind, especially when focusing on the evaluation of interaction technologies in the living room, is the differentiation between the content that is delivered via the means of a certain device and the usage experience of the device itself. Existing evaluation methods tend to focus on either a certain aspect of UX or still on basic usability targets 0. Combined methods (e.g. Attrakdiff) exist but seem to lack some aspects of importance for our focus area, the living room and iTV, like haptic properties of the remote control that could influence UX. Another question is if the UX evaluation should be included in the usability evaluation or whether it should be evaluated separately or not and if, when, and how. Thus, the research focus should be on the identification, analysis and evaluation of factors that are important and contributing to UX in this specific context of use, the living room, if possible at the real location of usage and within a normal usage situation, keeping in mind and being adaptable for recent and future technological changes as well as changes in usage situations.

4. RESEARCH GOALS
The research objectives are: to understand (1) How users UX concepts are related to interaction technologies that are used for iTV systems and how an interaction technology does contribute to the overall UX when interacting with an iTV system. (2) How usability and user experience are related in that specific domain (e.g. does the enhanced UX of a gesture based interaction really contribute to a positive UX in the long term, or is usability the key factor for a long term use). (3) How to inform the design and development process to improve UX of the interaction technique and the system (before a product is available), and finally (4) How the consumption of iTV content on a variety of devices (cross-device-usage) will change the overall UX. This leads to the research goal, which is to develop a set of methods to better capture the UX of interaction technologies, services and systems in the living room, focusing especially on iTV. The methods should fit the living room context and properly incorporate factors that are important to evaluate the UX of media usage and interaction technologies for this context from a users perspective. These methods should allow evaluating UX of a system and its accompanying interaction technologies quickly and easily applicable during product development as well as for existing products. The set of methods developed within this phD thesis are aiming to be general enough to be applicable for various devices and interaction technologies, taking into account recent and future technological changes, while at the same time being focused enough to still properly grasp the UX of media and interaction technology in the living room. This will be approached by thoroughly choosing and addressing UX factors that seem to have high importance and impact in this context of usage, identified within current UX literature as well as during studies focusing on this issue. The methods thus do not claim to provide a comprehensive evaluation of the multi-facetted construct of UX, but are rather trying to provide valuable insights for our small area of research.

In previously conducted studies, we already compared field usability studies to lab usability studies, where we evaluated the same system in both conditions 0. Within the field study, we already addressed the topic of user needs during the preinterview in order to identify important aspects from a users perspective. During the study, participants stated that they wanted the system to be easy to handle, user-friendly, and without the need of an operating manual. Other user needs stated were individualization and safety issues, as well as the reduction of devices via an all-in-one device. UX has been evaluated in this trial using the AttrakDiff questionnaire. Concerning the evaluation of Interaction Technologies, we also conducted a lab study, comparing touch-based to button-based interaction (using the same remote control shape and functionality) and investigating the relation of user experience and usability 0. iTV usability might still be an important factor in the early usage phases of the system (allowing to access content), but user experience is becoming more and more important. When investigating the relation between usability and user experience, it has been noted that for the compared product, a remote control, good usability values do not necessarily impose a better UX, and low usability values can at the same time lead to high UX ratings. As a result of a high rating of hedonic quality and a good assessment of the touchbased interaction technology, it is concluded that product design as well as visual appeal are influencing the users willingness to use a product.

5.2.

Studies to Identify Major UX Factors

5. METHODOLOGY
In order to identify factors that are influencing UX, a literature review has been conducted as a first step to get an overview on concepts, evaluation methods and related work, followed by research conducted to identify factors from a users perspective.

In order to address the research goals and get a better understanding of what UX concepts and factors are important for the evaluation of an iTV system in the home, two ethnographically oriented studies have been conducted in 2010. Within these studies, the question which factors are contributing to a positive UX was addressed in order to identify factors that are really important from a users perspective and in the real usage context in the home. The studies were conducted in two different countries with overall 69 participating households and 179 participants (149 adults). Besides other topics that are beyond the focus of this paper, factors influencing the UX of media usage in the home and especially in the living room were addressed and led to first insights for the further development of the UX evaluation method. The factors aesthetic experience (including visual and haptic experience), utility, purpose, the elicitation of emotions, functionality and usability were the UX factors that were most stated and relevant for our context of research. Other UX factors that were not named directly but observed during analysis were the need for stimulation and identification, as well as the contextual factors time, place/situation, social influences and whether a device is perceived as personal or not. Others, e.g. the need for diversion, were omitted because they are more content- and not interaction technology related. Also the need for relatedness, respectively its fulfillment, was only observed for technologies that allow communication features and may thus be neglected for our research focus; nevertheless it may gain importance when new services that are offering communication features will reach a mass audience in the iTV sector. Additionally, based on the identified UX factors, our conclusion is that media content is not that much interfering with the evaluation of the iTV system, services and interaction techniques, especially when combining expert and user oriented evaluation, and thus might be neglected, as the influences of the mediated content on the UX are beyond our research focus.

5.3.

Current State and Future Work

5.1.

Previous Work
45

At the moment, the findings gathered during the ethnographic studies 0 0, combined with those identified in the literature and in previous studies, are used to develop an UX questionnaire for our domain as a first step, which is currently subject to first

evaluations and examination of its validity within user tests. It focuses on the UX evaluation of interaction technologies in the living room, and should allow investigating and measuring UX factors already in early design phases. The preliminary version of the questionnaire and the underlying framework will be presented at doctoral consortium. As described previously in the methodology section, first steps have already been taken within the thesis, a first version of the methodology is developed and in the course of being evaluated. At this point, the thesis has progressed far enough to present first results and receive feedback from the community that is working within the same or related areas to further improve the research within the remaining one to one and a half years of the phD thesis. The doctoral consortium thus should serve as a forum to provide valuable feedback for the further development of the UX methods. Especially interesting would be feedback about the methodology chosen, also regarding the question if all important aspects of UX can be addressed accordingly with a questionnaire and the viability and requirements for expert evaluation. Additionally, feedback about the UX factors identified and how to incorporate other UX factors, how methods could be further combined, what benefits they could offer in the development process and which insights the methods could provide would be interesting topics for further discussion. The community of the conference seems to be an ideal possibility to further discuss potentials of the proposed approach and methodology, possible drawbacks and areas where further investigation might be necessary. The next steps of the thesis will include the development of expert guidelines that can be used in the tradition of heuristic evaluation to understand if and to what extent future systems support major UX factors. Here the application and adaption of evaluation methods taken from structural and functional playability 0 seem reasonable, as they are already addressing functional (i.e. more usability-related) as well as structural (i.e. more aesthetic-related) concepts which have a substantial interconnection to current UX concepts and models. These guidelines should offer valuable benefits for the fast-paced industrial product development cycle, where other means of UX evaluation may not be appropriate due to project time constraints or time and manpower needed to carry out the evaluation.

7. REFERENCES
Bernhaupt, R. (Ed.) 2010. Evaluating User Experience in Games: Concepts and Methods. London: Springer. 2010. Bernhaupt, R., Pirker, M., Weiss, A., Wilfinger, D., Tscheligi, M. 2011. Security, Privacy, and Personalization: Informing Next Generation Interaction Concepts for Interactive TV Systems. ACM Comp. in Entertainm. In press Desmet. P. M. A. & Hekkert. P. 2007. Framework of product experience. Intern. Journal of Design. 1(1), 57--66. Hassenzahl, M., and Tractinsky, N. 2006. User Experience - a research agenda. In: Behavior & Information Technology, 25(2), (2006) pp. 91--97. Hassenzahl, M. and Roto, V. 2007. Being and doing: A perspective on User Experience and its measurement. Interfaces, 72, 10--12. Jrvinen, A., Heli, S. and Myr, F. 2002. Communication and Community in Digital Entertainment Services. Online http://tampub.uta.fi/tup/951-44-5432-4.pdf Law, E.L.-C., Roto, V., Hassenzahl, M., Vermeeren, A., Kort, J. 2009. Understanding, scoping and defining user experience: a survey approach. In: Proc. CHI09, 719728. Law E.L.-C., and Van Schaik P, 2010. Modelling user experience - An agenda for research and practice. Interacting with Computers, 22 (5), pp. 313-322. Pirker, M., Bernhaupt, R. and Mirlacher, T. 2010. Investigating usability and user experience as possible entry barriers for touch interaction in the living room. In Proc. EuroITV 2010. ACM, New York. 145-154. Pirker, M. and Bernhaupt, R. 2011. Measuring User Experience in the Living Room: Results from an Ethnographically Oriented Field Study Indicating Major Evaluation Factors. In Proc. euroiTV 2011. Accepted. Roto, V., Ketola, P. & Huotari, S. 2008. User Experience Evaluation in Nokia, in CHI'08 Workshops, 3961--3964. Takatalo, J., Hkkinen, J., Kaistinen, J. and Nyman, G. 2010. Presence, Involvement, and Flow in Digital Games. In Bernhaupt, R. (Ed.) 2010. Evaluating User Experience in Games: Concepts and Methods. London: Springer, p. 2346. Vermeeren, A., Law, E.L.-C., Roto, V., Obrist, M., Hoonhout, J. and Vnnen-Vainio-Mattila, K. 2010. User experience evaluation methods: current state and development needs. In Proc. NordiCHI 2010, ACM, 521-530 Wilfinger, D., Pirker, M., Bernhaupt, R., and Tscheligi, M. 2009. Evaluating and investigating an iTV interaction concept in the field. In Proc. EuroITV '09, 175-178. ACM.

6. CONTRIBUTION AND CONCLUSION


To sum it all up: for the evaluation of UX of interactive TV systems and the respective interaction technologies, factors from other areas like gaming and mobile usage, as well as product related factors are important. Based on the factors that we identified in several studies and the literature, a set of methods is being developed that allows investigating and measuring these factors already in early design phases. The current approach is to use method triangulation with a questionnaire as a first step, including evaluation of the user interface, the interaction technique and the orthogonality between interaction and user interface, which will be followed by guidelines for expert evaluation in the future. The main contribution of this phD thesis lies within the proposed framework and evaluation methods in order to better understand and evaluate UX of interaction technologies for iTV and its services in a living room setting. The research conducted to identify the UX factors in this setting will contribute to a better understanding of which aspects are really important in this context, which influencing factors might change the UX and which factors should be included in an UX evaluation method for interaction technologies in the living room. The UX evaluation methods will offer the possibility to quickly and easily evaluate UX within the whole product design and development cycle.

46

Subjective Quality Assessment of Free Viewpoint Video Objects


Sara Kepplinger
Institute for Media Technology Ilmenau University of Technology 98693 Ilmenau, Germany 0049 (0) 3677 69 2671

Sara.Kepplinger@tu-ilmenau.de

ABSTRACT
This paper presents an overview on the intended contribution to quality assessment of free viewpoint video representations in the video communication use case within the authors PhD proposal. This proposal will analyze opportunities and obstacles for free viewpoint video objects usage within video communication systems focussing on subjective quality of experience. Quality estimation of emerging free viewpoint video object technology in video communication has not been covered yet and adequate approaches are missing. The challenges are the definition of quality influencing factors, the formulation of a measure, and to link quality evaluation up with technical realization. The paper outlines a description on the theoretical background and intended work. A short description of the related project Skalalgo3d, which offers a useful application framework for the intended work, is included. Preliminary outlined results consist of a tentative research framework, and evaluations conducted so far.

Categories and Subject Descriptors


H.5.1 [Multimedia Information Systems]: Evaluation / methodology

General Terms
Algorithms, Measurement, Design, Experimentation, Human Factors

Keywords
Free viewpoint video, video communication, methodology, quality of experience

1. INTRODUCTION
Free viewpoint video applications enable the user to navigate interactively and freely within a visual real world scene representation. Applications, like free viewpoint choice on DVD, or similar approaches on TV or online, gain more and more attention in the field of interactive media. Free viewpoint video objects (or 3DVOB) usage within the context of video communication may offer sociability and communication support. This can be achieved by technical possibilities to over-

come the obstacles of absent eye contact, or freedom of choice regarding the viewing angle and distance to the dialog partner, for example. These are activities which are possible and usual in real face-to-face conversations. There are different approaches to realize this way of representation using multiple views of the recorded scenes. This complex processing chain can be regenerated in different ways of acquisition, processing, scene representation, coding, transmission, and presentation. This paper is describing the planned efforts within the PhD proposal in order to pay more attention to the users perception of these new visual representations allowing interactivity. One goal is to define an extended model or an absolute measure for overall quality including subjective quality assessment. Therefore, the correlation between the used algorithm(s) and achieved quality will be considered. The opportunity of this approach is to gain further insights which may be useful for system adaptivity and processing scalability. The challenges of this scheme are mainly (still) open questions about novel algorithms for image analysis and synthesis on one hand, and the development of evaluation and measurement methods of visual quality on the other hand. This emergent field of research is influenced by several different approaches in both: image processing (e.g. [13], [16]), and the inclusion of subjective quality assessment for overall quality estimation (e.g. [17], [12]). In the following, the most relevant work for the authors PhD proposal will be outlined, starting with a short introduction into the technical background. This proposal focuses on the quality assessment within the described technical context and use case. Two main questions are being addressed: How to include the subjective quality estimation by the user? How to identify the most relevant quality influencing factors in order to provide an extended quality model supporting technical optimization? This is outlined in the following way: The problem which will be worked on in this research project is stated in section 2 by explaining the theoretical starting point and intended goals. This is followed in section 3 by a general description of the project Skalalgo3d, approaches chosen within the project, and at related work. Section 4 describes the planned methodological approach. First evaluation steps and preliminary results of previous research will be outlined in short in section 5. This is concluded by section 6 Discussion leading to future work.

2. STATEMENT OF THE PROBLEM


The theoretical starting point is published research concerning the definition of a quality measure for free viewpoint video objects which was developed at the Ilmenau University of Technology [11]. This measure includes definitions of influencing quality parameters as well as measurable characteristics based on objectively quantifiable errors. It is clearly outlined in the description of the measure, called 3DVQM, that it is open in terms of extension by subjective

47

quality estimation and the definition of to-be identified quality influencing factors and their evaluation [5]. Initial efforts towards an extension of measurements by subjective quality of experience were already made under the usage of synthetic free viewpoint video objects [3]. However, up to now, subjective assessment of natural free viewpoint video objects and the resulting user experience received only little attention and demand more efforts in incorporating early user inclusion [11] (see also section 3. Related Work). A video communication use case is used as a framework regarding eye-contact and other communication based factors. Based on this, and on the technical further development in terms of processing steps, the aim is the identification of further quality influencing factors based on subjective quality assessment. This will lead to the definition of terms and quantifiers and proposed patterns of application of the results to prospective technical developments. Throughout the authors experience, literature analysis, and work within the project, a number of questions arose for the PhD proposal: What kind of methodology accounts for reliable subjective quality assessment of free viewpoint video objects in the particular use case of video communication? Which further factors influencing free viewpoint video object quality can be identified? To which extent do factors, identified by means of subjective quality assessment, influence the overall quality of experience? How can the identified factors benefit prospective technical development and processing algorithms of free viewpoint video objects? These questions within this interdisciplinary approach address mainly methodological questions and practical realization within the area of human computer interaction as well as intended impact on processing development. The novelties the PhD intends to bring about are definitions of (further) quality influencing factors, the way of linking quality evaluation up with technical realization of free viewpoint videos, and therefore the formulation of an adequate measure.

[5]. They can be either model based or based on disparity analysis or a combination of both. The differences are due to different usage of interpolation, warping, morphing and the recording of several different camera views. The technical development within the project Skalalgo3d is based on the following actual processing chain as shown in Figure 1. This includes the software usage of MATLAB and the projects internal ReVOGS (Realistic Video Object Generation System). A person is recorded by at least two ordinary webcams. This is followed by the processing of first representations out of the recorded scene. This may include adequate pre-processing like colour correction, keying, and calibration. Thereof, second representations are generated by rectification and analysis for accurate disparity determination.

Acquisition
ReVOGS two stereo- and two groundtruth cameras

Calibration
MATLAB (p.r.n.

Rectification
MATLAB or ReVOGS

Color Correction

ReVOGS) Internal and external camera parameters

MATLAB

Keying
Manually (Combustion),
later: automatic algorithm

Analysis
Disparity Determination with MATLAB and ReVOGS

Synthesis

ReVOGS

Figure 1. Current status of 3DVOB generation in Skalalgo3d After this, the view synthesis leads to the intended 3DVOB, provided by different and new views. There are different approaches available for the view synthesis, the disparity analysis and refinement, as well as for the usage of 3DVOB for eye-contact support. They are outlined in following sub-sections.

3. RELATED WORK
The work related to the PhD proposal consists of three main topics. Once, there are the evaluation approaches with similar goals. Then, there is the technical realization of free viewpoint video objects in general and their usage for eye contact support in video communication. Preliminary, the project Skalalgo3d is described in this section as the PhD proposal arose within the framework of this project.

3.2.1. Different disparity and synthesis methods


Within the development process of the most adequate algorithms to create a qualitative 3DVOB representation, different approaches, concerning disparity and synthesis methods, are considered. These approaches differ in their costbenefit ratio. Table 1. Summary of used disparity and synthesis methods Processing step View Synthesis Disparity analysis Disparity refinement Different methods used Linear Linear interpolation interpolation plus median filtering Windowed NCC cost measure with / without hole filling after cross check with / without temporal cleaning

3.1.

Skalalgo3d

The project Skalalgo3d (Scale able algorithms for 3D video objects under consideration of subjective quality factors) intends to improve free viewpoint video objects and eye contact support used within the context of video communication. The project work of Skalalgo3d is divided into two general working areas. These are the technical realization of free viewpoint video objects and the identification of subjective quality factors. This concerns the optimum processing as well as qualitative displaying under different conditions. It is funded by the German Research Foundation (DFG).

3.2.

Technical Realization of 3DVOB

In general, the procedure of 3DVOB generation starts with the acquisition of a time variable and a three dimensional object. The methods of the reconstruction processes differ in principle

In Table 1 a summary of up to now used methods is given. The view synthesis is either only done by accounting the neighbour pixel, or a classical equalization filter is used additionally, in order to reduce the so called salted pepper noise, a visual disorder. The disparity analysis undertaken for the test items is realized by the classical usage of a cost based measure. Within

48

the refinement differences are made by the usage of a hole filling filter or temporal cleaning. These different approaches may result in differently perceived quality concerning representation. A more detailed description on the analysis of disparity in general is outlined in [13].

3.2.2. Support of eye contact in video


communication
The problem of usual video communication systems is the impossibility of eye-contact.

(mainly associated with user interface design) already worked on measurement of user experience and user acceptance also in a pre-prototype stage of product development (e.g. [9]). Activities are being undertaken in order to clarify the different usage of efforts to understand the quality of experience of new technologies as summarized in [4]. Research on free viewpoint video object technology up to now mainly regards the technical feasibility. Hence, in this particular emergent field of research user inclusion did not attract much attention up to now. Specific approaches are available concerning the evaluation of video quality in different usage contexts by means of subjective (e.g. [5]) as well as objective measurement (e.g. [6]). However, the quality estimation of emerging free viewpoint video object technology in video communication has not been covered yet [14].

4. METHODOLOGY
First and foremost, the goal of the PhD effort is to identify factors of subjective quality experience (e.g. disturbing fringes, recognized holes, missing eye contact...) and their respective extent of impact as formulated into a measure. To achieve this, an adequate method needs to be defined. There are several methodological approaches paying little attention to subjective factors, besides evaluation efforts on objective measurements (e.g. concerning system processes) in early system development phases, as described in section 3.3. As a consequence, the first step within the proposed evaluation framework is an explorative approach in order to deduct non-critical factors and to define a range of applicable methods for evaluation. In a second step, the application of an - at that time specified - applicable method to collect data about quality influencing subjective factors is verified. The final step is the formulation of a most adequate methodological approach providing a quality measure. This measure provides the possibility of a mathematically formulation of quality influencing factors derived from the users perception and therefore may be able to be integrated into the technical processing chain (e.g. in a form of a perceptual coder or something similar).

Figure 2. Problem of eye-contact in video communication The user either has to look on the display to get information or to look into the camera in order to simulate eye-gaze (see Figure 2). Eye-contact is seen as a critical factor in the fields of communication, psychology, and sociology. In 1976 [1] analyzed the role of gaze and mutual gaze in conversations and communication. A possibility offered by the usage of free viewpoint video objects is to support eye-contact via video communication on computer, television, or mobile devices. This can be realized by ways of eye-adjustment or the use of the so called Wollaston illusion by adjusting the displayed persons position (without manipulating the eyes). Skalalgo3d allows this support by a virtual camera positioning. This approach is described to some extent in [8]. There are already some approaches published concerning eye-contact support. In [10] an approach of virtual view image synthesis for eye-contact in TV conversation system is described. In [15] the effects of gaze direction and depth on perceived eye contact and perceived gaze direction compared between 2D and 3D display conditions are described. In [2] the role of eye gaze in avatar mediated conversational interfaces is analysed. One possible approach to technically realize eye contact via a camera/display system for videophone applications is described in [7].

5. PRELIMINARY RESULTS
As a first approximation to the described topic several methods were conducted in 2009 in order to gain more information. Expert interviews, focus groups, and online questionnaires were held to collect information on possible pre-experiences and users ideas about possible free viewpoint video object usage. In 2010 a methodological framework, in cooperation with the Institute of Psychology at the University of Salzburg, Austria, was established by the systematic evaluation of pre-produced test items. The goal was to detect to what extent the resulting quality of different processing steps was acceptable and to examine subjective factors influencing the experienced quality. This included the experience of eye-contact and the measurement of possible influence by several characteristics (e.g. appeal, trustworthiness) of the shown conversational partner or different conversation contexts (private talk vs. professional conference). The test items were different free viewpoint video objects (produced by the usage of the described processing chain) showing four different people (two men, two women) resembling a possible video communication partner. With the conduction of this study a total of 322 data sets were collected. The data collection was carried out within three weeks in November 2010 in a laboratory providing a standardized environment (i.e. lightning conditions) in five separate rooms with personal computers and 19 LCD displays. Table 2 summarizes the design of the evaluation study and shows the different pre-defined independent variables.

3.3.

Existing Evaluation Methods

Objective quality measures compute metrics representing compare able reference values, mostly focused on technical feasibility. Several defined measures, parameters, and assessment methods are available in order to rate general video quality objectively [17]. Subjective quality measurements intend to include the users and their opinion. This is expressed for example via judgment (e.g. yes/no) or adjustment (e.g. user changes influencing factors and chooses the preferred outcome) resulting in a measure (e.g. mean opinion score) representing this judgments. However, developments, especially in the field of 3D technology, ask for the inclusion of more sophisticated subjective measures in order to reach an adequate consideration of human perception which may differ from the objective rating [6]. In the context of 3DVOB quality assessment, preliminary defined subjective quality factors derive for example from occlusion, distortion, and shape, as outlined in [11]. However, the identification and the tighter definition of the extent of influences ask for further exploration. This is intended by the authors PhD proposal. Researchers of related research areas

49

Table 2. Combination of different evaluation variables Technical items Usage context Content shown Test items (i.e. videos, 10 sec.), with/without eye-contact, different view synthesis and disparity analysis (Table 1) Private talk with Professional talk friend to adviser
Man Woman Man Woman

videophone applications using a conventional direct-view lcd. SID 1995, Digest. Korn, T. 2009. Kalibrierung und Blickrichtungsanalyse fr ein 3D Videokonferenzsystem. Diplomarbeit, Technical University Ilmenau, Ilmenau, GER. Law, E., Roto, V., Hassenzahl, M., Vermeeren, A., Kort, J. (2009) Understanding, Scoping and Defining User Experience: A Survey Approach. In Proc. Human Factors in Computing Systems, CHI09. April 4-9, 2009, Boston, MA, USA. Murayama, D., Kimura, K., Hosaka, T., Hamamoto, T., Shibuhisa, N., Tanaka, S., Sato, S., Saito, S. 2010. Virtual view image synthesis for eye-contact in TV conversation system. In: Conference Proceedings of Electronic Imaging, (17-21 January 2010. San Jose Convention Center, San Jose, California, United States) San Jose, USA, Paper No.7526-11. Rittermann, M. 2007. Zur Qualittsbeurteilung von 3DVideoobjekten. Dissertation, Technical University Ilmenau, Ilmenau, GER. Satu Jumisko-Pyykk, Dominik Strohmeier, Timo Utriainen, and Kristina Kunze. 2010. Descriptive quality of experience for mobile 3D video. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (NordiCHI '10). ACM, New York, NY, USA, 266-275. DOI=10.1145/1868914.1868947. Scharstein, D., Szeliski, R., 2002. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. In: International Journal of Computer Vision, 47(1-3):7 42, 2002. Schreer, O., Kauff, P., Sikora, Th., (eds.) 2005. 3D Video communication: Algorithms, concepts and real-time systems in human centered communication. John Wiley & Sons, Ltd., UK. Van Eijk, R., Kuijsters, A., Dijkstra, K., IJsselsteijn, W., A. 2010. Human Sensitivity to eye contact in 2D and 3D videoconferencing. In: Proceedings of the QoMEX 2010, Second International Workshop on Quality of Multimedia Experience (June 2123, 2010. Trondheim, Norway), Trondheim, Norway, Paper No. 76. Weigel, C., Schwarz, S., Korn, T. 2009. Wallebohr, M., Interactive free viewpoint video from multiple stereo. In: Proceedings of the 2009 3DTV Conference: The True Vision Capture, Transmission and Display of 3D Video (3DTV-CON 2009, Potsdam, Germany, May 4-6, 2009). IEEE Xplore, DOI = 10.1109/3DTV.2009.5069663. Winkler, S. 2005. Digital Video Quality Vision Models and Metrics. John Wiley & Sons, UK.

With the possibility of collecting this amount of data sets every possible test setting contains at least 15 data sets. The settings vary in the combination of the different evaluation variables. For first evaluation activities, a set of different test items were created with the usage of the described different methods of view synthesis and disparity analysis (see also Table 1 in section 3.2.1).

6. DISCUSSION AND FUTURE WORK


There are several open questions concerning the identification of subjective quality factors and their measurement. In a first step, within the framework of the authors PhD proposal, data is collected providing a basis for explorative analysis. The analysis of the preliminary data will be organized in three different phases. First of all, quality influencing factors will be identified via categorization and correlation analysis paying attention to the different evaluation variables (Table 2). A weighing of the identified factors leading to a list of influences on quality will be carried out in a second step. This is followed by the subtraction of non-critical factors and the conception of a further evaluation in May 2011. Therefore, it is planned to allow the user to define him or herself the best combination of provided processing steps in order to assess a free viewpoint video object representation with the best experienced quality. The identification or development of an ideal methodology in order to reach above mentioned goals is seen as accompanying needed effort and therefore main part of the overall result. Results will be published gradually within conference publications, the intended PhD work, and within the project report of Skalalgo3d until the end of February 2012.

7. REFERENCES
Argyle, M., Cook, M. 1976. Gaze and mutual gaze. Cambridge University Press, New York, USA. Colburn, A., R. 2000. The Role of Eye Gaze in Avatar Mediated Conversational Interfaces. In: Avatar Mediated Conversational Interfaces, Technical Report MSRTR2000-81, Microsoft Research, Microsoft Corporation, One Microsoft Way. Fan, F., 2008. Analyse von Qualittsparametern von 3DVideoobjekten. Diplomarbeit, Technical University Ilmenau, Ilmenau, GER. Geerts, D., De Moor, K., Ketyk, I., Jacobs, A., Van den Bergh, J., Joseph,W., Martens, L., De Marez, L., 2010. Linking an Integrated Framework with Appropriate Methods for Measuring QoE. In: Proceedings of the QoMEX 2010, Second International Workshop on Quality of Multimedia Experience (June 2123, 2010. Trondheim, Norway), Trondheim, Norway, Paper No. 158. ITU-R BT.500-11. ITU Recommendation. 2002. Methodology for the subjective assessment of the quality of television pictures. ITU. Jumisko-Pyykk, S., Reiter, U., Weigel, C., 2007. Produced quality is not perceived quality A qualitative approach to overall audiovisual quality. IEEE Xplore. Kollarits, R. V., Woodworth, C., Ribera, J., F., and Gitlin, R., D. 1995. An eye contact camera/display system for

50

Allocation Algorithms for Interactive TV Advertisements

Ron Adany
Department of Computer Science Bar-Ilan University Ramat-Gan 52900, Israel

adanyr@cs.biu.ac.il

ABSTRACT
In this research we consider the problem of allocating personalized advertisements (ads) to interactive TV viewers. We focus on the optimization problem of maximizing revenue while taking into account the special constraints and requirements of the TV ads industry. The research is part of studies towards a Ph.D. in Computer Science currently in the third year of the four year planned study. In this paper we define the research problem, present the achievements attained to date, detail the research plan and discuss the contribution of the work.

1.

INTRODUCTION

Personalization is the next-generation in the world of advertisement. It is very attractive to all players. From the commercial companies perspective, it offers the possibility to tailor their advertisements to specific audiences and to ensure that the target population receives the desired ads in the desired format. From the standpoint of the service suppliers, i.e. the media companies and the operators, whose major source of income is advertisement [13], it is a way to increase revenue [7, 13]. And, from the viewers perspective, it allows them to view ads which best suit their profile, preferences and interests. Ads personalization is already extensively used in the Internet medium (e.g. Google AdsWords [9]), but not in the TV medium. There are several key points that distinguish TV ads, as we consider them in this research, from Internet ads, such as the environment, the method of exposure, the pricing method, allocation constraints, etc. Over the past few years we have witnessed progress in technology, infrastructure upgrade, increased use of alternative TV screens, e.g. cell-phones, and the penetration of interactive TV. This The research is part of the NEGEV Consortium [17] targeted at developing personalized content services, and directed by SintecMedia [20], which is a High-Tech company that designs and implements management systems for the TV broadcasting, cable and satellite industries.

progress has given rise to real personalized services, and the assignment of advertisement to specific viewers, based on their interests and their relevance to the advertised content. The personalization problem becomes even more important for the mobile TV platform where there is no uncertainty with respect to who is watching [1, 2]. Many studies concern the problem of selecting personal advertisements most suitable to each individual viewer, e.g. [12, 15, 16], and many others focus on how to deliver them, e.g. [4, 16]. Our research supplements these studies, by using their results as input with the goal of optimizing the allocation of ads. The issue of optimizing the personal TV advertisement problem is still an open problem for which, to date, no adequate solution has been proposed. In this work we propose algorithmic solutions and do not deal with the hardware or infrastructure problems. Throughout this research we assume that the infrastructure is similar to the iMedia system framework, which is designed for personal advertisement in the interactive TV environment [4]. Based on frameworks such as iMedia, the entire process of the personalized advertisement is as follows. Given advertisement requests, ads are allocated to viewers and playlists of ads are generated for the planned time periods in some centralized computing center. Then, advertisement contracts are signed with the advertisers according to the allocations, and the playlists are delivered and stored in the Set-Top-Box (STB) units with which each viewer is equipped, as is common today. During the planned time periods, viewers watch TV, and on commercial breaks each STB airs ads based on the viewers playlist. At the end of each time period, each STB sends an ads viewing report to the centralized computing center, detailing the actual airing of the ads from the viewers playlist. At the end of all the planned time periods the billing process is activated according to the signed contracts.

2.

PROBLEM DESCRIPTION

The Ads Allocation problem concerns the allocation of personal TV advertisements to viewers. We consider two versions of the Ads Allocation problem. The deterministic version, where the problems data is known in advance, is presented in Section 2.1. The uncertain multi-period version, where there are multi allocation periods and uncertainty about the problems data, is presented in Section 2.2.

2.1 Deterministic Version


The input for the Ads Allocation problem consists of a set of ads and a set of viewers. Each viewer is associated

51

with a viewing capacity, and a profile attributed to him. Each ad is associated with a transmission length, a required rating, a required airing frequency, a profit and a profile defining the target population. The ad rating indicates the required number of different viewers to whom the ad must be assigned in order to be considered allocated and be paid. The ad frequency corresponds to the number of times the same viewer should view the ad in order to be considered assigned to that viewer. The target population defines the set of viewers that are relevant for the ad. An example of parameters of a viewer would be a viewing capacity of 20 hours a week with a profile of a male from London in the 35-40 age group. An example of an ad request would be a 30 second ad that needs to be allocated to 20,000 viewers, 10 times to each viewer, which will result in a profit of $10,000, and the target population is females from NYC in the 20-35 age group. The goal of the Ads Allocation problem is to maximize the profit from a valid assignment of ads to viewers. A valid assignment that will result in payment should satisfies the ad rating and frequency requirements, does not exceed the viewers viewing capacities and be personal, i.e. suits the ads target population and viewers profiles. The deterministic version of the Ads Allocation problem is an extension of several well-studied optimization problems such as the General Assignment Problem (GAP) [14], the Multiple Knapsack Problem (MKP) [5], and the Multiple Knapsack problem with Assignment Restrictions (MKAR) [6, 19]. All of these problems are NP-hard and as an extension of them the Ads Allocation problem is also NP-Hard. Consequently, our proposed method for solving the problem is the heuristic approach (see Section 3.2) which is very common in solving instances of GAPs [14].

kids ages 2-11 watched 25:48 of TV per week on average and adults over 65 watched 48:54 of TV per week. These statistics are averages of specific viewer groups, but similar statistics can be calculated for smaller viewer groups and even for the individual viewer based on observations of the viewers past activity (taking into account privacy issues). In order to tackle uncertainty using an iterative approach of reallocation in future periods, the problem version is defined over multi-periods, where the ads allocations can be split over several periods instead of a single one, e.g. 4 weeks. This modification allows us to handle the uncertainty not only before it is revealed but also after.

3.

RESULTS

We have been working on this research for almost three years. To date we have revealed several interesting results and published two papers. We provide a brief description of the theoretical results in Section 3.1 and in Section 3.2 we present a brief summary of the experimental results for the deterministic and multi-period uncertain versions.

3.1 Theoretical Results


We have presented the problem hardness and have developed several polynomial time approximation schemes (PTAS) for special instances of the deterministic version. A PTAS algorithm takes the problem instance and > 0 as input and returns a solution with an approximation ratio of at most (1 + ) from the optimal, i.e. a solution which is worse than the optimal by at most a factor of (1 + ). The complexity of PTAS is polynomial in the instance size, but may be exponential in 1/ . We have shown that the problem is strongly NP-hard using a reduction from the 3-partition problem, which is strongly NP-hard [8]. This reduction holds even for instances where ad profits, ratings and frequency are equal to 1 and all ads can be allocated to all viewers, i.e. there is no personalization according to profiles. In addition, we have proven that personalization according to arbitrary profiles, i.e. arbitrary assignment restrictions, makes the problem APX-Hard, i.e. the problem has no PTAS unless P = N P . This proof was done using a reduction from the maximum 3-bounded 3-dimensional matching (3DM-3), which is APXhard as shown in [11], and it holds even for instances where ad lengths, profits, ratings and frequency are equal to 1. Although the problem has no PTAS, as described above, we developed PTASs for special instances of the problem: Instances of the Ads Allocation problem with no assignment restrictions and uniform ad lengths; Instances of the Ads Allocation problem with no as-

2.2

Multi-Period Uncertain Version

The multi-period uncertain version of the Ads Allocation problem is an extension of the deterministic case into a multi-period problem where the viewers viewing capacities are uncertain. While data regarding the ads requests, as well as data concerning the viewers profile (e.g. by asking the viewers), are known in advance, the data on a viewers viewing capacity is only a prediction of how much time a viewer will view TV within a certain period. Situations where viewers watch more or less time than expected are possible. The latter case, i.e. less viewing time than expected, is more problematic since there may be some ads that are not fulfilled which will cause a loss in revenue. However, the case of more viewing time than expected is also problematic, since knowing the actual viewing time in advance could result in allocation of more ads that in turn would increase revenue, which is our goal. We assume that each estimated viewing time, cj , of viewer vj is given together with some uncertainty factor 0 uj 1, where the real viewing capacity is a value in the range of [cj (1 uj ), cj (1 + uj )]. This model allows more realistic representation of the TV viewers, since their viewing capacity is not stable but can be estimated with a bounded error. The estimated viewing times can be based on viewing statistics. Such statistics are already available for some defined viewer groups. For example, according to BARB [3], in 2010 the average weekly viewing per person in Great Britain was 28:13. According to Nielsen [18], in 2009-2010 the average American watched 35:34 (hours/minutes) of TV per week,

signment restrictions, ad rating values that are taken from a constant number of options, and ad length val- ues that are a power of 2; Instances of the Ads Allocation problem with a bounded number of assignment restrictions, i.e. a constant num- ber of profiles, and uniform ad lengths. All of the PTASs we developed are based on generalizations of the PTAS for the multiple knapsack problem given in [5] and contain two steps: (1) selection of the ads to be assigned and (2) assigning the selected ads to viewers.

52

1 0.98 0.96 % Revenue 0.94 0.92 0.9 0.88 0.86 ProfitPerSec Backtrack Heuristic CPLEX 100 200 300 Number of Ads 400 500 % Revenue 100 75 50 Basic 25 CombinedHalf CombinedHalfLP Robust 0 10 20 30 40 50 % Uncertainty Factor

60

70

80

90

100

Figure 1: Algorithms performances for the deterministic version over instances of 100-500 ads.

Figure 2: Algorithms performances: average results for the multi-period uncertain version over instances of 1-8 periods. ments on different scenarios of uncertainty and number of periods, the performances of the heuristics were normalized to the performance in the deterministic case where the actual viewing capacities are known in advance. The heuristics we propose can be split into two categories: a robust heuristic which considers the worst case of viewing capacities and a modified rating heuristic which manipulates the ad ratings. Some of the interesting heuristics we propose adapt to the uncertainty and combine these two approaches. The C ombinedH alf heuristic considers the uncertainty of all periods in advance and allocates each ad to more viewers than needed, whereas the C ombinedH alf LP heuristic considers only the uncertainty of the last period. We use Basic to denote the performance of the heuristic which only adapts to the uncertainty and does not consider it in advance. As can be seen in figure 2, for higher uncertainty the performance of C ombinedH alf LP declines dramatically while the performance of the C ombinedH alf remains high, i.e. attaining above 85% of the revenue. Both algorithms obtain at least 87% of the revenue when the uncertainty is less than 60%, with some advantage to the C ombinedH alf LP heuristic. In general, it seems that the C ombinedH alf LP heuristic is preferred for lower uncertainty and the C ombinedH alf heuristic for higher uncertainty, where the break-even-point depends on the number of periods. For the full and detailed results of this problem version see [2].

3.2

Experimental Results

As a real world problem motivated by the industry, the focus of the research has been on developing algorithms that can be used and implemented (in contrast to pure theoretical research). Since the Ads Allocation problem is NP-Hard we developed heuristic algorithms which are common in solving instances of NP-Hard problems.

3.2.1

Deterministic Version Results

We developed several heuristic algorithms and evaluated them using simulations. Considering the size of the problem, i.e. thousands of ads and millions of viewers, we could not use an IP (Integer Programming) solver to solve the problem. In order to evaluate our heuristics we reduced the problem instances size and compared the results to those of a state-of-the-art IP solver, i.e. IBM ILOG CPLEX [10]. For the tested instances our heuristics returned results, within a few seconds, which were very close to the upper bound of the optimal value given by the solver (for some instances we could not find the optimal solution using CPLEX even with- out a runtime limit). Since realistic problem instances are much more, e.g. millions of viewers, with which CPLEX is unable to deal, the heuristic solutions, displaying very good performance, seem to be a good solution. The instances we tested include different ratios of ads per viewers, and different kinds of ad profiles, e.g. general ads that are relevant to all viewers vs. specific ads relevant to a unique target population. The heuristics we propose can be split into three categories: payment oriented, target population oriented and backtrack oriented. One of the interesting heuristics we propose is the BacktrackH euristic algorithm which takes into account the personalization level and payment of the ads in addition to the backtrack process. Another interesting heuristic is the P rof itP erSec algorithm, a greedy heuristic that prefers ads with a high profit per sec. Performance results of these algorithms are presented in figure 1. Our BacktrackH euristic outperforms the other heuristics and the IP solver, denoted as C P LEX , and on average attains 98% of the possible revenue. For the full and detailed results for this problem version see [1].

4.

RESEARCH PLAN

During the current year and the upcoming fourth year of research we plan to continue to investigate the problem, develop new algorithms and extend the evaluations. The detailed research plan, described below, includes completed, current and future work. Develop heuristics for the deterministic version of the Ads Allocation problem, and evaluate and compare them to the CPLEX IP solver. Completed and pub- lished (see [1]). Develop heuristics for the multi-period uncertain ver- sion of the Ads Allocation problem and Completed and published (Best winner, see [2]). evaluate them. paper award

3.2.2

Multi-Period Uncertain Version Results

This version of the problem naturally falls into the cat- egory of a multi-period problem where after each period, when some of the uncertainty has been revealed, the ads can be reallocated. Therefore, we present a sequential solu- tion procedure for the problem and propose several heuristic algorithms for solving it. Through computational experi-

Develop approximation algorithms for the determinis- tic version of the Ads Allocation problem. Such results provide bounds on the attainable revenue resulting in

53

the guaranteed quality of our heuristics. We already have several theoretical results including hardness results and polynomial time approximation schemes for special instances of the problem (see Section 3.1). Current work, partially completed. Extend the evaluation of the multi-period uncertain solutions for the Ads Allocation problem under different types of data environments, for instance, by altering the number of ads, the number of viewers, the uncertainty factors, the number of periods, etc. In addition, collect real data regarding viewers viewing capacities and evaluate the solutions using this data. Current work. Address the Ads Allocation problem under relaxation of the all-or-nothing rating and frequency constraints. The all-or-nothing constraints, e.g. the request to allocate the ad to the exact number of required different viewers, seem to have a tremendous effect on the problem while in reality minor violations can be ignored. Current work. Consider other special constraints of interactive TV, such as past interactions with viewers, current watched content, user interactive limitations (e.g. problematic back-channel for participation TV or television commerce services), etc. Future work.

5.

RESEARCH CONTRIBUTION

Since the aim of this research is to develop new algorithms to allow optimized ad personalization in the interactive TV environment and other enhanced TV mediums, its contribution to the interactive TV industry is consequential. This research will allow the industry to maximize revenues from advertising as well as deliver more relevant and interesting advertisements to the viewers. While other studies focus on selecting ads most suitable to the viewers, this research focuses on optimizing such allocations given the suitable ads for each viewer. As far as we know there is no other work underway which presents solutions to this problem while taking into account all the special constraints of the TV industry. Along with its contribution to the industry, this work can also be relevant to other domains and industries faced with similar assignment problems, e.g. packing of containers with multi all-or-nothing constraints. In addition to our heuristics algorithms we also present theoretical work which contributes to the theoretical investigation of the General Assignment Problem (GAP), the Multiple Knapsack Problem (MKP), and the Multiple Knapsack problem with Assignment Restrictions (MKAR).

6.

REFERENCES

[1] R. Adany, S. Kraus, and F. Ordonez. Personal Advertisement Allocation for Mobile TV. In International Conference on Advances in Mobile Computing & Multimedia, 2009. [2] R. Adany, S. Kraus, and F. Ordonez. Uncertain Personal Advertisement Allocation for Mobile TV. In International Conference on Advances in Mobile Computing & Multimedia, 2010. [3] BARB. Barb reports: Monthly total viewing summary. http://www.barb.co.uk, 2010.

[4] T. Bozios, G. Lekakos, V. Skoularidou, and K. Chorianopoulos. Advanced Techniques for Personalized Advertising in a Digital TV Environment: The iMEDIA System. In Proceedings of the eBusiness and eWork Conference, pages 10251031, Venice, Italy, 2001. [5] C. Chekuri and S. Khanna. A PTAS for the multiple knapsack problem. In Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pages 213222. Society for Industrial and Applied Mathematics Philadelphia, PA, USA, 2000. [6] M. Dawande, J. Kalagnanam, P. Keskinocak, F. Salman, and R. Ravi. Approximation Algorithms for the Multiple Knapsack Problem with Assignment Restrictions. Journal of Combinatorial Optimization, 4(2):171186, 2000. [7] V. Dureau. Addressable advertising on digital television. In Proceedings of the 2nd European conference on interactive television: enhancing the experience, Brighton, UK, MarchApril 2004. [8] M. R. Garey and D. S. Johnson. Computers and In-tractability. A Guide to the Theory of NP-Completeness. W.H. Freeman, New York, 1979. [9] Google AdWards. http://adwords.google.com. [10] IBM ILOG CPLEX Optimizer. http://ibm.com. [11] V. Kann. Maximum bounded 3-dimensional matching is MAX SNP-complete. Information Processing Letters, 37(1):2735, 1991. [12] G. Kastidou and R. Cohen. An approach for delivering personalized ads in interactive TV customized to both users and advertisers. In Proceedings of European conference on interactive television, 2006. [13] E. M. Kim and S. S. Wildman. A deeper look at the economics of advertiser support for television: the implications of consumption-differentiated viewers and ad addressability. Journal of Media Economics, 19:5579, 2006. [14] O. E. Kundakcioglu and S. Alizamir. Generalized assignment problem. In C. A. Floudas and P. M. Pardalos, editors, Encyclopedia of Optimization, pages 11531162. Springer, 2009. [15] G. Lekakos and G. Giaglis. A Lifestyle-Based Approach for Delivering Personalized Advertisements in Digital Interactive Television. Journal of Computer-Mediated Communication, 9(2):0000, 2004. [16] M. Lopez-Nores, J. Pazos-Arias, J. Garc a-Duque, Y. Blanco-Fernandez, M. Mart n-Vicente, A. Fernandez-Vilas, M. Ramos-Cabrer, and A. Gil-Solla. MiSPOT: dynamic product placement for digital TV through MPEG-4 processing and semantic reasoning. Knowledge and Information Systems, 22(1):101128, 2010. [17] Negev consortium. http://www.negevinitiative.org. [18] Nielsen Media Research. Snapshot of television use in the u.s. http://nielsen.com, September 2010. [19] Z. Nutov, I. Beniaminy, and R. Yuster. A (11/e)-approximation algorithm for the generalized assignment problem. Operations Research Letters, 34(3):283288, 2006. [20] SintecMedia. http://sintecmedia.com.

54

Video access and interaction based on emotions


Eva Oliveira
LaSIGE, University of Lisbon FCUL, 1749-016 Lisbon, Portugal, IPCA, 4750-117 Arcozelo BCL, Portugal +351 2175000533

eoliveira@ipca.pt

ABSTRACT
Films are by excellence the form of art that exploits our affective, perceptual and intellectual activity. Technological developments and the trends for media convergence are turning video into a dominant and pervasive medium and online video is becoming a growing entertainment activity on the web and iTV. Alongside, Human Computer Interaction research community has been using physiologic, brain and behavior measures to study possible ways to identify and use emotions in human-machine interactions. In our work we explore emotional recognition and classification mechanisms in order to provide video classification based on emotions, and to identify each users emotional states so as to provide different access mechanisms. We also focus on emotional movie access and exploration mechanisms to explore ways to access and visualize videos based on their emotional properties and users emotions and profiles.

of watching movies easy and doable. Films are by excellence the form of art that evolves affective, perceptual and intellectual activity. It is called as a way to transport us to new worlds, lives and fantasies by telling stories [11]. By combining diverse symbol systems, such as pictures, texts, music and narration, video is a very rich media type, often engaging the viewer cognitively and emotionally, and having a great potential in the promotion of emotional experiences. It has been used in different contexts: as a way to capture and show real events, to create and visualize scenarios not observable in reality, to inform, to learn, to tell stories and entertain, and to motivate; as movies, documentaries or short video clips. Isen et. al. [9] attested this potential, when she and her colleagues experimented the effect of positive affect in her patients, inducted by ten-minu-te comedy films. The study of films as an emotion induction method has some reports dated from 1996, as [12] reported, analyzing the mental operations of film viewers and discussing how emotions guide the motivation of perception and consequently the control of our attention by cinematographic narratives. Emotion studies have been done over the last few years, since it became proved that they are fundamental in cognitive and creative processes. In fact, understanding emotions is crucial to understanding motivation, attention or aesthetic phenomena. There is an increasing awareness in the HCI community of the important role of emotion in human computer interactions and interface design, and new mechanisms for the development of interfaces that register and respond to emotions have been studied [3]. Gathering emotional information from users can contribute to create emotional context in applications interfaces. Rosalind Picard in [17] defends that systems that ignore the emotional component of human life are inevitably inferior and incomplete, and she states that systems that provide a proper and useful social and emotional interaction are not science fiction but a science fact. Societys relation with technology is changing in such ways that it is predictable that, in the next years, Human Computer Interaction (HCI) will be dealing with users and computers that can be anywhere, and at anytime, and this changes interaction perspectives for the future. Human body changes, expressions or emotions would constitute factors that became naturally included in the design of human computer interactions [2]. There is a wide spectrum of areas that investigate emotions with different, but complementary, perspectives. For example, in the neurobiological area, [5] showed that emotions play a major role on cognitive and

Categories and Subject Descriptors


H.5.1 [Information Interfaces and Presentation (I.7)]: Multimedia Information Systems video; H.5.2 [Information Interfaces and Presentation (I.7)]: User Interfaces screen design;

General Terms
Design, Experimentation, Human Factors.

Keywords
Affective computing, Emotion-aware systems, Human-centred design, Psychophysiological measures, Pattern-recognition, Discriminant analysis, Support vector machine classifiers, Movies classification and recommendation.

1.INTRODUCTION
Video growth over the Internet changed the way users search, browse and view video content. Watching movies over the Internet is increasing and becoming a pastime. The possibility of streaming to TV Internet content, advances in video compression techniques and video streaming have turned this recent modality

55

decision making processes; HCI aims to understand the way users experience interactions and strives to stimulate the sense of pleasure and satisfaction by developing systems that focus on new intelligent ways to react to user's emotions [10]. HCI is also concerned with evaluation and usability, which includes evaluating the extent and accessibility of the systems user interface, accessing a users experience in interaction and identifying specific problems. The advent of rich multimedia interfaces has been providing new technological foundations to support these emotional trends in HCI. Currently, affective computing systems are being developed that are able to recognize and respond to emotions with the aim of improving the quality of human-computer interaction. Part of this research has concentrated on solving many technical issues involved with emotion recognition technologies. For example, [14] describe their work with sensors in the context of a study on emotional physiological response. According to [1] physiological measures such as galvanic skin response or pupil dilation constitute objective factors but are not easily correlated to particular emotions. Moreover, there are variations in rates which are due to normal individual differences among users, and intrusive wires or sensors may affect users behaviors. To circumvent this, less intrusive devices were developed [17]. In our work we are developing a novel emotional recognition approach based on pattern recognition techniques, based on discriminant analysis and support vector machine classifiers, which are validated using movies scenes selected to induce emotions ranging from the positive to the negative valence dimension, including happiness, anger, disgust, sadness, and fear. We present the system, iFelt, an interactive web video system designed to learn users emotional patterns, create emotional profiles for both users and videos, and explore this information to create emotion based interactions. The iFelt system is composed of two components. The Emotional Recognition and Classification component performs emotional recognition and classification and semantic representation of emotions in order to provide video classification based on emotions, and to identify each users emotional states so as to provide different search and access mechanisms. The Emotional Movie Access and Exploration component explores ways to access, to search, to represent and visualize videos based on their emotional properties and users emotions and profiles.

Considering the effect of the emotions in a persons attention, motivation and behavior, a scenario where it would be beneficial to have emotional impact videos to capture viewers attention is in educational contexts, where video could capture students attention in different ways, either to focus or to relax. The induction of emotions using movies has been largely used in psychology studies [8] and in health related studies. In fact, experimental studies confirmed that positive emotions can have a beneficial effect on physical health [15]. The development of new mechanisms to catalog, find and access movies based on emotions could help to assess videos emotion impact, and to find movies or scenes that tend to induce a certain feeling in the users. It could also aid filmmakers to perceive the emotional impact of their movies and, in particular, the emotional impact of each scene and compare it to the intention they had for the scene impact, and relate it to the adoption of specific special effects, acting approaches and settings. Moreover, actors may also be able to perceive their impact in a specific act. Finally, movie consumers may be able to explore movies by the emotions stirred by the content in multiple ways, compare their emotional reactions with other users reactions and see how they change overtime. Other challenges in accessing video is the fact that it conveys a huge amount of audiovisual information that is not structured and that changes along time, and so, accessing all the data that a video can provide is often not an easy task. Semantic descriptors, like its emotional properties, either expressed on the movie or felt by the users, can be used to tag some information of the video. And once this information is collected, we can try to use it for a better and meaningful organization of the individual and collective video spaces, to search, and even to provide new forms of visualization and interaction [7,18]. Visualization techniques, emerged from research rooted primarily on visual perception and cognition [4], can actually help to handle the complexity and express the richness in these information spaces. Video visualization can be an intuitive and effective way to convey meaningful information in video [18]. These issues can be synthesized in the following problem statement addressed in this work: Emotional classification based on physiologic information acquired from users when watching films, improves the relevance of movie search retrieval, contributes to enrich movie recommender systems and enables the design of emotional aware user interfaces for movie visualization by adapting their structure and their visualization elements and tools.

2.PROBLEM
Emotions can be expressed in a variety of ways such as body expressions (facial, vocals, body posture), or neurophysiologic symptoms (respiration, heart-rate, galvanic skin response and blood pressure). Accordingly, the Human Computer Interaction research community has been using physiologic, brain and behavior measures to study possible ways to identify and use emotions in human-machine interactions [10]. However, there are still challenges in the recognition processes, regarding the effectiveness of the mechanisms used to induce emotions. The induction is the process through which people are guided to feel one or more specific emotions, which provokes body reactions. Some relevant works showed that films were one of the most effective ways to induce emotions. J. Gross et al. [6] tried to find as many films as possible to elicit discrete emotions and find the best films for each discrete emotion. In 1996, a research group [12] tested eleven induction methods and concluded that films are the best method to elicit emotions (both positive and negative). The exploration of movies by their emotional dimensions can be used for entertainment, education or even medical purposes.

3. PHD OBJECTIVES
In order to address the issues identified in the problem statement presented above, and focusing on the research questions that emerged, four main goals were defined for this thesis, more specifically to: Improve movie search mechanisms, making information retrieval more relevant through the use of emotional profiles from users; Enrich recommender systems by adding emotional information allowing movie suggestions based upon emotional profiles of users and movies emotional profiles; Access and visualize characteristics of videos; videos based on emotional

Adapt user interface aspects with emotional awareness features that allow controlling the movie sequence

56

visualization, the complementary information available and even the way in which the collected information is displayed; To address these objectives we are going to follow a methodology described in the next section.

4.METHODOLOGY AND PHD CONTRIBUTIONS


The methodology used to develop this work was the following. First, an extensive research literature review was performed so as to understand the role of emotions in the context of affective computing with the precise objective of identifying the importance of emotional approaches in Human Computer Interaction, on Multimedia Information Retrieval and in Video Processing, along with the clarification of associated problems and limitations. Background literature on emotional theories, emotion recognition, biosignal processing, classification techniques, video analysis and low-level feature extraction, emotional design and recommendation systems was also covered with the objective of providing a framework of key concepts and technologies on which to base the design of a system architecture that addresses the emotional classification of users and movies as well as its representation and access. We developed an interactive web video application - iFelt - developed to learn users emotional patterns using movies scenes selected to induce emotions. - The iFelt system has 2 main components aimed at: emotional movie Content Classification and emotional movie Access and Exploration. This last component aims to provide video access and visualization based on their emotional properties and users emotions and profiles. We are designing different methods to access and watch the movies, at the levels of the whole movie collection, and the individual movies. The first prototype is focused towards the access based on the emotions felted by the user, to explore and evaluate emotional paradigm, on top of which we will later add the other perspectives. The design options are thoroughly addressed in [13]. Next we present our main contributions so far based on the problems stated before.

watching movie scenes to create an engine to support user interaction, and to enhance automatic recognition of users emotional states. We selected a set of movie scenes to induct subjects to feel five basic emotions (happiness, sadness, anger, fear and disgust) and the neutral one. Every subject watched 16 scenes (four of happiness, four of sadness, four of fear, two of disgust and two of anger) and one neutral scene. Based on their feedback, we associated the captured physiological signals with emotional labels, and trained our engine. Eight movies were randomly chosen from the total pool of 30. An average of two subjects watched these eight movies, and were classified by the system. With the SVM classifier, the overall average recognition rate is 69% (s.d. 5.0%), which represents a 49% improvement over random choice, whereas the k-NN classifier produced an overall average recognition rate of 47% (s.d. 9.3%). The SVM classification score shows promise that the iFelt system can be used to automatically evaluate human emotions.

4.2.Emotional Movie Access and Exploration


iFelt is an interactive web video application that allows to catalog, access, explore and visualize emotional information about movies. It is being designed to explore the affective dimensions of movies in terms of their properties and in accordance with users emotional profiles, choices and states. Although iFelt supports any kind of video, we are focusing our analysis in movies. The iFelt system has two main goals: 1) Emotional Movie content classification: to provide video classification based on emotions, either expressed in the movies, or felt by the users; 2) Emotional Movie access and exploration: to access and visualize videos based on their emotional properties and users emotions and profiles. In iFelt, we created different levels to access and explore movies: the 1) movies space where users get a view over the movies existing in the system, with information about their dominant emotions. We designed different representations, including movie lists and emotional wheels, where the movies are represented by a colored circle, with their dominant emotion color, in ways that represent the level of emotion dominance in each movie.; the 2) emotional scenes space, where users can obtain a view of the scenes of the movies based on the scenes dominant emotions, and allowing for e.g. to access the individual movies, but presenting only the scenes with the selected emotion, as emotional summaries of the movies; the 3) individual movie level, where the movie can be watched and, in addition, information about its dominant emotions and emotional scenes can be viewed, for e.g. through an emotional timeline that represents the emotional scenes along the movies; and 4) users have an emotional profile, with emotional information about their movies, that is movies classified from their own perspective or view, and statistical information concerning their history, in terms of the emotional classification of movies they watched. The Emotional Movie Access and Exploration are thoroughly described in [13].

4.1.Emotional Recognition and Classification


Our emotional recognition and classification component is grounded in the induction of emotional states by having users watch movie scenes. The recognition process included two important phases, the training and the testing phases. Inspired by the works of [16], we based our testing phase in an induction of emotion using a set of emotional movie scenes. This component can be divided into two main modules, the Biosignal Recording module and the Pattern Recognition module. Biosignal recording uses biosensors for measuring Galvanic Skin Response (GSR), Respiration (Resp) and Electrocardiogram (ECG) and is responsible for users biosignals recording and signal processing pipeline. These sensors were specifically chosen as they record the physiological responses of emotion, as controlled by the autonomous nervous system. The Pattern Recognition module uses discriminant analysis, support vector machine and K-nn classifiers to analyze the physiological data and it was validated by the usage of specific movie scenes selected to induce particular emotions. Our objective was to determine whether our classification engine is sufficiently accurate to automatically recognize emotional patterns from new data with a reasonable success rate. Another goal was to determine if the selected scenes had the same emotional impact in all the users in order to measure the importance of the scene for eliciting a specific emotion. Eight participants, averaged 34 year, were submitted to the experiment. In our study we are using the subjects data obtained while

5.FUTURE WORK AND PHD JUSTIFICATION


Our ongoing research intends to support real-time classification of discrete emotional states from biosignals for multimedia content classification and user interaction mechanisms by developing emotional aware applications that react in accordance to user's emotions. We are considering using emotion

57

recognition to automatically create emotional scenes, recommend movies based on the emotional state of the user and adjust interfaces according to users emotions and based on emotional regulation theories. By creating emotional profiles for both movies and users, we are developing new ways of discovery interesting emotional information in unknown or unseen movies, compare reactions to the same movies among other users, compare directors intentions with users effective impact, analyze over time our reactions or directors tendencies. Regarding visual exploration and access mechanisms of emotional information the next step would be to improve and extend the system in accordance with users feedback, our own evaluation of the current design and implementation, and some of the ideas we originally had and that were not yet included in the current version. Some of the future features include: extending the concept of video summaries to present movies in chosen emotional perspectives and preferences, with more criteria other then selecting scenes with one chosen emotion; summarizing or searching or recommending movies based on users current emotional states, or defined emotional criteria; to find movies by example, i.e. with emotional timelines similar to the timeline of a given movie; exploring the visual representation of huge amounts of movies and extend selecting and browsing methods based on more sophisticated and powerful filters and searches; and to include support for historical emotional information gathered along time, so we can witness the evolution of users emotional reactions to movies over time, and compare it to other perspectives, including the actors and directors involved, in the several movies genres. We also intend to make all this information more available, or visible, on the web as a shared and recommender environment based on the emotional classification of movies, useful for the general public, as well as for more professional perspectives of directors and actors. Finally, iFelt is currently focused in movies and the web environment, but this same approach can be useful and interesting to be explored with other types of videos, as is the case of advertisement videos that typically aim at specific emotional reactions from the viewers; and from interactive TV and video on demand services. The core functional and interface features could be the same, but some new requirements in these contexts might involve some adaptations or extensions.

[4]

Card, S.K., Mackinlay J.D., and Shneiderman, B. 1999 Readings in Information Visualization: Using Vision to Think, San Francisco, California: Morgan-Kaufmann. Damasio, A. (1995). Descartes' Error. Harper Perennial; Gross, J. J. & Levenson, R. W. (1995). Emotion elicitation using films. Cognition & Emotion, 9(1), 87-108. 987-108 Hauptmann, A. G. 2005. Lessons for the Future from a Decade of Informedia Video Analysis Research. Int. Conf. on Image and Video Retrieval, National Univ. of Singapore, Singapure, July 20-22. LNCS, vol 3568, pp.1-10, Aug. Huppert, F. 2006. Positive emotions and cognition: developmental, neuroscience and health perspectives. In Forgas J.P. (Ed.), Hearts and minds: Affective influences on social cognition and behavior., Psychology Press, New York. Isen A. M., Daubman K. A., and Nowicki G. P. (1987) Positive affect facilitates creative problem solving. Journal of personality and social psychology, 52:112231.

[5] [6] [7]

[8]

[9]

[10] Maaoui, C., Pruski, A., & Abdat, F. 2008. Emotion recognition for human-machine communication. 2008 IEEERSJ International Conference on Intelligent Robots and Systems, 1210-1215. [11] Metz, C. & Taylor, M. 1991. Film language: A semiotics of the cinema. University of Chicago Press. [12] Mnsterberg, H. (1970) The film: A psychological study: The silent photoplay in 1916. Dover Public. [13] Oliveira, E., Martins, P., Chambel, T. 2011. iFelt Accessing Movies Through our Emotions. In Proceedings of EuroITV'2011, 9th. European Conference on Interactive TV and Video, ACM SIGWEB, SIGMM & SIGCHI, Lisbon, Portugal, Jun 29-1Jul, 2011. [14] Peter, C., & Herbon, A. (2006). Emotion representation and physiology assignments in digital systems. Interacting With Computers: 18 (2), 139-170. [15] Philippot, P., Baeyens, C., Douilliez, C., & Francart, B. (2004) Cognitive regulation of emotion: Application to clinical disorders. In P. Philippot & R.S. Feldman (Eds.). The regulation of emotion. New York: Laurence Erlbaum Associates. [16] Picard, R. W., Vyzas, E., & Healey, J. 2001. Toward Machine Emotional Intelligence: Analysis of Affective Physiological State. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1175-1191.1175-1191 [17] Picard, R.W., 1997. Affective Computing. MIT Press, Cambridge, MA. [18] Teresa Chambel, Telmo Rocha and Joo Martinho, "Creative Visualization and Exploration of Video Spaces". In Proceedings of Artech'2010 : Envisioning Digital Spaces, 5th International Conference on Digital Arts, pp.1-10, Guimares, Portugal, Apr 22-23, 2010.

6.REFERENCES
[1] Axelrod, L., Hone, K.S. Affectemes and all affects: a novel approach to coding user emotional expression during interactive experiences. Behaviour & Information Technology,25, 2, March-April, 159-173 (2006) Being Human: Human-Computer Interaction in the Year 2020. http://research.microsoft.com/hci2020/, 2007. Brava, S., Nass, C., & Hutchinson, K., 2005. Computers that care: investigating the effects of orientation of emotion exhibited by an embodied computer agent. International Journal of Human-Computer Studies, 62(2), 161-178.

[2] [3]

58

Posters

59

Online iTV use by older people: preliminary findings of a rapid ethnographical study
*Susan Ferreira, **Sergio Sayago, *Valeria Righi, *Guiller Maln, *Josep Blat
*Interactive Technologies Group, Universitat Pompeu Fabra RocBoronat,138 (Barcelona, Spain)

(susanferreira, righi.vale, gmalon)@gmail.com, josepblat@upf.es


**Digital Media Access Group, School of Computing, University of Dundee, DD1 4HN (Dundee, Scotland)

sergiosayago@computing.dundee.ac.uk

ABSTRACT
This poster presents preliminary findings of a rapid ethnographical study of online iTV use by some 40 older people during 2 months. Whereas some research has addressed iTV accessibility for older people, online iTV has largely been overlooked. This paper presents some key issues of online iTV by ordinary older people who are motivated to technology uptake, their attitudes towards interacting with iTV on the traditional TV and a number of issues related to ICT use which can inspire the design of more enriching and inclusive online iTV services. on these opportunities if online iTV is not accessible to them. Most of todays older people have a lack of experience with digital technologies. Furthermore, these technologies have largely been designed without taking older people into account. Thus, there is a need to further online iTV research with older people. A number of previous studies have addressed iTV accessibility with older people (0, 0, 0). These have focused on developing novel and interesting software prototypes. For instance, 0 describes a system aimed at sharing multimedia content and 0 discusses a tool intended to give support to communication. Other studies have explored through interviews the reasons why iTV is unappealing to older people 0. Whereas the interaction barriers (e.g. difficulties using the mouse, understanding computer jargon) that older people face while using other digital technologies have been explored before (0,0), very little is known about those they encounter when interacting with iTV. 0 points out that significant work still should be done to design interfaces that are more usable to older people and support better their skills and abilities. Moreover, none of the studies reviewed above has addressed online iTV, despite the proliferation of it. Examples are the BBC iPlayer of the BBC in the UK 0 and TV3 A la Carta of TVC 0 in Catalonia. There is also a lack of information about how older people use (or would use) online iTV in out-oflaboratory conditions. Most of the studies reviewed above have been conducted in laboratory conditions, which concurs with the main approach adopted in HCI research with older people to date 0. Yet, there is growing awareness in HCI that understanding interactions as they happen in everyday environments is a crucial element to designing better, and therefore, more accessible interactions 0. An extended ethnographical study of technology use by older people revealed, for instance, that accessibility issues due to cognition limit older peoples interactions with digital technologies in out-of-laboratory conditions more seriously than those due to vision 0. This paper presents the preliminary results of a rapid ethnographical study 0 of iTV use by older people. This work is being conducted within Life 2.0 0, a research project aimed at making the network of social interactions more visible to older people. This will be done by providing them with an accessible platform consisting of collaborative ICT that track and locate relevant members of their social networks (i.e. relatives, friends and caregivers). The Life 2.0 platform will allow older people

Categories and Subject Descriptors


H.5.2 [Information Interfaces and Presentation]: User Interfaces - User-centered design; K.4.2 [Computers and Society]: Social Issues - Assistive technologies for persons with disabilities;

General Terms
Design, Human Factors.

Keywords
iTV, older people, accessibility, ethnography

1. INTRODUCTION
The traditional model of watching TV is changing. Today, people can interact with and create TV content almost anywhere. Online iTV (interactive TV on the web) allows us to share and produce content, reinforce communication and personalize information and services. We argue in this paper that exploring online iTV with older people (60+) is worthwhile. Online iTV services can and should be useful to older people, who consume a lot of TV. However, and despite an increasing ageing population, they run the risk of missing out

60

and their social networks to communicate amongst themselves through phone calls, text messages, advanced multimedia content distribution systems (e.g. IPTV, interactive digital signage and WebTV) and video telephony/conference solutions. The preliminary results indicate that our participants are interested in online iTV, especially in re-watching their favorite programs and watching those TV programs they missed. They are also keen to recommend TV programs to important members of their social circles. Interestingly, our participants do not have any interest in writing comments related to TV programs, despite the popularity of the comments in online iTV and, in general, Web 2.0. The results also indicate some interaction and social issues that should be considered in the design and evaluation of future iTV services, namely, privacy and social exclusion. The remainder of the paper is organized as follows. Section 2 describes the rapid ethnographical study. Section 3 presents the initial results of this study. Section 4 discusses the results and the research approach. Section 5 describes our ongoing and future research activities.

Table 1 - Ongoing fieldwork


Activity Workshop on Google Maps Description Hands-on introduction to Google Map, collaborative map. Hands-on introduction to blogs, creation of a blog with blogger. Hands-on introduction to TV channels video on demand Web Page and Youtube. Hands-on introduction to Facebook. Discussion about collaborative maps and blogs. Download and edit pictures from the web about gardens. Create/share documents (e.g. calendar, power point). Idem Technology Google Maps Participants 12 (6 men / women) Duration 2 sessions 2-hour session week 2 sessions 2-hour session week 1 session 2-hour / /

Workshop on weblogs

Blogger

11 (5 men / 6 women)

Workshop TV on the Web

Internet Explorer, Mozilla Firefox

11 (6 men / 5 women)

Workshop Facebook Participatory Design Workshop

Facebook

9 (6 men / 3 women) 10 (5 men / women)

1 session 2-hour 1 session 2-hour

2. RAPID ETHNOGRAPHICAL STUDY 2.1. Context: gora


We have conducted this study in gora, a 20-year-old association in Barcelona, which intends to integrate into Catalan society people who are, or might be, excluded from it, e.g. immigrants, non-educated and older people. gora considers that mastering digital technologies is a crucial aspect in achieving this inclusion. Thus, courses in computing, Internet access and workshops are provided. These and other activities are free of charge and participants, which is the term used by gora to reinforce the inclusion aspect of their work, decide what technologies they want to (learn to) use. This decision is often grounded in the participants daily needs or interests.

Map paper prototype and some other images and text. Internet Explorer, Mozilla Firefox, MS office tools, pictureediting tools, e-mail. Idem

Course on Gardens of the World

9 (4 men / women) 11 (6 men / women) 9 (4 men / women) 9 (4 men / women)

5 5 5 5

4 sessions 2-hour session week /

Course on Wild life and Nature

13 (4 men / 9 women) 11 (4 men / 7 women)

2 sessions 2-hour session / week

2.2.

Participants, iTV and research methods

We have conducted 27 hours of fieldwork over a 2-month period. The fieldwork activities consisted of in-situ observations of and informal conversations with around 40 older people (aged 60-75) while using several digital technologies. We ran 5 workshops, in which we explored technologies relevant for Life 2.0, such as and online iTV (YouTube and ondemand Spanish and Catalan TV channels), Google Maps, weblogs and Facebook. We also participated in 2 courses, which were organized by gora as part of their activities to foster the use of digital technologies amongst the older population and that had no specific connection with Life 2.0, in order to develop a more comprehensive understanding of older peoples interactions with digital technologies. The fieldwork was conducted in the goras computer room. Our participants can be considered as a heterogeneous user group. They originated from different Spanish and Catalan regions, and had different educational levels (ranging from primary to secondary school). In terms of computer skills, 27 were familiar with basic and more advanced aspects of interacting with computers, such as when to left- or right-click and look for information online. Table 1 summarizes the fieldwork activities.

We have recorded fieldnotes by using paper and pencil, and photographs. Our participants wrote down their notes by using notebooks and were used to other people in gora doing the same. Thus, laptops and video cameras were intrusive. Also, there are no laptops in the goras computers room, and our participants are not used to being videoed during their everyday interactions with them. We have analyzed the fieldnotes by using Grounded Theory 0, i.e. while gathering the data. We have conducted initial, axial and selective coding. We discuss the initial results next2.

3. PREMILINARY RESULTS
We first discuss some aspects of how older people use or would use online iTV. We then deal with their attitudes towards watching online iTV on the traditional TV. We also address other aspects related to interacting with online iTV, such as privacy and social inclusion, which emerged from the analysis and we consider crucial in better understanding and designing online iTV services with older people.

3.1. Using online iTV

0 analyses the data gathered in this ethnographical study in terms of the potential of geo-positioning services based on ICT for social inclusion amidst older people and their social circles.

61

Whereas none of our participants had used online iTV before, all of them were eager to use it. They showed interest in sharing videos with people they did know, namely, their children, grandchildren and close friends. They are also key actors in the use of e-mail by older people 0. Our participants were also interested in the possibility of either re-watching TV programs or watching those they missed. We observed that the participants were keen to share videos by e-mail. This is probably because all of them send and receive emails. However, no participant was interested in either commenting or rating videos. We found a similar result in a previous study of YouTube we conducted with another group of older people 0. In both studies, older people reported not being interested in the opinions of other people, and their strategy for commenting and rating is likely to be sending an e-mail to their children, grandchildren and close friends. Part of our future work is to explore this issue in detail. We observed, and participants reported, that websites such as TVE a la carta 0, TV3 a la carta 0 and BTV a la carta 0, were fairly easy to use. Despite the considerable amount of information presented on these sites, each participant searched for his or her favorite TV programs independently (i.e. without relying on us). They were much more dependent (i.e. relying on us) to conduct other activities, such as downloading an attachment received in an e-mail, in which we consider they deal with much less amount of information. Several factors might account for this interesting finding, which highlights the relevance of cognition in agreement with 0 such as the familiarity with the task at hand or the desktop (difficult to understand) and online iTV metaphors (similar to TV magazines). We will explore this result further in our future research.

Concurring with previous studies of ICT and ageing (e.g. 0, 0, [13]), the family is very important in the use our participants make of online iTV services. This suggests that an online iTV channel with my relatives, or other forms of communication with them mediated by online iTV, can encourage the uptake and use of it by older people, as well as reducing social exclusion. All our participants considered that using digital technologies was crucial in being included in current society. Thus, although numerous older people are not motivated to use ICT, the effort our participants make to use them should be understood as an opportunity to design better iTV services for them (and all of us).

4. DISCUSSION
We have addressed online iTV with a heterogeneous group of older people in an attempt to improve current understanding of iTV with them and their interactions with digital technologies. The preliminary results are rich, as they have shown expected and unexpected findings and dealt with a broad number of issues. For instance, whereas re-watching favorite TV programs can hold true for the use of online iTV by other user groups, our participants seem to have their own strategy for rating and commenting online iTV content, and this strategy is not related to writing comments or clicking on I like it on the website. We have also identified their changing and positive attitudes towards iTV, and the importance of privacy, the family and social inclusion in understanding their interactions with iTV (and other technologies) and designing better ones for them. We have adopted a research approach which has seldom been used in previous studies of iTV (and HCI in general) with older people: rapid ethnography 0. Although the results are preliminary, our first experiences of recording in-situ observations of and conversations with the participants while using the technology in out-of-laboratory conditions suggest that the method has great potential to further our understanding of older people as iTV users. Whereas it is common to include extracts of fieldnotes in ethnographical studies, we have not included any because we feel much more research is needed to expand, confirm and reject our initial ideas. At this stage of our research, we have not made any comparison between younger and older peoples interactions with online iTV because we consider we need to understand much deeply the current gap in iTV research with older people and work with more participants in order to make valid, significant and useful comparisons. Finally, let us also note that whereas the number of participants who took part in activities related to iTV during our fieldwork can be regarded as small, and we need to work with more participants in our future work, observing and talking with them and others while using different digital technologies has allowed us to start to identify and understand interactions issues which are common across to technologies.

3.2. TV

Attitudes towards watching online on

As there is a growing tendency towards accessing the Internet through traditional TVs, we decided to explore the attitudes of our participants towards interacting with online iTV using their own TVs. At first, they found it difficult to imagine an online TV. The computer was the device for doing online activities. However, after having had some contact with online iTV through computers, all participants showed a big interest in doing the same through their TVs. As stated earlier, they were keen to re-watch their favorite TV programs or watch those they missed. It is worth noting that rather than being afraid of digital technologies, our participants wanted to explore what they could do with them3.

3.3.

Privacy, family and social inclusion

Our participants are worried about their privacy when they go online. Most of the participants do not use Facebook because they do not relish the idea of letting unknown people read their messages or personal information. However, they were interested in showing other participants, their children and grandchildren what they do in gora and share with them (e.g. e-mail) information which can be regarded as personal (e.g. a presentation with photos of their grandchildren). Privacy is important, independent of technology and strongly connected with who should read what.

5. NEXT STEPS
We are gathering more ethnographical data. We expect to combine informal conversations with more structured interviews and focus groups in order to deepen and widen our first-hand observations and in-situ conversations. We are also currently analyzing the diaries filled by our participants. This analysis should help us cover more activities and gather more quantitative data (e.g. frequency of use of TV). We plan to conduct much more activities (e.g. workshops) related to iTV so

During a session on Course on Gardens of the World, participants were interesting in knowing how to display presentations (they create with MS Power Point) on the TV to show them when people paid them a visit

62

that we can explore further the use of several iTV services, platforms and technologies. We will also design quantitative studies to understand the effect of observational and conversational data on the interactions of older and younger people on the prototypes we will design.

Life 2.0: Geographical positioning services to support independent living and social interaction of elderly people (CIP ICT PSP20094270965). http://www.life2project.eu/. Last accessed on 1-Feb-2011 Millen, D. 2000. Rapid Ethnography: Time Deepening Strategies for HCI Field Research. In DIS 2000, New York, 280-286. Moggridge, B. 2007. Designing Interactions. Cambridge, MA. The MIT Press. Radio Televisin Espaola. http://www.rtve.es/alacarta/. Last accessed on 27-Feb-11 Rice, M., & Alm, N. 2008. Designing New Interfaces for Digital Interactive Television Usable by Older Adults, Computers in Entertainment (CIE) - Social television and user interaction, Volume 6 Issue 1, January 2008. Righi, V., Maln, G., Ferreira, S., Sayago, & S., Blat, J. 2011. Preliminary findings of an ethnographical research on designing accessible geolocated services with older people. 14th International Conference on Human-Computer Interaction. Orlando, USA. Accepted for publication. Sayago, S., & Blat, J. 2009. About the relevance of accessibility barriers in the everyday interactions of older people with the web. W4A2009 - Technical, (p. 104-113). Madrid, Spain. Sayago, S., & Blat, J. 2010. Telling the story of older people emailing: An ethnographical study. International Journal of Human-Computer Studies, Volume 68, Issues 1-2, January-February, (p. 105 -120). Svensson, M., & Sokoler, T. 2008. Ticket-to-Talk-Television: Designing for the circumstantial nature of everyday social interaction. Proceedings of the 5th Nordic conference on Human-computer interaction: building bridges, October 20-22, Lund, Sweden. Televisi de Catalunya. http://www.tv3.cat/videos. Last accessed on 27-Feb-11.

6. ACKNOWLEDGMENTS
This work has received the support from the Ministry of Foreign Affairs and Cooperation and the Spanish Agency for International Development Cooperation (MAEC-AECID), and the Commission for Universities and Research of the Ministry of Innovation, Universities and Enterprise of the Autonomous Government of Catalonia. We are indebted to our participants and gora for their participation in our research, and our colleagues at the Interactive Technologies Group for their support and feedback. We also thank the reviewers of this paper for their comments and suggestions.

7. REFERENCES
Arias, J. 2006. Diseo y evaluacin de la interfaz de usuario de un buscador de vdeos sencillo para personas mayores. MsC degree project. Universitat Pompeu Fabra. Barcelona Televisi. http://www.btv.cat/alacarta/. Last accessed on 27-Feb-11 British Broadcasting Corporation (BBC) Television. http://www.bbc.co.uk/iplayer/tv. Last accessed on 18-Apr2011 Charmaz,K., Mitchell, R.G., 2007. Grounded theory in ethnography. In: Atkinson, P., Coffey, A., Delamont, S., Lofland, J., Lofland, L.(Eds.), Handbook of Ethnography. SAGE Publications, London, 160175 Dickinson, A., Newell, A. F., Smith, M. J., & Hill, R. L. 2005. Introducing the Internet to the over-60s: Developing an email system for older novice computer users. Interacting with Computers, 6(17), p. 621-642 Kurniawan, S. 2007. Older Women and Digital TV: A Case Study. ASSETS07, (251-252). Tempe, Arizona.

63

Multipleye Concurrent Information Delivery on Public Displays


Morin Ostkamp
Mnster University of Applied Sciences Stegerwaldstr. 39 48565 Steinfurt, Germany

Gernot Bauer
Mnster University of Applied Sciences Stegerwaldstr. 39 48565 Steinfurt, Germany

morin.ostkamp@fh-muenster.de ABSTRACT
The visible area of a monitor is often called the displays visual real estate. On many contemporary desktop systems, this visual real estate is subdivided into small units, each showing dierent types of information independently (e.g. time, user name, network status). Though there may be many independent units on display in parallel, each of them is most commonly used to transport only one information at a time and is dedicated for one particular user the visual real estate occupied by a web browser cannot be used concurrently by a word processor. This imposes a constraint on the amount of information delivered by the medium: A public display can satisfy peoples curiosity only one by one, but not simultaneously without resizing the used visual real estate. Thus, people are compelled to wait in front of such displays until the information they individually desire is shown. This loss of time is often an annoyance to the viewers, since they could have spent this dwell-time on other more meaningful things. In a project called Multipleye we try to discover ways of multiplexing information visually, e.g. by frequency-, space-, time-, and code-division multiplexing. According to the viewers choice, a mobile app demultiplexes the individual information from the multiplexed image. The increased amount of information transmitted per time can reduce unnecessary dwell-time in front of public displays. This paper presents a demonstrator based on frequency-division multiplexing, discusses rst results and proposes further work.

gernot.bauer@fh-muenster.de 1. INTRODUCTION & RELATED WORK


There is a substantial body of research on the usability of public displays today. Some of the work focuses on how the visual representation of information can be optimized when perceived by more than one person at a time. In [5] Izadi et al. present Dynamo, a system allowing two ore more persons to work on a particular task by using a communal interactive interface. However, this approach only focuses on explicit user interaction with dedicated, spatially divided screen areas. Kray et al. investigate peoples preferences for logically subdivided displays with the Hermes and GAUDI system in [6]. Their approach is however not intended to work with more than two information at a time. A similar strategy is used by Linden et al. for the UBI-hotspot system as presented in [8]. This outdoor system can be used by multiple passers-by while each of them explicitly interacts with a dedicated display area. Thus, the size of the useable display area shrinks with each new user. Peltonen et al. describe CityWall [10], which also allows passers-by to interact with an outside display. But their approach does not provide any means to protect the visual real estate of one user from others. Vogel and Balakrishnan introduce the Interactive Public Ambient Display in [12]. Their system allows users to access information on public displays while the employed visual real estate can still be used by other viewers because of partial transparency. However, this way the number of parallel information is quite limited and the resulting image may become cluttered and confusing since different contents may overlap. In [9] Oliver et al. demonstrate how dierent devices can be orchestrated to implement a crossmodal public-private display. Their proposed CrossFlow system gives navigational directions to multiple users by using time-division multiplexing and a so called crossmodal cue (e.g. the vibration of the users mobile phone). Nevertheless, this multiplexing approach does not really deliver multiple information in parallel. Instead, it presents each information in separate time slots of 0.8 seconds. Due to their interaction concept, all of the above mentioned results can be applied to Computer Supported Collaborative Work (CSCW) environments. However, there is little work on shared use of public displays without direct user interaction. An approach without direct user interaction could be of interest for public spaces such as government agencies, waiting areas, or airports. At such locations, public displays usually show a linear playout of predened content. In most cases, the viewer cannot inuence the presentation, but only perceive it. If the information appears to be irrelevant to the viewer, she is likely to divert her attention

Categories and Subject Descriptors


H.5.2 [Information Interfaces and Presentation]: User Interfaces

General Terms
Design

Keywords
public display, multiplexing, dwell-time, channel capacity

from the screen. The viewer as well as the display operator strive to avoid this irrelevance, for it means a waste of the viewers time and a waste of the operators resources. An improvement would be to allow the viewer to switch contents as easily as to switch TV channels. However, due to varying interests, there may be some potential for conicts when picking the TV channel. To avoid such a clash of interests in public spaces, this calls for special means of content selection and distribution. Multipleye is exactly about this: a system capable to multiplex and demultiplex information visually, thus increasing the amount of information delivered simultaneously (see Figure 1). In telephone engineering multiplexing is used to transmit scores of phone calls through one wire at the same time. In a similar way, we try to transport more than one information at once using the same public display. The projects name is a neologism of multiplexer and eye, the latter one alluding to our visual approach to multiplexing.

Multiplexed Image

Demultiplexed Images

Figure 1: The basic idea of Multipleye.

image, and faultlessly would state, that the demultiplexed information can be perceived with accuracy, completeness, and consistency as dened by Wand and Wang in [13]. In many technical domains a common method to transmit multiple information over the same medium is to use multiplexing. A couple of strategies can be employed: frequencydivision (FDM), space-division (SDM), time-division (TDM), code-division (CDM), and polarization (see [1, 2, 14]). As a start, the Multipleye project focuses on frequency-division multiplexing, the results being discussed in the remainder of this paper.

2.

IDEA

Our aim is to allow more than one viewer at a time to use the public displays visual real estate for information retrieval. Thus reduce the dwell-time a viewer has to spend waiting until her individually desired information is shown on the public display. The approach pursued in this paper is to increase the information amount per time on public displays. Beforehand, it is necessary to dene how this amount could be measured and named. Possible well-known terms are bit, bandwidth, or channel capacity. Their eligibility is discussed in the rst part of this section. The unit bit is often used to describe an amount of information (e.g. a JPEG image could be 300,000 bit in size). However, this technical denition of the term information diers from the common human understanding of an information (e.g. the delayed arrival of a ight) and there is no sound conversion between them. Additionally, a bit does not state anything about the time it takes to transfer that information. Besides this denition, bit is also used to specify the uncertainty contained in an information, which is based on Shannons work on information theory [11]. A possible goal could be to push this value as close to zero as possible, which would mean there is no uncertainty about the public displays content. But again, since it does not provide any information about the consumed time, it is not applicable in this context. In its original meaning, bandwidth is dened as the dierence between the upper and lower frequencies in a contiguous set of frequencies, measured in Hertz (Hz). In context of computer networks, it is often mistakenly used to quantify the speed of the information transfer instead of the proper term bit rate, like kbit/s or mbit/s. Since measuring the delivered information in terms of bits turned out to be inappropriate in the previous paragraph, a related, but more general term is to be sought. The channel capacity expresses the amount of information that can be transferred faultlessly over a communication channel within a given time. In this context, the information could be the departure time of a ight; the communication channel would be visual light; the specied amount of time could be the time it usually takes an adult to process an

3.

FDM DEMONSTRATOR

In the domain of visual light, dierent frequencies result in dierent colors. Hence, a straight-forward approach for visual multiplexing is to use dierent colors for dierent information. However, not all hues of the RGB color space can be exploited to deliver any number of information in one multiplexed image at once only basis vectors may be used. The reason is that every color is dened by a linear combination of these basis vectors. Here, the most commonly picked basis vectors are red, green, and blue. For example, the color yellow is a mixture of equal parts of red and green, but no parts of blue. The FDM demonstrator works accordingly: if, for instance, the red letter X was to be visually multiplexed with the green letter Y, the intersecting parts of both letters would show up as yellow in the multiplexed image. Figure 2(a) visualizes this concept. Following the TV set analogy, red and green would be the equivalents to two TV channels the user can chose from. Due to the cubical design of the RGB color space, the maximum number of basis vectors is three. Hence, the maximum number of simultaneously transported information is also three in order to guarantee an unambiguous reconstruction of the source images from the multiplexed image. The demultiplexing is done on an Android phone equipped with a camera. The applied algorithm can be kept simple: Initially, the whole screen is blank. Every pixel of the camera image is then decomposed into the colors red, green and blue. Depending on which channel the user has picked in the settings, the corresponding pixel on the smartphones screen will be colored, if that color exceeds a certain threshold. Figure 2(b) shows the demultiplexed information contained in each channel of Figure 2(a). Since the software constantly processes the images recorded by the camera and immediately displays the results, it behaves like an optical lens,

ltering visual information behind it. Ishii and Ullmer call this type of device an ActiveLens [4]. The computation can be done within reasonable time (6 FPS) on a Sony Ericsson XPERIA X10i or Samsung Galaxy S, for example. The presented Android app as well as a website for generating the multiplexed image are available online1 .

4.

DISCUSSION

For a start, test images with strong contrasts were used. Since text letters provide this property, three layers of prose were visually multiplexed as shown in Figure 2(a). Although the result may appear as hardly legible for the human eye, a small user study with co-workers showed that the perception varies due to their individual vision, which may e.g. be inuenced by color blindness. The multiplexed and demultiplexed results shown in Figure 2 are quite promising. Yet, the current approach does not work well for all kind of images, like photos taken with a digital camera. Theoretically, the system should be able to decompose the multiplexed image into its original components awlessly. However, due to some distortion of the recorded image, the retransformation cannot be done precisely enough as the just slightly varying shades of red, green, and blue cannot properly be distinguished. This may be caused by the cameras optics and may additionally be worsened by the automatic white-balance and color-correction, which both cannot be disabled or inuenced by an application on current devices. Hence, in some occasions, the displayed colors are too bright or they get tainted, thus causing the corresponding image regions to appear as plain white or as a falsied color. Another optical eect causes a distracting glow around the outline of large monochromatic shapes, which results in shadows or blurry parts in the demultiplexed images. Such color mismatches may also be caused by diering characteristics of each manufractures camera. Nevertheless, the dwell-time in front of displays may be reduced, because of the displays increased channel capacity. It allows to serve peoples individual interests simultaneously. The viewer can select the content she is interested in, which reduces the risk of loosing her attention.This may be a step towards a shared use of public displays without direct user interaction, as often found in CSCW environments like Dynamo described in [5]. Display operators could sell airtime more than once, use their resources more economically and increase their viewers satisfaction by oering them a broader range of information. Also, the results could be useful in educational environments, since individual content could be shown to each person or group, thus improving the learning experience. Kruppa and Aslan propose a comparable approach applied to museums in [7]. Another aspect to take into consideration is privacy on public displays. There has been some research on this issue, for example by Izadi et al. in [5] or Vogel and Balakrishnan in [12]. Yet, the rst one only focuses on how certain areas of the display can be protected from direct manipulation by
1

(a) Multiplexed texts.

(b) Demultiplexed texts. Figure 2: Multiplexed and demultiplexed images.

others and the latter one tries to occlude those areas to prevent others from spying on them. Multipleye however could be used as a true means of privacy for public displays, if the shown information was legible with a certain device or security code only. In this papers context, the device is the smartphone running the demonstrator app and the code is one of the three channels red, green, and blue. Obviously, the code is yet too simple and not very secure, but further work on other multiplexing methods may improve this. In contrast to approaches, which push the information directly to the users mobile device, Multipleye does not need additional communication channels like GSM or UMTS. This way, it even works in areas without mobile data networks. Furthermore, it can help to visualize the general presence of information at one glance, whereas data push services have to be brought to the users attention in the rst place, e.g. by text messages (SMS), which may however be overlooked in certain situations. This notion of information transportation is known as informative art [3]. Since Multipleye is more comparable to the traditional TV broadcast model than to video on demand (VOD) solutions, which push the content onto the users device, it may be suited better to deliver copyrighted or sensitive material. Along with the fact that broadcasters can decide on when to show the specic content, it can also easily be tied to a particular display, hence to a particular location. Combined, the temporal and spatial aspect may contribute to

http://www.multipleye.de

a successful implementation, as recent events like the 2010 Soccer World Championship have shown that people favor watching broadcasts together in public viewing sessions.

Furthermore, other platforms, such as Apples iPhone or Microsofts Windows Phone 7, could be tested. They possibly oer better image processing capabilities, allowing a faster demultiplexing of the multiplexed image.

5.

CONCLUSIONS

The proposed idea of visual multiplexing based on frequencydivision seems to be a valid approach to deliver individual information to multiple viewers by a single public display. However, as explained in the rst part of Section 2, it is dicult to dene a term, which can be used to precisely measure the projects outcomes. The expression channel capacity does not t natively in this context, for it is usually used in a sheer technical sense. Some renement needs to be done on this during the further course of the project. Anyways, the impact on the viewers dwell-time or the number of satised viewers, who found the information they were looking for on the public display, can already be measured and evaluated objectively. Though the presented demonstrator is capable to multiplex and demultiplex three dierent texts satisfactorily in regards to quality and speed, its performance is less useable when applied to photos. The resulting images do have some resemblance with their originals, yet they probably dier too much from the source material for most applications. In conclusion, a more robust information code has to be used on the apparently noisy transport channel. Section 2 lists four other methods for multiplexing, which will all be analyzed in subsequent work. The results are also expected to help overcome the limit of three simultaneous information at a time and make the multiplexed image even more illegible for the human eye.

7.

REFERENCES

6.

FUTURE WORK

The current demonstrator has not yet been evaluated within a eld study. However, we are currently working on the design and implementation of such. The research questions that should be answered are: Can visual multiplexing be used on public displays to deliver more than one information at the once? and What benet is there for the viewer? In comparison to solutions, which utilize QR tags to guide mobile users to specic contents, Multipleye may turn out to be faster and more direct, since the viewer does not have to scan such an QR tag or manually type in an URL, which would both disrupt the overall user experience in that particular situation. After a thorough evaluation of the frequency-division approach, the four other multiplexing methods space-division, time-division, code-division and polarization will be examined. The most challenging one seems to be code-division, which could be realized based on a Fourier, Laplace or ztransform of the content. Such a transform could improve the fault tolerance when transferring data over the seemingly noisy visual channel and help to overcome the current limit of three simultaneous information at a time. Another step could be to embed the additional information in a clearly recognizable carrier image, so that the digital signage display will keep its traditional function as a public display and remain legible for conventional viewers without additional tools.

[1] M. Born, E. Wolf, A. B. Bhatia, P. C. Clemmow, D. Gabor, A. R. Stokes, A. M. Taylor, P. A. Wayman, and W. L. Wilcock. Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diraction of Light. Cambridge University Press, 7 edition, Oct. 1999. [2] L. L. Hanzo, Y. Akhtman, L. Wang, and M. Jiang. MIMO-OFDM for LTE, WiFi and WiMAX: Coherent versus Non-coherent and Cooperative Turbo Transceivers. John Wiley & Sons, 1 edition, Oct. 2010. [3] L. E. Holmquist and T. Skog. Informative art: information visualization in everyday environments. In Proc. of GRAPHITE 03, pages 229235. [4] H. Ishii and B. Ullmer. Tangible bits: towards seamless interfaces between people, bits and atoms. In Proc. of SIGCHI 97, pages 234241. [5] S. Izadi, H. Brignull, T. Rodden, Y. Rogers, and M. Underwood. Dynamo: a public interactive surface supporting the cooperative sharing and exchange of media. In Proc. of UIST 03, pages 159168. [6] C. Kray, K. Cheverst, D. Fitton, C. Sas, J. Patterson, M. Rounceeld, and C. Stahl. Sharing control of dispersed situated displays between nand residential users. In Proc. of MobileHCI 06, page 61. [7] M. Kruppa and I. Aslan. Parallel presentations for heterogenous user groups - an initial user study. In M. Maybury, O. Stock, and W. Wahlster, editors, Intelligent Technologies for Interactive Entertainment, volume 3814 of LNCS, pages 5463. Springer Berlin/Heidelberg, 2005. 10.1007/11590323 6. [8] T. Linden, T. Heikkinen, T. Ojala, H. Kukka, and M. Jurmu. Web-based framework for spatiotemporal screen real estate management of interactive public displays. In Proc. of WWW 10, pages 12771280. [9] P. Olivier, S. Gilroy, H. Cao, D. Jackson, and C. Kray. Crossmodal attention in Public-Private displays. In Proc. of ACS/IEEE 06, pages 1318. [10] P. Peltonen, E. Kurvinen, A. Salovaara, G. Jacucci, T. Ilmonen, J. Evans, A. Oulasvirta, and P. Saarikko. Its mine, dont touch!: interactions at a large multi-touch display in a city centre. In Proc. of CHI 08, pages 12851294. [11] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379423, 1948. [12] D. Vogel and R. Balakrishnan. Interactive public ambient displays: transitioning from implicit to explicit, public to personal, interaction with multiple users. In Proc. of UIST 04, pages 137146. [13] Y. Wand and R. Y. Wang. Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39:8695, Nov. 1996. ACM ID: 240479. [14] C. M. White. Data Communications and Computer Networks: A Business Users Approach. Cengage Learning, 4 edition, Mar. 2006.

Older Adults and Digital Interactive Television: Use of a Wii Controller


Amritpal Singh Bhachu
University of Dundee School of Computing Queen Mother Building, Dundee +44 (0) 1382 388115 abhachu@computing.dundee.ac.uk

Vicki L. Hanson
University of Dundee School of Computing Queen Mother Building, Dundee +44 (0) 1382 386510 vlh@computing.dundee.ac.uk

ABSTRACT
Digital Interactive Television (DITV) is full of interactive content that was not previously available through analogue television. However, the interactive content can be complex to use, which in turn creates obstacles for users, and in particular older adults, to access this content. This work looks into ways in which DITV content can be made accessible for older adults through the use of a Wii Controller. This is a qualitative study where the opinions of older adults using the BBC iPlayer (the BBCs own video on demand service) on the Wii console with its gesture controller are compared against those opinions of younger adults who also took part in the study.

2. Background 2.1. Usability Constraints of the Elderly


With age, we develop multiple minor impairments [7; 8]. By the age of 65 our vision is often impaired severely, although in most cases it can be corrected to 20/40 for at least 80% of over 65s [5]. Motor abilities experience a decline in our control and speed of movement as well as a possible loss in our kinesthetic senses. Working memory, where temporary information is kept active while we work on it or until it is used, is also affected by age. It is likely that older adults will perform less well in dual task conditions and have difficulties developing new automatic processes [5].

Categories and Subject Descriptors


H.5.2 [Information Interfaces and Presentation] User Interfaces - graphical user interfaces, input devices and strategies, interaction styles, screen design, user-centered design.

2.2.

The Remote Control and its Issues

General Terms
Design, Experimentation, Human Factors.

Keywords
Digital Television, Older Adults, Interface Design, Gesture Control.

1. Introduction
Compared to analogue TV, Digital Interactive Television (DITV) has a multitude of new channels, interactive services and features. These services include the Electronic Programme Guide (EPG), Personal Video Recorder (PVR), Video on Demand (VoD), Red Button services and Digital Text. In the future Internet functionality will be common place on Set Top Boxes (STBs). This additional functionality has the potential to enhance the TV experience for viewers as well as increasing audiences for DITV services. The additional functionality brings with it an increase in complexity [9] and an increase in the physical and mental demands put on the viewer to operate their DITV system. This is at a time where the older population is growing throughout modern day society [4], and because of age related decline of physical and mental attributes, this group in particular will find DITV a challenge to use. DITV content therefore must be made more accessible to older adults.

DITV has also seen the evolution of the remote control. The remote control can be looked upon as a second interface in the use of a TV system [10]. There are more buttons on a remote control than previously, with the need to control the additional DITV functionality. The growth in buttons adds complexity to the remote control, with many of them rarely used. Buttons are also likely to be smaller on the modern remote control and will often have multiple functions attached to them for the different TV modes. The typical remote control also constrains on-screen interface design because of the conventional navigational inputs. The small buttons, with poor labeling, can be difficult to find and hard to read for those who have visual impairments. Motor impairments make it hard to press the required button. There are also issues for those with cognitive impairments, as remembering the function of each button can be a challenge. This is increased as the viewer constantly switches focus between the TV screen and the remote control while trying to complete a task, and further complicated if the viewer requires different corrective lenses for each interface [3].

2.3.

DITV Menu Systems

In most cases, DITV menus are designed from a computing perspective, taking on a file system type layout in that they have an hierarchical type structure similar to that of a file system structure. This is inappropriate for many reasons. The screen resolution of a TV is traditionally smaller than that of a Personal Computer (PC) monitor. Modern TV sets do address this issue to an extent. There are limitations on screen space available due to the low resolution and the inclusion of safe zone for display cut off during design [4]. The viewer is also

68

likely to sit further away when watching TV than when using a PC and will rarely sit directly in front of the screen with an angle being introduced [2]. Television watching is also considered to be a sit-back activity compared to PC usage, which is a lean-forward activity. For older adults with visual impairments, DITV menu text can be difficult to make out and read and, with too much information on screen, older adults can be cognitively overloaded and get confused in the options available. It can also be laborious for older adults to learn and remember the sequence of steps required to achieve a goal, putting additional strain on their memory requirements.

Lab space was used and set-up to mimic a living room type layout. A 28 LG LCD television was connected to the Wii with BBC iPlayer running. An armchair was used for the participants to sit in and positioned 6ft from the screen.

3.2.3

Data Collection and Analysis

3. Wii-mote Study 3.1. Introduction


Based on the literature review, it was decided to investigate further how older adults would react to using a controller different to the traditional TV remote control. At the time of the experiment design, the Nintendo Wii was a popular games console and had been regularly advertised as a device that could be used by everyone. In addition to this, the BBC had recently developed a Wii specific iPlayer interface, Figure 1, to be operated by the Wii controller, which is a gesture-based controller. The BBC iPlayer is the BBCs own Video on Demand (VoD) service. It was decided to do this exploratory study and compare a small group of older and younger adults and assess whether a study should be performed on a larger scale, possibly with other alternative devices.

The study was to be a qualitative study. Each session was recorded using two video recorders, one focusing on the TV screen to record on screen selections and one focusing on the participant to record their movements and reactions. During the session, the experimenter also took notes on any observations and comments made by the participant. This data was later analysed and thematic coding [1] was applied. The themes used are based on the questions asked in the end of experiment discussions, which are detailed in section 3.2.5. This allowed comparisons to be made between the results and opinions of the older adults and those of the younger adults.

3.2.4. Tasks
Each participant was asked to perform the same set of tasks. Altogether there were 7 tasks to be carried out. These were: Find a programme that they would like to watch from the selections available on the iPlayer and select to view it Pause the selected programme Browse to 10 minutes into the selected programme Turn down the volume of the iPlayer Mute the volume of the iPlayer Find the most popular television programme for the day Find an alternative way of finding the most popular programme for the day Find a specific programme using the search function

3.2.5.
Figure 1 Wii BBC iPlayer Interface Particular interest was in how the user could use a gesturebased control in this environment and how older adults interacted with the design and layout of buttons and controls on the BBC iPlayer interface. The Wii iPlayer interface offers a variety of onscreen objects that the user can interact with. This includes selectable icons, text and images, vertical and horizontal scroll bars as well as an area to input text during searches.

Post Study Questions

3.2.

Research Method

3.2.1. Participants
There were 11 individuals taking part in the study in total, 5 younger adults (1 female and 4 male) who were under the age of 38 and 6 older adults (2 female and 4 male) over the age of 65. The younger adults were all staff and students from the School of Computing at the University of XXXX and were experienced computer users. The older adults were all from the older adults computer center in the School of Computing at the University of XXXX and had a good grasp of the basics of computing. None of the participants had a great deal of experience of the using Nintendo Wii or gesture based remote controls, although most had played games on the console at least once. Participants were required to have had no experience of using the iPlayer on the Wii, although many of the younger adults group had experience of using the BBC iPlayer on a desktop platform.

Once the participant had completed all the tasks, they were asked a set of questions to evaluate their experience of using the system. These were: Did the user feel they were concentrating more on the screen or the controller? How difficult was it to locate the cursor on the screen? Does the tactile, vibrating feedback help the user to find the icons on screen? How difficult did the user find it to make selections? Is it annoying to see the on-screen video control icons (when playing a video)? How difficult is it to use the remote without the armrest? Does the user feel their arm weakening when using the controller? How does it compare to the remote controls the user currently uses? Is it something the user could get used to using?

3.2.6.

Procedure

3.2.2. Set-up/Layout

After completing a consent form, each participant was asked to fill in a questionnaire asking their age group, their living status and if they required glasses for watching TV, reading text on the TV screen or for using the TV remote control. They were then given each task to complete in turn. During the tasks, the experimenter would not interrupt the participant and would only give appropriate help when the participant was clearly becoming frustrated with the task. Altogether each session lasted for an average of 45 minutes. It was stressed to the participant during each session that it was the interaction with the system using a gesture controller that was being assessed

69

and not the Wii console itself, BBC iPlayer or the user themselves.

3.3. Results
The results in this paper have been categorized by the themes used during the analysis.

the older adults found the tactile feedback annoying, but 1 of these 2 did highlight how they felt it might be useful for those who have visual impairments.

3.3.3. On-Screen Selections


The majority of the younger group had no problems with the on-screen elements and one summed this up when commenting that they found the buttons: Big and easy to hit. The main issue that this group mentioned was issues with the scroll bars. They found that they had to be precise when controlling them and commented: I wouldnt like to use it if I was tired and watching TV. The feedback from the older adults on this topic was slightly different. Many felt that they needed to get more familiarity with the layout and how each of the elements worked. This may a legacy of the fact that this group had little experience of using iPlayer before and so did not know what the buttons did in each case. One comment was: Im still at the stage of guessing and its very much trial and error. 2 of the 6 older adults mentioned how they found it difficult as the pointer moved when they were making selections. This was often a case of a movement that they would make as they hit the A button to make selections. 1 of the 6 adults also found that the on-screen elements were a bit small. The younger adults were more likely to try different solution paths if one did not work after a few seconds of trying. On these occasions, the older adults were more likely to get stuck down one solution path without attempting to find an alternative. One of the reasons for this was because of the highlighting of the tabs. Initially, the older adults couldnt understand that a single tab could display different things depending on the process used to get there. For example, when a programme is chosen, the information for said programme is displayed under the TV tab. However, when searching for favourites, most of the older adults were unaware that selecting the TV tab from the home page would give them an overview of the channels and options available as they expected it to only show single programme information. None of the younger group had issues with this.

3.3.1. Viewer Attention


When asked if the participants felt their attention was drawn more to the TV screen or Wii-mote using this set-up, the majority of all participants said that their focus was on the TV. For both groups of participants, comments were made that attention was on a specific area of the screen when performing a particular task. An observation made by the experimenter was that this was a particular issue for the older adults group. For example, when using the search function, the older adults were fully focused on the text input area on the right of the screen and failed to notice the predictive suggestions on the left, which also contained the programme they were looking for. One of the older adults said: There is a focus on one thing and you get stuck in that area looking for the answer, while another commented: I didnt see that (the prediction) because I was concentrating (on entering the text). The younger adults appeared to be able to adapt to this situation with more ease and therefore able to complete the tasks quicker. Participants from both groups however did highlight that the Wii-mote still required them to look down at the control occasionally to make selections. Again, on observing the videos, it was apparent that this was an issue that slowed down the older adults more than the younger adults. The younger adults appeared to adjust quicker and by the end of the tasks, were visibly spending less time looking at the control. When asked about finding the cursor on screen, 4 of the younger adults said that they had no problems with this. The other participant in this group said that they had to keep the cursor very steady, which was slightly awkward, but was something that they could get used to. 4 of the older group also felt that the cursor was easy to find. The others felt it was difficult to find but once it had been found it was not a problem. 3 of the 6 older adults also felt that the cursor was difficult to keep steady with the Wii-mote. 1 of this 3 suggested that this was made even more difficult with the tremor in their hand.

3.3.4. On-Screen Video Control Icons


Discussion also focused on the on-screen controls of the video player that would pop-up when needed and disappear again when the Wii-mote was set down. In general, the younger adults identified that these controls and their operation was something they were used to when watching DVDs on their own computers or other video players like the iPlayer. 2 of the 5 did indicate that it was annoying that it did not go away almost instantly and that the control had to be set down and rested before it did. 1 of the 5 liked that it showed up towards the middle of the screen and that made it more noticeable. Of the older adults, 4 of the 6 had no problems with it and did not find it too irritating. 1 of the 6 preferred if this was constantly shown on-screen while another 1 indicated that they did not want anything on the screen when they were watching a programme.

3.3.2. Tactile Feedback


The tactile, vibrating feedback that the Wii-mote produced when the cursor went over screen elements got mixed reactions within both groups. Several in the younger group commented on how the feedback was only given when large, fast movements were made. Another in this group felt that the feedback was good to start with, but the longer they used it the more annoying it became while another felt it was annoying from the beginning and that they preferred to use sight instead. One of the younger group also commented that they did not even know what the vibrations were doing. 2 of this group also felt that the vibrations were what caused the slight movements of the cursor that irritated them. None of the older adults found the tactile feedback helpful. 4 of the 6 did not know what it was for. Once it had been explained to them what it was for, 2 of the 4 commented on how it may be useful when they understood what it did. One went further to justify this statement by saying that they like it when they are given meaningful responses to what they are trying to do and dont like it when systems are silent with little feedback. 2 of

3.3.5. Handling the Wii-mote


All the participants who took part used the armrest of the chair or their own body as a way to rest their arm and steady the Wiimote. In general it was felt that this was easier and less tiring doing this and also most felt that the pointer was more difficult to control without it. A comment from one of the younger participants was: Might be easier to put down and use like a mouse.

70

Another from this group commented: I would have to use both hands as it needs some steadying. One member of the older group felt that it was actually easier to control without the armrest. However, this same person felt the remote was a bit too heavy to keep held out which made it tiring. In the main, most of participants in both groups found the Wii-mote tiring to use, as it was slightly heavy. 4 of the 5 younger adults liked the Wii-mote as much if not more than the remote control that they currently used with their TV. One felt that they could use it in the same way while relaxing. 3 others liked the fact that there were less buttons than on their own remote controls at home. However, 3 of the 5 did say that they needed more buttons on the remote control, at least for the basic functions of the TV. They were also open to having a control that incorporated elements of both controllers. The older adults had a quite different response to this. They preferred their own remote controls in all cases, although they saw the fact that there were less buttons as somewhat of a benefit. One of the older adults did mention on how they felt that it may become uncomfortable when using the Wii-mote over a period of time. The main feeling amongst this group was that the Wii-mote was something that they may get used to with more experience of it.

7.

References
[1] BRAUN, V. AND CLARKE, V. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 77 - 101. [2] CARMICHAEL, A. 1999. Style Guide for the Design of Interactive Television Services for Elderly Viewers, I.T. COMMISSION Ed., Kings Worthy Court, Winchester. [3] CARMICHAEL, A., RICE, M., SLOAN, D. AND GREGOR, P. 2006. Digital switchover or digital divide: a prognosis for usable and accessible interactive digital television in the UK. Univers. Access Inf. Soc. 4, 400-416. [4] CENTRE OF ECONOMICPOLICY The Growing Elderly Population. Online. Accessed April 2008. http://www.cepr.org/pubs/bulletin/meets/416.htm [5] COOPER, W. 2008. The interactive television user experience so far. In Proceedings of the Proceeding of the 1st international conference on Designing interactive user experiences for TV and video, Silicon Valley, California, USA2008 ACM, 1453832, 133142. [6] FISK, A.D., ROGERS, W.A., CHARNESS, N., CZAJA, S.J. AND SHARIT, J. 2009. Designing for Older Adults - Principals and Creative Human Factors. [7] GREGOR, P., NEWELL, A.F. AND ZAJICEK, M. 2002. Designing for dynamic diversity: interfaces for older people. In Proceedings of the Proceedings of the fifth international ACM conference on Assistive technologies, Edinburgh, Scotland2002 ACM, 638277, 151-156. [8] KEATES, S. AND CLARKSON, P.J. 2004. Assessing the accessibility of digital television set-top boxes. In DESIGNING A MORE INCLUSIVE WORLD, 183-192. [9] KURNIAWAN, S. 2007. Older Women and Digital TV: A Case Study. In Proceedings of the Assets '072007, 251-252. [10] SPRINGETT, M.V. AND GRIFFITHS, R.N. 2007. Accessibility of Interactive Television for Users with Low Vision: Learning from the Web. In EuroITV 2007, P.C.E. AL. Ed., 76-85.

4.

Conclusions

The greatest positive for older adults using this alternative control was that there was less need to switch focus between the screen and the controlling device, although this was not completely removed. However, although their attention may have been drawn more to the screen, it appears that they did still have difficulties focusing on the screen as a whole, with more focus on specific areas of the screen. This may be a skill that is learnt through more interaction with such a set-up and layout. Comments from the older adults suggest that the physical use of a gesture control such as the Wii-mote may cause some issues as they find it tiring to use. This is something that may be solved by using a lighter gesture device. The other issue with the Wii-mote for the older adults was the ability to steady the cursor on an icon. This was complicated further in some cases by the vibrating tactile feedback moving the cursor as well as cursor movements caused when trying to press the select button on the control. The sensitivity may be adjusted to help with this. The feedback from this work suggests that older adults might be open to new ways of interacting with their television given the time to get used to a different way of interaction. Compared to the younger adults, they had problems adapting to the new system throughout the study, although there were indications that with better explanation of the set-up, and the opportunity to gain experience using the gesture control, they would become more comfortable using it.

5.

Future work

Based on the results of the initial Wii-mote study, it has been decided to that the next stage of work will explore other alternative interfaces in more depth. To do this, a standard remote control will be directly compared against a tablet PC and a gesture control to assess whether it improves the usability for older adults. The on-screen interface will be the same for each device so the participants are only comparing the operation of the devices against each other.

6.

Acknowledgements

Thanks go to the XXX and XXXXX for providing support and funding for this work.

71

Predicting Where, When and What People Will Watch on TV Based on their Past Viewing History
Michael J. Darnell
Microsoft Corporation 1065 La Avenida Street Mountain View, California, 94043, USA +1 650 693 4965

mdarnell@microsoft.com ABSTRACT
This study investigated predicting what people will watch on TV when they have a specific program in mind that they have been routinely watching in the past. The set of recently watched TV programs for the household would form the basis of prediction. A survey of 264 TV viewers in the USA was conducted to evaluate whether the prediction could be improved by knowing which TV in the home was used to view each recently watched program, knowing the day of the week and time the programs were viewed and knowing who in the household viewed the programs. The results of the survey showed that each of these supplemental information sources, considered alone, would improve the prediction for small subsets of the respondents, but taken together, would improve the predictions for the majority of people. This type of prediction is contrasted with typical TV recommendation systems.

Suppose, when a person sits down to watch TV, with a particular program in mind, we could predict and automatically tune to that program or we could present on the TV a brief shortcut list of the most likely to-be-watched programs for that particular time? This would be useful because a person could get to a program on the shortcut list quickly, saving them from having to navigate to the program they want to watch. For instance, if the person had a Live TV (scheduled, linear) program in mind to watch, it would save them from having to navigate to the correct channel. If the person had a recorded program in mind, it would save them from having to go to the list of recordings and scroll down to find the program. The present study is concerned with the case where people sit down to watch TV and know exactly what they want to watch. This is different in focus from much of the prior work in TV recommender systems, which has been concerned with the case where people do not have a particular program in mind. For example, they may be interested in watching a movie or a new TV series, but do not know which one. The TiVo system provides suggestions for each TiVo PVR receiver. The suggestions include movies and series which similar TiVo users watch or have rated highly and which are not scheduled to record on the given PVR. These suggestions are based on collaborative filtering. The suggestions are not provided in real time but are only updated perhaps once per day. TiVo suggestions are not meant to be used as a shortcut list to help people navigate to programs they already intended to watch. Instead, they are meant as a way of helping people discover new movies and programs that they would find interesting. [2] Netflix also uses a collaborative filtering method to recommend new movies and TV series to subscribers [3]. If a Netflix user has been watching a particular TV series, Netflix does make it convenient to get to the next episode of the series by providing a recently-watched list of programs, ordered by recency. The focus of the current study is predicting what a person will watch when they sit down to watch TV with a specific program in mind. Knowing what TV programs have been watched in a household in the recent past (Table 1, Column 1) is a starting point for predicting. This is based on the finding, mentioned before, that more than half of the time when people sit down to watch TV, they know what program they want to watch, because they have been watching the series routinely. One could simply present on our hypothetical shortcut list, the recently watched programs which are available to watch right now. For instance, if Maury was watched in the recent past, and it is now on Live TV, Maury would be on the shortcut list. An

Categories and Subject Descriptors


H.5.2 [INFORMATION INTERFACES AND PRESENTATION]: User Interfaces evaluation/methodology, user-centered design

General Terms
Design, Human Factors

Keywords
Recommendations, Predictions, Television

1. INTRODUCTION
TV watching is part of most peoples daily routine. People tend to watch certain TV programs at certain times with certain people in their households. More than half the time when a person sits down to watch TV, they know what program they want to watch because they have been watching the TV series routinely[1].

72

hour later, when Maury is over, it would not be on the shortcut list. Recently watched PVR recordings and Video on Demand programs would also appear on the shortcut list, although there may be too many of them to keep the shortcut list brief. Table 1. Example of a list of recently watched TV programs for a household with supplemental TV coding, Day & Time coding, and Personalization coding TV Program CSI Big Love Maury NCIS The Office TV coding Living room Bedroom Living room Bedroom Bedroom Day & Time coding Fri. Sun. Tue. Sun. Fri. 19:00 20:00 10:00 22:00 20:00 Person. coding Art Zoe Zoe Art, Zoe Art

The survey collected information about people in the respondents household, the TVs in the household, and the frequency of use of different TV sources (Live TV, PVR and VOD) Survey respondents were asked to recall the titles of the last TV program they recalled watching on Live TV, PVR and VOD. Was it a TV show or movie? Did they watch it alone? Was it an episode of a series that recurs regularly? For each of the 3 TV program titles recalled: the last Live TV program watched, the last PVR recording viewed, the last VOD program viewed, respondents were asked: 1. Do you almost always watch it on the same TV in your home? Do others in your households watch, on that same TV, many programs that you never watch? Do you almost always watch it on the same day of the week and time? Do others in the household almost always watch different shows at the same time? What is the overlap between the shows you watch the most and the shows other people in your household watch the most?

2.

How could our hypothetical shortcut list be further filtered? Knowing on which TV in the household (TV coding) each program was watched could aid prediction. For example, a program could only appear on the shortcut list on a given TV (e.g., Living room TV) if it was actually watched on that TV in the recent past. See Table 1, Column 2. Although current technology makes it possible to view TV programs and movies on a variety of devices both in and out of the home, the vast majority of TV viewing is still on TVs in the home [4]. Knowing what day of the week and time each program was watched (Day & Time coding) may also aid prediction. For example, a program could only appear on our shortcut list on a given day and time if was actually watched on the same given day and time in the recent past. Day & time coding would probably apply to PVR recordings and Videos on demand as well as Live TV, since people tend to view recorded programs at routine times [5]. Knowing who watched the program (Personalization coding) could improve prediction. For example, a program would only appear on the shortcut list for a given person if that person watched that program in the recent past. How do we know who watched the program? Some advanced commercial TV/Video systems have explicit login capabilities so that the system knows who is using the system. Xbox Kinect ID is even able to identify the person(s) sitting in front of the TV automatically [6]. The current paper presents the results of a survey to find out how routinely people watch Live TV, Recorded TV (PVR) and Video on Demand (VOD). Do they tend to watch a given TV show on the same TV? Do they tend to watch a given show on the same day and time? Do different people in the household tend to watch different shows? Would knowing the "where, when and who" of recently-watched shows in the household improve prediction accuracy of what shows will be watched during a particular future TV viewing session?

3.

The following describes how the respondents answers to these questions were interpreted as supporting or not supporting each coding method. TV Coding: If respondents reported almost always watching the last-recalled show on the same TV, that supports using TV Coding to predict what shows will be watched in future TV viewing sessions on that TV. However, if other people in the household watch many other programs on that same TV, that the respondent doesnt watch, that does not support using TV coding. Day & Time Coding: If respondents reported almost always watching the last-recalled TV show on a specific day of the week and time, that supports using Day & Time Coding to predict what show will be watched in future TV viewing sessions. However, if other people in the household almost always watch other shows at the same time, that does not support using Day & Time coding. Personalization Coding: If there is little overlap between the most-watched shows and channels of different people in the household, that supports using Personalization coding to predict what show will be watched in future TV viewing sessions. However, if there is much overlap between the most-watched shows of different people of the household, that does not support using Personalization coding.

3. RESULTS
The survey was completed by 264 people, each representing one household. There was an average of 2.3 adults per household. Forty two percent of the households had children less than 18 years of age (an average of 1.8 children per household). The respondents households had an average of 2.9 TVs where Live TV could be viewed; 1.8 TVs with PVR, and 1.8 TVs with VOD. Sixty percent of the respondents watched Live TV every day. PVR recordings were watched daily by about 40% of the respondents. VOD was not used on a daily basis (Figure 1).

2. METHOD
To participate in the survey, respondents were required to have a PVR system at home and to personally watch recorded or Live TV at least once per week. On the average, an adult in the US watches about 35 hours of TV and video per week [7]. Thirty seven percent of US households have a PVR service [8]. Forty four percent of US households have a Video on demand service available on a TV in their home.

73

were among the 5 most-watched channels of other household members. Thirty-seven percent reported that their 5 mostwatched channels were the same as the 5 most watched channels of other household members. Thus, sixty-three percent of respondents reported that a majority of their most-watched channels were unique in the household. This supports using Personalization coding for Live TV. To support using Personalization coding for PVR, the respondents last watched PVR program Had to reside on a PVR that had recordings from other people in the household that the respondent would never watch Had to be reached after scrolling through at least six other recordings including recordings belonging to others in the household

Only 20% of the respondents last recalled PVR program met these criteria. (Figure 3) Figure 1. Percentage of respondents watching Live TV, PVR and The combination of coding methods did not do as well to VOD at various frequencies predict VOD viewing, probably because VOD is not used often to view series and thus does not have as much routine viewing as Live TV and PVR. The last Live TV program respondents recalled watching was a series episode for 75% of the respondents and was watched alone for 43% of the respondents. The last PVR program recalled was even more likely to be a series episode: 86%. For VOD, the last recalled show was unlikely to be an series episode (31%) and was less likely to be watched alone (35%) Respondent households were divided into 4 groups based on the number of persons and number of TVs in the household. This was done because predicting what a household is going to watch could not benefit from Personalization Coding in households with 1 person and could not benefit from TV Coding in households with 1 TV. Eighty-one percent of the households had multiple people and multiple TVs. In these households, prediction could be made using any of the three coding methods individually (TV coding, Day & Time Coding or Personalization Coding), any combination of two of the coding methods, or the combination of all three. TV Coding: Respondents tended to watch the last recalled Live TV program on the same TV (76%) However, other people in the household tended to watch many different programs on that same TV that the respondent never watched (64%). Thus 22% of the survey respondents supported using TV codingthat is, they tended to watch their last recalled Live TV program on the same TV and lived in households where others tended not to watch other programs on that TV. There were similar results for PVR. See Figure 3. Day & Time Coding: Respondents tended to watch their last recalled Live TV program on the same day of the week and time (82%) However, other people in the household tended to watch different Live TV programs on other TVs in the home on the same day and time (56%). Thus, only 26% of the survey respondents supported using Day & Time coding that is, they tended to watch the last recalled Live TV program on the same day and time and lived in households where others tended not to watch other Live TV programs on other TVs on the same day and time. There were similar results for PVR. See Figure 3. Personalization Coding: Nine percent of respondents reported that their 5 most-watched channels were different from the 5 most-watched channels of other household members. Fifty-four percent reported that 1 or 2 of their 5 most-watched channels

Figure 3. Percentage of multi-person, multi-TV households in which the respondents answers to survey questions supported each coding method Overall, only a minority of respondents answered the survey questions in such a way to support any one of the 3 coding methods considered separately. The only exception was Personalization Coding for Live TV. One should not try to directly compare the percentage of respondents supporting each of the three coding methods because the evidence for each method was based on a substantially different question sets in the survey. However, a majority of respondents answered the survey questions in such a way to support at least one of the coding methods. For example, TV coding would benefit some households; Day & Time Coding would benefit some of the same households plus some different ones. Together, the two coding methods benefit more households than either one alone.

74

4. SUMMARY AND DISCUSSION


This survey was a preliminary study to determine how one could predict what someone would watch during their routine TV viewing and how this prediction might be improved by knowing which TV was being watched, what day and time the viewing was taking place and who was watching the TV. The results suggest that each of these sources of information could improve prediction in some households and taken together would improve prediction for a majority of households. This information could be used to construct a brief shortcut list of the most likely watched programs. This list could be dynamically updated perhaps every half hour as the predictions change. If the TV viewer sat down to watch a specific program which is part of their TV watching routine, it is likely that program would be in this hypothetical shortcut list, making it very easy to navigate to that program, thus, making routine TV watching more efficient. One potential problem with presenting a brief shortcut list of shows to the TV viewer, even if the list usually includes the show the viewer intends to watch is that the viewer may want to make sure that there isnt something better to watch at that time. This phenomenon has been a problem for features like Favorite channels commonly found on TV systems. People may not use Favorite channels because they want to make sure there isnt something better on one of the channels that isnt a favorite before committing to watch a particular program [9]. Another potential problem is that about half the time when people sit down to watch TV, they do not have a specific program in mind and are not really sure what they want to watch [1]. The prediction system described in this paper is a type of recommendation system quite different from the type of recommendation systems found in movie services such as Netflix [3] or TiVo [2]. A key difference is that the prediction system described in this paper only recommends TV programs that the household has been routinely viewing. The system adds value by predicting when, where and to whom to recommend those programs. It learns ones routine viewing habits and reinforces them by making it easy to continue those routines. This is appropriate for the scenario when the person sits down to watch TV with a specific program in mind. In contrast, a recommendation system such as that used by Netflix, recommends movies and programs that the viewer most likely has not seen based on movies and programs that the viewer has seen together with viewer specified ratings. This type of system is geared toward recommending movies, rather than TV programs, because movies are generally only viewed once. This is a different problem and is appropriate to the scenario where the person sits down to watch TV but doesnt have a specific program in mind. The prediction system described here together with a Netflix type system could be combined to cover both types of recommendation situations. The current study used a survey to measure the potential effectiveness of various sources of information for prediction. Surveys are subject to the limitations of the respondents judgment and to the fairly crude precision of the questions. Future studies should use actual longitudinal TV viewing data

for households to see how well these sources of information predict routine viewing patterns.

5. ACKNOWLEDGMENTS
I would like to thank Afshan Kleinhanzl for supporting this research.

6. REFERENCES
.

Lee, B. AND Lee, R.S. 1995. How and why people watch TV: Implications for the future of interactive television. J. Advertising Research 35, 6, 9-18 Ali, K. AND van Stam, W. 2004. TiVo: Making Show Recommendations Using a Distributed Collaborative Filtering Architecture. Proceedings of KDD04, 394 401. Wilson, T. 2006. How Netflix Works. http://electronics.howstuffworks.com/netflix2.htm Video Consumer Mapping Study. 2009. Council for Research Excellence. http://www.researchexcellence.com/vcmstudy.php Gorman, B. 2009. When Do DVR'd Shows Get Watched? Same Night or After? http://tvbythenumbers.zap2it.com/2009/01/09/when-dodvrd-shows-get-watched-same-night-or-after/10483 Leyvand, T., Meekhof, C., Yi-Chen, W. Jian, S. AND Baining, G. 2011. Kinect Identity: Technology and Experience. Computer (44) No. 4. 94 96. TV DIMENSIONS. 2010. Trends in TV Viewing. Media Dynamics, New York, p. 65. Snapshot of Television Use in the US - Worldwide | The Nielsen Company http://blog.nielsen.com/nielsenwire/wpcontent/uploads/2010/09/Nielsen-State-of-TV09232010.pdf Darnell, M.: How do people really interact with TV? Naturalistic observations of digital TV and digital video recorder users. (2007) http://portal.acm.org/citation.cfm?id=1279550&coll=GUI DE&dl=ACM&CFID=49431438&CFTOKEN=76454999

75

Unusual Co-Production: Online Co-Creation in Cross Media Format Development


Skylla J. Janssen
INHolland University Wildenborch 6 1112 XB Amsterdam/Diemen +31 (0)20 - 495 11 11

skylla.janssen@inholland.nl ABSTRACT
In order to be interesting, attractive and relevant for the audience the broadcasting industry is looking for ways to seduce, relate to and bind the audience. In the traditional broadcasting industry there are boundaries between the media professional and the audience [1, 2, 3, 4, 5]. Studies of audience participation generally show that television traditionally offers limited access and almost only controlled space for ordinary people [2, 6, 7]. Changes in media consumption practices and the rise of the viewser [8] (the combination of viewer and user; see also Tofflers prosumer [9] or Bruns produser [10]) might reduce the gap between program makers and the audience. The media professional is supposed to know the needs and interests of his audience, but does he? Nowadays he focuses upon satisfying the needs of viewers (in combination with a healthy business model) by looking into cross media activities, interactivity and participation. This poses the question whether the traditional broadcasting industry can benefit from audience participation in cross media format development. This paper offers notion of the process of co-creation while using service design principles. Program makers and viewers are invited to join a temporary online community. Lessons comprise insights in co-creation, the use of service design and participant design in a television industry context, the use of an online pressure cooker platform, and what can be expected from co-operation with the viewer in a level playing field.

1. INTRODUCTION
In my paper Interactive Television Format Development Could Participatory Design Bridge the Gap? I announced that participant design could be a model for TV program makers to become familiar with viewers and their habits, concerns, needs and wishes [11]. In the broadcasting industry, format development is considered to be the work of media professionals with expert skills. Nick Couldry (2000) describes the symbolic power of broadcast institutions that is based on the boundaries between professional media people and nonprofessional, ordinary, people [1]. It is still rather unusual for program makers to invite viewers to participate to create a TV format. During earlier research I discussed this topic with media professionals and they explained to me that most of the viewers ideas for formats arent usable. They also told me that program makers are consumers as well, and so they know what viewers want. They often mentioned that program makers know how to make well-made television and that it is a mtier [12]. It is not that the industry is not aware of changing viewing practices and the use of digital media. The policy plan 20102016 called Verbinden, verrijken en verrassen [13] of the Dutch public broadcasting organization (NPO) describes the ambition to create a strong public service for and together with the Dutch citizens. Interaction with audiences is seen as one way to reach that goal. Enli (2007) describes: In a new media environment, however, the audience role is in transition: broadcasters increasingly address their audience as participants as well as viewers, and new skills and competences are hence required. (p. 62) [14]. The question remains, how can the industry professional involve the audience? My idea is to investigate a viewer-centered approach using principles you normally find in service design [15]. There is no common definition of service design though it relates to usercentered design and participatory design. The goal of all these approaches - that is indeed what they are: approaches not methods - is to help innovate (create new) or improve (existing) services to make them more useful, usable, desirable for clients and efficient as well as effective for organizations. (Moritz, 2005, p. 31) [16]. In this paper I describe what online co-creation between program makers and viewers can bring in for cross media format development. You can read about the exploratory study and the preliminary results: I explain how the TV program format will be developed using a service design approach in section 2.4.

Categories and Subject Descriptors


K.m MISCELLANEOUS

General Terms
Management, Economics, Experimentation, Human Factors.

Keywords
Co-creation, participatory design, service design, cross media television format, format development, special interest, enduser participation, broadcasting industry, program makers, online community.

76

The choice for using a temporary online community, the options I used and my choice for a pressure cooker model are explained in section 2.3. The research took place from 14th March 4th April 2011. In this paper preliminary results can be found in section 3.

2.2.

Selection of respondents

2. BACKGROUND
Approaching the end of my Ph.D. research, this study is the final-stage, focusing on viewer empowerment in the broadcasting industry. The architecture of participation [17] that started on the web, finds its way in many other contexts nowadays ranging from civic journalism to the use of an automatic external defibrillator by non-medical practitioners using SMS alerts. Interviews with media professionals in preceding years taught me that co-operation with viewers is something that is done with the expert in the lead. It is not a level playing field in which TV experts and viewers have equal power. It is important to note that there is no resistance to expertise in participant design: equal does not mean that participants are the same or take over each others role. However the product or service will have to be experienced though the eyes of the viewer. As a stakeholder the viewer will also have to have a say in what the program should look like and how it fits his needs and wishes. The will to collaborate is very important for co-creation to be successful. I am lucky to be able to put co-creation in TV format development to the test. Every participant in this study comes in with his own knowledge, experience and expertise. Knowing the context of program makers as well as the viewers context from earlier research, I will reflect on the results of this co-creation project in this perspective.

Thirty viewers, all horsemen and women who really care about their hobby, were selected via a snowball method using my own network as a start. In this study the viewers can be seen as experts on the content of the program. They can deliver input for the stories to be told. In this project they operate as comakers. Ten program makers were recruited at the network for media makers (NPOX) in Hilversum, The Netherlands on February 15, 2011 and via my personal network. The TV professionals who responded have been aware of changing viewing practices combined with new uses of digital media and have become curious about what co-creation could mean for them because they had the possibility to interact directly with the viewers and other program makers.

2.3.

Online community

For this research I used a temporary online community as the workplace for the development of the format. In the community members could participate anonymously if they wanted to. However no one used this opportunity. The makers were recognizable by adding the label maker in front of their names. I used a white label idea generation community and adjusted the look and feel of the website to the topic. The online community offered the opportunity to participate in an easy way at any moment of the day. The website functionalities consist of assignments, newsletters, threads of discussions, a drawing tool and options for uploading all sorts of files.

2.4. Methodology
Preceding the start of the community I set up a platform and developed the communication and assignment plan. I formulated assignments, questions and drew up other material to invoke storytelling, dialogue, discussion, comments, remarks, suggestions, etc. using text, photo and video material. All participants were briefed on the project. The platform was available for the duration of two weeks (like a pressure cooker). The pressure cooker model helped creating a momentum. A limited amount of time was assigned to the different tasks and questions in order to keep focus. During the course of the project I moderated the community and followed up on posts and stimulated contributions or requested for additional information, material or explanations. Every time someone posted on the site, the other community members were informed about this new post via email. This helped to stimulate a visit to the platform. A three step model, common in service design, was used for this co-creation project. The first step was about gathering stories, so called emergence. The viewers were asked to explain the importance of their hobby in their lives and what kind of media they normally use to inform themselves about horses and horse riding. They described in detail what they specifically like about their hobby and how much hours they spend on their favorite pastime. E.g. some focus on competitions while others just like to be around the horse. The answers stimulated interaction between the viewers; they reacted to each other. In this way the program makers gathered a clear vision of what is interesting to the viewers. In step 2 viewers were asked to tell what they would like to see on television. Program makers were asked to make suggestions as well. Insights were gained into the topics participants came up with. What role can the format play in the viewers hobby? What suggestions do the programs makers have that are

2.1.

Cross media format development, a co-production

For this co-creation project I invited television program makers and viewers to join in a temporary online community. The simple existence of the online community is obviously not enough to ensure active exchange of information between community members. Research shows that community members can have different motivations to join [18, 19]. The relevance of the topics is of great importance to the members of the community. Motivational aspects for community collaboration are amongst others the level of engagement of a member [20, 21] and the relationship between the moderator of the community and the members [22]. In order to attract enthusiastic viewers as participants in this project I have chosen for the development of a cross media format in a special interest theme. When viewers are interested in the subject, they are more likely to add a surplus in terms of knowledge [23] and experience and are also likely to contribute more. The theme I picked is horse riding since it is a very popular sport in The Netherlands. Nevertheless it seems difficult to develop a horse program that attracts enough viewers to satisfy (business) goals. The programs Hemel op Paarden (RTL4, 2000-2001) and Horse & Co (RTL4, 2006-2007) both disappeared from the screen despite more than 500,000 practitioners and over 1 million horse lovers in The Netherlands.

77

received enthusiastically by the other community members? It is important to filter the insights and check the value from a viewers and a makers perspective. In step 3, useful ideas were distilled from these insights for the actual develop a cross media format. A brief guideline on How to create a format? of maximum two pages was provided in order to clarify the different elements to the viewers. After the fieldwork was done, all the log files with all contributions were downloaded. To conclude the project, the makers were asked to reflect on the co-operation with viewers. What are their experiences, lessons learned, observations, suggestions for future experiments et cetera? My role as a researcher was to moderate the community and stimulate interaction between participants. The online community offered the possibility to intervene, direct discussions and re-brief the respondents throughout the project or ask for more detail and argumentation.

ambiance; tone of voice; the preferred presenter; different items, et cetera. Participants were asked to contribute to the genre of their liking. By then it became clear that no one was taking the lead. The format was not completed. After two weeks the platform was closed for the viewers just as planned. However for the program makers I created a media wiki. The online community platform I used was well fitted for idea creation, however not user friendly enough to create a two page document where all members could write like in for example Google docs. This media wiki offered a good solution for the usability problem. Unfortunately the change of platform appeared to be a big hurdle. Only four program makers created an account and just one person actually worked on the formats. Talking to the program makers I found out that their lack of time was the main issue. Despite promises to visit the wiki and work on the format, it produced no further results. I had to conclude that the program makers all had a busy schedule and that they set other priorities. Apparently the assignment was too big for a voluntary unpaid activity: creating a format ready for presenting to a broadcaster takes time. The project ended after week 3. The evaluation showed that the program makers enjoyed the experiment. They made suggestions about how to improve the usability of the community platform. They liked the fact that viewers were already selected. Also the speed and quantity of idea generation in this online setting was warmly welcomed. Adding a face to face meeting to the process might help to stimulate the cooperation amongst the program makers. Because they did not know each other in person, they were cautious to work with and adjust the ideas of other makers, afraid of being rude. After thorough analysis of the coded data, much more can be told about the activity of makers and viewers in the community, about their interactions, about the type of results and about the evaluation of this experiment. This will be reported in a next publication.

3. RESULTS
This project aims to deliver valuable information about cocreation in the TV industry. The members of the community were told that the development of a cross media format was the goal of this project. However since this type of co-creation is still scarce in the industry, conducting the project was actually of even more importance than the outcome of the format development. While writing this paper I am in the midst of the data analysis. For this reason you will only find preliminary results in this paper. When starting this project, I was not sure about the enthusiasm amongst media professionals for my plan and their willingness to join the temporary online community. To my surprise it appeared that program makers were very willing to join this project. They were interested in the experiment and looking for opportunities in the field of viewer participation. Ten program makers joined the online community. Only one person who had enrolled did not contribute. For me it was easy to find thirty viewers that wanted to participate. I used my personal network and the network of acquaintances to form this group of viewers. By using the network strategy, there existed a relatively close connection between me, acting as a moderator, and the members in the community. This relation is stressed as an important factor for collaboration by Ludwig, Ruyck & Schillewaert (2011) [22]. Only five viewers refrained from contributing mainly because of a lack of time. All the others participated in the community and contributed their ideas, suggestions and remarks. TV program creation is attractive and talking about your hobby is a pleasure. The creation of assignments and moderating a community like this is nice to do. Since I am a horse rider myself it is presumably easier for me than for someone else because I am well informed on the topics discussed. In the first week stories were gathered and ideas for television programs and topics were shared. However there was hardly any interaction between viewers and program makers. After one week abundant information was put in. For community members it started to look crowded with all different ideas and discussions. The split in different genres in step 3 helped to keep focus. The following genres were suggested: a youth series, a magazine format, a documentary and a (real life) soap. For each genre an example of the different elements that needed to be addressed was given. E.g. target group description; the

4. ACKNOWLEDGMENTS
This Ph.D. research is supported by Leiden University and Inholland University. I thank my promotor prof. dr. P.W.M. Rutten and my co-promotor dr. J. Hermes for their notion that I feel strongly for this part of my research.

5. REFERENCES
Couldry, N., The Place of Media Power. 2000, London: Routledge. Karlsen, F., Sundet, V.S., Syvertsen, T., & Ytreberg, E. (2009). Non-professional activity on television in a time of digitalization. More fun for the elite or new opportunities for ordinary people? Nordicom review, 30(1), 19-36. Williams, p.225-227 in Couldry, N. (2000). The Place of Media Power. London: Routledge. Caldwell, J.T. (2008). Production Culture. Industrial Reflexivity and Critical Practice in Film and Television. Durham and London: Duke University Press. Newcomb, H. & Alley, R.S. (1983). The Producers Medium. Conversations with Creators of American TV. New York and Oxford: Oxford University Press.

78

Livingstone, S. & Lunt, P. (1994). Talk on television, Audience Participation and Public Debate. London and New York: Routledge. Priest, P.J. (1995). Public Intimacies. Talk show participants and tell-all TV. Jersey: New Hampton press. Lotz, A.D. (2007). The Television Will Be Revolutionized. New York and London: New York University Press. Toffler, A. (1980). The Third Wave. New York: Bantam Books. Bruns, A. (2005). Gatewatching: Collaborative Online News Production. New York: Peter Lang. Janssen, S.J. (2009). Interactive Television Format Development - Could Participatory Design Bridge the Gap? In Proceedings of the Seventh European Conference on European Interactive Television (EuroITV09) (Leuven, Belgium, June 03-05, 2009). ACM, New York, NY. DOI=http://doi.acm.org/1542084.1542114 Janssen, S.J. (2011). Next-Gen TV. Ph.d. dissertation Leiden University, unpublished NPO (2010). Verbinden, Verrijken Verrassen. Concessiebeleidsplan 2010-2016. Retrieved from: http://corporate.publiekeomroep.nl/data/media/db_downlo ad/254_0c8a93.pdf on February, 26 2011. Enli, G.S. (2007). The Participatory Turn in Broadcast Television, Institutional, editorial and textual challenges and strategies. Oslo: University of Oslo. Stickdorn, M., Schneider, J. & the co-authors (2010). This is Service Design. Amsterdam: BIS Publishers Moritz, S. (2005). In Stickdorn, M., Schneider, J. & the coauthors (2010). This is Service Design. Amsterdam: BIS Publishers, p. 31.

O'Reilly, T. (2007). What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. Communications & Strategies, No. 1, p. 17, First Quarter. OReilly Media. Dholakia, U. M., Bagozzi, R.P. & Klein Pearo, L. (2004), A social influence model of consumer participation in network- and small-group-based virtual communities, International Journal of Research in Marketing, 21 (3), 241-63. Shirky, C. (2008). Here comes everybody: The power of organizing without organizations.New York: Penguin Press. McLure W., Faraj, M. & Faraj, S. (2005), Why should I share? Examining social capital and knowledge contribution in electronic networks of practice. MIS Quarterly, 29 (1), 35-57. Wiertz, C. & De Ruyter, K. (2007). Beyond the Call of Duty: Why Customers Contribute to Firm-hosted Commercial Online Communities. Organization Studies. 28 (3), p. 349-78. Ludwig, S. Ruyck, T. de & Schillewaert, N. (2011). Op zoek naar de ideale mix: hoe de deelname in online communities voor marktonderzoek stimuleren? In Bronner E.A. et al. (red.). Jaarboek MarktOnderzoek Associatie, dl. 36, 2011, p. 149-170. Haarlem: Uitgeverij Spaar en Hout BV. Shirky, C. (2010). Cognitive Surplus: Creativity and Generosity in a Connected Age. London: Allen Lane

79

Trendy Episode Detection at a Very Short Time Granularity for Intelligent VOD Service: A Case Study of Live Baseball Game
Hogun Park, Sun-Bum Youn, Geun Young Lee, Heedong Ko
Imaging Media Research Center, Korea Institute of Science and Technology (KIST), Seoul, Korea {hogun,dmonkey,gylee,ko}@imrc.kist.re.kr

ABSTRACT
For intelligent VOD service, social networking services (SNS) like Twitter provide great potential for extracting practical metadata of live TV contents. For example, when breaking news occurs, social media bring not only background but also the trend of public opinion almost in real-time. Many previous researches addressed to analyze social media to detect meaningful topics and trends, but they did not attempt to approach a live event at a very short time granularity. In contrast, we aim to identify a time constrained entity-relation graph, so-called episode. The episode is to annotate a realtime event and to provide intelligent VOD service for future IPTV. In our experimental study, we evaluated 33 baseball games and also implemented a baseball watching system.

trendy episodes of a live event and implemented a prototype of live baseball watching system. For evaluation, we attempted to discover noticeable trendy episodes of 33 baseball games and compared it with official baseball records. The live baseball watching system detects a trendy moment during a live event and visualizes its trendy episodes with NL (Natural Language)based description and relevant videos.

2. RELATED WORK
In contrast to many attempts to extract hot topics in news and blogs domains, the small number of works has targeted realtime social media. Most of hot topic detection in social media stream has been attempted in finding a global topical trend and detecting a single instance of seminal events and disasters. Tweettronics4 provides a summary using topical trends and sentiment tendencies with respect to company brands or products. Web2express Digest5 makes use of Twitter to find real-time topical trends. It monitors Twitters real-time stream and helps people to discover any topic of interest to follow. A recent study [5] tried to address tweets tagged with a semantic web conference. They observed that people utilized Twitter to identify what happened during a conference, and they were willing to spread interesting news. However, they did not approach a live event at a very short time granularity. For intelligent VOD Service of TV contents, there were several attempts for extracting practical metadata from social media. [7] suggested a concept that chat data of viewers can be extracted to provide a summary of TV contents. In addition, [2][8] implemented a synchronized group watching interface and indexing VOD contents using their chat data. However, only keyword-level indexing was accomplished and did not detect a topic at a level of episode.

Categories and Subject Descriptors


H.3.3 [Information Search and Retrieval]: Information Filtering; J.4 [Computer Applications]: Social and Behavioral Science

General Terms
Design, Algorithms

Keywords
Social Media, Twitter, Bursty Feature Detection

1. INTRODUCTION
With the increasing popularity of Twitter, numerous spatiotemporal data are being created in real-time. Its real-time nature benefits us to understand a current event at the very moment. For example, as soon as Mr. Choo in Cleveland hits homerun, the news propagates immediately over Twitter, so that we can notice it at the very moment it occurs. This characteristic has great advantage to provide immediate and intelligent support for viewers who are watching TV. The support includes extracting a rich episode description with relevant videos and helping viewers to join a new social community. To support them intelligently, it is necessary to provide a framework to detect an emerging episode and associate it with relevant videos and people. In this paper, we propose a framework to discover

3. DEFINITION OF EPISODE
DEFINITION [Episode] Given a live event, L, an episode, EP is composed of dynamic relationship graphs, R=(E, E, W, T, S) where E represents an entity, T is an appearance time, and S stands for corresponding social network information of opinion leaders and their messages. An episode is intuitively a set of dynamic relationship graphs. The graph is composed of relations among two entities with a time stamp and also includes social network information of corresponding messages of opinion leaders for later interaction. Entity is represented by a noun which indicates people or activities. Therefore, in order to detect an episode from social media, we have four sub-problems to solve: (1) entity detection (2) relationship graph construction (3)

4 5

http://www.tweettronics.com/ http://help.web2express.org/about-digest

80

appearance time estimation (4) social network information. In this paper, we focused on describing (1),(2)&(3) and evaluating (1)& (3) using official baseball records. (4) is also an important challenging problem and have been many research efforts. However, for concentrating our core framework, we simply recommended users who mentioned the entity frequently.

4.1.

Noisy Social Media Filtering


Table 1. Types of Noisy Social Media
Description Presenting their expectation or expressing their hope Miscellaneous things (meaningless mentions, excessive repeats of words, or messages which is too short to understand)

Noisy Types I_State [6] (e.g. believe, intend, or want) Others

4. SYSTEM OVERVIEW

For detecting the above types of noise, we use of two groups of features for each tweet as follows: (1) Keyword features: terms in a tweet. (2) Statistical features: the number of words appearing at a message and the maximum term frequency at a message. Given the features, we created a SVMs (Support Vector Machines) classifier to find noisy Twitter data. We trained the classifier using a collection of positive and negative training sets from Twitter, and only positive Twitter data were used for later temporal peaking profile construction.

4.2.

Temporal Peaking Profile Construction

4.2.1. Peaking Period Detection


Algorithm : Peaking Period Detection Let C be a collection of tweets Let (C) // the average of C Let (C) // the standard deviation of C k0 repeat kk+1 Ek find the highest count ck in a given C sl = left end of the spike having a peak ck sr = right end of the spike having a peak ck for each sl < cj < sr do C C-{ cj } Ek Ek + cj end for until c > + 2 return { E1, , Ek} A peaking period is a unit of trendy time periods of an event. Detecting a peaking period can help to understand the most meaningful moment from a point of social impact. In this section, we will describe how to detect a peaking period of a baseball game using temporal patterns of social media. As in the below graph of Figure 3, sudden spikes caused by emerging number of tweets could be constantly found. To detect a spike, our algorithm checks out only if it appears significantly bigger than the average and finds spikes that increase exponentially. To specific, it firstly looks for global maximums from the set C. Then, utilizing the timestamp of each maximum value, it investigates left and right-ends of the corresponding spike. The above algorithm describes details.

Figure 1. System Architecture

The architecture of our proposed approach is illustrated in Figure 1. First, noisy Twitter data were detected by our semantic classifier. We created a SVMs classifier (Section 3.1) to find noisy tweets. It was trained given a set of positive and negative training samples to filter out I_STATE [6] expressions (e.g. hope/expectation) and other types of noisy messages. Each training set is composed of keyword features like seems like or I believe that and statistical features. After the classification, a temporal peaking profile construction module firstly finds boundaries of individual peaking periods utilizing their gradients and averages on its frequency curve (Section 4.2.1.) Later on, it identifies bursty features of the period by utilizing our parameter-free bursty feature extraction algorithm (Section 4.2.2.) In order to find bursty features, it computes a probability that each feature in a period is likely to make a burst. Subsequently, the extracted bursty features are used for episode detection (Section 4.3) as a form of dynamic relationship graphs. Detected episode is summarized with NLbased description (Section 4.4.) Figure 2 shows a sample result of our system, and it detects two trendy episodes with rich contextual information like who, what, and opinion leader.

Figure 2. Examples of System Output

81

graphs at two different temporal periods. A circle represents an individual activity, and a square means a person. They are associated with temporal episodes. Case 1 depicts that Park gets a strik-out, but we cannot discover any relationship with homerun and Lee. On the other hand, case 2 presents different relationships among three concepts. It is the case that Lee just hit a homerun from Park. In this paper, a relation graph is constructed through co-occurrencebased dynamic relationship corroboration [1], and a type of entities to detect is restricted to baseball actions and player names. Entity detection is accomplished from matching bursty features with entries of official baseball league database. Figure 3. A Sample Result of Peaking Period Detection

4.2.2. Bursty Feature Extraction


Twitter data are normally much noisy and not formalized well, so it is hard to apply traditional TDT or keyword extraction approaches. Existing previous work of bursty feature extraction on Twitter is to exploit frequency-based term weights or utilize clustering-based term extraction [3]. However, they need many parameters or thresholds in deciding their burstiness, and the values would be much variable depending on the size and pattern of data. In contrast, our approach computes statistical probability that Twitter data steam is likely to contain a particular feature. Thus, we model the distribution of a feature in a peaking period by binomial distribution. Let Nw is the number of features in an peaking period, w, and nfre is the frequency of f appearing in an period. Then, we compute the burstiness of each feature, f, (w,f;pe) using the sum of probability distribution function, as in (1). It is the modification of [4] for bursty feature detection at each peaking period.

Figure 4. Sample Results of Dynamic Relation Detection

4.3.2. Topical Continuity Detection


This section describes topical continuity detection of extracted entities. When we detect an entity on social media, it is important to avoid a temporal bias for estimating correct appearance time of the entity. Thus, a topic about the entity is likely to remain over a social network even if it actually happened many hours ago. We define it as a topical continuity problem. For eliminating the topical continuity, our system computes cosine similarities among entities at near time periods. If the similarity is significantly large, it regards firstly appeared entity as true. The detail experiment results will be presented in our later publication.

4.4.

Relation Graph-based NL(Natural Language) Text Generation

Table 2. A Sample Mapping Table for Generating a NL text


Relation <Person> <Action> <Person> <Person> Natural Language Text to Generate Are you curious of how Person A did Action B ? Check out how Person A and B made an accident!

is a probability mass function, and its expected probability, pe is calculated by (3). The expected probability denotes the average of term frequency over whole peaking periods. L means the number of peaking periods. To find out the probability that a feature is likely to be a bursty feature in a peaking period, if a function is larger than 1, we can consider the feature distribution as an abnormal behavior. It means that the probability is apparently higher than the prior probability of the feature, and the feature can be selected as a bursty feature.

To provide VOD service, it is necessary to describe an episode with more expressive representations. In our proposed system, natural language texts can be generated from a mapping table, as in Table 2. For example, if there is a triple, <Lee> <Homerun>, then a sentence, Are you curious of how Lee did Homerun?, will be generated. Therefore, each episode can have multiple summery sentences, and they are selectively utilized for VOD service.

5. EXPERIMENT 5.1. Data Set and Evaluation Method


For the evaluation of proposed framework, we crawled data of 33 baseball games from Twitter. Twitters messages referring to a baseball game could be found at tweets marked with corresponding hash tags such as #gotigers and #doosanbears. Test collection which we crawled using Twitter API6 consists of 33 games (8 teams) and 1761.37 tweets per a day on average. In the test set, each game had 9.52 peaking periods, and a peaking period included 2.48 bursty features on average. For evaluating

4.3.

Episode Detection

4.3.1. Co-Occurrence- based Dynamic Relationship Detection


Relation detection is the task of identifying a relevant association between two features. Key challenges in detecting a relation include identifying the type of features and discovering temporally defined relationships at a given time period. In particular, from the dynamic nature of social media, it is hard to utilize any pre-existing schema or knowledge for understanding an episode. For example, Figure 4 describes sample relation

JavalibraryfortheTwitterAPI,http://twitter4j.org/en/index.html

82

our system, we measure the performance of entity extraction and appearance time estimation of episodes, respectively. In order to evaluate the performance of entity detection, we measured mean average precision (MAP) score by comparing extracted features with web casts of Korea Baseball Organization. The web casts describe the official baseball game records including player names and activities like a hit or a homerun. For the MAP score, entitys were tagged with true, only if web casts have corresponding players and activities within near 5 minutes time window. Appearnce time estimation of episode was also evaluated by comparing the time stamp of each episodes peaking period with actual appereance at web casts. For the time estimation, we only considered episodes including entities which tagged with true.

Figure 6. The system provides real-time trendy-keywords and highlights with relevant VODs.

7. CONCLUSION
In this paper, we propose a generic framework to discover trendy episodes for intelligent VOD service of future IPTV. This is also the first quantitative study on Twitter for a live event at a very short granularity. In the experiment, it showed much promising results and indicated that the real-time nature of micro-blog service is helpful to understand the event. In the future, our framework will be expanded into other live event domains and evaluated.

5.2.

Results and Discussion

8. ACKNOWLEDGEMENT
This research is supported by Korea Institute of Science and Technology under "Development of Tangible Web Platform" project.

9. REFERENCES
Sarma, A.D., Jain, A., Yu, C. 2011. Dynamic Relationship and Event Discovery, In Proc. of WSDM '11, 207-216. Shamma, D. A., Shaw, R., Shafton, P. L., and Liu, Y. 2007. Watch what I watch: using community activity to understand content. In Proc. of MIR '07, 275-284. Grinev, M., Grineva, M., Boldakov, A., Novak, L., Syssoev, A., and Lizorkin, D. 2009. Sifting micro-blogging stream for events of user interest. In Proc. of ACM SIGIR '09, 837. Fung, G.P.C., Yu, J.X., Yu, P.S., and Lu, H. Lu. 2005. Parameter free bursty events detection in text streams. In Proc. of VLDB '05, 181-192. Letierce, J., Passant, A., Breslin, J., Decker, S. 2010. Using Twitter During an Academic Conference: The #iswc2009 Use-Case, In Proc. of AAAI ICWSM '10, 279-282. Pustejovsky, J., Castao, J., Ingria, R., Saur, R., Gaizauskas, R., Setzer, A., and Katz, G. 2003. TimeML: Robust Specification of Event and Temporal Expressions in Text. In Proc. of IWCS-5. Miyamori, H., Nakamura, S. and Tanaka, K. 2005. Generation of views of TV content using TV viewers' perspectives expressed in live chats on the web. In Proc. of ACM MULTIMEDIA '05, 853-861. Park, H., Youn, S.B., Hong, E., Lee, C., Kwon, Y.M., Ko, H., Park, M.W., Sohn, Y.T., and Kim, J.K. 2010. Sharing of baseball event through social media. In Proc. of MIR '10, 389-392.

Figure 5. Influence of the Number of Tweets on Performance Table 3. Performance of Entity Extraction and Appearance Estimation
MAP 0.79 Appearance Time Estimation Error (sec) 61.81

Table 3 suggests that our entity extraction and appearance time estimation are useful for understanding a real-time event. The MAP score of entity extraction was about 0.79, and appearance estimation error was about only 1 minute. Even though the remaining features are tagged with false, most features are still related to baseball topics such as cheer leaders and casters. In Figure 5, it is shown that how the number of tweets affects its performance. Each point represents the result of each game, and games which have lower than 400 tweets were ignored in these graphs. In the graph, the number of trendy episodes increased as more tweets are generated. At the same time, its MAP score of each game is also steadily increased after the number of tweets is more than about 1000. Thus, the descriptive power to a live event and precision are enhanced as people write more tweets.

6. PROTOTYPE OF LIVE BASEBALL WATCHING SYSTEM

Figure 6. Screen Dump of Prototype of a Live Baseball Watching System :Trendy Keywords and their Video Playback As a prototype of intelligent VOD service on future IPTV, a live baseball watching system was implemented. It detects a trendy moment during a live event and visualizes its trendy episodes with description and relevant videos at that time, as in

83

Spatial Tiling And Streaming in an Immersive Media Delivery Network


Omar Niamut TNO Brassersplein 2 Delft, The Netherlands +31 8886 67218 omar.niamut@tno.nl Martin Prins TNO Brassersplein 2 Delft, The Netherlands +31 8886 67816 martin.prins@tno.nl Ray van Brandenburg TNO Brassersplein 2 Delft, The Netherlands +31 8886 63609 ray.vanbrandenburg@tno.nl Anton Havekes TNO Brassersplein 2 Delft, The Netherlands +31 8886 67121 anton.havekes@tno.nl

ABSTRACT
Within the EU FP7 project FascinatE, a capture, production and delivery system capable of supporting pan/tilt/zoom interaction with immersive media is being developed. Intelligent networks with processing components are needed to repurpose the content to suit different device types and framing selections. With this poster presentation, we report on the latest and ongoing developments of the FascinatE delivery network functionality. The associated prototype implementation demonstrates the key concepts of this functionality, with a focus on enabling interaction on mobile devices, such as smartphones and tablets. Within the EU FP7 project FascinatE [4], a capture, production and delivery system capable of supporting pan/tilt/zoom (PTZ) interaction with immersive media is being developed by a consortium of 11 European organisations, including partners from the broadcast, film, telecoms and academic sectors. The FascinatE project will develop a system to allow end-users to interactively view and navigate around an ultra-high resolution video panorama showing a live event, with the accompanying audio automatically changing to match the selected view. The output will be adapted to their particular kind of device, covering anything from a mobile handset to an immersive panoramic display. At the production side, an audio and video capture system is developed that delivers a so-called Layered Scene; a multi-resolution, multi-source representation of the audiovisual environment. In addition, scripting systems are employed to control the shot framing options presented to the viewer. Intelligent networks with processing components are used to repurpose the content to suit different device types and framing selections, and user terminals supporting innovative gesture-based interaction methods allow viewers to control and display the content suited to their needs. As an example application, the FascinatE system should enhance home media viewing with gesture-based navigation through a sports game, selecting players to follow and zooming in on interesting events. With this poster presentation, we report on the latest and ongoing developments of the FascinatE delivery network functionality. The associated prototype implementation demonstrates the key concepts of this functionality, with a focus on enabling interaction on mobile devices, such as smartphones and tablets.

Categories and Subject Descriptors


H.5.1 [Information Interfaces And Presentation]: Multimedia Information Systems video, immersive media, interactive media

General Terms
Experimentation, Verification.

Keywords
Immersive media, spatial segmentation, tiled streaming, HTTP Adaptive Streaming.

1. INTRODUCTION
The media industry is currently being pulled in the often-opposing directions of increased realism and personalization. That is, the notion of immersive media with high resolution video, stereoscopic displays and large screen sizes seems contradictory to leveraging the users ability to select and control content and have it available on personal devices. Leveraging high resolution video to offer immersive media services has been studied by NHK in their Super Hi-Vision 8k developments [1]. In the international organization CineGrid.org [2], 4k video plays a central role. At Fraunhofer HHI, a 6k multi-camera system, called the OmniCam, and an associated panoramic projection system was recently developed [3].

2. RELATED WORK
Interaction with immersive media was recently demonstrated by KDDI [5]. The demonstrated prototype allows a user to zoom into a region of interest (ROI) on a mobile device. The ROI parameters are sent to a network proxy, which then crops the transmitted video to reduce the overall video bandwidth. That is, since the user is looking at a specific ROI, only that spatial part of the video can be transmitted without loss of resolution. While this method is relatively simple to implement, it scales poorly when the number of users increases. PTZ interaction was extensively studied by Mavlankar et al. In [6], the authors describe a video coding approach which allows for extracting ROIs directly from the coded bit stream. They note that this method results in nonstandard codec behavior and therefore requires significant modifications to existing deployed hardware.

84

In-device rendering A/V Data Compression & Encapsulation

Theater

In-network rendering

Home

Mobile

Figure 1: FascinatE network functionality and in-network rendering. In [7], Mavlankar employs a spatial tiling and streaming method for ROI interactivity. This approach has been tested and trialed in the ClassX online lecture system [8]. In this paper, we implement and test Mavlankars work in [7] as a basis for the FascinatE delivery network. We further describe our research challenges and planned extensions for our spatial tiling testbed.

Figure 2: Spatial tiling and representation on multiple enddevices. requirements, properties and challenges. However, an overall challenge lies in enabling spatial synchronization at the FRN, i.e. to ensure that the different spatial segments that make up a single video frame are temporally synchronized. Simple experiments have shown that when a spatial segment is even a single frame out of alignment this has severe impact on the perceived video quality of the resulting video.

3. FASCINATE DELIVERY NETWORK


The FascinatE delivery network plays a key role in delivering the Layered Scene towards a users end device. This process encompasses the large-scale delivery to network edges, and the subsequent personalization in the network edge, or on the endterminal. Figure 1 displays this concept, where the audio and video rendering takes place in a so-called FascinatE Rendering Node (FRN). As depicted by the triangular shapes, such an FRN may be located in the network, on the users end-device, or a combination of the two, depending on the use case. In the case of a mobile end-device, such as a smartphone or tablet, the innetwork rendering can play a significant role in reducing the bandwidth and computational resources towards and on the enddevice.

3.2 Using Multicast Delivery


Multicast video delivery is most often used in managed IPTV networks. Within the FascinatE delivery network, it enables independent delivery of scene layers by means of Multicast Groups. The FRN requests the layers it needs via IGMP and the layers are transmitted to the FRN using RTP. It is imperative that the video frame can be aligned at the FRN. Furthermore, the signaling of layers requires study, e.g. the mapping between layers and multicast groups. As with existing IPTV deployments, this approach may suffer from acquisition delays, or ROI switch delays. Existing fast channel change solutions based on retransmission may reduce these delay times.

3.1 Spatial Tiling and Streaming


A comparison between ROI-based coding and spatial tiling/streaming was performed in [9]. While ROI-based coding appears to be more efficient with respect to bandwidth resources, we find spatial tiling to be a suitable candidate for distributing a Layered Scene in the FascinatE delivery network. With spatial tiling, the video is segmented a priori into several spatial regions. Each spatial segment, or tile, is stored as an independent video stream. Based on the display size and the selected ROI, one or more segments are displayed simultaneously on the end device, as shown in Figure 2. Thus, only that part of the video is transmitted that is actually needed on the Fascinate Rendering Node, which allows for optimum usage of network resources and optimal distribution of the Layered Scene in the Fascinate delivery network. The FascinatE delivery network may encompass a multitude of existing delivery network types, such as IPTV multicast, over-thetop CDN and hybrid broadcast broadband. Therefore, we plan to study both push and pull-based delivery protocols for the distribution of spatial segments between spatial tile server and the FRN, mainly RTP multicast (push) and HTTP adaptive streaming (pull). Each of these methods comes with its own set of

3.3 Using HTTP Adaptive Streaming Delivery


HTTP adaptive streaming, or HAS, has recently emerged as a potential standard for video delivery over best-effort networks. It enables independent delivery of tiles by means of HAS technology, where the tile streams are time-segmented and required tile streams are requested by the end-device via HTTP. Although a HAS standard is being developed in MPEG DASH [10], choosing an adaptive streaming implementation may still prove to be difficult. Also, the frame alignment for recombination of layers and the signaling of layers needs to be resolved. Furthermore, current HAS implementations suffer from significant buffering delays.

4. PROTOTYPE IMPLEMENTATION
In order to experiment with the different transport mechanisms in combination with tiled streaming we have developed a spatial tiling testbed and a first prototype of the FRN. This architecture is shown in Figure 3. While FascinatE introduces the Layered Scene concept that consists of different audio and video layers, in this first implementation we use single-layer video content only.

85

6k x 2k uncompressed panorama image sequence ROI selection & device capabilities

Spatial Tiling

AV Rendering Node

M x N spatially segmented compressed video files in multiple resolutions

Stitched, scaled and compressed ROI video stream

Figure 3: The spatial tiling testbed architecture.

4.1 Architecture Description


The testbed consists of two nodes: a segmenter and an FRN. The segmenter, an offline segmentation tool, segments a video into M x N spatially segmented tiles, which are stored in separate video files. In order to enable zoom-operations without losing fidelity, the segments are offered in multiple resolutions. This way the complete video can be shown on a small-screen device while limiting the necessary bandwidth. The FRN consists of two components: a combiner which recombines tiles to create the preferred view and an adapter, which adapts the created video view to the capabilities of the end-device. The combiner performs spatial recombination of the individual tiles to recreate the entire scene or a selected ROI. The result is an uncompressed video stream. The combiner is fully inter-active: ROI commands fed by the end-user client through a HTTP request are processed in real-time, which leads to the creation of the selected view, e.g. a ROI on the next video frame after the command is received. Figure 4 shows a screenshot of the output of the combiner; The adapter performs scaling and cropping of the raw video stream generated by the combiner and encodes the video to a format supported by the end-device. The adapter also provides delivery of the video using a multitude of multimedia transmission protocols (RTP/H.264, MPEG-TS over RTP, Adobe Flash RTMP [11] and the Apple HTTP adaptive streaming implementation [12]. During playback, the session between the Adapter and the End-User clients remains constant, regardless of the ROI commands given by the terminal, so no session renegotiation or decoder reinitialization is required when changing views. This approach allows for the support of end-user terminals with limited capabilities.

Figure 4: Screenshot of the output of the combiner. Some artificial lines have been added to illustrate how the video is the result of the recombination of different spatial tiles. field, this field is set with a random offset at the start of the RTP stream per the RTP specification [13]. This means that this field cannot be used to link RTP packets belonging to different spatial tiles to each other since their relative position with respect to the start of the stream is not known. In order to solve this we need to modify the RTP server so that instead of using random offsets for the RTP timestamp field, it uses a fixed offset. This allows the RTP receiver -in this case the FRN- to use the timestamps field as a shared timeline between different tiled streams and thus as a way to synchronize different spatial tiles making up a video frame. A similar mechanism is sometimes used to synchronize separate audio and video streams. Another aspect that needs to be resolved is determining the start of a media access unit, i.e. the start of video data that can be decoded independently from previous data transmitted in the video stream.

4.2.2 Time-segmented HTTP


The advantage of using time-segmented HTTP for the delivery of spatial tiles is that the inherent time-segmentation makes it relatively easy to synchronize different spatial tiles in the FRN. As long as the time segmentation process makes sure that timesegments between different spatial tiles have exactly the same length, the relative position of a frame within a time segment can be used as a measure for the position of that frame within the overall timeline; for example, frame number X within time segment Y of tile A should be synchronized with frame number X within time segment Y of tile B.

4.3 Observations
Our current testbed includes time-segmented HTTP streaming and can provide interactive playback on a multitude of devices, including a desktop video player and an Android tablet. This has allowed us to make of the following observations.

4.2 Transport mechanisms


As discussed in section 3, we have chosen to implement two different transport mechanisms between the segmenter and the FRN; RTP-based multicast and time-segmented HTTP. Due to the very different nature of these two mechanisms, the spatial synchronization between segments making up a video frame has to be performed using two different methods.

4.3.1 Interactivity delay


One issue we have noticed is a delay, in the order of seconds, between ROI selection and seeing the resulting change on a mobile terminal. The main causes for this delay are video encoding and transmission in the adapter, buffering and decoding in the video client and the switching between different spatial segments. The recombination process itself introduces almost no delay. A performance increase has been achieved by optimizing delivery, encoding settings and decoder initialization:

4.2.1 Multicast RTP


The problem with using a separate RTP streams for each of the spatial tiles is that there is no common timeline between the different RTP streams. While RTP packets provide a timestamp

86

We noticed that without proper signaling, the H.264 decoder initialization on the Android and desktop video player could take about two seconds due to inefficient signaling of Network Abstraction Layer parameters; Initial optimizations between the buffering between combiner and adapter lead to variable results: decreasing the buffer threshold optimizes interactivity latency, but also increases playback issues due to buffer under runs. A larger buffer threshold increases smooth playback but will contribute to the latency; The time-segmented HTTP approach is not suitable in combination with interactivity, e.g. the selection of a ROI, since its implementation requires the buffering of multiple segments, resulting in high delays when switching ROIs. We implemented a fast view change method by having a secondary low resolution stream of the full video image always available for upscaling.

demonstration through Apps on Android, iOS and HTML5capable smartphones and tablets. Also, multi-screen usage will be considered, where the demonstration combines a large screen display with a mobile device. Finally, we will extend the FRN to allow it to work with FascinatE Layered Scene content.

6. ACKNOWLEDGMENTS
The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 248138.

7. REFERENCES
[1] M. Maeda, Y. Shishikui, F. Suginoshita, Y. Takiguchi, T. Nakatogawa, M. Kanazawa, K. Mitani, K. Hamasaki, M. Iwaki and Y. Nojiri. Steps Toward the Practical Use of Super Hi-Vision. NAB2006 Proceedings, pp. 450-455 (2006). [2] Cinegrid, http://www.cinegrid.org/. Visited: February 2nd, 2011. [3] Fraunhofer HHI, http://www.hhi.fraunhofer.de/en/departments/imageprocessing/applications/omnicam/. Visited: February 2nd, 2011. [4] FascinatE, http://www.fascinate-project.eu/. Visited: February 18th, 2011. [5] KDDI R&D Labs Three Screen Service platform. http://www.youtube.com/watch?v=urjQjR5VK_Q. Visited: January 17, 2011. [6] Mavlankar, A., "Peer-to-Peer Video Streaming with Interactive Region-of-Interest," Ph.D. Dissertation, Department of Electrical Engineering, Stanford University, April 2010 [7] Mavlankar, A., Agrawal, P., Pang, D., Halawa, S., Cheung, N., Girod, B., "An Interactive Region-of-Interest Video Streaming System for Online Lecture Viewing," Special Session on Advanced Interactive Multimedia Streaming, Proc. of 18th International Packet Video Workshop (PV), Hong Kong, Dec. 2010. [8] ClassX, http://classx.stanford.edu/. Visited: February 18th, 2011. [9] Khiem, N., Ravindra, G., Carlier, A., and Ooi., W. 2010. Supporting zoomable video streams with dynamic region-ofinterest cropping. In Proceedings of the first annual ACM SIGMM conference on Multimedia systems (MMSys '10). ACM, New York, NY, USA, 259-270. [10] T. Stockhammer, Dynamic Adaptive Streaming over HTTP Standards and Design Principles. MMSys11, February 23 25, 2011, San Jose, California, USA. [11] Adobe, http://www.adobe.com/devnet/rtmp.html Visited: February 18th, 2011. [12] Pantos, R., May, W., HTTP Live Streaming, draft-pantoshttp-live-streaming-05, 2011. [13] Schulzrinne, H., Casner, C., Frederick, R., Jacobsen, V., RTP: A Transport Protocol for Real-time Applications, RFC3550, IETF, 2003.

4.3.2 Frame synchronization and video encoding


Another issue we have stumbled upon is problems with tile synchronization caused by the different frame types used in video encoding. Encoded video often consists of three types of frame: Iframes, B-frames and P-frames. I-frames are the only frames that can be decoded without requiring knowledge of other frames within the video and are therefore the least compressible. B- and P-frames can only be decoded by sequentially decoding all interleaving frames up until the next or previous I-frame. When switching to a new ROI, it will usually be necessary to switch to another set of spatial tiles. When such a switch occurs on a B- or P- frame, the entire GOP must be requested in a worst case scenario. In our case it is necessary to seek within this file to the temporal location (the timestamp) of the frame which is currently being processed. However, there is a good chance that this frame will be a B- or P-frame, in which case the underlying video decoder will simply return the next I-frame. In normal situations where a user seeks within a video this is not a problem since the resulting frame will often only be about 12 frames of, which is usually good enough. However, in the case of spatial recombination, even a single frame out of alignment has severe implications on the perceived video quality. It is therefore necessary for the combiner to be aware of this problem and always request the preceding I-frame when seeking within a video and then sequentially decode all subsequent frames up until the requested frame.

5. CONCLUDING REMARKS
This poster presents a spatial tiling testbed and the first implementation of a FascinatE Rendering Node. This set-up allows users to view and navigate around an ultra-high definition video, while making sure that only those parts of the video are transferred that are actually watched, optimizing bandwidth requirements. On the testbed we have implemented and tested a time-segmented HTTP delivery method that can be used to deliver spatial tiles in the FascinatE delivery network and we have shown how spatial synchronization can be achieved using either of the two methods. Future work will consider a variety of topics, such as an additional RTP-based delivery method, the perceptual evaluation of frame alignment errors, the optimization of latency and quality, and between the level of interactivity and coding performance. The prototype will be extended to allow for

87

Inclusion of multiple devices, languages and advertisements in iFanzy, a personalized EPG


Pieter Bellekens Stoneroos pieter.bellekens@stoneroos.nl
ABSTRACT
In this paper, we introduce the current and ongoing research in iFanzy, a personalized electronic program guide, within the context of the NoTube project. This project specifically focuses on the development of an architecture for the personalized creation, distribution and consumption of television content, by using Semantic Web technologies. In particular, we focus on three iFanzy aspects of this architecture namely the inclusion of multiple devices, multiple languages and personal advertisements within the iFanzy interface and back-end server.

Interactive Television Sumatralaan 45 Hilversum, The Netherlands

Annelies Kaptein Stoneroos annelies.kaptein@stoneroos.nl


first place in the list of most often used digital services1 . While a simple but strong concept, the EPG concept started to become cumbersome as more and more channels became available requiring users to scroll through endless channel lists. In response to this problem, attention shifted towards the personalized EPG or PEPG, delayed watching and VOD content. In a PEPG the schedule and/or channel list is preselected based on a user profile. In other words, a system analyzes the users profile and tries to estimate which TV programs fit that user best in a particular situation or context. For the evolution of the PEPG, we refer to [4].

Interactive Television Sumatralaan 45 Hilversum, The Netherlands

Categories and Subject Descriptors


H.4 [Information Systems Applications]: Miscellaneous; H.3.3 [Information Search and Retrieval]: Information Filtering

2.

General Terms
ubiquitous tv, personalized advertisement, pepg

1.

INTRODUCTION

The television medium has evolved enormously during the last decade, finally making an end to its rather conservative reputation lasting through the previous century. During the first decade of this century we finally witnessed the highly anticipated rise of the digital television platform. Although the digitalization itself did not bring any new features, it did introduce the television medium into the existing digital world where data is plenty and interactivity key. Among the first interactive applications on the new digital television platform, we saw the Electronic Program Guide or EPG which basically provided the user with an overview of all available channels and programs broadcast. Given that every fervent TV watcher wants to check the television broadcast schedule daily, this seemingly straightforward service soon evolved to one of the most frequently used applications. Moreover, in the Netherlands the EPG holds the

iFanzy2 is a personalized EPG developed by Stoneroos Interactive Television3 [1]. At its core, iFanzy is a regular EPG displaying a huge list of channels, providing an extensive overview of tonights television schedule. However when a user logs in and enables the inclusion of his accumulated profile, the iFanzy algorithms become able to estimate to which extent every program in the current listing matches this particular profile. In Figure 1 we see the interface of the iFanzy Web, when a user is logged in. In this figure we see the home page with in the bottom left a small PEPG listing three channels. The programs shown on these channels get a specific color ranging from very soft toward very strong orange, indicating how well this program fits with the profile of the current user. In this way the interface provides a recommendation for the programs playing right now. For a more elaborate overview, the interface also contains a large listing which can be found under the tab TV-GIDS showing all the channels in the system, colored accordingly in shades of orange. However, the small PEPG differs for everyone as it shows the three most appreciated channels of the current user, either indicated by himself or deduced from his program ratings.

IFANZY

3.

BACKGROUND

Currently, new features and collaborations are being developed in the context of the NoTube project4 . The NoTube project is a FP7 project with the clear goal of developing an end-to-end architecture, based on semantic technologies (a field of standards and technologies which allow a machineunderstandable and machine-processable representation of http://www.kijkonderzoek.nl/images/stories/rapporten/ SKO TV in NL 2010 def k.pdf 2 http://www.ifanzy.nl 3 http://www.stoneroos.nl 4 http://notube.tv/
1

88

4.

MULTI-DEVICE

Figure 1: The iFanzy Personalized Program Guide the meaning of digital content), for personalized creation, distribution and consumption of television content. Within this project iFanzy was brought in as excluded background technology, meaning that it could be developed further within the context of this project. NoTube further allows for the support of a variety of realistic future television scenarios, including a scenario dealing with personalized semantic news, a personalized TV guide with adaptive advertisements and a TV in the Social Web demonstrator. For more information, we would like to refer to the NoTube Web site. The iFanzy participation in NoTube focuses on multilingual and multi-device support for iFanzy clients and the generation of personalized advertisements. Different iFanzy clients, potentially served in a variety of different languages, need different interfaces, interaction and information extraction, depending on that particular device and/or language. However, in the background, at the server side, communication from different clients still flows into one semantic database for the extraction of patterns and deduction of interests. Moreover, every action performed at any of the platforms has a direct influence on the others. Further, knowing which device the user is currently using determines his context and can therefore influence personalization strategies. For example, if the user uses iFanzy on his iPhone, we can favor advertisements for other iPhone apps. In this paper, we introduce three iFanzy features which we develop or extend within the NoTube context. In Section 4 we take a short look at the different devices which are deployed in iFanzy to assist users in selecting the perfect TV program. Afterward in Section 5, we show how iFanzy copes with different languages found across Europe and the ease of adding new ones. Thirdly in Section 6, we look at other benefits of personalization in iFanzy, here illustrated by the deduction of personalized advertisements. We conclude this paper with some conclusions in Section 8.

iFanzy was built to provide a seamless and ubiquitous television experience to the user, independent of his choice of platform. Currently, iFanzy consists of a Web application, a set-top box environment, an iPhone application and a Yahoo Widget running on connected TVs. Each of these platforms is carefully tailored to support the user with the functionality mostly expected from the respective platform. Behind the scenes, all three connect to the same server assuring their mutual synchronization of data. Thereby, iFanzy guarantees that every action performed on any of the platforms, has an immediate effect on all. For example, a rating given to a TV program via the Web interface will influence the generated recommendations on the iPhone app immediately. However, different devices also introduce different features and limitations. From the interaction side, a personal computer including input devices like keyboard and mouse is better suited to facilitate the user with more complex interactions than the average mobile phone. In the Web interface for example, a user can browse the complete channel overview and inspect every available program in a very elaborate fashion. E.g. after clicking a program, iFanzy shows all information it was able to grab from multiple online data sources including the synopsis, cast and accompanying roles, trailers, links to related websites, the users rating (or an estimation made by the recommendation algorithm) and an iFanzy rating which is the average of all ratings this program has received from its user base. Further, for every program the user has a number of actions at his disposal to among others add the program to its favorites, set alerts (e.g. via e-mail), add programs to his planner, etc. Currently, iFanzy also runs on several thousands of set-top boxes in households across the Netherlands. There, while sitting at the television, users have much less input capabilities, as usually the remote control is the only alternative. Here the interface is optimized to serve the relevant functionalities with as few key-presses as possible. Moreover, the most frequently used actions in a given situation are bound to the colored action buttons on the remote. For the set-top box environment, the personalized EPG was completely redesigned to enable fast browsing of channels and programs with just the four arrow keys. Figure 2 depicts a screenshot of the set-top box interface, with a transparent navigation panel at the left-hand side. Lastly, the iPhone interface demands even different requirements mainly due to the limited screen size. Therefore, people can not get a thorough overview of the many channels available. Hence, the iPhone interface focuses on browsing all the programs available on one particular channel or programs sharing a particular genre like e.g. the genre movie. Trailers are shown only if they match an encoding format compatible with the iPhones player.

89

6.

PERSONALIZED ADS

Figure 2: The iFanzy set-top box interface

5.

MULTILINGUAL

Over the last few years iFanzy has been introduced in a number of different countries across Europa, as shown in Figure 3, including the Netherlands, Belgium, Germany, Austria, Switzerland and Turkey. However, unavoidable are the multitude of different languages spoken within this part of the world. Still, even so iFanzy needs to be able to deal with these languages and preferably in an efficient way. Moreover, some languages do not only differ in semantics and syntax, but also in terms of character set or in direction of reading. For example, among the iFanzy translations we have Korean which introduces a whole different character set. To cope with this issue, iFanzy maintains an interlanguage mapping scheme which maps different words, word sequences and sentences between languages.

In iFanzy, every user who logs into the system accumulates a user profile containing a consolidation of all actions he or she ever performed on any of the different iFanzy client applications. This profile does not only serve as input for the generation of program recommendations, but rather for all applications of personalization. One such other application is the generation of personalized advertisements where we estimate for every one of the available advertisements how well it fits a specific user, given his or her current situation or context. The two main (and necessary) information sources to facilitate this technology include an extensive user profile and a thorough description of the relevant domain items, in this case advertisements and TV programs. While the former is covered by the standard iFanzy user profile, the latter is covered by egtaMETA and TV-Anytime two specific metadata specifications built to describe advertisements and TV programs respectively. As described in [3], egtaMETA can be summarized as a set of semantically defined attributes considered to describe advertising material as follows: Descriptive information: e.g. the title of the ad. Exploitation information: e.g. what is the period during which it shall be used. Credits: includes key persons involved in the creation, post-production and release of the ad. Technical information: e.g. the file format and its audio, video and data components. Having both the user profile and the description of relevant domain items, the third step in the process is to define the algorithms to combine these two resources and produce the personal advertisement selection. In principle, personalized advertisement algorithms can be compared with general recommendation algorithms where ads are recommended instead of regular TV programs, music, books, etc. However, there is one big difference. General recommendation systems are usually fed by the users explicit or implicit feedback on the target items. The user for example rates or reads books, listens to music, watches TV programs, etc., whereas this is not the case for advertisements. Having no direct feedback on ads complicates the recommendation process severely. In theory, some feedback could be elicited if the user effectively buys the product which was advertised, although practically such scenarios can be regarded as too intrusive. Assuming that this information is not available, we can distinguish three different approaches to select the best fitting advertisements for a specific user: Stereotype-driven approach, User profile-driven approach and the Item-driven approach.

Figure 3: iFanzy across Europe + South Korea By means of this mapping we can easily create an iFanzy interface in a new language just by adding the correct mappings between any of the available languages and the new language instances. Moreover, through this mapping scheme a language can easily be changed if the end-user requires so. Imagine an iFanzy interface launched within a specific country matching the native language and an immigrant user not speaking this language. In such a case, the interface can be reloaded in the users mother tongue, if correctly indicated in his or her profile, by exploiting these mappings.

6.1

Stereotype-driven approach

In the advertisement world ads often come accompanied by stereotypes of the target group they aim at. E.g. Beer advertisements could contain the stereotype of males between 18 and 28 years old. If the advertisement indicates the intended audience by means of stereotypes, and we can deduce to which stereotypes a user belongs, we can make an ad selection. However, this generalizes the ad selection to a large

90

extent. For example, an ad for a sports car could be labeled by the provider for a target group Men, Age 35-55, Wealthy, however women might be interested just as well.

6.2

User profile-driven approach

A second approach is to solely look at the user profile of the user, and compare it to the metadata of the advertisements to find out which ones fit best. This method in principle compares all the attributes of the ad with the structured data amassed in the user profile. E.g. if the user profile contains a high interest in the genre auto racing, we could deduce a potential interest in advertisements of sport cars.

6.3

Item-driven approach

As previously mentioned, the recommendation of advertisements suffers from the fact that users do not rate or provide any other direct feedback on advertisements. Usually users experience advertisements as obtrusive and annoying not making them eager to provide their potential interests. However, we could, to some extent, circumvent this issue by looking at other items of the domain (in this case for example TV programs), and compare these to the advertisements. After all, we do have the users feedback on TV programs. If we can find connections and similarities in the metadata of TV programs on the one hand and advertisements on the other, we can deduce the users interest in advertisements. In the TV domain, other strategies towards personalized ads can be found among others in the iMEDIA project [2].

6.4

Advertisement placement

The placement of advertisements is another important issue, as we want to draw as much attention to them as possible without making them too obtrusive, potentially interfering with the global interface. We investigated this ad placements in terms of different devices. Hence, the iPhone interface requires a different placements than on a Web or set-top box interface. In Figure 4 we see the iFanzy iPhone interface with an included ad. Given the interface, the bottom of the screen was here the most suitable place for an ad. In Figure 2 we see an ad in the bottom right corner, again conspicuous while not being too obtrusive.

Figure 4: The iFanzy iPhone interface including a personalized advertisement maps different words, word sequences and sentences between languages. Lastly, we have developed a set of algorithms to facilitate personalized advertisements based on the accumulated iFanzy profile. To evaluate the latter, we have planned a user study in the next quarter of this year.

9.

REFERENCES

7.

ACKNOWLEDGEMENTS

This work has been (partially) supported by the EU project NoTube (http://www.notube.tv). NoTube (ICT-231761) is an Integrated Project (IP) of the European Unions 7th Framework Programme: Information and Communication Technologies (ICT) Call 3.

8.

CONCLUSIONS

In this paper we discussed the ongoing research in iFanzy, a personalized electronic program guide, extended within the context of the NoTube project. We focused on three different aspects of the research, namely the inclusion of multiple devices, multiple languages and personal advertisements. We have shown that iFanzy has been developed on several different platforms where every platform obtained specific interface features and controls to assist the user as good as possible in finding the right TV programs. Further, due to the expansion of iFanzy across several different countries, we have developed an inter-language mapping scheme which

[1] Pieter Bellekens, Geert-Jan Houben, Lora Aroyo, Krijn Schaap, and Annelies Kaptein. User model elicitation and enrichment for context-sensitive personalization in a multiplatform tv environment. In Proceedings of the seventh european conference on European interactive television conference, EuroITV 09, pages 119128, New York, NY, USA, 2009. ACM. [2] Theodoros Bozios, Georgios Lekakos, and Victoria Skoularidou. Advanced techniques for personalized advertising in a digital tv environment: The imedia system. In In Proceedings of the eBusiness and eWork Conference, 2001. [3] EBU/egta. Metadata for the file exchange of advertising material (egtaMETA). EBU Technical October 2010 (http://tech.ebu.ch/egtameta). [4] Barry Smyth and Paul Cotter. Case studies on the evolution of the personalized electronic program guide. In Liliana Ardissono, Alfred Kobsa, and Mark T. Maybury, editors, Personalized Digital Television: targeting Programs to Individual Viewers, pages 5372. Kluwer Academic, 2004.

91

ITVinIndustry

92

Convergence of Televised Content and Game

Wei-Yun Yau
wyyau@i2r.astar.edu.sg
Institute for Infocomm Research

Kong-Wah Wan
kongwah@i2r.astar.edu.sg
Institute for Infocomm Research

Sujoy Roy
sujoy@i2r.astar.edu.sg
Institute for Infocomm Research

Hwee-Keong Lam
hklam@i2r.astar.edu.sg
Institute for Infocomm Research

Wei-Yun Yau received his PhD degree in computer vision (1999) from the Nanyang Technological University. He is the recipient of the TEC Innovator Award 2002, the Tan Kah Kee Young Inventors Award 2003 (Merit), Standards Council Merit Award 2005, IES Prestigious Engineering Achievement Awards 2006 and the Standards Council Distinguished Award 2007. A paper he co-published in 2008 received the Pattern Recognition Journal Honorable Mention 2010. Currently, he is a Programme Manager, leading the research in Interactive Social Tele-Experience and also the Chairman of IPTV Working Group, Singapore. The Institute for Infocomm Research (IR) is a member of the Agency for Science, Technology and Research (A*STAR) family. Established in 2002, our mission is to be the globally preferred source of innovations in 'Interactive Secured Information, Content and Services Anytime Anywhere' through research by passionate people dedicated to Singapores economic success. IR performs R&D in information, communications and media (ICM) technologies to develop holistic solutions across the ICM value chain. Our research capabilities are in information technology and science, wireless and optical communications, and interactive digital media. We seek to be the infocomm and media value creator that keeps Singapore ahead.

Executive Summary
Television has long been the source of our home entertainment and news since its introduction in the late 1920s. Recently, videogames and computer games have become popular. The question then is whether the televised content and game are always mutually exclusive? This motivates us to explore the convergence of televised content and games. Here, we proposed a content driven game, where the game is related to the televised content and the viewer will play the game while watching the content. Based on our analysis, we categorized the content driven game based on processing needs of the content to insert the game and the relationship between the content and the game. We then translate this into framework architecture of the contentdriven game as shown in Figure 1. Networking Module

user view and interact with the game as it is easy to implement. For the networking component, we used wireless 802.11g network to communicate between the netbook and the game server while the communication between the game server and the content analysis server is via the wired Ethernet running 100Mbps. The television used is the ITU-T H.721 compliant device which can support the Lightweight Interactive Multimedia Environment (ITU-T H.762) which is a subset of HTML4, CSS and ECMAscript. Thus the user can also play the game on the television itself in addition or as an alternative to using the netbook, esp if the viewer is alone. The content analysis module determines the various events that occur during the football match received in near realtime such as goal, goal shot, corner kick, penalty, yellow or red card offence and the region of the field where the ball is in. The analysis will be able to provide critical moments where the game should not be activated. The content analysis module also keeps track of the players and extracts the image and face of the players which would be passed to the game server. The games implemented include guessing the player and spot the ball. Figure 2 shows the TV showing the football match and with the game shown on the TV.

Scoring Module

Content Rendering


User Interaction

Metadata & Control

Metadata Extraction

Game Server


Content

Content Analysis

Game Rendering

Figure 1. Content driven game framework

Figure 2. System implemented on H.721 compliant TV User evaluation showed that most users would like to play the game while watching the football match but the game has to be interesting and easy to use.

We implemented a case of game synchronized with a football game. The football game considered is streamed live to the content processing component via a Video-On-Demand server to simulate live broadcasting. We used netbooks as the user interface devices where the

93

heckle.at TV
interjection as annotation
The TV of the future could work more like the theatre of the past, where the real action was off-stage, in the social activity of the stalls. Heckle is an experimental web-service that captures the comments, asides, and discussion generated by an audience to annotate video. Heckle explores how conversations between a mix of co-present and remote viewers can be orchestrated to elicit rich, divergent metadata.
A Play in a London Inn Yard, in the Time of Queen Elizabeth. From Thornbury's Old and New London, Cassel & Co, 1881.

heckle.at environmentalism

Using a second screen device, users are able to interject text, Google images and video into a pool of 'heckles', which can be user-rated.

Popular 'heckles' can then be shown on the shared 'main' screen, in the style of news tickers, speech bubbles or infographics. 'Heckles' can then be integrated with stills as a visual overview of the video, also enabling text search within the video timeline.

Saul Albert, Queen Mary University of London


is a PhD researcher in the department of Computer Science and Electrical Engineering's Media & Arts Technology Programe. He is currently working with the BT Research & Technology, exploring second screen applications for Social TV. In 2006 he co-founded The People Speak, where he continues to work as an artist, technologist and strategist.

The People Speak is a participatory media, art and technology partnership that creates 'tools for the world to take over itself': www.thepeoplespeak.org.uk BT is one of the worlds leading communications services companies, serving the needs of customers in the UK and in more than 170 countries. www.bt.com The Media & Arts Technology Programme is an innovative inter-disciplinary research programme in the science and technologies that are transforming society's creative possibilities and economies. www.mat.qmul.ac.uk

94

Lettheaudiencedirect

"We've got a few worried editors in the control room. This is an experiment and it's up to you to make it work. George [Mazarakis, executive producer], Peter [Griffiths, interactive editor] and Jessica [Pitchford, managing editor] are monitoring your suggestions. They're hyperventilating a bit, so help them by becoming a part of our team." - Derek Watts (Carte Blanche anchor)

iTV experiment: In late 2010 Carte Blanche turned 22 years old. As an experiment to involve the audience directly in live TV
and to push the limit of social media in this environment, viewers were asked to direct the show via and during our birthday broadcast. The audience could choose which stories went out on the night (4 out of 8), what order they went out in, and how these stories were introduced by the show anchor. The experiment was designed to manipulate the ratio of likes to posted content by using the button as a voting tool. Eight video promos were posted on Facebook and the one that received the most likes from viewers went out first, followed by 2nd, 3rd and 4th. Viewers could vote until seconds before the stories aired, meaning that the 4 stories with the highest votes at the start of the show wouldn't necessarily air; no story was guaranteed a spot viewer interactivity meant the votes were constantly changing. In order to encourage as much "talking" as possible around the experiment, viewers were asked write intro and outro links to each story by commenting underneath the video promos. On Twitter, viewers could also suggest links by tweeting us. This process was also part incentivised through giving prizes to each person whose links we made use of. Likes, comments and tweets were monitored in real-time and story changes were done almost to the second - quite a feat in a live studio. The result - a significant spike in activity - clearly shows how successful this experiment was, making the rest of the graph look almost completely flat, and shows that live TV experiments can be used to draw in audiences who wouldn't normally engage in social media applications and/or associate to our brand in this way. The experiment showed that traditional broadcast TV in a live environment can still have a competitive edge using standard social media tools in a live studio. We want to be around for another 22 years, and this experiment shows the extent to which live TV can make use of its competitive advantage in the social media space. It also showed that editorial control in a news environment can be passed on to viewers without compromising the show and that viewers increasingly want to be part of and see the (behindthe-scenes) process. As on critic wrote: "If the unofficial slogan for Carte Blanche since its inception has been The right to see it all, it's now added a new one: The right to be involved in it all." The presentation will include video of the event as well as graphs showing stats to show the extent to which this experiment spiked all forms of social media traffic.

Biography: Peter Griffiths has been the Interactive Editor for Carte Blanche for five years. He manages all
technologies and projects to enable the audience to effectively interact with the television programme. Input is critical to the survival of Carte Blanche as many of the stories investigated come from viewers. Peter has worked in traditional publishing, but has a keen interest in web, mobile and using social media as an interactive tool on TV.

Production: Carte Blanche, an investigative current affairs show, has won over 120 local and international awards, and is
credited as one of the most recognisable brands in South Africa. It is broadcast live during primetime every Sunday night. Several sub-brands - including Carte Blanche Africa, Carte Blanche Medical, Carte Blanche Consumer and Carte Blanche Extra have broadcast on other nights of the week during primetime.

Company: Carte Blanche is produced by Combined Artists, which specialises in live TV, live broadcast events, and bringing
new live formats to air. Most recently, the company produced the 2010 South Africa FIFA World Cup Final Draw. Established in 1981, the production company has also produced the "Millennium Broadcast" with Nelson Mandela, transmitted live to a global audience of four billion viewers from Robben Island, Cape Town, and the Miss SA Beauty Pageants and Miss World.

95

RendezVous AneditorialruledbasedcontextualTVinformationsystem
Sebastien Poivre - Orange Labs - France - sebastien.poivre@orange-ftgroup.com

Comp any presenta tion


France Telecom-Orange is one of the worlds leading telecommunications operators with 169,000 employees worldwide, including 102,000 employees in France, and sales of 45.5 billion euros in 2010. Present in 32 countries, the Group had a customer base of 209.6 million customers at the end of 2010, including 139.7 million customers under the Orange brand, the Group's single brand for internet, t elevision and mobile services in the majority of countries where the company operates. As of December 31 2010, the Group had 150.4 million mobile customers and 13.7 million broadband internet (ADSL, fiber) customers worldwide. Orange is one of the main European operators for mobile and broadband internet services and, under the brand Orange Business Services, is one of the world leaders in providing telecommunication services to multinational companies. The aim of R&D is to shed light on the future and contribute to the Group's growth and competitiveness. It provides the portfolio of innovative projects with high growth potential solutions focused on customers' needs, helps maintain and develop existing assets and forecasts requirements in the medium and long term. Sebastien Poivre is a software architect and developer for both web and mobile devices at Orange Labs, specialized on modeling and metadata processing, as well as user interaction study.

RendezVous overview
Orange is investigating the best way to give added value experience to customers watching TV, assuming the aggregator role and thanks to its network abilities. RendezVous is a system which provides both users interactivity and enriched additional information about the show users are watching. It matches content obtained through partnerships with content editors or found by automatic systems. The system is an application for smartphones or tablets. It involves a second display and does not impact the main TV display in order to preserve the full TV screen collective experience and to protect creators rights while offering rich interaction possibilities. Due to the position of Orange as a central operator, RendezVous aggregates information for every channel. For channel partners it offers more visibility, even if the service is also available for their competitors. Among our first partners, a well known French TV channel, agreed that having a larger audience is better than trying to lock fewer users on a specific application. Not only the additional information they provide will be more easily accessed by end users and the overall experience for their show will be enhanced. Additional information offers a new dimension of differentiation for content provider. Content displayed on the second screen can be related to voting, betting, shopping, fan or social pages concerning the media or the performers, etc. With the contextual router RendezVous, producers can offer new transmedia services, defining new crossmedia storytelling experience. Intra-content notification, with information not restricted to the whole media, but to specific moments or events paves the way to a more dynamic and interactive experience, and to more user customization possibilities. In order to cover, with the highest quality, the whole range from high-end shows to lesser audience contents RendezVous relies on a semi-editorial system based on semantic analysis, media search treatments and a rule based system. After an internal test with some 200 testers, a wider field experiment will be launched during 2011 with more media partners to asses our aggregator approach versus channel specific applications and to have a better understanding of the needs of media producers for provisioning interfaces.

96

Settop,overthetop,future!
Miguel Pinto is a Senior Innovation Architect, at Novabase Digital TV, currently working in the R&D team. Miguels areas of specialization are computer methodologies, interactive TV application design and systems integration. From self-care solutions for pc, mobile and set-top boxes to interactive newspaper solution from 2000 to 2008 he collaborated with ZON Multimdia building interaction solutions. In 2008 he integrated the R&D team design set-top and over the top solutions for Novabase Digital TV. Founded in 2000 (at the time as Octal TV) Novabase Digital TV delivered set-top box solutions to several broadcasters in Europe (ex: ZON, etc) over the years, as part of the Novabase group (largest Portuguese IT services company ) is currently focusing in SIP (System in a Package) solutions for set-top and over the top solutions in Europe (ZON, Mediasat, canal plus, etc), India (Reliance), South America (Telefnica) Since the beginning of TV, the TV experience is confined to the living rooms as a family experience. With the advent of internet came a new less passive, interactive way to access content, turned TV in the way of secured content, and better TV experience was provided using devices such as set-top boxes. As of today improved hardware and software is paving the way for entertainment to jump from the living room to were the customer desires. In TV realm this is also true, were the contents and the social environment around them are starting to jump from the living room (Set-top Box paradigm) to the PC, smartphones and Connected TVs (over-the-top paradigm). This emerging market shift presents a huge challenge to TV service providers were they need to adapt the way they deliver content in these new markets. To answer this growing necessity from our clients, we designed the IMMS platform. The IMMS platform is built over a three layered model, where a Silverlight application acts as a client for the services provided by IMMS CORE (EPG, VOD Catalog management, user management, rights management, etc...).The IMMS Core layer provides services to client either by acting as proxy for the client services providing functionalities above cited or by access to information stored on the platform (VOD catalog on MS Commerce server or EPG information in local database ). The IMMS Core Layer also provides back-end functionalities by providing dedicated interfaces for information update on VOD catalog and EPG data . Last but not least the IMMS platform assumes the existence of third layer of services provided by the client in order to support client dependent information usage (user profile, billing, etc...).

97

As delivered out-of the box, the IMMS platform provides a Client with low level of customization options available such as simple branding, advertising management and few color schemes (ex: picture to the right). The MMS Core is provided with VOD catalog and EPG gatherer services using a predefined format for data acquisition (meta-data and media), providing also services for authentication and requiring only that the client provides a WCF service providing user profile management and billing. The basic setup of the platform allows to predict a period of 6 to 7 month to fully deploy the platform to market. If the client requires some higher level of customization, such as new color schemes, navigation pane adjust- ments, stronger rebranding and new navigation panels on the client (ex: picture to the left) or even stronger in- tegration between IMMS CORE and the client services by integrating the IMMS core workflows in to the existing client process workflows, will impact on overall project estimates by adding 3 months (Low impact integration) up to more 5 month on more radical integration reviews on the client (ex: full screen mode, global rebranding and reviewed navigational logics), making a estimate of up to 11 months for project conclusion. Summing the IMMS platform provides a global solution that gather solutions for adaptive bit rate media broadcast, media and metadata management (ex: EPG), digital rights manage- ment and also a device client to use over HTTP. The IMMS platform achieves this using Mi- crosoft technologies such as IIS,SQL Server ( for EPG, Media and metadata manage- ment), Smooth streaming (for adaptive bit rate broadcasting), Windows Media DRM (for digital rights management), WIF (for user and authorization management) and Silverlight (client application), known as Media Portal. The usage of Silverlight provides an open door to provide this over the top service to any PC or Device, only requiring a browser that supports Silverlight plugin. The smooth streaming al- so allows that windows phone 7 devices to be ready to use the client, either by browser or natively. This fact adds an advantage for 4play service providers, because it allows direct in- tegration of the service on devices prior to sell (this can be an advantage when taking in ac- count the Nokia/Microsoft recent partnership that may result in mass market, low cost win- dows phone 7 devices). During this presentation it will be shown how an over the top solution, supported by Micro- soft technologies, can be delivered from 6 months to 11 months depending on client custo- mization (UX customization) and target device (just pc deploy or pc and native windows pho- ne 7 deploy).

Av. D. Joo II, Lote 1.03.2.3 Parque das Naes,1998-031 Lisboa, Portugal

98

Euro iTV2011 -Industry Track 05.05.2011

SmartTVandhowtodoitright
The convergence of television and internet is developing faster than ever. With the lately es- tablished HbbTV standard (Hybrid Broadcast Broadband TV), a pan-European initiative man- aged to harmonize the broadcast and broadband delivery of entertainment and pushed televi- sion to another level. After the web and the mobile world, now television has become smart. Most major European channels have therefore integrated HbbTV (lately also called Smart TV) along their channels; still the awareness among TV consumers is rather limited. Optimis- tic estimations for German households assume a diffusion of up to 23 million HbbTV ready television devices until 2014. In early February 2011, 34 regular television consumers with a high interest in the internet but no experience with Smart TV services were invited to facit digitals specialized lean back user experience lab. The goal was to assess the user experience of HbbTV applications provided by the major German TV broadcasters and to find out about success driv- ers of those program-related services via the Red Button. The test was conducted on a Humax iCord device and a 55 flat screen monitor by Samsung. All subjects were highly surprised by the new options Smart TV provided them. The tested services were clearly described as adding value to their TV experience and all participants intended to use Smart TV at home if they had the according device. As one of the key results, participants chose the EPG and media library as the core functions of a program- related Smart TV service, which in their eyes is currently only provided by ARD (First Pub- lic German Television). There they easily found background information, trailers as well as the option to recapture material of the last 7 days in a simple and structured way. In terms of design, participants clearly favored the bar- or split screen approach over a full screen approach. In the industrial track at EUROITV 2011, further success factors and identified UX prob- lems of Smart TV will be presented to the audience. Additionally the case study maxdome as an example of VOD integration into program-related services will be discussed.
Test Setting

The author/presenter
As a cultural anthropologist, Mirja Baechle is dedicated to understanding how people are affected by today's technology in their daily lives. With over 7 years of experience in the field of user experience research, she has consulted for clients such as Sky, AlcatelLucent, Vodafone or T-Home on how to improve their web-based products and services.

The company
Facit digital (facit-digital.com) is a research and consulting company for digital media based in Munich, Germany. Their goal is to use empirical based user insights for the de- velopment and optimization of digital communication channels. They are one of Germany's leading user experience research companies.
facit digital 2011

99

InnovatingUsability

100

The ubiquitous remote control.


1) Introduction: Problem statement: There are a lot of electrical devices in the house each having their own remote which results in complexity. 2) The past: Many attempts took place to simplify the user experience by moving to one universal remote . However, most people got used to their original remote and do not want to change. Furthermore, the setup and daily use of a universal remote was difficult. 3) The future: A seamless experience: Some examples and best practices will be shown that will lead to a better user experience in the future
- Improved interaction with the Remote Control - HDMI - Multiple screens

4)

Practical lessons based on what went often wrong in the past.


- Technical constraints - Balancing the needs for backward compatibility versus the attractiveness of innovative user interaction

5)

Conclusion - No single winner for the next generation of TV user interfaces. - Early involvement of all stakeholders, teamwork and strong leadership. - Requirements for the UI of the future

By: Ir. Michael W.P. DHoore, Philips, Home Control.

Working at Philips: 0395- 03 Product Management & New Business Development for Remote Controls // Europe R&D and Project Management for CD & DVD // Belgium and Singapore

Education: Master in engineering, Mechatronics & Business Administration 85 - 92 Katholieke Universiteit Leuven (KUL), Belgium & TU-Braunschweig, Germany

Market experience: working for the cable, telephone and satellite pay TV operators Innovative technical and business solutions: Working with key suppliers (Industrial Design, Technology providers, Usability experts) Product and usability know how for remote controls in house Special interest area: needs of seniors / elderly people. Context of the business: > 10 account managers worldwide working in the pay TV market segment 3 development sites (China, Singapore, Europe) Production mainly in Asia (China & Indonesia)

101

Content plus Interactivity as a Key Differentiator


Pay TV customers expect more from their television sets than just watch the traditional streams of unidirectional linear content. Service providers, that strive to differentiate themselves with new and innovative services, can tap this potential with ground-breaking Multi Screen Interactive TV Applications. But its not enough to have a compelling Interactive TV application to drive users, content is also fundamental and the interaction needs to be with it and not only about it. The successful introduction of live interactivity, in Portugal Telecoms MEO Service, on premium talent and reality shows like Portuguese Idol and Secret Story demonstrates that, with the right mix of content and interactivity, users will use the applications and feel it adds value to the experience. The Portuguese Idol application had multi-camera support for live streaming, all the content from previous shows available directly on the TV, news feeds with photos and video, direct access of Idol Facebook Wall on TV and a host of other features. The Secret Story application allowed MEO clients to select multiple direct feeds from the house, both in a mosaic fashion and in fullscreen. In addition, photos, videos and information about each contestant, and a detailed timeline with the most significant events from the house were always just a red button away. With interactivity in mind, a poll mechanism was implemented, allowing users to vote and see the results every day. This was one of the most successful applications developed to date, with over a million visits on the first month, and drove an increase in the usage of all the interactive applications. The presentation will focus in showcasing these and other successful interactive applications available today in Portugal Telecom commercial offer MEO.
Company and Author Bio PT Inovao is the research and development branch of Portugal Telecom Group. The Interactive and Convergent Television and Multimedia Development Center is fully focused on IPTV Interactive Applications, Mobile TV streaming, Rich Media Applications and OTT solutions and services that help operators monetize their assets and distinguish themselves from the competition. Bernardo Cardoso received his Master degree in Information Management from University of Aveiro in 2002. He joined PT Inovao in 2000 and is currently heading the Interactive and Convergent Television and Multimedia Division. Previously, since 2001, he has been consulting, researching and developing in the areas of Interactive TV and Digital Video related technology, within the Multimedia Technology Unit, being involved in research, development and deployment of products and systems such as TV Cabo Interactive TV project and Portugal Telecom MEO IPTV successful commercial offer.

102

ITVstrategy:TheuseofDirectandIndirectCommunicationasa strategyforthecreationofinteractivescripts
Elizabeth Furtado University of Fortaleza State University of Cear 1321, Washington Soares Avenue Fortaleza, CE, Brazil +55 085 34773400 Elizabethsfur@gmail.com Patrcia Vasconcelos Active Brazil University Estcio of Cear 5335, Santos Dumont Avenue Fortaleza, CE, Brazil +55 085 32654950 Patricia@activebrasil.com Niedja Cavalcante University of Fortaleza 1321, Washington Soares Avenue Fortaleza, CE, Brazil +55 085 34773400 Niedjac@gmail.com

This case study made use of direct and indirect communication as a strategy for the production of television programs scripts with a focus on live studio audience TV shows. It will look at the presentation of the author, TV company, and then the case study. The University of Fortaleza (UNIFOR) is a research institution very well regarded with graduates of more than 47,000 professionals working on the social-economic-cultural process in the Northeast of Brazil. Its campus has a research laboratory for the studies of users and systems - LUQs, in which post-doctoral researcher at Stanford University, Elizabeth Furtado, uses the laboratorys environment to conduct researches focusing on digital TV. Elizabeth participated in three research projects for DTV: i) SBTVD, whose goal was to develop the Brazilian standard DTV (Furtado et al, 2005), ii) SAMBA, which aimed at promoting digital inclusion through the production of DTV content for Italian and Brazilian users (Samba, 2007 ) iii) The M-Player DTV Interactive, which aimed to integrate mobile telephony operators in Brazil with the Brazilian DTV system allied to the needs of interactive users. For this case study, UNIFOR has partnered Doctor Elizabeth with the TV broadcaster: TV Dirio, the largest media group in the state of Cear. Regardless of its size TV Dirio is still not yet prepared to make use of Brazilian digital TV systems full potential. They do not yet have the necessary experience to script, recognize, or utilize the power of direct communications with the audiences. Thinking about this, we developed a communication strategy for the production of television scripts by creating an interactive script that was exclusively used in LUQs for the television program: "Your Morning from Diario TV'. The theme chosen to reach a good audience was that of a story of betrayal. Now with the TV program set and also the simple script, we had to think of how to integrate the three blocks into the program, each block already had a set interactivity location allotted, where there is no room for additional interactivity. 1 Block Opening of the TV show Wifes presentation o 1 Interview o 2 Interview o 1 Interaction The presenter invites viewers to vote using remote control and cell phone. By demonstrating the menu options of the remote control and explaining the subtitles appearing on the TV screen showing all phone numbers available for voting the presenter motivate viewers to interact. o 1 Break
o

2 Block 3 Interview o 2 Interaction- Without any presenters guidance subtitles appear on the TV screen with phone numbers available for voting. Husbands presentation o 4 Interview o 3 Interaction - Presenter informs partial voting results to viewers using digital board. o 2 Break
o

3 Block Final considerations o 4 Interaction indirect invitation to play a game. o 5 Interaction Presenter informs viewers that they can vote throughout the day Closure.

103

(Figure 1)

The script was set but did not contain details on the interactive, only stood the possible locations where they might occur. The structure given for the program would be the same one used commonly. The team then entered the LUQs interactivity to the script to embed the applications and content that would be integrated (figure 1). In this interactive guide, we also defined the actions and words of the presenter to meet the objectives of the evaluation. We tried to do this in two ways: with the direct communication (Figure 2) and indirect communication of the presenter (figure 3). With the direct communication we suggest and prompt the audience to interact at specific times. With indirect communication, the announcer just reminds the audience that it can interact throughout the day, despite having finished the program.

(Figure 2) LESSONS LEARNED Television engineers are focused on having a way resembles the work of Human-Computer Interaction (HCI). to communicate with (Figure 3)

the viewer that

Scriptwriters had some difficulty in inserting moments of interactivity in the script of the program due the fact that interactivity is not a familiar element of the traditional script. Thus, we believe that insertion of the interactivitys ideal moment in the script should take into consideration: the interactive applications, the television program and the target audience.

We conclude that a program must include an audience interactive script to facilitate user interaction with applications to digital TV. The actual presenter has a great influence on the teaching of this interaction and motivation of the participation of subjects through direct communication, since 90% of the participants interacted during the direct prompting of the presenter. Whereas, only 10% of participants noticed or resolved to interact through the indirect communication. ACKNOWLEDGMENTS The authors would like to thank FUNCAP and FINEP for financial support. The work leading to this research was supported by Dirio television network. SELECTED REFERENCES [1]SAMBA DOW Description of Work. Projeto Samba - Documento interno, 2007 [2]FURTADO, E.; CARVALHO, F.; SOUSA, K.; SCHILLING, A.; FALCO, D.; FAVA, F. Interatividade na Televiso Digital Brasileira: Estratgias de Desenvolvimento das Interfaces. In: Simpsio Brasileiro de Telecomunicaes, 2005, So Paulo. Simpsio Brasileiro de Telecomunicaes. SBC, 2005.

104

AmbientMediaEcosystemsforTVAforecast2013
Authors:ArturLugmayr,TUT;VilleOllikainen,VTT;TuijaAalto,YLE;TimoManninen,YLE;HaoZheng,TUT;ErikBckman,YLE;Sauli Niskanen,TUT;JormaKivel,JUTEL;JereHartikainen,JUTEL;JuhaRapeli,Hitmedia.Contactauthor:lartur@acm.org 1.Introduction NELME (New Electronic Media) is an industrial cooperating project. The project provides justifiable predictions about the most presumableandcriticalfuturesofelectronicmedia,andspecificallythefutureofbroadcastuntil2020.Thegoaloftheprojectis to impact media industry, regulation authorities and policy makers with the project results, and most importantly help the participantstodeveloptheirbusinessandR&Dplan.ArturLugmayrleadsWP4AmbientMediaExperiencewithgreatcontribution fromYLE.ThemajorcooperationincludesVTT,YLE,Metropolia,MTV,Jutel,Anvia,Tieto,SofiaDigital,Hitmedia,Genelecandetc. Moredetailscouldbefoundatwebsite:www.nelme.com 2.MotivatingForces Themotivatingforcesaredividedintothreecategories:technology,contentandservice.Thisprojecttriestoexplaintheintrinsic link between three parts, and not only gives major players in traditional media field, but also enumerates major players in innovativeambientmediafields 3.DiscoveredMegatrend This WP4 generates the five major megatrends as the below figure: Smart Services, Content and Spaces (Internet of Things, ambientserviceathome,EPGextendedservice,informationdistributionmultichannels,smarthouseholddevices,);Broadcasting andProducingforMobilePhone(thefusionofbroadcasting,personalbroadcasting,importanceofartisticexperience,broadcasting system of mobile platform, SNS website and TV program, economical mechanisms in content business); Ambient Audience Research, Advertising and Merchandizing (measuring audience experience, automated distribution for Ads and system announcement,Adsandpurchasetogether,preventingAdsskippingsystem);PersonalandLivingArchivesinDigitalEnvironments (consumer as the content creator, live content tagged, personal living archives,the profit for online personal archives, personal livingarchivesasdistributingchanneland);andAmbientProActiveSystems,Interaction,andContentConsumption.

4.MajorChanges Therearesomepotentialchanges:theemergingoftelevisionandcomputerseemstobeinevitable;Internetofthingsbasedonthe fusion of universal networks; advertisers use higher efficiency method to monitor the reaction of consumers; Internet searching companiesprovidemoreindepthsearchservices;broadcastcontentsareadvertisedandpersonalized;consumersarecreatorsof mediacontents;multicustomerutilizeddevicesareimprovedandenhanceinteractiveexperience;theevolvementfromtraditional TV to ambient TV; Innovative TV displaying technology; SNS connected with TV; mobiletagging for Ads and purchasing action happen together. In the future, the preliminary integration of all kinds of networks is an irreversible megatrend for Internet, Internetofthings,broadcastingnetwork,andmobilebroadcastingnetwork.

105

Tutorials

106

Designing and Evaluating Social Video and Television


David Geerts CUO,
IBBT / K.U.Leuven Parkstraat 45 Bus 3605 3000 Leuven, Belgium +32 16 32 31 95

david.geerts@soc.kuleuven.be ABSTRACT
In this tutorial, we will discuss how the social uses of television have an impact on how we should design and evaluate interactive television and online video applications. After categorizing several social TV applications, we will focus on the concept of sociability, and explain how this can be evaluated using guidelines and heuristics. We will also discuss how sociability can be studied by performing user tests, both in the lab as well as in the field, and which aspects of testing are different from studying usability. Although the guidelines and user tests are especially appropriate for designing and evaluating social television systems and online social video, parts of it are also suitable for other iTV or online video applications. user experience. Although several guidelines for evaluating the usability of interactive TV exist [3, 10], and heuristic evaluation as well as usability testing is a well-known and often practiced technique, for applications being used in a social context such as the social television systems and online social video applications mentioned above, evaluating only usability is not enough. Even if these applications are evaluated to improve their usability, it doesnt mean that the social interactions they are supposed to enable are well supported. This tutorial wants to fill this gap by teaching researchers and practitioners how to design and evaluate social features of interactive television and online video. Based on his extensive experience in performing user tests of social television systems for evaluating their sociability (e.g. [5]), the presenter will explain the practical issues related to performing user tests with iTV focused on social interactions. Furthermore, he will discuss the sociability heuristics he has created based on these tests [6], as well as several other social interface guidelines, and explain how they can be used to evaluate social television systems and online social video applications, or social aspects of interactive television and online video in general. The proposed structure of the tutorial is as follows: first, the social uses of television as documented by several media researchers [9, 11] will be shortly introduced. They will be linked with the current state of interactive television services and online video applications, including a wide range of social TV systems and social video applications on the web, and the need for designing and evaluating sociability will be explained. The applications will be categorized according to a recent framework created by Cesar & Geerts [2]. After this, an overview of sociability evaluation methods focused on social interaction will be discussed, including small exercises. Then, an overview of twelve sociability heuristics the presenter has developed will be given, along with an explanation of how to use them to evaluate iTV and online video. Finally, a practical hands-on session will be held in which the participants can apply the sociability heuristics to an online social video application such as Watchitoo (or due to the fast

Categories and Subject Descriptors


H.4.3 Information Systems Applications: Communications Applications, H.5.2 User Interfaces: Evaluation/methodology, H.5.1 Multimedia Information Systems: interactive television

General Terms
Measurement, Design, Experimentation

Keywords
Sociability, heuristics, evaluation, social television

1. INTRODUCTION
In past EuroITV conferences, as well as at uxTV2008, social TV has proven to be an important and exciting new topic of research in interactive television. In the past years, social TV has made the move from academic and industrial research labs [e.g. 1, 4, 7, 8] to the consumer market. The introduction of widgets on connected television sets (e.g. by Yahoo!, Opera and GoogleTV) opens up a whole range of possible social TV applications, and many social video applications on the web have been launched (e.g. Watchitoo, Clipsync, Sofanatics, ). Even more applications for the iPhone and iPad are being created (e.g. Miso or Tunerfish), bringing into practice the use of a secondary screen as ideal social interaction medium. As is good practice in user-centered design, evaluating these systems early and often is important to create an optimal

107

changing nature of the area another suitable application that will be available at the time).

2. SCHEDULE
30 min. introduction to the social uses of (interactive) television and online video 60 min. overview of social TV applications 60 min. discussing sociability evaluation methods 45 min. overview of sociability heuristics 45 min. practical exercise

3. TARGET AUDIENCE
The target audience for this course is researchers and practitioners that design or evaluate interactive television or online video applications, and that want to focus on social aspects of iTV and online video. The tutorial requires a basic background in general user-centered design methods, but most other concepts related to the content of the tutorial will be explained in detail.

audience: e.g. introductions to new technologies for secondary school children, practical seminars in web design for university students and in-depth courses on usability topics for practitioners. Currently he teaches a master course in Human-Computer Interaction for students in communication science and economy, as well as usability design courses and workshops for practitioners. David Geerts has organized workshops and SIGs at CHI2006, CHI2007, CHI2008, EuroITV2007 and EuroITV2008. He finished his doctors degree on Sociability of Interactive Television, for which he has developed twelve heuristics for designing and evaluating social television interfaces. David Geerts is co-founder of the Belgian SIGCHI.be chapter, and was program chair of EuroITV2009, the 7th European Interactive TV Conference.

7. REFERENCES
[1] Boertjes, E., Schultz, S. and Klok, J. Pilot ConnecTV. Gebruikersonderzoek. Public report of the Freeband Project, TNO (2008) [2] Cesar, P. & Geerts, D. (2010). Past, Present, and Future of Social TV: A Categorization. To be presented at CCNC'2011 Workshop SocialTV . Las Vegas: IEEE. [3] Chorianopoulos, K. User Interface Design Principles for Interactive Television Applications. In International Journal of Human-Computer Interaction, 24, 6, Taylor Francis (2008) [4] Coppens, T., Trappeniers, L. & Godon, M. AmigoTV: towards a social TV experience. In Masthoff, J. Griffiths, R. & Pemberton, L. (Eds.) Proc. EuroITV2003, University of Brighton, (2004) [5] Geerts, D. 2006. Comparing voice chat and text chat in a communication tool for interactive television. In Proc. NordiCHI 2006, 461464. [6] Geerts, D., & Grooff, D. D. (2009). Supporting the social uses of television: sociability heuristics for social tv. In Proceedings of the 27th international conference on Human factors in computing systems (pp. 595-604). Boston, MA [7] Harboe, G., Massey, N., Metcalf, C., Wheatley, D., & Romano, G. The uses of social television. In ACM Computers in Entertainment (CIE), 6, 1 (2008) [8] Harrison, C., Amento, B., 2007. CollaboraTV: Using asynchronous communication to make TV social again, In Adjunct Proceedings of EuroITV2007, 218-222. [9] Lee, B. & Lee, R.S. How and why people watch tv: implications for the future of interactive television. In: Journal of advertising research, 35, 6 (1995), 9-18 [10] Lu, Karyn Y. Interaction Design Principles for Interactive Television. Master's Thesis, Georgia Institute of Technology (2005) [11] Lull, J. The social uses of television. In: Human Communication Research, 6,:3, Spring, (1980) 197-209

4. LEVEL OF THE TUTORIAL


The level of the tutorial is beginner.

5. TUTORIAL HISTORY
This tutorial was first organized as a workshop for interaction designers and usability professionals as part of the EU CITIZEN MEDIA project at the K.U.Leuven (22/1/2009). Based on the positive responses from participants, the instructor was invited to teach the course for the Eindhoven Birds of a Feather (BOF) group of CHI Nederland at the Technical University of Eindhoven (26/3/2009). The course was also taught at EuroITV2009, EuroITV2010 and NordiCHI2010, each time updated with the most recent developments. Given the fast changes in this area, for EuroITV2011 care will be taken to include the most recent examples of social iTV and online video applications, especially the application used for the interactive exercise. Finally, minor revisions will be made based on previous participants comments.

6. PRESENTERS BIO
David Geerts has a master in Communication Science at the K.U.Leuven and a master in Culture and Communication at the K.U.Brussel. He was project leader of the Mediacentrum of the Katholieke Universiteit Leuven (Belgium) for several years, and now leads the Centre for User Experience Research (CUO). He is involved in several research projects on user-centered design and evaluation. Furthermore, he acts as content manager for the post academic course Human-Centered Design. David has over ten years of experience in teaching for a diverse

108

How to investigate the Quality of User Experience for Ubiquitous TV?


Marianna Obrist
HCI & Usability Unit, ICT&S Center University of Salzburg Sigmund-Haffner-Gasse 18 5020 Salzburg, Austria

Hendrik Knoche
Media and Design Laboratory Ecole Polyechnique Federal Lausanne EPFL, IC, LDM, Station 14, 1015 Lausanne, Switzerland

marianna.obrist@sbg.ac.at ABSTRACT
The scope of user experience (UX) supersedes the concept of usability and other performance oriented measures by including for example users emotions, motivations and a strong focus on the context of use. Thereby UX fits nicely with the scope of Ubiquitous TV that moves away from a device-centric view to an experience of consuming content on different platforms, in different social settings and various locations: from the traditional living room TV set, to the PC, passing through the mobile phone or the screens in taxis and throughout the city. The purpose of this tutorial is to motivate researchers and practitioners to think about the challenging questions around how to investigate UX for ubiquitous TV for different users in diverse contexts. In particular, a clear understanding of the qualities of such an experience is required. Within this tutorial, we provide insights on state-of-theart UX methods and measures, highlighting their advantages and disadvantages based on concrete examples relevant an ubiquitous TV environment.

hendrik.knoche@epfl.ch
Developing applications that are explicitly intended to improve the UX can be seen as an important step towards a positive HCI [3]. UX frees the discourse from the old obsession of HCI to avoid poor designs, which was not geared at providing guidance to arrive at good designs. At the same time we need to keep in mind that we cannot design the UX but only design for UX because people appropriate technology individually especially in discretionary use contexts as McCarthy and Wright [6] point out. In order to compare design solution and inform better designs we need to understand how to best measure the individual factors and the overall UX. Developing these metrics is a current challenge in UX research. Partly this is due to the fact that certain parts like immersion or enchantment are hard to measure without affecting the experience. The current ambitious definitions of UX ([4],[5]), however, do not include the operationalizations such that the UX and its contributing factors can be measured and quantified. Requirements for these measures include amongst others: validity and reliability, speed, and cost efficiency, applicability for concepts, ideas, prototypes, and products [3]. The goal of this tutorial is: to provide participants an overview on methods and measures for studying UX, in particular for Ubiquitous TV, to have the participants apply selected methods and measures on concrete examples (e.g., home and mobile context), to discuss strengths and limitations in an experience-centered design approach.

Categories and Subject Descriptors


H5.m. Information Miscellaneous. interfaces and presentation (e.g., HCI):

General Terms
Measurement, Human Factors, Design.

Keywords
User Experience, Ubiquitous TV, Methods, Measures, Factors, Qualities, Context

1. INTRODUCTION
User experience (UX) is a complex concept that has gained large popularity both in academia and industry in the last decade (e.g., [1],[5],[7]). It contains various contributing factors such as users' emotions, beliefs, preferences, perceptions, physical and psychological responses, behaviors and accomplishments that occur before, during and after use [of a product, system or service] for example in the ISOs definition [4]. Designers have to arrive at a synthesis of these possibly competing factors and face hard decisions in trading them off for another. Researchers are interested in measuring and modeling these contributing factors and resulting overall UX of prototypes or actual implementations of systems and services. Equipped with metrics embedded in a method these factors can be evaluated and hopefully will help informing the trade-offs required in designs.

As tutorials should follow an interactive and participative approach, we will provide the participants with some initial background based on our expertise, followed by an interactive session. Based on applied examples from the home and mobile context for Ubiquitous TV we will prepare some provocative statements on studying UX to elicit views and responses from the participants. Thereby the tutorial ensures a practical focus based on a common theoretical and empirical basis.

2. AUDIENCE
The tutorial aims to bring together researchers on UX from a scientific and industrial background. Target participants are in particular designers of ubiquitous TV systems and services, as well as researcher concerned with requirements elicitation and evaluation of UX qualities of such services. Their major goal is to provide users with the optimal experience when engaging with an interactive system.

109

The tutorial aims to attract both experienced designers and researchers around the field on UX, but also affords newcomers a clear overview at what is going on in research and industry with regard to investigating qualities of UX for ubiquitous TV systems and services. In the first part of the tutorial we will present a comprehensive background on the topic to inspire and stimulate the discussion in the practical session applying selected methods and measures on a concrete Ubiquitous TV scenario. The participants will have the opportunity to apply and internalize this knowledge and share their own expertise in the second part of the tutorial. The overall goal of this combination is to learn from each others experiences based on clearly defined topics and guiding questions.

thesis focused on Quality of Experience (QoE) in mobile multimedia services. He has looked into beneficial trade-offs between QoE and economic constraints to deliver satisfying mobile TV experiences. Hendrik has worked in industry as a consultant in information architecture; mobile services (Nokia, Vodafone, Sport1) and research projects investigating distributed collaboration and TV-centric services. He is on the program committee of EuroITV, uxTV and the CHI09 workshop on mobile user experience and recently organized a workshop on user experience in TV-centric services.

5. Related Experiences & Publications


The instructors of this tutorial have been actively involved in the previous EuroiTV conferences, as organizers and presenters. Some selected publications are listed below outlining the instructors field specific experience: Knoche, H., Sasse, M. A. (2008) Getting the big picture on small screens: Quality of Experience in mobile TV. In Ahmad, A. M .A. & Ibrahim, I.K. (eds.) Multimedia Transcoding in Mobile and Wireless Networks, Chapter 3, pp. 31-46, Information Science Reference. Obrist, M., Wurhofer, D., Beck, E., Karahasanovic, A., and Tscheligi, M. (2010). User Experience (UX) Patterns for Audio-Visual Networked Applications: Inspirations for Design. Full paper at NordiCHI2010. Obrist, M., Moser, C., Alliez, D., and Tscheligi, M. (accepted). In-Situ Evaluation of Users First Impressions on a Unified Electronic Program Guide Concept. In Special Issue: New TV Landscape for the Entertainment Computing Journal (to appear 2011).

3. SCHEDULE
The tutorial is foreseen as a half-day event and brings together practitioners and researchers on UX. 09:00 10:00: Introduction to User Experiences The scope of UX research and approaches, qualities of UX, methods and measures overview.

10:00 10:30: Methods and Measures relevant for Ubiquitous TV Example from the Home Context (e.g., self-reporting techniques such as probing methods; challenges for 3DTV evaluation in the home and extended home) Example from the Mobile Context (e.g. experience sampling method, quasi-experimentation, diary studies, datamining)

10:00 10:30: Coffee Break 10:30 12:00: Hands-on Exercises (the organizers will prepare concrete scenarios for guiding the selection and application of the UX methods and measures for a Ubiquitous TV context considering real life constraints time, money, resources, context, target users, cultural restrictions, device, platform, etc.). 12:00 13:00: Plenary Discussion & Conclusions The schedule provides a framework to transfer and interactively discuss different approaches for investigating relevant qualities of UX with ubiquitous TV systems and services. The tutorial will be a forum to elaborate on and share individual knowledge with other participants from research and industry.

6. REFERENCES (Selected)
[1] Beauregard, R., Younkin, A., Corriveau, P., Doherty, R., & Salskov, E. (2007). Assessing the Quality of User Experience. Intel Technology Journal: Designing Technology with People in Mind, 11. [2] Hassenzahl, M. User experience (ux): towards an experiential perspective on product quality. In IHM08: Proc. of the 20th International Conference of the Association Francophone dInteraction Homme-Machine, pages 1115, New York, USA, 2008. ACM. [3] Hassenzahl, M., & Tractinsky, N., (2006). User Experience - a research agenda. In Behavior & Information Technology, 25(2), pp. 91-97. [4] ISO DIS 9241-210:2010. Ergonomics of human system interaction - Part 210: Human-centred design for interactive systems. International Standardization Organization (ISO). Switzerland. [5] Law, E., Roto, V., Vermeeren, A., Kort, J., Hassenzahl, M. (2008). Towards a shared definition of user experience. In Proc CHI 2008, 2395-2398 [6] McCarthy, J. & Wright, P. (2004). Technology as experience. Cambridge, Massachusetts: MIT Press. [7] Tullis, T., and Albert, B. Measuring the User Experience. Collecting, Analyzing, and Presenting Usability Metrics. (2008) Morgan Kaufmann.

4. INSTRUCTORS
Marianna Obrist is Assistant Professor for HCI & Usability at the ICT&S Center of the University of Salzburg. The focal point of her research is on user experience studies and methods. Marianna was involved in several research projects concerned with the study of user experience in different contexts and situations (e.g., home, mobile, games). She was involved in the organization of diverse workshops, special interest groups, and tutorials at HCIrelated conferences (e.g., CHI, NordiCHI, MobileHCI). She was conference co-chair for EuroITV2008 held in Salzburg, and organized a tutorial at EuroITV2009 as well as co-organized a workshop on study methods at EuroITV2010. Hendrik Knoche is a post-doctoral researcher at EPFL where he also teaches a class on mobile user experience design. He holds a PhD in computer science from University College London. His

110

Deploying Social TV: Content, Connectivity, and Communication


Marie-Jos Montpetit Research
Laboratory of Electronics Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge, MA, USA, +1 617 715 4295

Thomas Mirlacher
ICS-IRIT Universit Paul Sabatier 118, avenue de Narbonne 31062 TOULOUSE CEDEX 9, France

mariejo@mit.edu

thomas.mirlacher@irit.fr

ABSTRACT
This half-day tutorial presents the underlying aspects and the enablers of social television. Three sections will provide a full overview of the different yet overlapping areas that constitute the social converged television chain: content, connectivity and communication with emphasis on interactivity and community viewing.

Keywords
Social TV, interactive TV, Internet-based television, community viewing, video systems.

As Social TV is moving to a multi-screen and multi-device ecosystem of the new TV landscape, these challenges take center stage. Consequently, novel approaches are needed to enable the wider deployment of Social TV services. While, until recently, it was sufficient to research social TV within the confines of the living room and its single TV screen, with the rise of the high resolution network connected tablet computers and powerful smartphones, Social TV is becoming a ubiquitous anytime anywhere service. Deploying Social TV solutions requires a systemic understanding of the end-to-end delivery chain from the networks that will support the service, the multimedia capabilities of the different devices that will render the experience and the properties of the different contexts they are used in, to the viewer quality of experience and the content providers business models. Of particular interest are the overlapping aspects that address new delivery mechanisms beyond traditional client-server architectures and linear delivery. The tutorial will address mechanisms such as community viewing and peer-to-peer communication that improve the quality of experience especially for smartphones and tablet computers. While not dwelling into technical details, device augmentation and network combining will also be briefly mentioned as other means of democratizing TV viewing. Content protection in Social TV requires the review of traditional models of Digital Rights Management to separate the business aspects from the encryption aspects. In particular, layered content protection are necessary to socializes TV viewing and differentiate between the original content and the ancillary information provided by users, which may also need privacy protection. But obviously, there is no Social TV without an audience and means for that audience to interact with the content and other viewers. The proliferation of new interactivity devices and applications can become confusing as per the best approach to favor: remotes, phones, keyboards, and tablets are now competing to provide users with the best interactive experience. The tutorial does not intend to review these exhaustively but to distill their main features into a comprehensive set of requirements, based on user behavior and expectations as well as development on new technology and heritage from human computer interactivity research. Specific design examples will be provided. A review of salient user studies trading off the lean forward aspects of the computer experience versus the lean back expectations of the TV audience will be presented.

1. INTRODUCTION
Social TV has been named as a promising service in the next few years and is currently being deployed in many countries. However, there are still many technological hurdles that impair its full acceptance by the TV community. Amongst those are the issues of content formatting and protection, connectivity across heterogeneous networks and communication between devices and end users. This tutorial aims to provide the participants with an overview of these different elements with a focus on enabling technologies and user research.

2. TUTORIAL DESCRIPTION
Traditional television delivery systems are based on content being acquired and distributed under the control of a single operator to a single end-user device. As content moved onto the Internet and to wireless networks and social viewing, this model is increasingly obsolete. TV content is nowadays available from a variety of networks and operators and rendered via web technology on any device capable of supporting a browser. Moreover, video is combined with ancillary content and extra features that could be inserted anywhere in the network. These elements are building the foundation of Social TV.

111

Finally, a portion of the tutorial will be dedicated to present a number of current and future Social TV systems and will include recent projects from the MIT Social TV class of the Spring 2011 semester.

5. SELECTED REFERENCES
[1] MIT News 2010, Rethinking Networking, http://web.mit.edu/newsoffice/2010/network-codingpart2.html [2] MIT Technology Review 2010, TR10, May-June 2010, http://www.technologyreview.com/communications/25084/? a=f [3] F. Zhao, T. Kalker, M. Mdard and K. Han, Signatures for Content Distribution with Network Coding, ISIT, July 2007. [4] M.J. Montpetit, Community Networking: Getting Peer-toPeer out of Prison, Communications Futures Program Winter Plenary, January 18 2008, cfp.mit.edu. [5] M.J. Montpetit. T. Mirlacher and N. Klym, The Future of TV: Mobile, IP-based and Social, Springer Journal of Multimedia Tools and Applications, Spring 2010. [6] R. Martin, A.L. Santos, M. Shafran, H. Holtzman and M.J. Montpetit, 2010, neXtream: A Multi-Device, Social Approach to Video Content Consumption, Proceedings of CCNC2010, Las Vegas, January 2010.

3. TARGETED AUDIENCE
The targeted audience for the tutorial ranges from TV generalists, social TV designers and researchers to information technology professionals moving into the TV industry. No prerequisites are necessary except basic knowledge of TV, web technologies and applications.

4. ACKNOWLEDGMENTS
The authors would like to thank the MIT Social TV community and the RLE Network Coding Group for technical support. The work leading to this tutorial was supported in part by British Telecom and NBC-Universal/Comcast.

112

Workshops

113

Workshop1:QualityofExperienceforMultimediaContent Sharing:UbiquitousQoEAssessmentandSupport

114

Quality of Experience of Multimedia Services: Past, Present, and Future


Blekinge Institute of Technology SE-371 79 Karlskrona, Sweden

Hans-Jrgen Zepernick

hans-jurgen.zepernick@bth.se ABSTRACT
In recent years, the notion of Quality of Experience (QoE) has gained increased attention to represent the quality of a service as it is perceived by the users. Accordingly, QoE amalgamates service integrity quantified by conventional Quality of Service (QoS) parameters with non-technical but subjective factors. In this paper, we provide a survey on the evolution of QoE concepts in multimedia systems touching on the past, present, and future aspects. Specifically, the ideas behind QoE are discussed and an overview of related standardization work is provided. On this basis, QoE metrics for speech, image, and video are described. An outlook of future work and challenges is also provided.

University of Nantes 44035 Nantes Cedex 1, France

Ulrich Engelke

ulrichengelke@gmail.com

Categories and Subject Descriptors


H.4 [Information Systems Applications]: Miscellaneous; D.2.8 [Software Engineering]: Metricsperformance measures

General Terms
Survey, Quality of Experience, Multimedia Services

1. INTRODUCTION
The variety of services delivered over telecommunication networks has seen significant advances over the last decades. This is due to the widespread use of the Internet and a shift from speech to multimedia applications including data, audio, image, and video services. It is also observed that humans, as the end-user of such services, demand access to their expanded environment without noticing any limitations on the technology or restrictions on their mobility. This is accompanied by a shift from speech to multimedia with the communication partners at both ends expecting always best Quality of Experience (QoE). As humans are the final judges of service quality, a key issue becomes the creation of experiences that closely resemble human-to-human interaction and the related QoE evaluation.

As a matter of fact, the first approaches on how to account for the QoE paradigm in system design and service evaluation emerged from the mobile industry [1, 2, 3]. Specifically, it has been realized that service integrity parameters such as bit error rates, frame error rates, throughput, delay, and jitter may quantify sufficiently the performance of the enabling technical system but may not always correlate well with human perception of the delivered multimedia service. In the latter relation to quality, other factors such as user expectations on and experiences with the service as well as satisfaction with support are included in the overall QoE assessment. In the meantime, the rich field of QoE has also received significant interest in academia leading to large research efforts. For example, the COST Action IC 1003, known as European Network on Quality of Experience in Multimedia Systems and Services (QUALINET) [4], has recently commenced operation as an initiative to coordinate the many fragmented research and development efforts in the QoE field under one formal umbrella. Its work program is organized within five working groups covering application areas, mechanisms and models of human perception, quality metrics, databases and validation as well as standardization and dissemination. It shall be noted that many aspects of QoE are currently being addressed by all important standardization bodies. The International Telecommunications Union (ITU), for instance, has defined QoE in Recommendation ITU-T P.10/G.100 as [5] The overall acceptability of an application or service, as perceived subjectively by the end-user. In view of the above discussion, this paper aims at giving an overview of the young but very rich field of QoE. Specifically, we provide some insights into fundamental QoE concepts in multimedia systems touching on the past, present, and future aspects. For this purpose, the ideas behind QoE are described such as relating Quality of Service (QoS) parameters to QoE through suitable mappings. On this basis, QoE metrics along with standardization efforts for speech, image, and video are described. An outlook of future work and challenges is also provided. The remainder of the paper is organized as follows. Section 2 gives an overview on QoE fundamentals. In Sections 3 to 5, objective perceptual quality metrics are presented for speech, image, and video applications, respectively. A discussion about future work and challenges are contained in Section 6. Finally, a summary is given in Section 7.

115

2. QUALITY OF EXPERIENCE
In [1], QoE is defined as how a user perceives the usability of a service when in use how satisfied he or she is with a service. Clearly, QoE extends well beyond end-to-end QoS integrity parameters, which have conventionally been used. In particular, QoE also accounts for subjective aspects including user expectations, requirements, and particular experiences as well as factors such as network coverage, service offers, or level of support [1]. Accordingly, QoE may be seen as being induced by technical factors and non-technical but subjective factors as illustrated in Fig. 1. Note that service integrity parameters may be related to quality of delivery (temporal impact) and quality of presentation (auditorial/spatial impact). A typical multimedia application may therefore be impaired by a mixture of spatial and temporal artifacts.
Quality of Experience Technical factors QoS QoS Service integrity Non-technical factors Subjective QoE

Subjective measurements

Non!technicalfactors

Objective measurements

MOS
Technicalfactors

NetworkService basedbased

Figure 2: Ob jective and sub jective factors leading to MOS through sub jective experiments. modeling suitable aspects of the human auditorial system and human visual system while the latter rely on extracting characteristic features of the multimedia service that stimulate the human audio-visual experiences. Depending on the degree of involvement of the original multimedia signal as a reference in the quality evaluation, one further distinguishes between full-reference (FR), reduced-reference (RR), and no-reference (NR) metrics. Another integral part of a metric design is the derivation of mapping functions that relate the actual objective or subjective quality metric to predicted MOS values (see Fig. 3). In this stage, the MOS obtained from subjective experiments are used to find a curve that fits best to the progression of ratings as a function of the metric under consideration. The obtained fitting curve can then be used to automatically advice predicted MOS values for a given quality measure.

Throughput, delay, jitter, data loss, bit error rate, etc. Networking Middleware Application

Figure 1: Composition of quality of experience. While the technical QoS parameters can be measured in the actual telecommunication networks or systems, insights about subjective factors need to be deduced from subjective experiments or tests. These tests involve a panel of non-expert participants or subjects to assess the quality of a given multimedia clip, for example, a speech clip, a sequence of an images or a video clip. Subjective experiments are typically conducted in a controlled laboratory environment. Clearly, careful planning and several factors have to be considered and selected prior to the experiment. This includes the selection of listening/viewing conditions, methodology, grading scale, selection of the test material as well as the timing of the presentation. For example, Recommendation ITU-R BT.500 [6] provides detailed guidelines for conducting subjective experiments for assessment of quality of television pictures which is often used for image and video services as well. The outcomes of a subjective experiment are the individual ratings or scores given by the subjects, which are used to compute so-called mean opinion scores (MOS). Figure 2 shows the driving components that influence the user perception as quantified by MOS values. In this sense, MOS may be considered as QoE. As the planning and execution of subjective tests is time consuming and expensive, large efforts have been undertaken to develop objective perceptual quality metrics that can automatically predict MOS with high accuracy using some sort of processing algorithm. In addition, objective metrics are better suited to in-service quality monitoring and resource management as an involvement of panels of test subjects in a live system would be rather impractical. Objective quality methods may be classified into psychophysical and engineering approaches [7]. The former are primarily based on

MeasuredMOS Predictioncurve

PredictedMOS

Figure 3: Mapping of QoS parameters or ob jective perceptual quality to predicted MOS. As far as the standardization of QoS and QoE aspects are concerned, all major standardization bodies are active and are addressing different components of multimedia systems such as the following: Bodies: ITU, ETSI, tmforum, broadband forum, ISTF, 3GPP. Issues: System metrics, measurement methods, service metrics, frameworks. Services: Data, mobile voice, MTV, VoIP, IPTV.

3. SPEECH
Subjective speech quality measures obtained from ratings given by human listeners are referred to as subjective qual-

116

MOS Objectiveorsubjectivequalitymeasure

Table 1: Overview of no-reference image quality metrics Ref [20] [21] [22] [23] [24] Features/Artifacts Blocking Blocking Blur Blocking, blur Natural scene statistics Domain DCT Spatial Spatial Spatial DWT Medium JPEG JPEG Image JPEG JPEG2000
1996

Image size 512 512 240 480 768 512 Various 768 512
1998 2002 2005

ity metrics. Among the many different listening test procedures and subjective quality evaluations, the absolute category rating (ACR) test and degradation category rating (DCR) test are the two most commonly used. Specifically, an ACR listing test requests the subjects to rate the overall quality of a speech clip without having access to the original clip. On the other hand, a DCR listing test asks for a rating about the annoyance by comparing a speech or an audio test clip with the original clip. In order to avoid expensive and time consuming subjective tests, several objective perceptual speech quality metrics have been proposed over the years. The related research activities have resulted in standards within the ITU and the European Telecommunications Standards Institute (ETSI). Figure 4 presents an overview of the three main roadmaps leading to Perceptual Objective Listening Quality Analysis (P.OLQA), Objective Conversational Voice Quality Assessment Model (P.CQO), and the E-model extension. The rationale behind these different paths may be summarized as follows. P.OLQA: The root of this path is given by ITU Recommendation P.800 [8] describing methods for subjective determination of transmission quality. In particular, techniques for perceptual quality evaluation of audio (20 kHz) from the early 1990s such as the Perceptual Audio Quality Measure (PAQM) [9] were adopted to speech (3 kHz). In this context, the Perceptual Speech Quality Measure (PSQM) [10] adopts the concept of asymmetry factor from PAQM as a means of weighting differences between reference and degraded speech signal in each time-frequency cell. Although PAQM and PSQM function well for evaluating audio and speech codecs, impairments induced by communication networks such as unknown delays and filtering processes turned out to cause problems in the quality evaluation. An improved version of PSQM was hence developed known as PSQM99 which accounts for linear filtering. Continuing development then also included variable delays to cope with situations such as those experienced with Voice over Internet Protocol (VoIP) services. This resulted in the well-known and frequently used Perceptual Evaluation of Speech Quality (PESQ) [11] metric. Finally, P.OLQA [12] advances from narrowband to super-wideband speech (80 Hz to 14 kHz) quality evaluation. P.CQO: While the aforementioned path follows intrusive, i.e. full-reference methods, the roadmap to P.CQO [13] considers non-intrusive, i.e. single-ended methods, which do not need the original signal. Accordingly, these methods may be used for live network monitoring to evaluate unknown speech signals at the receiving end of a network. However, this advantage over intrusive methods is paid for by a re-

ITUP.800

PSQM

PESQ

P.OLQA

P.561 P.562 P.563

P.564

P.CQO

ETSIETR 250

G.107 E!model

E!model extension

Figure 4: Ob jective perceptual speech quality metrics. duced correlation between objective and subjective scores (0.935 for PESQ and 0.89 for P.563 [14]). Specifically, P.561 [15] describes the in-service non-intrusive measurement device (INMD) for voice service measurements and P.562 [16] assists with the analysis and interpretation of INMD voiceservice measurements. On this basis, ITU Recommendation P.563 [14] specifies the first single-ended signal-based nonintrusive measurement method that takes impairments such as filtering, variable delay, background noise, channel errors, and errors caused by speech codecs into consideration. Subsequently, P.564 [17] has been developed as a parameterbased model for assessing VoIP transmission quality. The transition from listening quality (LTQ) to conversational quality (CQO) is then performed within the work of P.CQO. E-Model Extension: The ETSI Computation Model (Emodel) [18] is also a non-intrusive and parametric model which is often used to predict speech quality in VoIP applications. In the E-model, transmission parameters such as signal-to-noise ratio, speech signal impairments, delay impairments, and impairments due to lossy speech compression are additively combined. Apart from the additive relationship between the transmission parameters, the E-model assumes further that the parameters are independent of each other which may not always be the case. In order to align with developments in network technology and structures, an extension of the E-model is under study to account for next generation networks (NGN), wideband systems, nonhandset terminal equipment and other advances.

4. IMAGE
In contrast to speech and video services, objective perceptual quality metrics and related QoE considerations are largely covered within academia while standardization seems not as rich as for moving pictures. Examples of contemporary NR models for images are provided in Table 1. The interested reader may be referred to the detailed survey given

117

Table 2: Standardization of QoE for Video Services Resolution Subjective FR RR NR QCIF ITU-T ITU-T ITU-T VQEG ITU-T CIF P.910 J.247 J.246 MM P.NAMS VGA P.911 P.NBAMS SDTV BT.500 J.144 J.147 RRNR-TV G.OMVAS HDTV J.140 J.249 HDTV J.245 in [19] and the references therein. Common to all the NR models listed in the table is the utilization of spatial artifacts as a measure for degradation. This includes blocking, blur, and natural scene statistics for these examples but may also consider the edges describing contours in an image. Given the availability of subjective experiments and subsequent derivation of a suitable mapping function for the related artifact, QoE may be quantified by way of predicted MOS values. handling QoE for existing multimedia services still remain, many new applications are already emerging posing additional challenges to display, network, service design as well as to their QoE evaluation. Some of the emerging and future work along with related challenges shall be mentioned in the following. Applications: Services on stimulating visual experiences have started to move from 2D to 3D videos with many different kind of applications being considered. The roadmap appears to follow from 3D cinema over 3D in the home to 3D on mobile devices. This would warrant extensive work on the developing of FR, RR, and NR models for 3D video QoE evaluation. In this context, the question of how to capture efficiently depth information and relate it to QoE arises. Similarly, the work on multiview applications will not only pose tremendous demands on future networks to deliver such bandwidth hungry applications but also on the modeling of the associated QoE as perceived by the end-user. Display: A key component in the delivery of multimedia services are displays. For 3D applications, insights on how existing displays support various degrees of QoE need to be obtained. As such, advantages and disadvantages including 3D experiences and costs have to be carefully considered for the current options such as stereoscopic displays along with anaglyph, polarized or shutter glasses; autostereoscopic displays without the need for glasses. Psychophysical aspects: Given that human beings may experience discomfort when watching 3D video clips resulting in headache, eye strain, and nausea, large efforts are undertaken to get a deeper understanding on such psychophysical aspects [34]. Sub jective testing: For the time being, there exist no standards that recommend on the procedures for conducting subjective tests for 3D applications. For example, the aforementioned discomfort would need to be accounted for and may significantly influence the duration of 3D experiments. The VQEG is looking into such issues within their 3D test plan.

5. VIDEO
Several video QoE models operating in the application layer have been developed and standardized for different video resolutions (see Table 2). Many of those models have emerged from ITU Study Groups (SG) along with the related methodologies of subjective quality estimation. Specifically, the FR model J.247 [25] is applicable to video codecs supporting resolutions of Quarter Common International Format (QCIF) with 176 144 pixels, Common International Format (CIF) with 352 288 pixels, and Video Graphics Array (VGA) ranging from 640 480 pixels in 16 colors to 320 200 pixels in 256 colors. Recommendation J.247, also known as Perceptual Evaluation of Video Quality (PEVQ), allows to evaluate the degradation due to packet-loss. On the other hand, J.144 [26] targets Standard Definition TV (SDTV) and High Definition TV (HDTV) using the Motion Picture Experts Goup (MPEG) format and allows for evaluation of coding distortions while distortions due to packet-loss are not covered. Similarly, J.246 [27] recommends an RR model applicable for QCIF, CIF, and VGA formats. It originated from work in the Multimedia (MM) project of the Video Quality Experts Group (VQEG) [28] and mainly centers around edge information to be carried from transmitter to receiver. As for SDTV and HDTV, Recommendation J.249 [29] operates similar on edges as J.246 does. On the other hand, J.147 [30] uses pseudo-noise (PN) sequences that are embedded as so-called markers into the transmitted video and are used as reference in the receiver to check for transmission impairments to the video service. Ongoing projects are now mainly focusing on NR models. These efforts have been conducted within VQEQ in its RRNR-TV, HDTV, and MM Projects and within SG12 of ITU. In particular, P.NAMS [31] looks into non-intrusive parametric models for QoE assessment of IPTV from header information, P.NBAMS [32] attempts this from payload information, G.OMVAS [33] studies an opinion model for video streaming applications.

7. SUMMARY
In this paper, we have provided a survey on the evolution of QoE concepts in multimedia systems. Fundamental aspects of QoE have been described and the line of thought towards QoE metric development has been provided. An overview on the many accepted standardized QoE issues have been presented for speech, image, and video services. An outlook on future work and challenges has been given with major focus put on upcoming 3D video applications.

6. FUTURE WORK AND CHALLENGES


Although the above survey indicates many activities in the fields of multimedia quality evaluation, subjective tests, QoE metric design, and related standardization efforts, this may be only the tip of the iceberg. While many open questions in

118

8. REFERENCES
[1] Quality of Experience (QoE) of Mobile Services: Can it be Measured and Improved?, http://www.nokia.com/NOKIA COM 1/Operators/ Downloads/Nokia Services/whitepaper qoe net.pdf, 2005. [2] G. Gmez and R. Snchez (Eds.), End-to-end Quality o a of Service over Cellular Networks, Chichester: John Wiley & Sons, 2005. [3] D. Soldini, M. Li, and R. Cuny (Eds.), QoS and QoE Management in UMTS Cellular Systems, Chichester: John Wiley & Sons, 2006. [4] COST Action IC 1003, European Network on Quality of Experience in Multimedia Systems and Services (QUALINET), http://www.qualinet.eu/ [5] ITU-T P.10/G.100, Vocabulary and Effects of Transmission Parameters on Customer Opinion of Transmission Quality, July 2008. [6] ITU-T BT.500-599, Objective Measuring Apparatus, Dec. 2009. [7] H. R. Wu and K. R. Rao (Ed.), Digital Video Image Quality and Perceptual Coding, CRC Press, 2006. [8] ITU-T P.800, Methods for Subjective Determination of Transmission Quality, Aug. 1996. [9] J. G. Beerends and J. A. Stemerdink, A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation, J. of the Audio Engineering Society, vol. 40, no. 12, pp. 963974, Dec. 1992. [10] ITU-T P.861, Objective Quality Measurement of Telephone-band (300-3400 Hz) Speech Codecs, Aug. 1996. [11] ITU-T P.862, Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-band Telephone Networks and Speech Codecs, Feb. 2001. [12] ITU-T P.OLQA, Super-wideband Speech Quality Under Consideration of Acoustical Interfaces, under study. [13] ITU-T P.CQO, Objective Conversational Voice Quality Assessment Model, under study. [14] ITU-T P.563, Single-ended Method for Objective Speech Quality Assessment in Narrow-band Telephony Applications, May 2004. [15] ITU-T P.561, In-service Non-intrusive Measurement Device - Voice Service Measurements, July 2002. [16] ITU-T P.562, Analysis and Interpretation of INMD Voice-service Measurements, May 2004. [17] ITU-T P.564, Conformance Testing for Voice over IP Transmission Quality Assessment Models, Nov. 2007. [18] S. Mller and J. Berger, Describing Telephone Speech o Codec Quality Degradations by Means of Impairment Factors, Journal of the Audio Engineering Society, vol. 50, no. 9, pp. 667680, Sept. 2002. [19] U. Engelke and H.-J. Zepernick, Perceptual-based Quality Metrics for Image and Video Services: A Survey, EURO-NGI Conference on Next Generation Internet Networks, Trondheim, Norway, May 2007, pp. 190-197. [20] S. Liu and A. C. Bovik, Efficient DCT-domain Blind Measurement and Reduction of Blocking Artifacts, [21]

[22]

[23]

[24]

[25]

[26]

[27]

[28] [29]

[30] [31]

[32]

[33] [34]

IEEE Trans. on Circuits and Systems for Video Technology, vol. 12, no. 12, pp. 11391149, Dec. 2002. L. Meesters and J. B. Martens, A Single Ended Blockiness Measure for JPEG Coded Images, Signal Processing, vol. 82, pp. 369387, Mar. 2002. P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi, A No-reference Perceptual Blur Metric, in Proc. of IEEE Int. Conf. on Image Processing, vol. 3, Sept. 2002, pp. 5760. Z. Wang, H. R. Sheikh, and A. C. Bovik, No-reference Perceptual Quality Assessment of JPEG Compressed Images, in Proc. of IEEE Int. Conf. on Image Processing, vol. 1, Sept. 2002, pp. 477480. H. R. Sheikh, A. C. Bovik, and L. Cormack, No-reference Quality Assessment Using Natural Scene Statistics: JPEG2000, IEEE Trans. on Image Processing, vol. 14, no. 11, pp. 19181927, Nov. 2005. ITU-T J.247, Objective Perceptual Multimedia Video Quality Measurement in the Presence of a Full Reference, Aug. 2008. ITU-T J.144, Objective Perceptual Video Quality Measurement Techniques for Digital Cable Television in the Presence of a Full Reference, Mar. 2004. ITU-T J.246, Perceptual Visual Quality Measurement Techniques for Multimedia Services over Digital Cable Television Networks in the Presence of a Reduced Bandwidth Reference, Aug. 2008. http://www.its.bldrdoc.gov/vqeg/ ITU-T J.249, Perceptual Video Quality Measurement Techniques for Digital Cable Television in the Presence of a Reduced Reference, Jan. 2010. ITU-T J.147, Objective Picture Quality Measurement Method by Use of In-service Test Signals, July 2002. ITU-T P.NAMS, Non-intrusive Parametric Model for the Assessment of Performance of Multimedia Streaming, under study. ITU-T P.NBAMS, Non-intrusive Bit-stream Model for the Assessment of Performance of Multimedia Streaming, under study. ITU-T G.OMVAS, Opinion Model for Video Streaming Applications, under study. M.T.M. Lambooij, W.A. Ijsselstein, and I. Heynderickx. Visual Discomfort in Stereoscopic Displays: A Review, Proc. SPIE, Vol. 6490, 2007.

119

Internet TV Architecture Based on Scalable Video Coding

Pedro G. Moscoso Instituto pedro.moscoso@ist.utl.pt


Superior Tcnico Lisboa, Portugal

Rui S. Cruz Instituto Superior Tcnico INESCID/INOV Lisboa, Portugal rui.s.cruz@ist.utl.pt

Instituto Superior Tcnico INESC-ID/INOV Lisboa, Portugal

Mrio S. Nunes

ABSTRACT

mario.nunes@inov.pt

The heterogeneity of the Internet raises several problems to the distribution of multimedia contents. This paper starts by introducing those problems and briefly overviewing the main approaches being used to mitigate them, in order to present a novel web-based Adaptive Video Streaming so- lution prototype, supporting Scalable Video Coding (SVC) techniques. The proposed solution, with focus on the client side, incorporates quality control mechanisms to maximize the end user experienced quality, is suitable for Interactive Internet TV Architectures and cooperative with Content Distribution Networks (CDNs) and Peer-to-Peer (P2P) web- streaming environments.

Categories and Subject Descriptors


C.2.1 [Computer-Communication Networks]: Network Architecture and DesignNetwork communications

nature of connectivity to the Internet does not also guar- antees uniform and stable conditions of reception at any moment in time to the end user devices. From the content producer and/or broadcaster side it is rather important that multimedia contents reach the end users with the best qual- ity possible while not wasting precious resources. These situ- ations have been the main drivers for the techniques recently developed, that aim to reduce the mentioned problems in or- der to optimize multimedia distribution and resources usage. Companies like Apple, Microsoft and Adobe already provide mechanisms, more or less transparent to the end user, able to support dynamic variations in video streaming quality while ensuring support for a plethora of end user devices and network connections. For these techniques to work, the original content is re-encoded with different qualities and bitrates [1, 10, 15]. But, with the SVC extension to the H.264/AVC [13] stan- dard for video coding, new approaches are possible for video distribution and consumption as videos can now be encoded in nested dependent layers corresponding to several hierar- chical levels of quality (i.e., higher layers refine the quality of the video of lower layers), not requiring multiple encoded versions (bitrates/quality levels) of the same content. The base layer (i.e., the first layer) is required in order for a SVC video to be decoded and played-out, but this layer cor- responds to a low definition, H.264/AVC compatible video with acceptable quality and low bitrate, into which higher layers, if successfully received, can be added to produce a higher quality, higher definition video in terms of space (i.e., image resolution), time (i.e., frame rate) or Signal-to-Noise Ratio) (SNR) dimensions. Almost all techniques involving SVC are based on packetized bitstream transmission meth- ods. The authors in [5] propose a Raptor coding scheme to protect the transmission of SVC in order to improve the dis- tribution of video over lossy networks. A similar approach is proposed in [16] but with network-assisted adaptive Forward Error Correction (FEC) scheme. In [11] the authors present a survey of several P2P systems supporting bitstream mode SVC. In [4] SVC is combined with Multiple Description Coding (MDC) to alleviate packet loss in P2P overlay mul- ticast. A packet scheduling mechanism, also for P2P stream- ing, is proposed in [7]. A prioritized scheduling mechanism, but for a chunk-based P2P transmission approach is pro- posed in [8]. This paper presents and evaluates the architecture of a web- based Adaptive Video Streaming solution prototype, sup-

General Terms
Algorithms, Performance, Measurement

Keywords
Internet video streaming, adaptive streaming, H.264/AVC, scalable video coding, peer-to-peer

1. INTRODUCTION
The growth and expansion of the broadband Internet and the increasing number of connected devices with multime- dia content play-out capabilities, notably the recent hand- held devices (smartphones, tablets) supporting high defini- tion video [14], raises several problems to the distribution of multimedia contents. From the end user device side, a wide range of networked devices with different characteris- tics and capabilities, such as screen size and resolution, Cen- tral Processing Unit (CPU) power, operating system and media player applications, poses big challenges to the dis- tribution of contents, hardly scaled for all of those devices. From the access network provider side, the heterogeneous

120

porting scalable video coding techniques, suitable for Inter- active Internet TV Architectures, that aims to maximize the end user experienced quality. The proposed architec- ture, being developed by the authors under the scope of the European Project SARACEN [12], can be seamlessly inte- grated with web oriented CDNs, allowing the media to get easily cached along the network, and used in a P2P web- streaming environment. Section 2 gives an overview of the proposed architecture, Section 3 analyses the results from the evaluation of the prototype and Section 4 concludes the paper.

delimited by Supplemental Enhancement Information (SEI) Network Abstraction Layer (NAL) units, starting when a SEI NAL unit appears in the encoded file and ending in the NAL unit that precedes the next SEI NAL unit. This fea- ture is the basis for the support of Live video streaming, as it turns independent of the media timeline the moment when a user joins the stream (can join at the current time- line and start watching the stream, in the worst case, after 2 seconds plus the time required to fill the play-out buffer). The Intra-Chunk Layer Partitioner splits the SVC chunk

2. OVERVIEW

ARCHITECTURE

The Internet TV distribution network architecture considers end user nodes and serving platforms. The end user nodes are distributed peers (with P2P capabilities) that can pro- duce, consume and share contents, offering their resources (bandwidth, processing power, storing capacity) to other end user nodes. The serving platforms are centralized ser- vice nodes providing control (tracker for P2P), content treat- ment and distribution (transcoders and media servers), as well as interaction tools and facilities. The architecture is a multi-source Hypertext Transfer Protocol (HTTP) client and server solution providing an advanced form of Web- Streaming and WebSeeding (HTTP based P2P Streaming Protocols) [3]. The process used for streaming distribution relies on a chunk transfer scheme (instead of a bitstream) whereby the original video data is chopped into small video chunks with a short duration (of typically two seconds). The chunk-based streaming protocols allow the deployment of a distribution network compatible with the Internet infras- tructure, such as Web caches and CDNs as well as P2P distribution. The description of the serving platforms of the Internet TV distribution network will not be covered in this paper (except for a brief description of the partition system), as its focus is on the client side, for a solution able to consume Video On Demand (VoD) and Live video ser- vices, supporting multiple device types and resolutions with adaptive streaming mechanisms based on SVC extension of H.264/AVC . The overview of the architecture will focus the following components: The Partition system, which will be responsible for splitting the SVC video files in chunks and then in several layer files. The Adaptation System that requests the video with maximum possible quality. The Reassembler System that rebuilds the video file to a given level of quality. The Media Player that plays the SVC video. The first component, the Partition system, is typically deployed in a centralized heavy-duty SVCencoder appliance. The other components are for the end user client.

Figure 1: Layer creation process files into several transmission layers, as illustrated in Fig- ure 1. This process begins with the demultiplexing of the NALs and with their identification with one ID. This is very important for the reconstitution of the original bitstream on the client side. This identification process is done by the in- sertion of a Sequence Numbering field with 2 bytes between the start code (0001), and the beginning of the NAL unit (Figure 2). Layer separation is done by using the three iden-

Figure 2: Identification of NAL order tifiers: Definition (DID), Quality (QID) and Temporal IDs (TID) that exist on SVC NALs. After the partition of the video file in layer files, an index file (Manifest) of the content is created. The manifest file holds information about the content, i.e., describes the structure of the media, namely, the codecs used, the chunks, the number of layers, the audio component, etc., is a Well-Formed XML Document encoded as double-byte Unicode.

2.2 Adaptation System


The Adaptation System is responsible for the adjustments in video quality by determining the number of layers to request from the serving nodes, based on a set of heuristics related to network and host system conditions. Network conditions, such as bandwidth and Round-Trip Time (RTT), are con- tinuously measured, with their averages used as smoothing factors to prevent abrupt changes in the quality of the video. This ensures that the variations between layers are smooth and causes an almost imperceptible impact on the user view- ing experience. For the host system condition the heuristics are related to the Screen Resolution and the CPU Load and the system always uses the lowest values returned by the metrics. Additionally, the download time of several layers of each chunk is also limited to 2 s to prevent pauses and re-buffering.

2.1 Partition system


The Chunk Partitioner (Figure 1) encodes the SVC video file in a set of independent chunks that can be played independently, each one with 2 s duration. The chunks are

121

2.3 Reassembler System


The Reassembler re-creates the independent video chunk file from the received layer files (containing several NALs iden- tified by unique IDs that provide the order of the NAL in the final video chunk). This video chunk is then sent to the SVC Media Player. Figure 3 illustrates the reassembly chain used for P2P or client-server streaming methods.

streams as well as accompanying metadata related to the stream content and Quality of Service (QoS) metrics, the daemon communicates directly with other P2P nodes, ap- propriate external Web servers, local video codecs, and the browser plug-in via a standard Javascript API (JSAPI). The back-end component lets the P2P core engine and the HTTP server to run in the background regardless if the front-end interface is running or not.

3. EVALUATION RESULTS
In order to evaluate the prototype solution a network sce- nario was prepared, using only a client-server mode, on which the available bandwidth for the client systems could be artificially adjusted. The HTTP web streaming server contained the SVC encoded videos, either as stored contents or as real-time encoded media chunk streams (simulating a Live TV program), together with the corresponding manifest files. The web streaming server had public address accessible from the Internet. The SVC video used on the evaluation, was encoded with ten layers for two spatial scalability lev- els, with the first five layers with a Common Intermediate Format (CIF) resolution and the other layers with a Double Common Intermediate Format (DCIF) resolution.The Peak Signal-to-Noise Ratio (PSNR) of the video has been ana- lyzed for the first 200 frames. The results were plotted in Figure 5 where each numbered Layer PSNR line corresponds to the number of layers combined in the video. The metrics
55 50 Layer 2 Layer 1 Layer 9 Layer 3 Layer 4 Layer 8 Layer 5 Layer 6 Layer 7 0 50

Figure 3: The SVC reassembly chain

2.4 Client Video Player


The end user client media player can be either a platformspecific software client to deliver audio-visual content to the user in a variety of formats, a Web browser plug-in, embed- ded into an HTML5 document, or a WebApp targeted to mobile smartphones, providing the user interface and con- tent playback functionalities. The architecture of the client provides not only a client side but also a peer serving side. The client side includes a local HTTP process that also supports standard client-server downloading and streaming via HTTP protocol. The local HTTP process listens at a lo- cal port to redirect HTTP GET or POST methods initiated from either the local web browser or from the application Graphical User Interface (GUI), to either the P2P engine or to the appropriate external Web server, basing its deci- sion on information taken from the Manifest of the content. The client media player supports several codecs, including SVC decoding [9]. As illustrated in Figure 4, the video play- back can be made directly in the browser video canvas (for the browser plug-in version) making it easier to integrate P2P based video delivery into Web based distribution mechanisms. The client serving side back-end component is a

PSNR (dB)

45 40 35 30

Frame

100

150

200

Figure 5: PSNR of a test video used in each test measured the Bandwidth, Network Load, RTT, Cache size and PSNR. For each relevant test the layer variation during streaming was also collected. For a score reference on the perceived quality of the received media after compression and/or transmission, during the analysis of the results the following relationship between PSNR and Mean Opinion Score (MOS) was used (Table 1). The bandwidth Table 1: Possible PSNR to MOS conversion. [2] PSNR MOS > 37 5 (Excellent) 31-37 4 (Good) 25-31 3 (Fair) 20-25 2 (Poor) < 20 1 (Bad)

Figure 4: Web browser plugin prototype interface software daemon that embeds the P2P core engine and an HTTP server to exchange data across the network. Act- ing as the underlying transport layer for all video and audio variation, from 10 Mbit/s to 256 kbit/s, allowed testing the adaptability of the solution to fluctuations in network ca- pacity. The lower limit of 256 kbit/s corresponded to the minimum throughput required to play-out a video without pauses or rebuffering. The tests started with the maximum

122

5. REFERENCES
[1] Adobe. Live dynamic streaming. [2] C.-O. Chow and H. Ishii. Enhancing real-time video streaming over mobile ad hoc networks using multipointto-point communication. Computer Communications, 30:17541764, Jun. 2007. [3] R. S. Cruz, M. S. Nunes, C. Patrikakis, and N. Papaoulakis. SARACEN: A platform for adaptive, socially aware multimedia distribution over P2P networks. In Proceedings of the 4th IEEE Workshop on Enabling the Future Service-Oriented Internet: Towards Socially-Aware Networks, GLOBECOM 2010, Dec. 2010. [4] F. de As Lpez-Fuentes. Adaptive Mechanism for P2P s o Video Streaming Using SVC and MDC. In Proceedings of the 2010 International Conference on Complex, Intelligent and Software Intensive Systems, CISIS10, pages 457 462, Feb. 2010. [5] J. Monteiro, C. Calafate, and M. Nunes. Robust multipoint and multi-layered transmission of H.264/SVC with Raptor codes. Telecommunication Systems, pages 116, 2010. [6] J. F. Monteiro. Quality Assurance Solutions for Multipoint Scalable Video Distribution over Wireless IP Networks. PhD thesis, Instituto Superior Tcnico - Universidade e Tcnica de Lisboa, Dec. 2009. e [7] M. Mushtaq and T. Ahmed. Smooth Video Delivery for SVC Based Media Streaming Over P2P Networks. In Proceedings of the 5th IEEE Consumer Communications and Networking Conference, CCNC 08., pages 447 451, Jan. 2008. [8] R. P. Nunes, R. S. Cruz, and M. S. Nunes. Scalable Video Distribution in Peer-to-Peer Architecture. In Proceedings of the 10 Conferncia sobre Redes de e Computadores, CRC10, Nov. 2010. OpenSVC. OpenSVC Decoder, 2011. R. Pantos. HTTP Live Streaming. Internet-Draft draftpantos-http-live-streaming-05, Internet Engineering Task Force, Nov. 2010. Work in progress. N. Ramzan, E. Quacchio, T. Zgaljic, S. Asioli, L. Celetto, E. Izquierdo, and F. Rovati. Peer-to-Peer Streaming of Scalable Video in Future Internet applications. IEEE Communications Magazine, 49(3):128 135, Mar. 2011. SARACEN Consortium. SARACEN: Socially Aware, collaboRative, scAlable Coding mEdia distributioN project Home Page, 2011. H. Schwarz, D. Marpe, and T. Wiegand. Overview of the Scalable Video Coding Extension of the H.264/AVC Standard. IEEE Transactions on Circuits and Systems for Video Technology, 17(9):1103 1120, Sep. 2007. Wikipedia. Apples iPad, 2010. A. Zambelli. IIS Smooth Streaming Technical Overview. Microsoft Corporation, March 2009. X. Zhu, R. Pan, N. Dukkipati, V. Subramanian, and F. Bonomi. Layered Internet Video Engineering: NetworkAssisted Bandwidth Sharing and Transient Loss Protection for Scalable Video Streaming. In Proceedings of the IEEE IFOCOM 10, pages 1 5, 2010.
a

Figure 6: Network load and number of layers

bandwidth until arriving to second t = 120 when the bandwidth was suddenly dropped to the minimum (Figure 6). The results show, for t < 120, that the system does not oc- cupies a constant bandwidth during the video stream, but has a spiky nature due to the small size of the video chunks (of 2 s) that are downloaded faster that their play-out dura- tion. The RTT value (Figure 7) was also fairly small during that first period (around 10 ms) and the video could be watched with maximum quality. At the instant t = 120, the

Figure 7: Round-Trip Time (RTT) [9] [10]

system detects the variation in networks conditions and automatically adapts the number of layers to be requested to the available bandwidth. From that moment onwards, the system uses the full available bandwidth continuously, and still manages, at a few moments, to download up to layer level 4, due to the variable size of the layer files (above the base layer) of each chunk.

[11]

4. CONCLUSION
This paper describes the architecture of a SVC adaptive streaming solution, using HTTP with P2P capabilities for chunk-based media transport, suitable for heterogeneous net- work environments.The solution incorporates quality control mechanisms to allow video play-out without pauses, stut- tering or image artifacts, by smoothly minimizing variations from network and system conditions. This system is com- patible with H.264/AVC and supports real-time streaming.

[12]

[13]

[14] [15] [16]

4.1 Acknowledgments
The research leading to these results has received funding from the European Unions Seventh Framework Programme ([FP7/2007-2013] ) under grant agreement n ICT-248474. The authors would like to thank Jnio Monteiro for all his support a and expertise in the Scalable Video Coding area [6].

123

From PSNR to perceived quality in H.264 encoded video sequences


Toms Brando
Instituto de Telecomunicaes ISCTE-IUL, Lisbon, Portugal

Miguel Chin
IST-UTL, Lisbon, Portugal

Maria Paula Queluz


Instituto de Telecomunicaes IST-UTL, Lisbon, Portugal

tomas.brandao@iscte.pt ABSTRACT

mchin@sapo.pt

paula.queluz@lx.it.pt

This paper describes and compares a set of no-reference quality assessment algorithms for H.264/AVC encoded video sequences. These algorithms have in common a module that estimates the error due to lossy encoding of the video signals, using only information available on the compressed bitstream. In order to obtain perceived quality scores from the estimated error, three methods are presented: i) to weight the error estimates according to a perceptual model; ii) to linearly combine the mean squared error (MSE) estimates with additional video features; iii) to use MSE estimates as the input of a logistic function. The performances of the algorithms are evaluated using cross-validation procedures and the one showing the best performance is also in a preliminary study of quality assessment in the presence of transmission losses.

of video quality assessment under transmission losses (e.g. packet losses in IP networks). The paper is organized as follows: after the introduction, section 2 provides a brief review of the error estimation module and describes the video quality prediction algorithms considered in this paper. Section 3 presents the assessment results of each algorithm. Main conclusions are given in section 4.

2. 2.1

QUALITY PREDICTION MODELS No-reference PSNR estimation

Keywords
No-reference, Video quality, H.264, Video transmission

1.

INTRODUCTION

In order to enable new video delivery possibilities, such as QoE automatic monitoring, QoE-oriented bandwidth allocation, or even scalable billing schemes, it is desirable to have an ecient video quality assessment system that is able to compute quality scores without the need of the original signals. Objective quality metrics that belong to this class are the so called no-reference (NR) quality assessment metrics. In this paper, three dierent NR metrics are described and evaluated. All of them have in common a module, proposed in [2], that estimates the quantization error due to H.264/AVC encoding of video sequences. Following this module, each metric combines the estimated error with additional information extracted from the bitstream or from the decoded video, in order to obtain the perceived quality scores. The algorithms are evaluated using subjective data and following a cross-validation procedure. The metric showing the best performance was used in a preliminary study

As already mentioned, all video quality metrics described in this paper use the error estimation module proposed in [2]. It provides estimates for the quantization error between an original video sequence and its H.264 encoded version. The error estimation module uses elements extracted from the video bitstream, namely the quantized DCT coecients, Xk , and their corresponding quantization steps, qk . Using these values, it computes an estimate for the squared error between the original and quantized DCT coecient values, 2 . k In order to provide a no-reference mean squared error (MSE) or PSNR prediction for the encoded video sequence, those estimates can be used in the same way as if the true error values were available: PSNRest[dB] = 10 log10 2552 ; MSEest MSEest = 1 N
N

2 , (1) k
k=1

where N is the number of considered DCT coecients. The error can be estimated by observing the value of the quantized DCT coecient value, Xk , according to: 2 = k
+

fX (x|Xk )(Xk x)2 dx,

(2)

where fX (x|Xk ) is an approximation of the statistical distribution of the original DCT coecients values conditioned to the observed value of Xk . Using Bayes rule for conditional densities, (2) can be rewritten as [2]: 2 = k
bk ak

fX (x)(Xk x)2 dx
bk ak

fX (x)dx

(3)

where ak and bk are the quantization interval limits. These can be derived from the values of the quantized DCT coecients and their corresponding quantization step sizes, qk . As for fX (x), it is computed using a method that explores the correlation between distribution parameters at

MV
Video bitstream

Perceptual model Local error estimation

pk
Pooling & Mapping MOSp

Xk, qk
Video bitstream

Br

Local error estimation log 2 Decoder

MSE

Linear

Xk , qk

SI TI

MOS p model

Figure 1: Perceptual weighting model scheme. adjacent DCT frequencies and combines it with a maximumlikelihood estimation method that also uses the values of Xk and qk extracted from the encoded bitstream (the details of this procedure can be found in [2]). Figure 2: Linear model scheme. is suggested by the Video Quality Experts Group (VQEG) in [10]: a1 M OSp = a0 + , (7) 1 + ea2 +a3 Dg where a0 to a3 are parameters that can be obtained through curve tting.

2.2

Perceptual error weighting model

The architecture of the rst metric is represented in gure 1. It comprises two main blocks: a local error estimation block, which was described in the previous section, and a perceptual model whose goal is to provide weights for those estimates, according to the visibility of the corresponding error. The perceptual model is based on Kelly-Dalys spatiotemporal contrast sensitivity function (CSF) [8, 4]. In short, this CSF is a function of the spatial frequency, fs , and the retinal velocity, vR , computed as: CSF(vR , fs ) = Sc0 c2 vr (2c1 fs )2 exp where S and fmax are dened as: S= s1 + s2 log c2 vR 3
3

2.3

Linear model

The second metric, whose architecture is represented in gure 2, is based on the work proposed in [3]. MOS predictions are computed as a linear combination of features extracted from both the bitstream and the decoded video. The considered features fi are: log2 (Br ) the logarithm of the encoded videos bitrate. M SE an estimation of the mean squared error, computed using the algorithm described in section 2.1. SI and TI spatial and temporal activity values, computed using the methods recommended in [6], but using the decoded video sequences instead of the original ones. These features are linearly combined, resulting in a MOS estimate for the received video, M OSp , according to:
N

4c1 fs , fmax

(4)

and fmax =

p1 . c2 vR + 2

Using the same settings as in [8] and [4], the constants and parameters of the CSF can be set to: s1 = 6.1; s2 = 7.3; p1 = 45.9; c0 = 1.14; c1 = 0.67; c2 = 1.7. The spatial frequency, fs , is computed for each DCT frequency pair (i, j), and depends on the distance from the observer to the screen and the dimensions of the displayed images. The velocity on the retina plane, vR , is given by the angular velocity of the object on the image plane, vI , compensated by a term associated to the eye movements (see [4] for additional details). vI can be estimated by using the motion vectors and the frame rate, both extracted from the bitstream: vI = fr (M Vx x )2 + (M Vy y )2 , (5)

M OSp = w0 +
i=1

wi fi = f T w,

(8)

with f = [1 f1 . . . fN ]T and w = [w0 w1 . . . wN ]T . In (8), fi is the value of the i-th feature, wi is the corresponding linear weight and N is the number of features. The weights vector, w, can be obtained from a training set using linear regression: w = (FT F)1 FT M, with 1 . . F= . 1 f1 . . . (K) f1
(1)

(9)

where M Vx and M Vy are the components of the motion vector along the horizontal and vertical directions, respectively, fr is the frame rate of the video sequence, x and y are the components of the observation angle. Based on the CSFs value, computed at each location in the block-wise DCT domain, a global distortion value, Df , for each video frame is computed using L4 error pooling: Df =
4

(k pk )4 ,
k

(6)

F is a K N matrix, where each row contains the feature values extracted from the k-th degraded video sequence in the training set and M is a vector with the true MOS values of the sequences included in the training set.

(1) fN M OS (1) . , and M = . . . . . . (K) M OS (K) . . . fN ...

2.4

Logistic model

where pk = CSF(vRk , fsk ) is the result of the contrast sensitivity function computed at the k-th DCT coecient location and k is the error estimate for that coecient. The same pooling process is applied along the time basis in order to get a global distortion metric, Dg , for the entire video sequence. Finally, this global value is mapped into a normalized MOS range using a logistic function similar to what

The last model, whose architecture is represented in gure 3, is an hybrid model that combines the MSE estimate produced by the error estimation block with a spatiotemporal activity index computed from the decoded video. The spatio-temporal activity index, s, is dened as the maximum between the spatial activity and temporal activity, computed as [1]:

Xk , qk
Video bitstream

Local error estimation

MSE

Logistic function

MOSp

0.12 0.1 0.08

250 200

Decoder

0.06 0.04 0.02 0 0 40 80 120 160

150 100 50 0 0 40 80 120 160

Exponential functions

Figure 3: Logistic model scheme.


1 Crew Crew (est.) City City (est.) Mobile Mobile (est.) Stephan Stephan (est.)

Activity

Activity

Figure 5: 1 and 2 vs. spatio-temporal activity. ment data available at the IT-Image Group site1 has been used. This subjective data has been obtained by carrying out a set of double stimulus subjective tests that followed the Degradation Category Rating (DCR) methodology, described in Recommendation ITU-T P.910 [6]. The database comprises 12 reference video sequences that span a large range of spatio-temporal activities. The test conditions considered during the subjective quality assessment sessions are versions of the reference sequences encoded at dierent bitrates (from 32 kbit/s to 2 Mbit/s) using the JM 12.4 H.264/AVC codec [5]. The total number of test conditions in this database is 58, each one assessed by at least 18 validated subjects.

MOS (normalized to [0, 1])

0.8

0.6

0.4

0.2

100

200

300

400

500

600

MSE

Figure 4: MOS as a function of MSE. Spatial activity the gradient norm is computed at all pixel locations, for the whole video sequence (using Sobel lters). A global spatial activity value is obtained by averaging all gradient norm values. Temporal activity the absolute values of the pixelby-pixel dierences between each two successive frames of the video sequence are computed. The temporal activity value is obtained by averaging the values of the gradients norm, on all images representing those dierences. The MOS prediction is the outcome of a logistic function of the MSE and parameters 1 and 2 , that depend on the spatio-temporal activity s of the video subject to evaluation: 1 + e1 (s)2 (s) , with i (s) = i ei (s) . 1 + e1 (s)(M SE2 (s)) (10) To illustrate the motivation for this model, consider gure 4, which represents the evolution of subjective MOS values with the MSE, using four video sequences encoded at different bitrates. As can be observed from the gure, the relation between MOS and MSE values can be approximated by a two parameter logistic function: one parameter, 1 , regulates the slope and another parameter, 2 , regulates the oset with respect to the MSE axis. Such function can be written as in (10). However, the optimal parameter values vary with the contents of the video sequence. It has been noticed that sequences with high spatio-temporal activity lead to a slower quality decrease as the MSE increases. On the other hand, for sequences with low spatio-temporal activity, quality decreases faster as MSE increases. Therefore, 1 and 2 have been computed based on the spatio-temporal activity index s. Consider gure 5, which depicts the tting parameters 1 and 2 , for 12 reference video sequences, and their relation with the activity index s. As the plots suggest, this relation can be approximated by exponential functions. M OSp =

3.2

Quality prediction results

3. 3.1

RESULTS Subjective data

Since all the algorithms presented in the paper require parameter estimation procedures, a cross-validation training methodology has been followed. The adopted procedure resembles the leave-one-out cross-validation method [9]: the subjective database has been organized according to 12 families, where each family is the set of H.264 encoded versions of each reference video sequence, together with the corresponding MOS values. The training/validation process is performed by turns. In each turn, one family is used as the validation set and the remaining 11 families are used as the training set. The process is repeated until every family takes its place in the validation set. The predicted MOS values that result from each validation turn are depicted in gures 6-a) to d) and confronted with their true values. Figures 6-a) to c) are the outcomes of the three algorithms described in the paper, while gure 6d) depicts the result of MOS predictions using the PSNR only. It can be observed that the logistic model described in section 2.4 holds the best results, leading to MOS predictions that are quite close to the ground truth data. In [10], VQEG suggests a set of statistical measurements whose goal is to evaluate the performance of an objective metric. These measurements are the root mean squared error (RMS), the Pearsons correlation coecient (CC), the Spearmans rank order coecient (RC) and the outliers ratio (OR). Using these indicators, a comparison between the models performances can be observed in table 1. It conrms that the logistic model leads to the best results. The perceptual error weighting algorithm holds values of CC and RC above 0.90, but its RMS and OR results are strongly penalized due to about 5 points whose MOS predictions are far from their true values. The linear model algorithm seems to be the weakest. Nevertheless, all models lead to results that are substantially better than using the PSNR.

3.3
1

MOS prediction under transmission losses

In order to evaluate the performance of the models described in the previous section, the subjective quality assess-

A preliminary study on the applicability of the described http://amalia.img.lx.it.pt/~tgsb/H264_test

1 0.9 Foreman Hall Mobile Mother

MOS (normalized to [0, 1])

0.8 0.7 0.6

MOS

MOS

MOS

0.5 0.4

2
0.3 0.2 0.1

4
p

4
p

0
0 2 4 6 8 10

4
p

MOS

MOS

PL (%)

MOS

(a) Error weighting.


6 6

(b) Linear model.

(a) MOS vs. PL

(b) Prediction results

Figure 7: MOS prediction for transmission losses. rithms share a common component on their architecture they all use an algorithm that computes an estimation for the error due to lossy video encoding. The algorithms performances have been evaluated using a cross-calibration procedure over 58 subjective test conditions (which are H.264 encoded versions of 12 dierent video sequences). The results have shown that the logistic model algorithm has the best performance. In order to assess the quality of video sequences subject to packet losses on IP networks, the logistic model algorithm has also been used in a preliminary study. While the results under transmission losses seem to be promising, no solid conclusions can be drawn yet due to the limited amount of subjective data available for these experiments.

MOS

MOS
0 1 2 3 4 5 6

MOSp

MOSp

(c) Logistic model.

(d) PSNR.

Figure 6: MOS prediction results. Metric RMS CC Perceptual error weighting 0.52 0.91 Linear model 0.64 0.86 Logistic model 0.32 0.97 PSNR 0.83 0.75 Table 1: Evaluation of the described RC OR 0.92 0.14 0.86 0.16 0.96 0.05 0.76 0.28 metrics.

5.

REFERENCES

models to transmission scenarios with packet losses has also been performed. The subjective quality database from EPFL / Politecnico di Milano 2 has been used for this purpose. The test conditions in this database and their MOS values result from corrupting six H.264 encoded video sequences with varying packet loss rates and dierent error patterns. Figure 7-a) depicts the evolution of MOS values with the packet loss rate, P L, using four video sequences from the database. The plot suggests an exponential decay for MOS as the packet loss rate increases. Therefore, the predictions for MOS have been modeled as: M OSp = M OSp0 exp PL , (11)

where M OSp0 is an initial prediction for MOS, without considering the eect of the missing packages (i.e., computed by any of the presented metrics, using the information that eectively arrives at the decoder), P L is the packet loss rate, and the parameter is a constant (obtained using data regression). Note that this model resembles the model proposed in ITU-T Rec. G.1070 [7]. Figure 7-b) depicts MOS vs. M OSp results, using the subjective quality data mentioned above and the logistic model presented in section 2.4, for computing M OSp0 in (11). Following a procedure similar to the presented in 3.2, the value of was obtained using leave-one-out cross-validation. The performance indicators achieved on this test were RMS=0.48, CC=0.94 and RC=0.93.

4.
2

CONCLUSIONS

Three dierent no-reference video quality assessment algorithms have been described and evaluated. Those algohttp://vqa.como.polimi.it

[1] A. Bhat, I. Richardson, and S. Kannangara. A novel perceptual quality metric for video compression. In Proc. of Picture Coding Symposium, USA, 2009. [2] T. Brando and M. P. Queluz. No-reference quality a assessment of H.264 encoded video. IEEE Transactions on Circuits and Systems for Video Technology, 20(11):14371447, November 2010. [3] T. Brando, L. Roque, and M. P. Queluz. Quality a assessment of H.264/AVC encoded video. In proc. of Conference on Telecommunications, Santa Maria da Feira, Portugal, April 2009. [4] S. Daly. Engineering observations from spatiovelocity and spatiotemporal visual models. In Vision model and applications to image and video processing. Kluwer, 2001. [5] Heinrich-Hertz-Institut. JM 12.4 H.264 reference software, December 2007. available online at http://iphome.hhi.de/suehring/tml/. [6] ITU-T. Recommendation P.910 Subjective video quality assessment methods for multimedia applications, 1999. [7] ITU-T. Recommendation G.1070 Opinion model for video-telephony applications, 2007. [8] D. H. Kelly. Motion and vision II: stabilized spatio-temporal threshold surface. Journal of the Optical Society of America, 69(10):13401349, 1979. [9] R. Picard and R. Cook. Cross-validation of regression models. Journal of the American Statistical Association, 79(387):575583, September 1984. [10] VQEG. Final report from the video quality experts group on the validation of objective models of video quality assessment, phase II. Technical report, www.vqeg.org, August 2003.

Adaptive testing for video quality assessment


Vlado Menkovski
Eindhoven University of Technology Postbus 513 5600 MB Eindhoven +31 402475653 v.menkovski@tue.nl

Georgios Exarchakos
Eindhoven University of Technology Postbus 513 5600 MB Eindhoven +31 402475653 g.exarchakos@tue.nl

Antonio Liotta
Eindhoven University of Technology Postbus 513 5600 MB Eindhoven +31 402473890 a.liotta@tue.nl

ABSTRACT
Optimizing the Quality of Experience and avoiding under or over provisioning in video delivery services requires understanding of how different resources affect the perceived quality. The utility of resources, such as bit-rate, is directly calculated by proportioning the improvement in quality over the increase in costs. However, perception of quality in video is subjective and, hence, difficult and costly to directly estimate with the commonly used rating methods. Two-alternative-forced choice methods such as Maximum Likelihood Difference Scaling (MLDS) introduces less biases and variability, but only deliver estimates for relative difference in quality rather than absolute rating. Nevertheless, this information is sufficient for calculating the utility of the resource on the video quality. In this work, we are presenting an adaptive MLDS method, which incorporates an active test selection scheme that improves the convergence rate and decreases the need for executing the full range of tests.

General Terms
Maximum Likelihood Difference Scaling, adaptive MLDS, Video Quality Assessment (VQA), Quality of Experience, QoE.

1. INTRODUCTION
The goal of efficient management for video delivery services is delivering the desired Quality of Experience (QoE) without overprovisioning the service resources. To make this process feasible we need an understanding of the relationship between the resources and the delivered quality. Moreover, if we can measure the utility of each resource, such as bit-rate, for the perceived quality, we can then provide this resource optimally or up to the level that is justified by the cost. For example, depending on the context, type of content and screen characteristics a person might not perceive any more improvement if the video bit-rate is larger than 512kbps. On the other hand, for a low cost service a 256kbps video could offer only slightly lower quality than 512kbps (again in the specific context) and be the optimal option. Calculating these utilities requires understanding of the costs, but more importantly, it requires understanding of the perceived quality for these resources. Measuring the relationship between a resource provided by the video delivery service and the provided quality requires subjective testing because of the subjective nature of perceived quality of video. Objective and subjective video quality methods have varied levels of success in delivering accurate estimations. The objective methods are considered more practical because they do not necessitate human testing. Nevertheless, they are less accurate mainly because they do not consider all the factors that affect the quality and disregard the viewers expectations [1].

The subjective methods are regarded as more accurate and are usually used as a benchmark for the objective methods. One such study by Seshadrinathan et al. [2] analyzes the different objective video quality assessment algorithms by correlating their output with the differential mean opinion score (DMOS) of a subjective study they executed. This type of undertaking is costly, time consuming and necessitates considerable amount of tests to achieve statistical significance. The bias and the variability of subjective testing arise from the fact that subjective tests rely in rating as the estimation procedure. Rating is inheritably biased due to the variance in the internal representation of the rating scale by the subjects [3][4][5]. Another subjective testing method uses the scale of just noticeable differences (JND). The JND scale measures the amount of subjective impairment in the video. One unit of JND corresponds to the amount of difference that is just noticeable (usually 50% of the time) and as such spans through the whole range of the physical parameter of interest [6]. However, this method requires multiple iterations through different levels of stimuli intensity to determine the scale of JND and cannot directly scale for example a given video with 10 arbitrary levels of bit-rate. Additionally, the JND unit will not be constant on a wider range of bit-rates, which are of interest in practical cases. In our previous work [7] we have used a two-alternative-forcedchoice (2AFC) method to estimate the relative differences in quality. The method Maximum Likelihood Difference Scaling (MLDS) delivers the ratio of subjective quality between a video with different levels of resource provided. Because the method is 2AFC, meaning the participant is forced to choose between two intensities, the amount of bias and variability is significantly lower than in rating [8]. In the case of video quality estimation the 2AFC test is discriminating between different levels of quality. Four videos or two pairs of video are presented and the respondent needs to select which pair has the bigger difference in quality. This might sound as a particularly difficult and timeconsuming effort, but in reality most of the tests are quickly and easily answered. The video is typically short (less than 10 seconds) and uniformly impaired, so in most cases the participant is confident enough to vote after only watching a part of each of the video. Many of the tests are quite obvious and derivative, i.e. based on previous responses the following are apparent. Nevertheless, if one wants to explore additional parameters, such as the type of video or the context in which its being watched the number of tests increases quickly. For example in the study executed in [7] for 10 types of videos a participant needed to answer 210 tests per video. Answering all the 2100 tests for each participant took around 8h over the period of a week. Motivated by the effectiveness of MLDS in estimating the utility of the resources for video quality and its drawback in the number

of tests that quickly grows with the number of samples and parameters under test we have developed an active testing procedure adaptive MLDS. This approach leads to significant decrease in the number of tests and improvement in the learning rate.

2. MLDS
The goal of the MLDS method is to map the objectively measurable scale of video quality to the internal psychological scale of the viewers. The output is a quantitative model for this relationship based on a psychometric function [9] as depicted in Figure 1.

selected.

Using this assumption, the probability of each response is 0 n for a test where the P( Rn 1; n , 2 ) 1 n P( Rn 0; n ) first pair is selected and for a test where the second pair is 1 P( R 1; ) 1
n

L( | R)


n n 1

The

likelihood
Rn

of

all

the

responses

is:

1 n

1 Rn

. There is no closed form for

such a solution, so a direct numerical maximization method needs to be used to compute the estimates arg max L( | R) . More details on MLDS for video quality can be found in [7] and for image quality in [11]. A fitter curve through the also represents the utility of the bitrate as a resource or how much we can improve the quality by increasing the bit-rate over the tested range assuming that the cost of increasing the bit-rate is constant over the same range.

3. Adaptive MLDS
The MLDS method is appealing for its simplicity and efficiency, however one full round of tests for ten levels of stimuli (i.e. video qualities) requires 210 individual tests. The full range of tests carry significant redundancy and removing some of it should not necessarily make the results significantly less reliable; even more so it can have only negligible effects on the end result. In this adaptive procedure we have two aims, to improve the rate of learning and to decrease the number of required tests. The approach is based on the idea that with the knowledge acquired by executing a small number of tests we can estimate the answers of the remaining tests with some confidence. Then using these estimates together with the known responses we execute the MLDS method. Executing the MLDS with more responses helps the argument maximization procedure. The estimates rely on the characteristics of the psychometric curve (such as its increasing monotonicity), so that the overall performance of MLDS is improved. The idea comes from the notion that some of the tests are covering the range of others. In fact, all of the tests are being covered by others in one way or the other. The approach makes use of the characteristics of the psychometric curve. The psychometric curve is a monotonously increasing function f X . Consequently, for k l m , xk xl xm if

Figure 1. Psychometric function. The horizontal axis of the Figure 1 represents the physical intensity of the stimuli in our study the bit-rate of the video. The vertical axis represents the psychological scale of the perceived difference in quality. The perceptual difference of quality 1 of the first (or reference) sample x1 is fixed to 0 and difference of quality 10 of the last sample x10 is fixed to 1 without any loss in generality [10]. In other words, there is 0% difference in quality from x1 to x1 (itself), while there is 100% difference in quality from x1 to x10. The MLDS method estimates the relative distances of the rest of the videos 2 through 9 and therefore models the viewers internal quality scale. This 2AFC test is designed in the following manner; two pairs of videos are presented to the viewers {xi, xj} and {xk, xl} where the indexes of the samples are selected as 1i<j<k<l10, so that the ranges of quality does overlap. The video with smaller index has higher quality. The viewer then selects the pair of videos that have bigger difference in quality. For a given test Tn the viewer selects the first pair (sets Rn=1) if she perceived the qualities of videos in the quadruple as |j - i |-|l - k |>0, otherwise she chooses the second pair (Rn=0). These comparisons between the quality distances of video pairs allow for design of a quality distance model between all of the presented videos. The method calculates the quality differences 2 through 9 as parameters in maximum likelihood estimation (MLE). The MLE requires a probability distribution for each response. This is done using signal detection theory (SDT). The difference of differences of quality between the four videos is the signal contaminated by Gaussian noise. When executing a test the participant calculates the value i, j , k , l n

xk xl xk xm in the physical domain then k l k m in the psychological domain Figure 2.


If we now observe five samples
T2 ( xi , x j ; xk , xm ) ,

xi , x j , xk , xl , xm such that

i j k l m and we observe two tests T1 ( xi , x j ; xk , xl ) and

the perceived qualities in the psychological that would mean that

domain are i j k l m . If in T2 the first pair is bigger or


j i m k

j i m k l k . In other words, if in T2 the first pair is selected with a bigger difference, then in T1 the first pair has a bigger difference as well (Figure 2).

jn in ln kn where is value sampled from a


Gaussian distribution with zero mean and standard deviation of 1.

probabilities of the estimations. As we get more responses by asking the right questions the estimation for the rest of the tests improves. At some point adaptive MLDS will have very high probabilities of estimating correctly all of the remaining tests. This is a good indication that no more tests are necessary.

4. EXPERIMENTAL SETUP
To show the performance of the adaptive MLDS we have developed a software simulation, which simulates the learning process of the adaptive MLDS algorithm by sequentially introducing data from the subjective study in [7]. For every iteration a psychometric curve is estimated and compared to the one calculated on the full dataset. The root mean square error (RMSE) is computed on the differences. In parallel a random introduction of data is also executed as a baseline for comparison. The adaptive MLDS algorithm is implemented in Java, while the MLDS software from [10] written in R is used for estimating the psychometric curves.

Figure 2. Monotonicity of the psychometric curve


There are many different combinations of tests that have this dependency for the first pair or the second pair. We can generate a list of dependencies for each pair based on two simple rules:

Let us assume test T1(a, b, c ,d) such that a b c d , b a d c and test T2(e, f, g ,h) with e f g h . If e a b f and c g h d then f e h g . Let us assume that for test T1(a, b, c ,d) with a b c d , b a d c . If for test T2(e, f, g ,h) with

5. RESULTS
Adaptive MLDS as an active learning algorithm explores the space of all possible 2AFC tests with the goal of optimizing the learning process. It also provides indication of confidence in the model built on the subset of the data, so that early stopping of the experiment is feasible. The performance of the adaptive MLDS is presented in Figure 3, 4 and 5. In Figure 3 we present the accuracy of the estimations for three types of videos (blue sky, sun flower and mobile & calendar) against the number of introduced datapoints. In Figure 4, we observe the leaning rate of adaptive MLDS against the random MLDS. The horizontal axis represents the number of points introduced at the time the calculation was executed and the vertical axis the RMSE between the estimated curve and the curve built on the whole dataset. We can clearly observe that for this datapoints adaptive MLDS brings significant improvement in the learning rate. The experiment was repeated for 100 times for each number of datapoints introduced starting from a different random 15 datapoints. In Figure 5 we present the standard deviation of the different value for the RMSE at each point. Figure 6 presents the distribution of the confidence or the probabilities of those estimations. The data in Figure 6 shows that the adaptive learning algorithm estimated the unknown answers with high confidence and that after between 40 and 60 collected answers the confidence in the estimations was close to 1, suggesting that the rest of the tests are not necessary and that we can correctly estimate the psychometric curve without them. This also evident in Figure 3 where the accuracy surpasses 9496% after 60 tests.

e f g h the following hold: a e f b g c d h then f e h g .

and

After introducing an initial set of responses we can estimate the probabilities of the rest, however first we need to learn the probabilities of each of the known responses to be actually valid. MLDS estimates the values of the psychological parameters =(1,...,10) such that the combined probabilities of each response or the overall likelihood of the dataset is maximized. Nevertheless, after the argument maximization is finished the different responses have different probabilities of being true. Having a set of initial quality values as the prior knowledge about the underlying process coming from the data, we generate the estimations for the rest of the tests. The interdependencies from the tests are far more complex, of course. Let us assume, for example, a test T1 that depends on tests T2 and T3. If the answer from T2 indicates that the first pair has a larger difference in T1 and the answer from T3 indicates the opposite then we need to calculate the combined probability of T2 and T3 to estimate the answer of T1. Assuming that the responses of T2 and T3 are independent and that the probability of giving the first and second answer is the same, T2 and T3 is the combined probability of
P (T1 ) P (T2 )(1 P (T3 )) P (T2 )(1 P (T3 )) (1 P (T2 )) P (T3 )

Of the remaining tests that have no responses, some will have higher estimates than others. In other words we have better estimations for some of tests than others. To improve the speed of learning, the adaptive MLDS method, focuses on tests that have smaller confidence in the estimations. This way when we receive the next batch of responses the overall uncertainty in the estimates should be minimized. The goal of the adaptive MLDS is to develop a metric that will indicate how sufficient the amount of tests is for determining the psychometric curve. We can obtain this indication from the

Figure 3. Accuracy of the estimations.

Figure 4. Mean RMSE for the three types of video

Figure 5. Standard deviation of the RMSE for the three types of video

Figure 6. Estimation confidences for the tree types of videos over the number of introduced datapoints

6. CONCLUSION
The adaptive MLDS algorithm is an active learning algorithm specifically designed for the MLDS method for estimating a psychometric curve. Motivated by the fact that MLDS is efficient in estimating video quality utility functions we have developed this adaptive scheme to improve the learning efficiency. The results from the simulations show that adaptive learning provides for significant improvement in the learning rate of MLDS and gives solid indication for stopping the test early when further tests bring no significant improvement in the accuracy of the psychometric curve. Overall this approach adds to the efficiency of MLDS into tackling the issues that arise with subjective estimations of video quality.

[4] [5] [6] [7] [8] [9] [10] [11]

7. REFERENCES
[1] [2] [3] S. Winkler and P. Mohandas, The Evolution of Video Quality Measurement: From PSNR to Hybrid Metrics, Broadcasting, IEEE Transactions on, vol. 54, no. 3, pp. 660-668, 2008. A. Kalpana Seshadrinathan, B. Rajiv Soundararajan, C. B. B. Alan, and K. C. B. Lawrence, A Subjective Study to Evaluate Video Quality Assessment Algorithms. D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky, Foundations of measurement, vol. 1: Additive and polynomial representations, New York: Academic, 1971.

R. N. Shepard, On the status ofdirectpsychophysical measurement, Minnesota studies in the philosophy of science, vol. 9, p. 441490. R. N. Shepard, Psychological relations and psychophysical scales: On the status of, Journal of Mathematical Psychology, vol. 24, no. 1, p. 2157, 1981. A. B. W. A and L. K. B, Measurement of visual impairment scales for digital video. V. Menkovski, G. Exarchakos, and A. Liotta, The value of relative quality in video delivery. Eindhoven, Netherlands: Eindhoven University of Technology, 2011. A. B. Watson, Proposal: Measurement of a JND scale for video quality, IEEE G-2.1. 6 Subcommittee on Video Compression Measurements, 2000. W. H. Ehrenstein and A. Ehrenstein, Psychophysical methods, Modern techniques in neuroscience research, p. 12111241, 1999. K. Knoblauch and L. T. Maloney, MLDS: Maximum likelihood difference scaling in R, Journal of Statistical Software, vol. 25, no. 2, p. 126, 2008. C. Charrier, L. T. Maloney, H. Cherifi, and K. Knoblauch, Maximum likelihood difference scaling of image quality in compression-degraded images, Journal of the Optical Society of America A, vol. 24, no. 11, p. 34183426, 2007.

Aligning subjective tests using a low costs common set


Yohann Pitrey, Ulrich Engelke, Marcus Barkowsky, Romuald Ppion, Patrick Le Callet
LUNAM Universit, Universit de Nantes, IRCCyN UMR CNRS 6597 (Institut de Recherche en Communications et Cyberntique de Nantes), Polytech NANTES, FRANCE {first-name.last-name}@univ-nantes.fr

ABSTRACT
In this paper we use a common set between three subjective tests to build a linear mapping of the results of two tests onto the scale of one test identified as the reference test. We present our low-cost approach for the design of the common set and discuss the choice of the reference test. The mapping is then used to merge the outcomes of the three tests and provide an interesting comparison of the impact of coding artifacts, transmission errors and error-concealment in the context of Scalable Video Coding.

build the mapping between tests. After choosing a reference test to map onto, we use a simple linear mapping to derive relationships between the different experiments. The paper is organized as follows. In Section 2 we briefly introduce the three experiments we conducted. In Section 3, we discuss the mapping of the MOS onto the common scale. Results are presented in Section 4 and conclusions are drawn in Section 5.

2. DESIGN OF EXPERIMENTS
We conducted three subjective experiments containing various Hypothetical Reference Circuits (HRC), in order to evaluate the impact of video coding artifacts, transmission errors and error-concealment techniques in the context of Scalable Video Coding (SVC). All the tested videos are in VGA (640 480 pixels) and QVGA (320 240 pixels) formats, displayed at 15 or 30 frames per second. Nine video sequences of 12 seconds each were used, representing a good variety of contents and high spatial and temporal activity ranges. The perceived quality was assessed using the Absolute Category Rating (ACR) methodology with 5 levels of quality, and conducted in a subjective test room with standard viewing conditions [1]. In the first test (T1), the impact of error-concealment and encoding parameters on two-layer SVC streams is evaluated. We simulate a loss of one second during which no data is received for the highest layer. The visual artifacts induced by the lost data are concealed using two techniques based on upscaling the base layer, which is assumed to be always available. The first technique is referred to as switched and consists in replacing the whole distorted frame with an upscaled version of the corresponding frame from the base layer. The second technique is referred to as patched and consists in replacing only the distorted area in the frame with the upscaled area from the corresponding frame in the base layer. Two constant bit-rate scenarios are combined with two base-layer temporal frequencies in order to identify the best encoding configuration in such a context. The HRCs in this test are referred to using a structure similar to 120/600kb/s-30Hz-switched, reflecting the bitrate used for the two layers, the frequency of the base layer and the errorconcealment technique (more details about this test can be found in [4]). For reference, one AVC HRC is included in this test under the same conditions of distortion. The artifacts are concealed using a state-of-the-art technique, based on reusing the last non distorted frames and buffer repetition. In the second test (T2), we evaluate the impact of two-layer

General Terms
sujective quality assessment, inter-experiment mapping, scalable video coding, error-concealment

1. INTRODUCTION
Subjective quality experiments are typically conducted with respect to international recommendations, such as ITU-T Rec. BT-500.11, in order to produce reliable and reproducible outcomes. Despite the strict rules defined in these recommendations, a lot of factors have an impact on the test results which cannot easily be controlled. They lead to a set of context effects that turn each experiment into a non standard environment. As a consequence, special considerations have to be taken into account in order to compare test results between different laboratories or different experiments. Typically, this involves mapping the outcomes of different tests onto a common scale. The common scale is usually built on a subset of conditions shared by all tests, or by groups of experiments. The design of the common set is of great importance for the mapping, to represent precisely the relation between tests. In this paper, we make use of such a method in order to compare the Mean Opinion Scores (MOS) of three subjective experiments that we conducted in the context of Scalable Video Coding. The originality of our approach is to include the common conditions during the design of the test, so that no extra experiment needs to be conducted in order to

132

Excellent (5)

Good (4)

Fair (3)

Poor (2)

Bad (1)

Figure 1: Overview of the MOS values for the three tests (from left to right: T1, T2, T3. Lower part of bars: before mapping, upper part: after mapping onto T3). SVC coding artifacts on the perceived quality, without transmission errors. We use four 26, 32, 38 and 44 as QP values for each layer, leading to 16 combinations of QP. We also include the 4 versions of the upscaled base layer encoded using the same four QP values. The HRCs from this test are referred to using a structure similar to QP38/44, 38 being the QP value for the base layer and 44 the QP value for the enhancement layer. The upscaled base layer HRCs are referred to as QP 38 Upscaled. In the third test (T3), we evaluate the impact of the distribution of network impairments on a subset of streams from T2. Four factors are varied: the quality of the base layer, the number of impairments, the total length of the impairments and the interval between two impairments. The HRCs from this test are referred to using the values of QP for the two layers and a succession of numbers of frames displayed from each layer in turn. For instance in 44/32 -32-16-32-, the 44/32 part means that the video was encoded with QP 44 for base layer and 32 for enhancement layer. The -3216-32- part means that two impairments of 32 frames are displayed, separated by 16 frames (more details about this test can be found in [5]). During the impairments, we use the switched error-concealment technique from T1. The positions of the impairments are calculated so that they are globally centered on the middle of the sequence. We designed the three experiments so that they share a subset of configurations called the common set, used for comparing their respective outcomes. This common set provides a basis to perform fitting operations, in order to compare the results of the three tests. The design of the common set and the fitting process are described in the next section. the conditions contained in the reference test should cover qualities that are evenly spread over a wide range of qualities. To achieve this goal, different methodologies have been deployed in the past. In [3] the authors create a reference test, which they refer to as a meta-test, from a set of 6 subjective tests. For that purpose, a subset of 185 Processed Video Sequences (PVS) were carefully selected from 479 available PVS, with respect to a uniform quality distribution over the range of the scale. These sequences were then used to conduct another quality test. The MOS from the 6 original tests were then mapped onto the MOS from the meta-test to facilitate comparison between the original MOS. Despite the clear rationale behind this method and the thorough conduction of its implementation, it is apparent that the preparation of an additional test is cumbersome and time consuming. In addition, a new meta-test needs to be created whenever a new test is to be included for comparison. The work in [2] reports on mapping to a reference test focussing on the particular context of IPTV degradations. Here, a different approach has been chosen by preemptively designing a reference test that contains a wide perceptual range of IPTV degradations. Subsequently, four other tests were designed that focus on a selective subset of degradations. This approach is mainly applicable in case where a particular application is considered and the range of degradations can be estimated. Thus, it assumes that the reference test is already designed with regard to the tests that are to be compared. Such foresight, however, is not always given and often it is of interest to compare tests without a reference test being available. In our work, we therefore take a different approach that avoids conducting additional tests and that also does not expect a reference test to be available for the mapping. When designing a new experiment, we include several configurations in it that come from the existing common set. All our tests share a small amount of configurations, on which we rely to perform mapping operations between experiments. Given a set of available tests, we carefully determine the most suitable test with respect to similar constraints as in the previous works. Hence, this approach is adaptive to the current problem at hand as it relies only on the data available.

3. METHODOLOGY
During a subjective test, the viewers tend to use the full quality scale, independently from the distribution of the expected quality scores of the presented sequences [2]. This phenomenon, known as the corpus effect, forbids direct comparison of the results from different subjective tests. However, presenting a common set of configurations in several tests allows mapping the results from one test onto another, and comparing them as the outcomes of a single test. When trying to compare more than two tests, the use of a reference test has been proposed to facilitate direct comparison between HRCs from different tests [2, 3]. For this purpose,

133

Fitting between T1 and T3 common sets 5 4 T3 3 2 1 1 2 3 T1 4 5 T3

Fitting between T2 and T3 common sets 5 4 3 2 1 1 2 3 T2 4 5

a1 0.759

b1 1.212

a2 1.058

b2 0.281

Table 1: Linear mapping parameters from T1 and T2 to T3. the mapped MOS values, in order to compare the influence of the three types of distortions to each other, namely error concealment, SVC coding artifacts and impairment distributions.

Figure 2: Fitting functions from T1 and T2 onto T3. Four HRCs are shared by T1, T2 and T3. These four conditions contain SVC coding distortions, transmission errors, upscaling artifacts and temporal discontinuities. Moreover, as the T2 and T3 tests are closer to each other in terms of evaluated factors, they share another 9 HRCs, raising the size of the common set to 14 HRCs. These additional HRCs contain a wider variety of coding distortions and transmission errors in order to get a more accurate mapping between the two experiments. The condition MOS (i.e. average opinion scores on all observers and all source contents) of the three tests are presented in Figure 1 in order of increasing magnitude. It can be seen that all three tests cover a wide range of qualities with test T3 covering the widest range. For this reason, we choose T3 to be the reference test in the scope of this work. We derived linear mapping functions to map the MOS from tests T1 and T2 onto the scale of test T3 as follows: yT i = ai xT i + bi , where xT i is the MOS of T1 and T2, and yT i is the mapped MOS on the T3 scale. The parameters ai and bi were determined using linear regression between the condition MOS of the common sets of HRCs respectively from T i and T 3. The mappings are illustrated in Figure 2 and the corresponding mapping parameters are presented in Table 1. Figure 2 first shows the repartition of the configurations from the common set on the quality scale. , indicates that the MOS of both T1 and T2 are highly correlated to the MOS of T3. In fact, the linear correlation between T1 and T3 is equal to 0.995 and the linear correlation between T2 and T3 is equal to 0.985. It should be noted that for both mappings, the original reference sequence (HRC0) is approximatly at the same location and very close to the diagonal. This indicates that the non-distorted reference sequences were rated very similarly between the three tests. From Figure 2 one can further see that in both cases the mapping functions are above the diagonal, meaning that the MOS are generally alleviated for both T1 and T2. This is also illustrated in Figure 1, where the mapped MOS are displayed above the MOS scores for T1 and T2. It can be observed that one mapped MOS value in T2 is slightly outside of the scale. This is a result of the mapping between T2 and T3 being considerably above the diagonal and hence, the already high quality reference condition is mapped slightly outside the scale. This might indicate that the scale was compressed at the upper end for this particular test. After mapping the results of T1 and T2 onto the scale of T3, we consider the outcomes of the three tests as a single experiment. In the next section, we conduct an analysis of

4. RESULTS AND DISCUSSION


A total of 82 HRCs result from joining T3 and the mappings of T1 and T2 onto T3. This super-set of HRC contains a large amount of different conditions with distinct kinds of distortions. To get a good overview of this large amount of data, Figure 3 compares the different HRCs from the 3 tests after mapping. We display the original test of each HRC as well as a short description following the structures introduced in section 2. On Figure 3, each HRC is represented by a black symbol on the same line as its description. The grey intervals on the upper and lower parts symbolize the HRCs that are statistically equivalent to one given HRC. The statistical equivalence between configurations is determined using the 95% intervals of confidence, calculated on the data after fitting. As an example of interesting results, we can observe that the reference HRC from T3 (line 6) gets an equivalent quality to (but slightly lower than) the SVC HRCs using a QP of 26 for the enhancement layer (lines 1-5). This might identify a saturation effect at the top of the quality scale, as the viewers fail to give the reference a significantly higher score than the already-high quality HRCs. Alternatively, it couls indicate that a QP of 26 is perceived as lossless by the viewers in our scenario. The corpus effect is also illustrated here, as the high quality SVC HRCs, which only contain limited coding artif and testingacts, are perceived equivalent to the reference in the context of network impairments such as the one in T3. The mapping of the three tests allows for comparison of HRCs that are located on distinct dimensions of distortion. For instance one can observe that the upscaled SVC base layer encoded with a QP of 26 (line 27) is equivalent to several two-layers SVC streams impaired by different loss patterns (e.g.: lines 23, 26, 28, 31). However, the upscaled version only needs an average bitrate of 0.84 Mb/s to be encoded, which is about half the bitrate needed to transmit one of the equivalent two-layer streams. Therefore, using a good quality video and upscaling may represent an interesting alternative to multiple layers. One can also draw a parallel between constant bitrate and constant quality configurations. For instance, the 120600kb/s-30Hz-switch HRC from T1 (line 24) is statistically equivalent in terms of quality to the 44/32 -32- configuration from T3, which has the same network impairment pattern. A correspondence can then be made between QP values and bitrate, which can be useful for the design of adapted bitstreams without using costly bitrate control techniques.

134

Original test and HRC description MOS

1 T2 2 T2 3 T2 4 T2 5 T2 6 T3 7 T2 8 T2 9 T1 10 T2 11 T2 12 T2 13 T3 14 T3 15 T1 16 T1 17 T3 18 T3 19 T3 20 T3 21 T2 22 T1 23 T3 24 T1 25 T1 26 T3 27 T2 28 T3 29 T1 30 T3 31 T3 32 T1 33 T3 34 T1 35 T2 36 T3 37 T3 38 T1 39 T3 40 T2 41 T3 42 T3 43 T1 44 T2 45 T2 46 T3 47 T2 48 T1 49 T2 50 T3 51 T2 52 T3 53 T3 54 T3 55 T1 56 T3 57 T2 58 T2 59 T2 60 T2 61 T3 62 T2

QP44/26 4,92 AVC QP26 4,86 QP38/26 4,85 QP26/26 4,84 QP32/26 4,80 Reference 4,71 QP44/32 4,25 QP26/32 4,24 SVC Not-Damaged 4,24 AVC QP32 4,14 QP38/32 4,13 QP32/32 4,04 44/32 -23,97 44/32 -2-16-23,74 120kb/s-30Hz-Patched 3,71 200kb/s-30Hz-Switched 3,70 44/32 -2-128-23,70 38/32 -163,65 44/32 -83,65 38/32 -8-8-83,60 QP26/38 3,51 200kb/s-15Hz-Switched 3,43 44/32 -163,40 120kb/s-30Hz-Switched 3,33 200kb/s-15Hz-Patched 3,32 44/32 -8-16-83,28 QP26 Upscaled 3,26 44/32 -8-8-83,24 200kb/s-30Hz-Upscaled 3,21 44/32 -8-128-83,21 38/32 -643,20 120kb/s-15Hz-Patched 3,19 44/32 -323,18 120kb/s-15Hz-Switched 3,16 QP26/44 3,12 38/32 -32-128-322,96 44/32 -8-8-8-8-82,95 200kb/s-15Hz-Upscaled 2,93 44/32 -8-8-8-128-82,93 QP32/38 2,87 44/32 -8-32-8-32-82,86 44/32 -642,78 120kb/s-30Hz-Upscaled 2,75 QP38/38 2,68 QP32 Upscaled 2,67 38/32 -1282,67 AVC QP38 2,65 120kb/s-15Hz-Upscaled 2,63 QP44/38 2,60 44/32 -32-16-322,58 QP32/44 2,57 38/32 -64-64-642,57 44/32 -32-128-322,56 38/32 -32-32-32-32-32-32-32 2,54 AVC Err.Con. 2,18 44/32 -64-64-642,07 QP38/44 1,80 QP38 Upscaled 1,71 QP44/44 1,59 AVC QP44 1,55 44/32 -56-8-56-8-56-8-561,55 QP44 Upscaled 1,37

+ | | | | |

| + | | | |

| | + | | |

| | | + | |

| | | | + |

| | | | | + | | |

| + | | | | | | | |

| | | + | | | | | | | | | |

| | + | | | | | | |

| | | + | | | | | | |

| | | | | + | | | | | | | | | |

| | | | + | | | | | |

| | | | | | + | | | | | | | | |

| | | | | | | + | | | | | | | | | | | | | | | | | | | |

| | | | | | | | + | | | | | | | | | | | | | | | | | | |

| | | | | | | + | | | | | | | | | | | | | | | | | |

| | | | | | | | | | + | | | | | | | | | | | | | | | | | |

| | | | | | + | | | | | | | | | | | |

| | | | | | | | | + | | | | | | | | | | | | | | | | |

| | | | | | | | + | | | | | | | | | | | | | | |

| | | | | | | | + | | | | | | | | | | | | | |

| | | | | | | | | + | | | | | | | | | | | | | | | | | |

| | | | | | | | | + | | | | | | | | | | | | | | | |

| | | | | | | | | | + | | | | | | | | | | | | | | | | | |

| | | | | | | | | | | + | | | | | | | | | | | | | | | | |

| | | | | | | | | | | | + | | | | | | | | | | | | | | | |

| | | | | | | | | | | | | + | | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | + | | | | | | | | | | | | | |

| | | | | | | | | | | | | | | + | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | | | + | | | | | | | | | | | | |

| | | | | | | | | | | | | | | | + | | | | | | | | | | | |

| | |

| | | | | | | | | | | | | | | | | | + | | | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | | | | | + | | | | | | | | | | |

| | | | | | | | | | | | | | | | | | | | | + | | | | | | | | | | | | |

| | | | | | | | | | | | | | | + | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | + | | | | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | | + | | | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | | | + | | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | | | | + | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | | | | + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + | | | | | | | | | | | |

| | | | | | | | | | | | | | | + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + | | | | | | | | |

| | | | | | | | | | | | + | | | | | | | | | |

| | | | | | | | | | | | + | | | | | | | | |

| | | | | | | | | | | | | | | + | | | | | | | | | | | | | | | | | | | | | + | | | | | | |

| | | | | | | | | | | | | + | | | | | |

| | | | | | | | | | | | | | + | | | | | |

| | | | | | | | | | | | | | | + | | | | |

| | | | | | | | | | | | | | | | + | | | |

| | | | | | | | | | | | | | | | | + | | |

| | | | | | | | | | | | | | | | | | + | |

| | | | | | | | + | |

| | | | | | + | |

| | + | | | | |

| | + | | |

| | + | |

| | | + |

| | | | | + | | +

Figure 3: HRC comparison after mapping the results of T1 and T2 onto T3.

5. CONCLUSION
In this paper we presented an approach to design subjective tests so that they share a common set of configurations used to align the outcomes of three experiments in a single data set. We used this data set to compare the impact of coding artifacts, transmission errors and error-concealment techniques in the context of Scalable Video Coding. In our future work, we will investigate on statistical tools to help construct the common set, such as determining the minimum number of HRCs to be included in order to reach a sufficient reliability after the fitting. The super-set of data obtained from the three experiments contains high variability in configurations and distortion types, which makes it a valuable resource for data mining and for designing objective quality metrics. This super-set will thus be used to get a better understanding of the factors having an impact on the perceived quality and design quality metrics adapted to the SVC transmission context.

6. REFERENCES
[1] ITU-T P.910 Rec. Subjective video quality assessment methods for multimedia appl. 1996. [2] M. N. Garcia and A. Raake. Normalization of subjective video tests results using a reference test and anchor conditions for efficient model development. QoMEX, 2010. [3] M. Pinson and S. Wolf. An objective method for combining multiple subjective data sets. SPIE Visual Communications and Image Processing, 2003. [4] Y. Pitrey and M. Barkowsky and P. Le Callet and R. Pepion. Evaluation of MPEG4-SVC for QoE protection in the context of transmission errors. In Proc. of SPIE Optical Imaging, 2010. [5] Y. Pitrey and M. Barkowsky and U. Engelke and P. Le Callet and R. Pepion. Subjective quality of SVC-coded videos with different error-patterns concealed using spatial scalability. In Accepted for IEEE EUVIP Conference, July, 2011.

135

Impact of Reduced Quality Encoding on Object Identification in Stereoscopic Video


Werner Robitza, Shelley Buchinger, Helmut Hlavacs
Research Group Entertainment Computing Faculty of Computer Science University of Vienna

{werner.robitza, shelley.buchinger, helmut.hlavacs}@univie.ac.at

ABSTRAC T
As more and more mobile mobile stereoscopic displays become available in phones and portable game consoles, many factors still have to be considered for the successful implementation of an end-to-end mobile 3D television system. Limited bandwidth in 3G networks require a tradeoff between perceived quality and bit rate shaping. When important objects that appear in a video can not be identified by the viewers, the meaning of a scene or even a whole clip can be lost. In this paper, we conduct a small user-centered experiment in order to find out which quantization parameter settings are necessary for important objects in stereoscopic videos to be identified. Also, the impact of quality on the perceived 3D effect is studied.

Categories and Subject Descriptors


H.1.2 [User/Machine Systems]: Human Factors

Keywords
Subjective Quality Assessment, Video Quality

1. INTRODUCTION
As stereoscopic television is currently on the verge of a comeback, digital cinemas are being equipped with latest technology for three-dimensional video experience. More and more devices are pushed onto the consumer market, enabling not only typically technology-oriented users, but also early adopters to enjoy stereoscopic video at home. Projects like 3D4YOU1 studied the particular requirements for 3D television and broadcasting, including technical aspects of recording, content management and transmission, but also fundamentals about subjective evaluation in stereoscopic video [8].
1

http://www.3d4you.eu/

However, in addition to the classical, static viewing scenario, where users are sitting in their living room in front of their television, more and more consumers are starting to use their mobile phones to watch the latest news or other clips. In this case, the experienced quality is an important issue because network conditions for mobile devices are limited. When aiming at providing new opportunities for 3D video and gaming on portable devices equipped with autostereoscopic displays, the limited bandwidth plays an important role. To provide an enjoyable experience to the users, the quality of a 3D media presentation needs to be (at least) acceptable. Within the Mobile3DTV project2 these and other issues have been studied with a focus on the mobile transmission of stereoscopic content over DVB-H [4]. Some published findings might be of general validity, whereas others might rely specifically on the use of DVB-H. Today, the success of DVB-H seems to be debatable: In some countries this technology has already been abandoned. It is therefore necessary to shift the focus to 3G networks. In 3G, especially in larger cities, bit rate throughput can vary due to shadowing and multi-path fading. Although technically possible, high bit rates like 2 MBit/s can not be achieved at all times. The bit rate can be expected to drop below certain levels, making it impossible to guarantee perfect fidelity in streaming video. This is why the video has to be encoded at a specific quality setting that is a tradeoff between 1) reasonably good visual quality and 2) an average bit rate that matches the networks typical performance. For mobile 3D television systems, the factors that lead to reduced quality have been classified into several groups ranging from capturing to coding and content visualization [5]. The sheer endless possible combinations of different settings and artifacts make it hard to objectively estimate perceived quality of stereoscopic video. From a users point of view it is however clear that in a video scene, the actors and objects that are interacted with need to be identifiable. This is necessary in order to understand the contents of a scene or even a whole (short) movie. Take as an example a crime scene where the suspect is only identifiable through a unique feature that might be hidden by coding artifacts. In this paper, the major goals consist of identifying at which point the quality degradation makes it impossible for the viewer to recognize the objects in a scene and how much average bit rate is needed for the video to be transmitted without serious impairments. Also, we assess how the per2

http://sp.cs.tut.fi/mobile3dtv/

136

ceived stereo effect is augmented with increased quality. The rest of this paper is structured as follows. In Section 2 the major research questions that are addressed in this paper are explained. The experimental settings of a user study that has been designed to find an answer to the formulated questions is described in Section 3 and results are presented in Section 4. Conclusions are drawn in Section 5.

QP 10 12 15 19 22 25 32 39 45 51 60

AM 5238 3883 2678 1750 1238 886 405 192 107 85 75

BA 6049 4502 3062 1931 1355 950 416 192 96 75 64

DF 4972 3606 2368 1462 1003 693 309 149 75 64 53

LL 5740 4257 2870 1814 1259 886 395 181 85 64 53

TU-B 4826 3504 2150 1181 774 544 256 128 67 51 48

Avg. 5365 3951 2626 1627 1126 792 356 169 86 68 59

2. MAIN IDEA
We can assume that upon a certain threshold of impairment, any object in a video or still image will become undetectable or unidentifiable by a test person. The reduction of quality achieved by increasing the encoding base quantization parameter (QP) introduces distinct artifacts like blocking, blurring or ringing. At some point, those artifacts lead to the aforementioned negative effect. We wanted to measure the grade of impairment that is necessary for an object to become identifiable. Put differently, starting from a heavily impaired video, the question is: How much do we need to increase the quality for the key objects to become identifiable? The stereoscopic view will aid users in identifying objects, for example, because of their perceived depth. The main results of this experiment are the following: 1) For every key object in the video there is a threshold QP averaged over each person that defines whether it is identifiable or not. Key objects are for example the main actors in a scene or the objects they interact with. 2) There is an average threshold for each combination of video and QP that marks the point where test persons are accepting the degradation and considering the quality acceptable for consumption. 3) Another working hypothesis is that the perceived stereoscopic effect is correlated with the perceived quality of the video. Moreover, there should be a maximum level of subjective quality attained at a specific QP, so that lowering the QP will not lead to higher subjective scorings. One could also say that the lowered noise is not detectable anymore and the bandwidth would be wasted.

Table 1: Average bit rates in kBit/s for all combinations of scene and QP. pect ratio, in front of a mid-grey background. The viewing distance was chosen to be smaller than usual [1], approximately within the length of the observers arm, but no more than that.

3.2 Test material


Video sources: As the main test material, four clips from the Fraunhofer Heinrich Hertz Institute and one clip from the TU Berlin have been used. The material was made available in the context of the Mobile3DTV project3 . The HHI videos used were Alt moabit (AM), Book arrival (BA), Door flowers (DF) and Leaving laptop (LL). The clips have a total length of 100 frames with a frame rate of 16.67 Hz, thus leading to 6 seconds duration. The video resolution is 512 by 384 pixels. AM shows an outdoor scenery in natural light. The camera is pointed towards the street where cars can be seen rushing by. The depth level is very low. The other three clips show scenes from a studio where an artificial light source was used to illuminate a set-up scene including multiple depth levels and high spatial detail. Of particular interest in those scenes are the several objects placed in the scene, like the statue of a lion facing the camera, or a green board with the letters HHI imprinted (which can be seen in Figure 1). In the scenes, one of the actors carries an object (e.g., a book) that we consider the main region of interest along with the actors themselves. For the clip called TU-Berlin (TU-B), the first 500 frames were used, totaling in 20 seconds of content. The resolution is smaller than the other four videos 320 by 288. The clip shows an equestrian statue with a very distinct foundation. Encoding: It is very likely that H.264 Multi View Coding [3] will become the primary codec for stereoscopic video, at least for the foreseeable future. Hence, for our tests, the videos were encoded with the latest available build of the H.264/MVC reference encoder JMVC (8.3.1) into an AnnexB compliant NAL bitstream. We can assume that in professionally transcoded content, stereoscopic encoding will be more sophisticated, but we decided not to modify the the software in any way to achieve comparable results. In order to achieve different bit rates, no rate control was used, as this might have lead to insufficient results in terms of bit allocation when using short video sequences. Instead,
3

3. EXPERIMENTAL SETTINGS
In this investigation, we wanted to evaluate those thresholds and correlations by means of a subjective test suited for 3D content [6]. The experiment was conducted with respect to the International Telecommunication Union (ITU) recommendation BT.500 [2]. Prior to testing, all subjects were screened for normal vision or corrected-to-normal vision with a Snellen chart. Color vision was tested with an Ishihara test [7]. To ensure proper stereopsis, a software test included with the active shutter glasses was conducted. Stereopsis tests such as mentioned in [1] have unfortunately not been available. Observers were therefore also asked if they perceived a stereoscopy effect during the clips.

3.1 Technical details


Our tests are targeted towards mobile quality assessment, however no mobile terminal with an autostereoscopic display was available yet. In the future it is inevitable to repeat a similar test on a real mobile device, but for this evaluation, a desktop PC was used, with an Alienware OptX AW2310 120 Hz display. Stereoscopy was achieved by using a pair of NVIDIA 3DVision active shutter glasses. The videos were played using the NVIDIA 3D Player. We decided to play the videos in their original size and as-

http://sp.cs.tut.fi/mobile3dtv/

137

Object Flip chart Jackets (left side) Telephone (desk) Wall mounted clock Mannequin Book (right person) Lion statue (in front) Coffee mug (on the desk) HHI imprint on green sign Flowers on the desk Text visible on flip chart Figure 1: Example: Video still composed from the same clip, but at different qualities

First (QP) 60 60 60 60 51 51 45 51 39 39 32

Most (QP) 60 60 60 60 45 39 39 39 32 32 32

Votes % 100 83 29 71 63 50 63 50 50 43 100

Table 2: Ob jects and QP levels necessary for identification, incl. votes (Clip Book arrival) recommend an acceptable QP that already leads to a pleasant quality level. Quality and stereoscopy: Participants were asked to rate both perceived quality and depth of stereoscopy on a five-point discrete scale. Because of the bias effect mentioned above we did not expect the associated ratings to decrease during one session. We did however expect different correlation strengths as the depth levels varied throughout the videos.

the desired target average bit rate was obtained by encoding the video with eleven different base quantization parameter (QP) settings. We included very low QP (10) for extreme bandwidths and very high QP for a higher amount of artifacts. The average bit rates can be seen in Table 1. As an example, Figure 1 shows a still from a clip used in the experiment. The left two thirds of the picture are taken from the version with QP at 60, the right part is from the QP 10 version. Clearly, the flower bouquet is unidentifiable in the left part. Ob jective Quality: Before the subjective experiments, the quality of the encoded videos was measured to ensure that the chosen quantization parameters and the resulting bit rate would lead to a constant degradation of quality. First, the peak signal to noise ratio (PSNR) [9] was calcu- lated on all videos for each view. The average difference in PSNR between two views was marginal (much less than 1 dB), therefore only the PSNR for the left view was evaluated. For brevitys sake the exact results are not provided, but we were able to observe an almost linear relationship between QP and PSNR, with a correlation of -0.988.

4. RESULTS
After the experiment, only two evaluators stated that they did not even want to try 3D at home or on a mobile phone. The female participant experienced simulator sickness during the first session already.

4.1 Object Recognition


The level at which the object was first identified was noted for each participant. It is assumed that a participant will continue to recognize the same object for all lower QP values. The QP level with the highest frequency of occurrence among all evaluators was selected to be the minimum QP necessary to properly identify an object.

3.3 Test procedure


In total, eight naive observers took part in the experiment, aged between 22 and 32 (mean 25, 2.53 at .95% CI), including one female. Interestingly, five participants have had experience with 3D video before, mostly from 3D cinema (and therefore using polarized 3D glasses). No signs simulator sickness could be observed (such as headache, nausea or eye strain). Ob ject recognition and acceptability: Observers would see the clips from one scene, starting with the most heavily impaired one. As the quality was increased for each clip, they were asked about the details they saw in the video. Observers were allowed to talk during assessment, where their descriptions were noted on a sheet of paper. Beginning with the lowest objective quality video of a scene, the test persons could watch the video three times. Whenever they were able to correctly identify an object, the QP level was noted. We are aware of the fact that always increasing quality introduces a bias, but it was the only way of reliably testing the ability of the viewers to name the objects. For each QP in a scene, the participants were asked if they felt that the visual quality was acceptable to watch. By counting the votes for each QP level, it is possible to

4.1.1 Indoor clips


A selection of objects and their QP values for the Book arrival clip can be seen in Table 2. The first column identifies the object. The second column denotes the QP level where the first person was able to name the object. The third column is the QP level where most of the observers identified the object. Note that some objects have not been identified by all users. As an example, the text on the flip chart was only nominated by two people, but both of them were able to see it at the same QP level. In general, almost all important objects had been identified at a QP level of 39. The scene setting for the other indoor clips was the same as in the first one the observants had seen. Therefore, they were only asked to identify the main objects that changed. In Door flowers the person that enters the room carries a bouquet of flowers. It was assumed that observers could logically conclude that the flowers, now missing on the desk, would be carried by the person entering nevertheless the flowers were visible at QP = 39 for 38% of the participants.

138

Scene Alt moabit Book arrival Door flowers Leaving laptop TU-Berlin

Acceptable QP 32 (75%) 32 (75%) 32 (75%) 32 (87.5%) 32 (75%)

Maximum QP 22 (75%) 22 (87.5%) 22 (87.5%) 22 (87.5%) 22 (87.5%)

Table 3: Acceptable QP and maximum QP values for each scene with fraction of votes

4.1.2 Outdoor Clips


As the number of features in the outdoor clips are less than for the three others, only two attributes needed to be identified. For the TU-Berlin clip, the equestrian statue and its foundation had to be described correctly. For 63% of the users, the statue was clearly visible at QP = 45. The foundation was identified at QP = 39 (for 75% of the participants). The Alt moabit clip features several cars passing by. All participants were able to count their number correctly even at the largest QP setting (60). There is a textual advertisement on the bus which could be correctly identified at QP = 39 (88% of users).

4.2 Acceptability Threshold


Even though the visual quality was marked as acceptable, clips with higher QP values were presented until there was no noticeable difference between two subsequent videos anymore. We took the QP step that most of the viewers agreed upon, not counting the last one where all agreed. For example, if all observers agreed on the fact that QP = 25 is an improvement, but only a fraction agrees on QP = 22, then the latter is counted as the maximum QP. As with the acceptable QP before, the results are shown in Table 3. Interestingly, the QP values for both, acceptable and maximum QP are the same for all five clips.

QP values. The minimum QP for the identification of the scenes key object was identified. When looking at the results, it becomes clear that a QP of 32 resulting in an average bit rate of about 350 kBit/s is sufficient to provide an acceptable video quality. In fact, all key objects could be identified at a QP of 32 or higher. The required bit rate for such a QP level perfectly aligns with the given 384 kBit/s of the UMTS base transmission rate. In scenarios where reception is good and technologies like HSDPA are available, an average bit rate of about 1100 kBit/s can be considered the absolute maximum necessary. It would be achieved with a QP of 22, which is already very low. Another extremely important result is the highly significant positive correlation between experienced depth of the 3D effect and the subjective quality of the video itself. The only exceptions come from the video with the shallowest depth, which explains why no notable 3D effect was perceived by the users. Further investigation has to be performed in order to verify this result. There currently exists no error concealment method in the H.264/MVC reference decoder. For the future it is therefore necessary to develop error resiliency tools for the transmission of MVC video over lossy channels especially 3G networks as well as basic error concealment in the decoding stage. With such an error concealment, it would become possible to study the impacts of packet loss, thereby creating a more realistic testing scenario. The thresholds for the quality degradation can be refined by testing more combinations of possible scene and shot types. More importantly, the tests should also be performed on mobile and autostereoscopic displays. This will reveal differences in the perceived quality as well as enable more precise studying for mobile usage scenarios.

6. REFERENCES
[1] ITU-R BT.1483, Subjective Assessment of Stereoscopic Television Pictures. 2000. [2] ITU-R BT.500-11, Methodology for the subjective assessment of the quality of television pictures. 2002. [3] ISO/IEC 14496-10, Information technology Coding of audio-visual objects Part 10: Advanced Video Coding, 2010. [4] G. B. Akar, M. O. Bici, A. Aksay, A. Tikanmki, and a A. Gotchev. Mobile stereo video broadcast. Technical report, Mobile3DTV, 2008. [5] A. Boev, D. Hollosi, A. Gotchev, and K. Egiazarian. Classification and simulation of stereoscopic artifacts in mobile 3dtv content. In Stereoscopic Displays and Applications, Electronic Imaging Symposium, San Jose, CA, USA, 2009. [6] W. Chen, J. Fournier, M. Barkowsky, and P. L. Callet. New requirements of subjective video quality assessment methodologies for 3DTV. In VPQM, 2010. [7] S. Ishihara. Tests For Colour-Blindness. H.K. Lewis, London, 1957. [8] S. Joll, J. Zubrzyck, and O. Grau. WP1 - Deliverable 1.1.2, 3D Content Requirements & Initial Acquisition Work. Technical report, 3D4YOU, 2009. [9] S. Winkler. Digital Video Quality. Wiley, 2005.

4.3 Quality and Stereoscopy


As both criteria, perceived depth and quality, were assessed on an ordinal scale, Spearmans rank order correlation coefficient (SROCC) was calculated for all combinations of scene and observer (total number of 40). In 33 cases, there was an extremely significant correlation between perceived depth and video quality with a confidence level of 99% in 37 cases, the correlation was significant for 95% confidence. Note that this does not necessarily imply a correlation on the perception level. We can observe two anomalies relating to one scene: 1) Two users gave the same judgement of 1 for every QP level of Alt moabit, which is an indication that in this video, the maximum disparity could be too low for a visible stereoscopy effect. In fact, the clip does not feature any distinct depth levels. 2) The same video seems to have no significant correlation also for another user, too (p = 0.0655).

5. CONCLUSIONS AND FUTURE WORK


In this paper we conducted a small study on encoding parameters for H.264 Multi View coded video. In order to find the minimum necessary QP for an acceptable video quality, test persons were asked to describe the objects presented in a scene and to rate the quality of the clip with different

139

Impact of Disturbance Locations on Video Quality of Experience


Tahir Nawaz Minhas
School of Computing Blekinge Institute of Technology Karlskrona, Sweden

Markus Fiedler
School of Computing Blekinge Institute of Technology Karlskrona, Sweden

Tahir.Nawaz.Minhas@bth.se ABSTRACT
Quality of experience is getting the attention of the research community as well as the industry. In case of real-time streaming, packet loss, delay and jitter degrades the video quality. The player buer can be emptied due to long delay, which freezes video at playout, while resuming the streaming video content causes jumps to current location. There is the question, how user is going to react to such kind of artifacts with respect to location where they arise. We collected user ratings for videos showing the artifacts due to delay variation as well as freezes and jumps at dierent locations. We also veried these results with Perceptual Evaluation of Video Quality (PEVQ) application. For delay and delay variation case, the PEVQ results are aligning to the human rating, but both dier in freeze-and-jump case. The users responses in case of freeze-and-jump shows interesting results with respect to location.

Markus.Fiedler@bth.se

Categories and Subject Descriptors


C.2.3 [Computer Communication Networks]: Network Operations

General Terms
Human Factors, Measurement

Keywords
Quality of Experience, User Perception, PEVQ

1. INTRODUCTION
To win the users loyalty by providing the good services, service provider use to measure the quality of services with dierent methods. Amongst others the concept of Quality of Experience (QoE) is emerging quickly and getting attention of service providers and the research community. In case of video streaming, QoE can be inuenced by video transcoding and transmission. Real-time video transmission

over the Internet may suer with degraded quality due to packet loss, delay and delay variation. Moreover, if by any reason (due to long delay, packet loss or key frame loss) the receiver stops receiving the stream and the player empties the play-out buer, then freezing will occur until the stream is resumed. For real-time streaming, the user will miss the content of video for that time period because the video will jump from stop position to resumed position. This will inuence the video quality and thus also the user perception. The video quality is assessed by using either objective or subjective method. Subjective quality depends on various factors based on human psychology and viewing conditions, such as observer vision ability, translation of quality perception into ranking score, preference for content, adaptation, display devices, ambient light levels etc. [13]. Dierent studies have been undertaken for video quality of experience in packet networks [3] [4] [7]. In [10], authors investigated the eect of frozen and skipped frames on video quality. The mean opinion score (MOS) has been considered the most reliable subjective quality measurement method, however it is time consuming and inconvenient. Alternative algorithms have been implemented and approved by VQEG and ITUT.Basically these softwares analyze the objective metrics of video quality and correlate them with subjective quality MOS. The Perceptual Evaluation of Video Quality (PEVQ) [8] model is recommended by ITU-T (J.247) in the category of objective perceptual multimedia video quality measurement in the presence of a full reference [11]. In this work we evaluate the PEVQ with reference to user perception for different jitter conditions. Also, we test the PEVQ for a xed duration of freeze-and-jump of a video sequence at dierent locations and compare the results with users ratings. The remainder of this paper is organized as follows. Section 2 introduce the perceptual evaluation of video quality (PEVQ). In Section 3 we describe the experiment setup and we discuss freezes and jumps of video sequences with respect to dierent location of sequence. The results are presented and discussed in Section 4 and 5. In Section 6, we conclude the paper.

2.

PERCEPTUAL EVALUATION OF VIDEO QUALITY (PEVQ)

PEVQ is provided by OPTICOM, is part of the PEXQ software suite and recommended by ITU for perceptual multimedia video quality measurement. It measures degradations due to network by analyzing the degraded video. This model measures the Quality of Experience (QoE) based on modeling the behavior of the human [8]. It also quanti-

es the other video quality parameters like PSNR, distortion indicator and lip-sync delay. PEVQ is built on PVQM and designed for mobile applications and multimedia applications. For video quality detection, it is based on ve indicators that are motivated by the human visual system (HVS). These indicators operate in temporal, spatial, luminance and chrominance domains [11]. The results of these indicators are incorporated and integrated in order to derive the MOS [8].

3. EXPERIMENTS
To study the impact of delay and delay variation on video, the experimental setup shown in gure 1 is used. It consists of a video streamer, video player, shaper and measurement point (MP). For streaming and playing VLC is used, while for delay and variable delay shaping the Network Emulator NetEm [6] is used, as it provides the best delay shaping as compared to other shapers [12].We captured the video trac before and behind the shaper using the Distributed Passive Measurement Infrastructure [1] based on DAG cards [5] to verify the shaped delay and delay variation. Streamer and player were installed on Microsoft Window XP while the shaper ran on Linux. The delay (D) and variable delay (D) settings used for these experiments are D D=100 ms{0 ms, 2 ms, 4 ms, 6 ms, 8 ms, 10 ms, 12 ms, 14 ms, 16 ms}.
6WUHDPHU 6KDSHU 3OD\HU

region at D = 4 ms. Similarly, Figure 4 shows the users feedback for F ootball video. Here, in the rst case with zero D the MOS is higher as compared to F oreman case. MOS is decreasing linearly with the rising delay variation till 10012 ms and signals BAD perception for D > 6 ms. The F ootball video shows more resistance to delay variation as compared to F oreman because it is a fast moving video that makes it dicult for a user to notice quick changes and disturbance. Those two video sequences are also analysed with PEVQ by OPTICOM [8]. The original video is used as reference for 1000 ms case and later all other videos are ranked with reference to 1000 ms video. Figures 3 and 4 shows the result along the results discussed above. From the gures, it is easy to see that the PEVQ results are very much in agreement with the user ratings.
5 4.5 4 3.5 MOS 3 2.5 2 1.5 1 Users MOS PEVQ MOS

8 D ms

10

12

14

16

03

Figure 3: PEVQ MOS and UsersMOS rating for F oreman Video

Figure 1: Experiment Setup


5

Figure 2 presents a view of the video sequences that were created to study the user perception with respect to a sequence of one-second freeze followed by a jump of video on dierent locations. The video sequences used for the experiments have 25 fps. The rst test sequence starts with freezed frame number one for one second and then continues playing from 25th frame to the end of video. In the 2nd test sequence the video plays for one second, then the video freezes at 25th frame awaiting one second duration and then jumps to the 50th frame, from where it plays till the end. Similarly the process will continue till the last second of the video sequence, i.e we move the freeze-and-jump through the video second-by-second. User perception tests were conducted on campus, most of the participants are undergraduate and graduate students. Moreover the tests were conducted according to the recommendation of ITU-R Rec. BT. 500-11 [2] and ITU-T P.910 [9] using absolute category rating (ACR).

4.5 4 3.5 MOS 3 2.5 2 1.5 1

Users MOS PEVQ MOS

8 D ms

10

12

14

16

Figure 4: PEVQ MOS and UsersMOS rating for F ootball Video As we saw in gures 3 and 4 the PEVQ and users ratings are very much aligned with each other, which can also be veried numerically from the table 1. We can see that the dierence between user rating and PEVQ is small. However after D > 10 ms the PEVQ ratings remain greater than or equal to 1.5, while user ratings approach to one for D 14 ms.

4. DELAY/DELAY VARIATION
Figure 3 depicts the users perception for F oreman video. The x-axis shows the delay variation D in milliseconds around a xed delay D=100 ms. The MOS is shown along the y-axis. On the x-axis zero means that video is received with a xed delay of 100 ms, in this case MOS is 3.720.247. There is no remarkable change in MoS in case of 100 ms and 1002 ms but after that with the increase of D, MOS decreases linearly till 1008 ms and enters the BAD perception

5.

FREEZE-AND-JUMP

As shown in Figure 2 we focus on special cases, in which a one-second freeze is followed by a jump of video moves along the whole sequence. For this test we selected the F oreman, N ews and HallM onitor videos. The corresponding users

V )UHH]H DW IUDPH V

V DQG -XPS WR V DIWHU RQH VHFRQG

V DIWHU RQH VHFRQG

)UHH]H DW IUDPH DQG -XPS WR V V V

V DQG -XPS WR V DIWHU RQH VHFRQG

)UHH]H DW IUDPH V

V DQG -XPS WR DIWHU RQH VHFRQG

)UHH]H DW IUDPH

DQG VR RQ WKLV IUHH]H DQG MXPS FRQWLQXH WLOO HQG RI YLGHR VHTXHQFH

Figure 2: Freeze and Jump Variation on Video Sequence

Delay DD 100 0 100 2 100 4 100 6 100 8 100 10 100 12 100 14 100 16

Foreman PEVQ User MOS 3.78 3.73 0.247 3.27 3.61 0.226 3.03 2.95 0.237 2.15 2.15 0.233 1.86 1.68 0.174 1.57 1.41 0.181 1.92 1.32 0.144 1.76 1.12 0.122 1.49 1.15 0.129

PEVQ 5.00 4.04 3.89 3.16 2.22 1.69 1.56 1.70 1.61

Football User MOS 4.71 0.186 4.25 0.177 3.79 0.204 3.25 0.177 2.50 0.264 1.67 0.226 1.50 0.264 1.08 0.113 1.08 0.113

5 4.5 4 3.5 MOS 3 2.5 2 1.5 1

Users MOS PEVQ MOS

50

Table 1: PEVQ MOS and Users MOS for Foreman and Footbal Video

100 150 Number of Frames

200

250

Figure 5: PEVQ MOS and UsersMOS rating for F oreman Video with Freeze and Jump perceptions are shown in gures 5, 6 and 7. The MOS is shown along the y-axis, whereas frame numbers are shown along the x-axis. The data shown for x = 0 corresponds to the original videos of 10 seconds length, whereas all other videos have nine seconds actual video plus one second freeze. This one second freeze-and-jump moves from rst second to last second of video. In both gures we can see that the users perception varies with respect to the location where this one-second freeze-and-jump occurs. For example in gure 5 when the freeze occurs at frame number 25 and jumps the frames from 25 to 50, the MOS is 3.63 whereas when freeze and jump cover frames 100 to 125, the MOS drops to 2.63. The corresponding results of PEVQ are also given in gures 5, 6 and 7 along with the users results. There is a considerable gap between the users and PEVQ rating, but the graphs are similar in shape. For the F oreman and N ews videos we can see the ratings variation by users as well as PEVQ, however in case of HallM onitor the PEVQ rating remains approximately constant except for the rst two and the last values. Table 2 shows the results of moving freeze and jump along the videos of F oreman, N ews and HallM onitor. From the table we can see that the rating of PEVQ and Users dier from each other. For all videos the rating of PVEQ is greater than four, while Users rating is varying between two and four, which indicates that user is pickier with respect to the location where freeze and jump occur, whereas the PEVQ just compares the frozen frame with the reference video. Obviously this yields a better rating as compared to user rating. PEVQ gives approximately the same rating for all HallM onitor videos, except for rst and last video. It has an unchanged background of hall with two moving persons, so PEVQ obviously could not gure

5 4.5 4 3.5 MOS 3 2.5 2 1.5 1 Users MOS PEVQ MOS

50

100 150 Number of Frames

200

250

Figure 6: PEVQ MOS and UsersMOS rating for N ews Video with Freeze and Jump

out any big dierence between the freeze-and-jump video and the reference video while comparing the frames with each other. From table 2 we can see that PEVQ rating for these videos is around 4.5 except for rst and last video, whereas the users ratings are signicantly lower than the PEVQ ratings and varying with respect to the location of the disturbance.

6.

CONCLUSION

In this paper, we have presented the study of users perception for delay variation and freeze-and-jump cases and also compare the user rating with PEVQ result. In case of delay and delay variation we observed that the MOS rating

Frame Freeze Jumped 0 0 1 1-25 25 25-50 50 50-75 75 75-100 100 100-125 125 125-150 150 150-175 175 175-200 200 200-225 225 225-250

Foreman PEVQ User MOS 5.00 4.63 0.245 4.73 3.63 0.352 4.50 3.00 0.358 4.53 2.88 0.352 4.41 2.75 0.335 4.48 2.94 0.378 4.51 2.88 0.434 4.24 2.63 0.352 4.45 3.06 0.418 4.70 3.75 0.335 4.92 3.88 0.303

PEVQ 4.84 4.53 4.47 4.46 4.05 4.45 4.25 4.47 4.46 3.91 4.49

News User MOS 4.69 0.261 3.77 0.394 3.00 0.544 3.08 0.564 2.85 0.696 2.61 0.522 2.92 0.469 2.85 0.660 3.08 0.564 3.08 0.564 3.92 0.348

Hall PEVQ 5.00 4.77 4.47 4.48 4.48 4.54 4.52 4.55 4.54 4.52 4.70

Monitor User MOS 4.56 0.344 3.67 0.462 2.33 0.566 2.44 0.576 2.22 0.544 2.22 0.714 2.11 0.393 2.33 0.800 2.56 0.808 2.44 0.808 3.22 0.635

Table 2: PEVQ MOS and Users MOS for freeze and jump videos [2] ITU.-R. BT.500-11. Methodology for the subjective assessment of the quality of television pictures. International Telecommunications Union, 2002. [3] P. Calyam, M. Sridharan, W. Mandrawa, and P. Schopis. Performance measurement and analysis of h.323 trac. In Passive and Active Network Measurement, volume 3015 of Lecture Notes in Computer Science, pages 137146. Springer, 2004. [4] M. Claypool and J. Tanner. The eects of jitter on the peceptual quality of video. In MULTIMEDIA 99: Proceedings of the seventh ACM international conference on Multimedia (Part 2), pages 115118. [5] Endace. Endace measurement systems. [Online] http://www.endace.com/, accessed March 2010. [6] S. Hemminger. Network emulation with NetEm. In Linux Conf Au, 2005. [7] Y. J. Liang, J. G. Apostolopoulos, and B. Girod. Analysis of packet loss for compressed video: Does Burst-Length matter? 5:684687, 2003. [8] OPTICOM. Perceptual evaluation of video quality. [Online] http://www.pevq.org/, accessed March 2011. [9] ITU.-T. P.910. Subjective video quality assessment methods for multimedia applications. International Telecommunications Union Telecommunication Sector. [10] Y. Qi and M. Dai. The eect of frame freezing and frame skipping on video quality. Intelligent Information Hiding and Multimedia Signal Processing, International Conference on, 0:423426, 2006. [11] ITU.-T. REC-J.247. Objective perceptual multimedia video quality measurement in the presence of a full reference. International Telecommunications Union Telecommunication Sector, August 2008. [12] J. Shaikh, T. Minhas, P. Arlos, and M. Fiedler. Evaluation of delay performance of trac shapers. In Second International Workshop on Security and Communication Networks, pages 18, 2010. [13] V. Vassiliou, P. Antoniou, I. Giannakou, and A. Pitsillides. Requirements for the transmission of streaming video in mobile wireless networks. In Articial Neural Networks ICANN 2006, volume 4132 of Lecture Notes in Computer Science, pages 528537. [14] VideoLAN. [Online] http://www.videolan.org/vlc/, accessed June 2010. [15] VQEG. Video quality experts group. [Online] http://www.its.bldrdoc.gov/vqeg/, accessed July 2010.

5 4.5 4 3.5 MOS 3 2.5 2 1.5 1 Users MOS PEVQ MOS

50

100 150 Number of Frames

200

250

Figure 7: PEVQ MOS and UsersMOS rating for HallM onitor Video with Freeze and Jump

drops linearly towards the bad region due to increase in delay variation. In this case PEVQ shows the result that are well aligned to users rating. In the freeze-and-jump case, the user reacts dierently with respect to the location where the problem happens. User ratings vary for the same kind of disturbance arising at dierent locations within the video, which indicates that the disturbance and its location have a combined impact on human perception. We observed similar behavior of user for three videos chosen for this study. In this case the results of PEVQ show variations similar to users observations but the reduction of the MOS diers signicantly in magnitude. In general the rating of PEVQ is between good and excellent, while the user rating is between poor and fair. On the basis of these results one can use PEVQ for MOS study in case of delay and delay variation and potentially for other typical network performance issues. But for freeze-and-jump and similar cases, one has to be careful as PEVQ overestimates user perception.

Acknowledgment
I would like to thanks Vasanthi D. Bhamidipati and Swetha Kilari for the QoE survey of delay/delay variation videos.

7. REFERENCES
[1] P. Arlos, M. Fiedler, and A. A. Nilsson. A distributed passive measurement infrastructure. In Proceedings of Passive and Active Measurement Workshop, pages 215227, 2005.

Workshop2:FutureTV2011:MakingTelevisionPersonal&Social

144

Workshoppapersofthe"FutureTelevisionMakingTelevisionPersonal&Socialavailableat: http://CEURWS.org/Vol720 Analysis of the Information Value of User Connections for Video Recommendations in a SocialNetwork ToonDePessemier,SimonDooms,JoostRoelandtsandLucMartens Employing UserAssigned Tags to Provide Personalized as well as Collaborative TV Recommendations

AndreasThalhammer,GntherHlblingandDieterFensel
SocialandInteractiveTV:AnOutsideInApproach VenuVasudevanandJehanWickramasuriya AnalyzingTwitterforSocialTV:SentimentExtractionforSports SiqiZhao,LinZhong,JehanWickramasuriyaandVenuVasudevan OurTV:CreatingMediaContentsCommunitiesthroughRealWorldInteractions JanakBhimani,ToshihiroNakakuraandKazunoriSugiura ITVservicesforsocializinginpublicplaces

PedroAlmeida,JorgeAbreuandRuiJos
Ubeel:GeneratingLocalNarrativesforPublicDisplaysfromTaggedandAnnotatedVideo Content

PiaOjanen,PetriVuorimaa,PetriSaarikkoandSannaUotinen
HybridalgorithmsforrecommendingnewitemsinpersonalTV FabioAiroldi,PaoloCremonesiandRobertoTurrin MiningKnowledgeTV:AProposalforDataIntegrationintheKnowledgeTVEnvironment

JosCarlosAlmeidaPatrcioJuniorandNatashaQueirozLino

145

Workshop3:InteractiveDigitalTVinEmergentEconomies

146

GEmPTV: Ginga-NCL Emulator for Portable Digital TV


Fabio Gomes de Souza
Nokia Institute of Technology Digital TV Department Manaus AM, 69048660, Brazil +55 92 8140 6280

Luiz Filipe da Silva Souza Pinto


Nokia Institute of Technology Digital TV Department Manaus AM, 69054700, Brazil +55 92 9208 6010

Vicente F. de Lucena Jr
University of Amazonas -- Ceteli Campus Universitrio Manaus AM, 69077000, Brazil +55 92 3305 4680

fabio.souza@indt.org.br

vicente@ufam.edu.br

luiz-filipe.pinto@indt.org.br ABSTRACT
Ginga 0 is the interactive middleware for ISDB-T system. Since it supports new technologies, create interactive content for this platform is not an easy task, mainly by the lack of appropriate tools for the development and validation of interactive applications. When we talk about portable devices, this task becomes even more challenging. Such devices have very distinct characteristics that must be taken into account. Therefore, this article proposes the development of an emulator for portable digital TV terminals.

Categories and Subject Descriptors


D.2.11 [Software Engineering]: Software Architectures; I.7.2 [Document Preparation]: Languages and systems, Hypertext/hypermedia, Markup languages, Standards

General Terms
Languages, Standardization, Design, Verification.

Keywords
Digital TV, Ginga, NCL, ISDB-T, Emulator

1.

INTRODUCTION

Along with the advent of digital TV in Brazil, it has raised a strong demand for strained professional and properly equipped to meet the needs of this new technology. Mainly by the fact the extended ISDB-T incorporates, among other technologies, the support for interactive applications throughout the middleware Ginga. In order to train professionals properly into this new middleware, it is necessary, besides digital TV standard knowledge, adequate tools so that interactive content authors may have all the equipment necessary for the preparation of their works. Appropriate professional tools have a very high value for purchase, with cost value of tens of thousands dollars. In this context, interactive content authors have no alternative but to

acquire such equipment, or work in a large company which already owns all the equipment necessary for the preparation of their work. The ISDB-T digital TV system has been adopted by several countries in South America such as Brazil, Argentina, Chile, Ecuador, and Bolivia. Besides that, the ISDB-T system is also under analysis for adoption by other South American countries as well as by countries of Central America and Africa. One of the great advantages of this new ISDB-T system is related to the transmission of interactive content, where TV users have the possibility to interact with the programming of the content provider without the need for additional devices. The Brazilian digital TV middleware for handling interactive content is Ginga. Since it supports new technologies, and it enables the development of interactive applications in two different programming languages paradigms (declarative and imperative), create interactive content becomes an arduous task, especially by the lack of adequate tools for development and validation of these works. When we talk about portable devices, this task becomes even more challenging, by the fact such devices have very distinct characteristics that must be taken into account, such as: (a) different screen resolutions and screen orientations, (b) resource constraints of hardware, (c) usability (input: keyboard x touch screen), and (d) support of interactive channel. Hence the need for tools that assist in developing interactive content dedicated to mobile TV receivers is evident, and the objective of this paper is to present a proposal of a Ginga-NCL emulator for digital TV portable devices.

2. GINGA-NCL FOR PORTABLE DEVICES


The Ginga-NCL subsystem is a mandatory item for portable terminals 0. Its reference implementation 0 is based on the implementation of Ginga-NCL for fixed terminals, but the former takes into account the limitations of hardware and software characteristics of this sort of terminals. Ginga-NCL defines the declarative programming language NCL 0 as the main programming language for developing interactive applications for digital TV, with the support of Lua scripts 0 in order to allow a better expressiveness in interactive works.

2.1.

Ginga-NCL Programming Languages

Since the Ginga-NCL subsystem must be present on portable TV receivers, interactive applications focused on these terminals should be developed in NCL language. NCL allows the development of multimedia applications with time-space

147

synchronization between the media objects involved. By the fact NCL is a declarative language only, the execution of some tasks may become an arduous process without the help of imperative language capabilities. String manipulations, mathematical calculations, use of the return channel are typical examples of actions that enrich an interactive application. To support these tasks with, the Lua scripting language can be used by means of NCLua 0 objects, and thus provide more expressiveness to interactive content author. For the development of enhanced interactive content it is necessary to have knowledge of the following technologies: NCL defines a time-space relationship between distinct media, thus forming a multimedia presentation. It is characterized by the separation between the content of the media and the structure of NCL elements, where the latter only refers to the media, while the formatter is responsible for displaying them properly. Lua is an interpreted scripting programming language designed to expand applications in general. Lua combines imperative programming with powerful data description constructs based on associative table and extensible semantics. It is dynamically typed, interpreted from byte codes, and has automatic memory management with garbage collection. NCLua is an extension of the scripting language Lua. It was created to suit the environment of digital TV as well as integrate with NCL through NCLua objects 0. NCLua objects give more expressiveness to the interactive content developer, allowing his application to reap the benefits of imperative languages, such as manipulation of mathematical formulas, word processing, graphic design, etc. Figure 1. Reference Implementation Architecture The Lua Machine module was added to the core to meet the specifications of the Brazilian digital TV standard, as it defines Lua as scripting language for handling NCLua objects. Update Manager component is responsible for receiving updates of the system coming through the air, via digital TV signal, and updates the software components in a modular way, without interfering with the communication interfaces between other components. Context Manager manages information about the system embedded in the device as well as viewers profile. Finally, the Transport module is responsible for dealing with the mechanism of interactivity. Once the interactive content is downloaded to the receiver, and it is processed by the GC subsystem, this content is immediately available to NCL Formatter. At this point the Formatter waits for a request to present the interactive application on receiver screen, either by TV user interaction or automatically initiation of the NCL application, as defined by interactive content author.

2.2.

Reference Implementation

Figure 1 depicts the Ginga-NCL reference implementation architecture for portable devices, which is divided into two logical subsystems: Ginga Core, responsible for extracting the interactive application from the digital TV signal and provide system functionality access to presentation machine. Ginga-NCL Presentation Machine, responsible for orchestrating the presentation of a NCL document.

2.2.2. Ginga-NCL Presentation Machine (GPM)


The GPM subsystem is in charge of controlling the NCL presentation flow. The Formatter module orchestrates the interactive application with the support of other modules that comprise the GPM. At the prompt request of an interactive application, the Formatter order to the Converter the mapping of NCL document into data structures, which are grouped and stored in applications private database. Once the date is stored, the Formatter is able to manipulate the stored data; at this point the Scheduler modules kicks in, it is the one who notifies the Formatter the time that the media, which comprise the interactive application, must be displayed. Exhibitor Manager provides the necessary media for the presentation of the interactive application. Layout Manager has been set to control the layout of the application. NCL Context Manager ensures that each interactive application run in its own context, and it is also responsible for handling incoming events from the GC subsystem related activation and deactivation of the interactive application. The Private Base Manager module is in charge of ensuring the integrity of the information persisted by the application.

2.2.1. Ginga Core (GC)


The Ginga Core (GC) subsystem is responsible for processing streams of audio, video and data, from digital TV signal reception. The streams get into the GC through Tuner component, which is responsible for tuning at the TV station frequency. The incoming streams are encapsulated in the form of MPEG Transport Streams, and they need to be de-multiplexed by Data Processor component, which separates the interactive content from audio and video streams. Persistence module is responsible for storing the interactive application data, and also for making them available to the subsystem Ginga-NCL Presentation Machine (GPM) whenever required. The Exhibitors module provides to GPM decoders suitable for the presentation of specific content as the media object being manipulated. Graphic Manager manages the spatial control of the rendering of objects that compose the application.

148

2.2.3. Components not supported by Reference


Implementation
Although the work presented in 0 defines the architecture of the reference implementation as illustrated in Figure 1, it was not included the following components: Tuner, Data Processor, Update Manager, and Lua Machine. These components, except the Update Manager, are of utmost importance for the completeness of a presentation environment for interactive content. The Lua Machine module, for example, enables the use of the return channel via SMS messages, as well as through the TCP protocol, allowing the interactive content author to explore a bi-directional communication channel. The Tuner module in turn, is the entry point for digital data. At last but not least, the Data Processor module is responsible for extracting the interactive application and separates it from the audio and video streams. Therefore, this article scopes to support these modules that were not handled on the reference implementation.

Tuner Emulator

signal emulation


Interactive Application

TS File

Audio/Video

Data Processor

File System

Figure2.DataflowofemulateddigitalTVsignal Tuner Emulator reads the transport stream file (which contains TV programming) from file system, and passes the data to Data Processor, this in turn makes the extraction of the interactive application, and separates the audio and video streams.

3.3. Lua Machine and Interactive Channel


Additionally to the signal emulation and the extraction of interactive application from the transport stream, the Lua Machine component was also integrated into GEmPTV. Ginga-NCL middleware allows interactive applications making usage of SMS text messaging as a means of communication between the digital TV user and the interactive content provider. It is therefore necessary the digital TV receiver to have support for Lua Machine module, since this module is the only means by which it is possible to send and receive SMS messages through events generated by NCLua objects embedded into NCL applications.

3. GINGA-NCL EMULATOR FOR PORTABLE DIGITAL TV


With the goal of providing a tool to support interactive content developers to validate their work, it is proposed a Ginga-NCL Emulator for Portable digital TV (GEmPTV) in accordance with ISDB-T standard. The GEmPTV emulator extends the functionality of the reference implementation, primarily through the support of interactive channel via SMS message, as well as supporting the following Ginga Core modules: Tuner, Data Processor, and Lua Machine presented in subsection 2.2.

3.1.

Target Platform

4. VALIDATION OF GEMPTV
To validate the proposed solution, the GEmPTV was installed into three Symbian devices with different input data mechanism: 1. Nokia N85, with input data via normal cellphone keys. 2. Nokia N97, hybrid input data via keyboard and touch screen. 3. Nokia 5800, touch screen only.

Among the analyzed target platforms, Symbian was chosen because of the following characteristics: The Symbian operating system is present in 34.3% of smartphones worldwide 0. Thus, it is intended to reach a larger number of interactive content developers using GEmPTV. The platform is mature and provides a set of APIs to access system functionalities required by Ginga-NCL middleware specification, especially related to SMS manipulation for interactive channel, as well as primitive instructions to manipulate graphic objects. Applications developed for this platform can execute in conventional computers, through available emulators, as well as within the target devices.

3.2.

GEmPTV Architecture

The GEmPTV architecture is based on the reference implementation architecture shown in Figure 1 of section 2. However, the Tuner module was replaced by a MPEG Transport Stream file reader; so that the digital TV signal can be emulated with no need for a real broadcast station. As in 0 the software modules of Ginga-NCL Presentation Machine subsystem were implemented. However, with respect to the Ginga Core subsystem, it was necessary to support the Data Processor module, thus allowing the extraction of the interactive application from the emulated digital TV signal. Figure 2 shows the data flow of an interactive application being processed by the Data Processor component.

Figure 3. GEmPTV on Nokia N85 presenting a NCL election application on both screen orientations (portrait and landscape)

149

the largest cities in the world, workers spend about four hours of their precious lives in traffic jam, which make the population stressed and usually in a bad mood. Provide an interactive entertainment for this niche of population can be an alternative. For this, it is necessary to stimulate the production of good interactive content for this category of viewers. Therefore the emulator presented here was proposed with the objective of encouraging authors to develop interactive content aimed to portable terminals, such as mobile phones, taking into account its peculiarities.

6. ACKNOWLEDGMENTS
Thanks to Nokia Institute of Technology (INdT) for providing APIs for handling digital TV functionalities on Nokia Symbian devices. Also thanks to CETELI, CNPq and CAPES for proving support of infra-structure and financial resources to Brazilian researchers.

Figure 4. GEmPTV on Nokia N97 presenting a NCL reality show application

7. REFERENCES
Soares, Luiz Fernando G; Barbosa, Simone Diniz Junqueira. Programando em NCL 3.0: desenvolvimento de aplicaes para middleware Ginga, TV digital e Web. Rio de Janeiro: Ed. Elsevier, 2009. ABNT NBR 15606-5 Associao Brasileira de Normas Tcnicas. 2007. Digital terrestrial television Data coding and transmission specification for digital broadcasting Part 5: Ginga-NCL for portable receivers XML application language for application coding. http://www.dtv.org.br/download/enen/ABNTNBR15606_2D5_2008Vc_2009Ing.pdf Cruz, Vitor M. Soares, Luiz Fernando G. Moreno, Marcio. Ginga-NCL: implementaao de referncia para dispositivos portteis. Proceedings of the 14th Brazilian Symposium on Multimedia and the Web, 2008. Vila Velha-ES, Brazil. DOI= http://doi.acm.org/10.1145/1666091.1666105 Soares, Luiz Fernando G. Nested Context Language 3.0. Part 8 NCL DTV Profiles. October, 2006. http://www.ncl.org.br/documentos/NCL3.0-DTV.pdf Lua 5.1 Reference Manual by R. Ierusalimschy, L. H. de Figueiredo, W. Celes Lua.org, August 2006 ISBN 85903798-3-3 SantAnna, Francisco. Cerqueira, Renato. Soares, Luiz Fernando G. NCLua: Objetos Imperativos Lua na Linguagem Declarativa NCL. New York, 2008. DOI= http://portal.acm.org/citation.cfm?doid=1666091.1666107

Figure 5. GEmPTV on Nokia 5800 presenting a NCL world cup application

Figure 6. World cup NCL application presented on Nokia devices N97, 5800, and N85

5. CONCLUSION
Interactive applications for portable digital TV receivers may be very attractive for TV viewers, especially in developing countries such as Brazil. In Sao Paulo city, for example, one of

SantAnna, Francisco. Neto, Carlos S. Azevedo, Roberto. Barbosa, Simone. Desenvolvimento de Aplicaes Declarativas para TV Digital no Middleware Ginga com objetos Imperativos NCLua. http://www.ncl.org.br/documentos/MCNCLua.pdf Gartner, August 2010, http://www.gartner.com

150

Business Process Modeling in UML for Interactive Digital Television


Paloma Maria Santos, Marcus de Melo Braga, Marcus Vincius Antocles da Silva Ferreira, Fernando Jos Spanhol
Post-Graduate Program in Knowledge Management and Engineering Federal University of Santa Catarina Campus Universitrio, Trindade, Florianpolis/SC, Brazil paloma@egc.ufsc.br, marcus@egc.ufsc.br, marcus.ferreira@hotmail.com, spanhol@led.ufsc.br

ABSTRACT
The modeling of business processes related to Interactive Digital Television is still an under explored research area in the literature. This paper discusses the application of Unified Modeling Language (UML) and its extensions to the modeling of business processes in the Digital Television environment. The modeling technique proposed here is applied to a particular t-Government case as an illustration. The application of this technique enables the projects graphic representation and facilitates the understanding of modeled business processes, thus contributing to its implementation.

Keywords
Business Process, Modeling, Interactive Digital Television (iDTV), Unified Modeling Language (UML).

1. INTRODUCTION
Business Processes (BPs) can be defined as a sequence of steps conceived to make products and services available to a customer [1]. By and large, iDTV models bring about meaningful changes to traditional BPs that are created for a particular environment. When applied to a digital context, these changes require a more dynamic view of the BPs due especially to the inclusion of new actors and stakeholders into the new model. Therefore, the conceptualization of new modelling forms are needed with a view to reflecting this structure and enabling a greater understanding of the processes linked to iDTV. One of the suitable tools for BP modeling is the traditional Unified Modeling Language (UML). UML can be applied to various modeling types for presenting a high level of generalization and extensions that expand its applications. This study specifically addresses the modeling of business processes, using the extension mechanisms proposed by [2]. According to [3], UML can be used as a knowledge modeling technique. Section 2 presents the concepts related to the modeling of processes that are needed for the understanding of the proposal suggested here. Section 3 shows an application of the technique developed for the modeling of iDTV BPs, considering the use of UML extensions. Finally, Section 4 discusses the main findings and points to some final considerations.

Which activities are involved? They will be described as processes or activities in the diagram. When are the activities performed and in which sequence do they occur? This information will correspond to the flows in the chart. How are the activities performed? This will be mapped in the process flow chart, usually by decomposing the processes into subprocesses. What is the aim of the process? This will be mapped in the process diagram. Who or what is involved in the execution of activities? This information refers to the resources that participate in the process. What is consumed and produced? This information refers to the resource that will be consumed or produced in the process. How are the activities to be performed? This question is defined by a flow control in the process or by a set of business rules. How does the process relate to the business organization? This can be shown by means of swimlanes in a process diagram.

[2] proposes an integrated view on the context of BP modeling by means of a set of stereotypes that seek - with the support of four views and their respective diagrams - to reflect upon the environment and organizational structure that will be supported by the modeled systems. The four views proposed are described as follows: 1. Business Vision deals with the presentation of business requirements. It is the starting point of the modeling business process. It is under this view that the business goals are recorded; Business Structure details the resources (physical, human, or information) that the company consumes, utilizes or produces; Business Process is the business modeling core. It shows the activities performed to achieve the goals as well as obtain the resources needed to do so; Business Behavior gives details of the way in which resources and processes behave over time.

2. 3. 4.

The combination of these views forms a comprehensive business model, and depending on the project needs some of these views can be suppressed. According to [2], the essential elements for the description of a business process are:

2. BUSINESS PROCESS MODELING


To facilitate the identification and modeling of BPs, [4] suggests a set of questions adapted from the proposal presented by [2]:

151

Resources represent everything that a company consumes, utilizes, refines or produces and they are manipulated by processes or they manipulate and manage these processes. Resources can be categorized as physical, abstract, informational and personnel; Processes represent the business activities performed in a business in which the state of the resources changes. These processes are delimited by rules and they describe how the work is performed; Goals represent the general purpose of a business or the results that a business is expected to achieve. They express the desired state of resources and they are reached by executing processes; Rules represent the definitions or restrictions of any business aspect. They are categorized as functional, behavioral and structural.

reason by which an organization adopts a process and it is linked to it by means of a dependency-type relationship [5]. By definition, every process has at least one output that can be a business result, a physical object (e.g. a report), or the transformation of resources in organized objects (such as a daily schedule) [5]. A link of the <<output>> type indicates the output flow of a process. The diagram illustrates the information inputs and outputs that are read and recorded by systems, showing how they enable (or impact upon) a business process.

2.2.

Assembly Line Diagram

All these elements are related to each other. The business goals are achieved by executing the processes that use, transform and generate resources, following a set of rules. This way, the goal of business modeling is define these elements and show the interactions and relationships among them [2]. BP modeling can be realized from the perspective of 4 UML diagrams and their extensions: (i) business process diagram; (ii) assembly line process diagram; (iii) use case diagram; and (iv) activity diagram.

The assembly line diagram is a variation of the process diagram and is particularly useful for those processes that are directly implemented by information systems [2]. The processes communicate with the system packages called assembly line packages by means of interactions that record the flows between the information system and the business process. These interactions also define the use cases to be foreseen by the information system [4]. The goal of this type of diagram (Figure 3) is demonstrate how processes interact with the information system in order to show what information is accessed by means of the system and used by the processes [2]. The assembly line diagram enables the connection between BP modeling and system requirement modeling based on use case [2].

2.1.

Business Process Diagrams

According to the extensions proposed by [2], a business process diagram extends the UML activity diagram and is determined along with the stereotypes that describe the activities performed in the business process and their interactions, events, resources, goals, outputs, rules and process information (Figure 1).

2.3.

Use Case Diagram

The use case diagrams (Figure 4) are used to identify and model the context by visualizing the scope and elements of the problem domain and modeling the system requirements that include a set of tasks that lead to an understanding of the system impact upon business, customers needs, and interaction of the final user with the system [6].

2.4.

Activity Diagram

Activity Diagrams (Figure 5) facilitate the representation of interaction flows in a way similar to that of a diagram. Rectangles represent functions, arrows correspond to system flows and decisions, and diamonds represent decisions [6]. In possession of these four diagrams, it is possible to apply UML to BP modeling in the iDTV environment.

3. APPLYING UML TO AN iDTV BUSINESS PROCESS


Figure 1 - Business Process Diagram. In this diagram, the actors of a process interact with it and generated events that affect the system behavior and its execution. A resource is an object that operates or is used for business, and it can be consumed, transformed, produced or utilized by the business processes. A link of the <<provides>> type means that the object is not consumed, but used in the transformation processes. A link of the <<input>> type, in turn, indicates that the object is consumed. The stereotype of the central element (i.e. the business process) has a traditional process icon and indicates that the flow of activities will move from left to right. A rule is an essential complement that aims to guarantee the representation of information related to the business functioning [4]. The goal (stereotype <<goal>>) is the To apply BP modeling to the context of an iDTV model, we provide a t-Governament application as an illustration. In this context, the four-stage classification for t-Government initiatives proposed by [7] was adopted. These four stages are (i) informational; (ii) interactional; (iii) transactional; and (iv) collaborative. We selected a business process of the collaborative type to better illustrate the kind of modeling purposed here. The selected process refers to participatory budgeting in the municipality of a city. This process may involve several subprocesses, namely: (i) consult with the accounting of previous years and the investment plan of the municipality for the current year; (ii) verify the information on projects already approved and completed, (iii) send new demands for investment to the municipality, (iv) vote for projects in the municipality, (v) monitor developing projects; and (vi) participate in the choice

of the neighborhood representative. Due to space constraints, this analysis will focus only on the modeling of the subprocess number (vi). A more detailed analysis of the processes involved can be obtained in [8]. The choice of neighborhood representative can be made by means of electronic voting. Every citizen resident of the neighborhood may become a candidate for representative, provided that they meet the deadline set by the government for registration and complete the application form with their personal data and proposed action. The application of tGovernment for this collaborative process must enable a virtual keyboard. After completing the form electronically, the data is routed by the system to a service center (Figure 2). Figure 3 - Voting Business Subprocess Diagram.

Figure 4 shows the Assembly Line Diagram for the voting process.

Figure 2 Scheme for provision of t-Government applications [8]. Figure 4 - Assembly Line Diagram for the Voting Subprocess.

There, the data is processed and stored in a database. The sector of the municipality responsible for that activity accesses the database, selects the new applications and takes the necessary measures to keep the process going. After being examined, all applications are available for the citizens to vote. The neighborhood, within the period specified for the voting process, analyzes the profile and the proposal for each candidate and votes for one candidate. The vote of each citizen is routed through the system to the service center for further processing and storage. Ended the voting period, the candidadate with the largest number of votes is elected. Citizens can then obtain contact information with the new neighborhood representative and monitor his/her performance, sending suggestions or complaints to this representative. For this subprocess, the system enables three options: (i) candidate application, (ii) vote for a candidate, and (iii) consultation with the elected candidate. Figure 3 presents the modeling of the voting process.

Figure 5 displays the Use Case Diagram for the citizen Actor of the voting subprocess.

Figure 5 - Citizen Use Case Diagram in the Voting Subprocess.

Finally, Figure 6 presents the Activity Diagram in the Voting Subprocess.

The presentation of the activity diagram for the illustration provided highlights how the activities that comprise the processes interact among each other and what flow of action is necessary to achieve the goal of the business process. According to [7], the subprocesses modeled here are all of the collaborative type. These subprocesses require a more advanced view on the government, a new paradigm, completely different from the traditional managerial model in which the citizen is seen as a customer and business perspective seeks essentially for the efficiency and effectiveness of public management. In this new paradigm, the main value is co-production. The citizen is no longer seen as a client, but as a partner who is actively participating in the process, helping to build public policies, measure and manage government resources. This new paradigm perspective demands that the government invests in modes of political representation that include new components (e.g. e-democracy, e-participation and ecitizenship) in order to expand the opportunities for citizen participation and interaction in new governmental business processes. Further studies along these lines may investigate the application of the model in more complex Proposed t-Government applications or in other iDTV applications.

Figure 6 - Activity Diagram related to the Voting Subprocess. Based on the diagrams above, one can identify the following system requirements for the voting subprocess: RQ1 - The system should display the time to vote for the candidate; RQ2 - The system, if there is no more time to vote, should display a deadline expired message; RQ3 - The system, if there is still time, must display a list with information about eligible candidates; RQ4 - The system, after the candidate being selected by the citizen, should verify if a return channel is working properly in order to enable a field for CPF [Individual Taxpayer Registry] number insertion; RQ5 - The system can only send data to the service center after checking the CPF number inserted; RQ6 - The system must inform the citizen about the status of the procedure performed.

5. REFERENCES
[1] DAVENPORT, T. Process Innovation: Reengineering Work Through, Information Technology. Harvard Business School Press, Boston, 1993. [2] ERIKSSON, H. E.; PENKER, M. Business modeling with UML: business patterns at work. New York: John Wiley & Sons, Inc., 2000. [3] KINGSTON, J.; MACINTOSH, I. Knowledge Management through Multi-Perspective Modeling: Representing and Distributing Organizational Memory. Knowledge-Based Systems, Cambridge, april 2000. 121131. [4] LIMA, A. D. S. UML 2.0: do requisito soluo. So Paulo: rica, 2005. [5] SALM JR, J. F. Extenses da UML para descrever processos de negcio. Dissertao de Mestrado, Departamento de Engenharia de Produo e Sistemas, Universidade Federal de Santa Catarina, Florianpolis, Brasil, 2003. [6] PRESSMAN, R. S. Practitioners Approach. York, Nov. 2001. Software Engineering: A 5th Ed., McGraw-Hill, New

4. FINAL CONSIDERATIONS
Generally speaking, the proposed model facilitates the understanding of business processes and the identification of opportunities for improvement. It is worth mentioning that we are dealing here with t-Government applications accessible via iDTV (a fixed device). It is also understood that such applications are not tied to a specific TV program, since they are treated as resident applications, that is, citizens download them from STB TV Channel and can interact with them whenever they want, regardless of the program that is being aired at the moment. The questions suggested by [4] and adapted from the proposal by [2] served as a guide and proved suitable for the survey of the integral elements of business processes and activities required to implement it. The application of the BP diagram has enabled us to obtain an overview of all elements participating in the process. No matter if they were resources, goals, rules, events, input or output elements as well as the relationship between them in a single diagram to facilitate the understanding of the process as a whole and assist its implementation. The Assembly Line Diagram not only highlighted the interaction between business processes and information objects read and written in the assembly line, but also aided the identification of use cases that support the actors of the system and, consequently, in the preparation of the t-Government application requirements.

[7] KOK, C.; RYAN, S.; PRYBUTOK, V. Creating value through managing knowledge in an e-government to constituency (G2C) environment. The Journal of Computer Information Systems, v. 45, n. 4, p. 32-41, july 2005. [8] SANTOS, P. M. Modelagem de processos para disseminao de conhecimento em governo eletrnico via TV Digital. Dissertao de Mestrado. Departamento de Engenharia e Gesto do Conhecimento. Universidade Federal de Santa Catarina, Florianpolis, Brasil, 2011.

Guidelines for the content production of t-learning


Airton Zancanaro, Paloma Maria Santos, Jos Leomar Todesco, Fernando Alvaro Ostuni Gauthier
Post-Graduate Program in Knowledge Management and Engineering Federal University of Santa Catarina Campus Universitrio, Trindade, Florianpolis/SC, Brazil. airtonz@egc.ufsc.br, paloma@egc.ufsc.br, tite@lec.ufsc.br, gauthier@egc.ufsc.br

ABSTRACT
The iDTV technology is becoming a frequent topic in discussions within the scientific community due to the fact that it can provide differentiated resources in terms of technology and interactivity to its users. Its use in the distance education field shows its great potential towards the teaching/learning process, given the experience that the user has regarding the use of television. Thus, setting the guidelines for the content production of t-learning is essential for the development of content designed for iDTV.

Keywords
iDTV, t-learning, Guidelines, Production Process.

1. INTRODUCTION
With the development of Information and Communication Technology (ICT), Distance Education (DE) seeks to rely - in an increasing manner - on technologies that are emerging as a means of facilitating the access of students and especially their acceptance [1]. For [2], distance education is planned learning that normally occurs in a place other than the learning/teaching venue, requiring special techniques for creating the course and its instructions, communication through various technologies and special organizational and administrative arrangements. In this sense e-learning is defined by [3] as the use of new multimedia technologies and the Internet to improve the quality of learning by facilitating access to resources and services, as well as exchanges and collaboration through distance. On the other hand, t-learning is a subset of e-learning, as shown in Figure 1. According to [5], t-learning has been adopted as a way to identify the Interactive Digital Television (iDTV) based learning experience, by accessing valuable educational materials in video, through an easy-to-use device, which resembles more a TV rather than a computer. The importance of this discussion for the scientific community is justified by the scarcity of studies regarding the use of iDTV as an assistive device for the DE experience, especially with regard to the guidelines for the content production of t-learning. A guideline is referred as a "sketch, in outlines, of a general plan, project etc.; policy" [6]. On the other hand, the process of content production for the iDTV is understood by [7] as the activity of creation or conduction of contents, and is based on three phases: preproduction, production and postproduction, as stated by [8].

Figure 1. Convergence of technologies [4]. In the following sections the guidelines related to each phase are discussed. Further details of this study can be obtained in [4].

2. GUIDELINES FOR PREPRODUCTION PHASE


The preproduction phase corresponds to the stage of planning the necessary requirements for a course to be available on the iDTV platform. To this end, the challenges are adapting the content with less text than a webpage and motivating different audiences who watch television for entertainment purposes to also realize its potential for educational purposes. Three requirements are addressed in this phase: personal, technical and pedagogical.

2.1.

Personal Requirements

The personal requirements relate to aspects regarding accessibility, motivation and expectations of the audience. As guidelines, we have: 1) Check students accessibility to the t-learning course: it is necessary to determine the availability of iDTV services within the region in which students are immersed in the geographic accessibility. Regarding human accessibility is important to check students ability to adapt (physical, motor, or sensorial) and study on their own. Finally, the technical accessibility refers to students access to iDTV technology such as set-top boxes, television sets, among others, and also the ability to use these technologies. 2) Arouse motivation and expectations regarding t-learning use: This guideline is needed due to the fact that users are accustomed to using television as an entertainment device only. This way, it is necessary to extend the use of television for educational purposes, motivating students through applications that are attractive and easy to understand.

3) Identify and offer actions for the disabled, elderly and children: this guideline points to a reality in which people with special needs, elderly and children spend most of their time watching television. It is therefore important to identify and offer content ensuring that all users are able to use the interactivity resources autonomously. For that, it is important to assess conditions such as lack of dexterity of the elderly, sight and hearing conditions and color blindness, as well as children's age when planning t-learning courses.

2.2. Technical Requirements


Technical requirements are the elements required for the technological infrastructure in order to allow the creation, development and enjoyment of the content for iDTV. As guidelines, we have: 1) Have a transmission channel: the existence of a transmission channel is essential for iDTV as it is considered the leading technology in t-learning courses. The Internet can be used as a secondary resource, promoting accessibility. 2) Disclose the basic technology for the realization of tlearning courses to students: the user needs to have its own equipment and it should be compatible with the Digital TV standard adopted. In addition to digital signal - TV and antenna the user will also need: a) for the case of set-top box, a middleware to run applications and, if possible, capable of recording (time shifting) content. The return channel, in case the user does not have one and the course requires it, can be hired through a telephone operator, b) for the case of mobile phones and other mobile devices, it is also necessary that these devices be enabled through middleware to run applications. Regarding the return channel for cell phones, the users own telephone line could be used. 3) Design courses for multi-technology: In order to meet the needs of students, particularly in relation to mobility, the use of multiple technologies is recommended, thus allowing users to choose between fixed and mobile devices and the Internet the technology that best meets their needs. 4) Plan the learning environment: the learning environment for t-learning is what will support the interactions: a) student and teaching material, b) among the students, c) student and teacher / tutor and d) students and the learning environment. To this end, it should be safe in order to allow, when necessary, integrity, authentication and data privacy, as well as being reliable, fast processing, providing quick learning by the user and being self-explanatory. The learning environment should be able to support images, videos, texts, and animations, among others. It should allow the execution of activities by students as well as the provision of feedback to successful trials and errors when necessary. 5) Investigate tools for content production: this means the access to specific tools for iDTV content production - focused on education, and also laymen; tools such as audio, video, text, and image editors. There is also the need to investigate specific programming tools for developing systems considering the adopted middleware. 6) Allow availability and reliability of applications and data: television has been recognized for the reliability of services offered, which means that the programming is hardly interrupted by technical problems. This guideline indicates that the t-learning courses require high availability, that is, servers / ISPs used for storage and data access should be available without interruption or connection failures.

7) Plan applications that take full advantage of the capabilities of remote control and mobile devices: the remote control is a simple and limited device. For this reason, applications should be designed in such a way to fully exploit the features it offers. Due to the difficulty of entering text in fields where this might be a requirement, the use of the virtual keyboard is needed, which favors usability. On the other hand, for users who have cell phone technology suitable for iDTV, the challenge is to plan applications within the optic of multiple devices, facilitating individual interaction in environments where the use of television is collective. 8) Measure usability into t-learning courses: usability is directly linked to the dialogue between the user and the software, which should provide the meeting of interaction goals. Thus, this guideline points to usability in the following cases: when the TV is for collective use; when cooperation among students in the use of certain pedagogical materials is required; when users are away from the television screen. In these cases, it is important to use mechanisms that catch the attention of the user. 9) Assess the audience needs regarding technological issues: assessing the needs based on age and those with special needs, as indicated previously. Provide captions for the hearing impaired and audio description for the visually impaired. The basic and technical knowledge, previous experience and comfort of the target audience regarding the use of television as a medium for t-learning should also be considered.

Pedagogical Requirements
Pedagogical requirements relate to the content development for t-learning which hold, among other factors, educational objectives, methodology and pedagogical strategies. As guidelines, the following topics are presented: 1) Define the purpose of the course: this refers to the way the o t-learning course will be structured and offered to students in order to achieve the proposed objectives: aiming towards formal, non-formal or informal education. In formal and nonformal education, there is greater control of content distribution and students monitoring. Adversely, when it comes to informal education there are no requirements or monitoring and learning occurs on its own. 2) Know the audience in order to develop the pedagogical content: knowing what are the needs, motivations, attitudes or socio cognitive characteristics, social, financial, educational and cultural contexts regarding the target audience will assist content production in what comes to the definition of language, teaching/learning strategy and the technology used to mediate. 3) Set the course objectives: it means being aware of what will be taught during the course. Goals should be linked to the needs, interests, expectations and characteristics of the target audience. In order to do so, they should be clearly defined, aiming at identifying the skills, cognitive competencies and attitudes that are to be assessed throughout the course . 4) Select the course content: the goal is to deliver knowledge to a previously defined group of people, according to their needs. For that, based on the objectives listed, the teacher makes the content selection, whenever possible, addressing topics according to the course requirements and the target

audience interests. The composition of the course should include the study guide, pedagogical content and tasks. 5) Plan the study guide: the study guide planning brings the organization and structure of the course and also provides guidance for students. The study guide is traditionally distributed on printed version; but it can also be available online. It may contain information such as course objectives, schedule and planning, course structure, guidelines about the use of the tool, browsing the content, getting help and recommended reading. 6) Define the structure and accessibility of content: it means knowing what the students level of knowledge - cognitive is and also regarding the capacity of using iDTV resources. According to these characteristics, the course may contain two structures: the first, a linear one in which the teacher defines the form and order of content presentation; and the second one, nonlinear, in which the student chooses how to browse the contents. It is up to the teacher to select which way will most effectively facilitate the students understanding. 7) Determine the type of t-learning support for the courses: this means defining how the t-learning course should be designed and distributed to students. In this case, support can be provided in three ways: a) Supplementary: using the t-learning as a plus in a traditional education setting, b) Partial: as a support in classroom education, especially in activities and selfassessments, and c) Substitute or entirely at distance: classroom education is replaced by t-learning. 8) Provide actions aimed at interaction: understanding the nature of the interaction and how to make it easier through the use of technology favor the assimilation of knowledge by the student. These interactions occur between: a) student and pedagogical content, b) student and teacher/tutor, c) among students and d) student and learning environment. 9) Provide pedagogical mediation to assist the educational process: because the students do their studies on their own, they should often be motivated and encouraged throughout the educational process. This guideline highlights the need for the teacher/tutor to guide, mentor and support students through constant feedback, monitoring students to ensure that they follow the pedagogical content appropriately. 10) Analyze student performance: the assessment of the knowledge assimilated by students as one of the ways of guiding the learning process. This can be an in-person analysis or one that makes use of the learning environment itself with self-assessment, peer evaluation or summative evaluation. 11) Identify solutions that support the learning process: in order for learning to take place, first the willingness of the student to learn becomes necessary and second, the content should be potentially significant. The pedagogical content, tasks, tutoring, study guide, messaging, creation and sharing of knowledge and motivation to learn are fundamental to the process of knowledge acquisition. 12) Offer ways to customize the course content: The course distributed via iDTV may encounter situations in which the student does not have a return channel installed on set-top box. In this case, the content customization happens through the

construction of the t-learning course considering different ways of learning, adaptation of content to the students needs and abilities and the creation of specific content to a particular region. 13) Associate the content of t-learning to a TV show: as iDTV is considered the main technology for the t-learning, content may be linked to a specific TV show or become available through an interactive service portal, which would allow users access at any time as well as enable the uploading of updates by the broadcasting channel.

3. GUIDELINES FOR THE PRODUCTION PHASE


In the production phase, the various modules of the learning environment are built, along with pedagogical content developed through the planning done at the preproduction phase. Regarding this phase, the guidelines are the following: 1) Adapt the content to be displayed in a iDTV: the content should be built in a compact and meaningful way, tailored to the specifications of iDTV. The small size of the screen and memory to store content, mainly in mobile phones, are critical issues that must be considered during the content preparation. 2) Build the modules for the learning environment: The learning environment will enable geographically dispersed students to have the opportunity to interact within a variety of time and places. For this reason, the learning environments, through its tools and along with students, interact, thereby building knowledge. 3) Hire and train an interdisciplinary team to develop the content: due to the specificities that constitute the production of t-learning courses, training professionals who will work in the construction of the content is of fundamental importance. Engineers are needed to develop specific applications to middleware, and teachers/tutors to provide students monitoring. In addition, speakers, actors, teachers, and others who will directly work in the content development, are necessary. Thus, the team can perform the duties with greater efficiency, effectiveness and quality of services. 4) Use the interactive buttons on the remote control to content navigation: the colored buttons on the remote control have been developed specifically for iDTV interaction through a fixed device. The red button is used to promote the link to any content on the screen. The green button enables personalized access to communication tools. The yellow one can be flexible and used to replace controls that may be difficult to access. The blue button was designed to promote access to text information on-screen or the selection of some service. The navigation buttons on the remote control relate to quick access to applications. Also, buttons such as the up and down arrows, volume control and the right and left such as the channel switching buttons in iDTV applications should function as navigation options [9]. 5) Provide course validation: this relates to conducting pilot tests with a controlled number of participants. The goal is to identify technical problems, motivation, usability and satisfaction, before it is distributed to a broader number of people. Besides that, problems related to technology may endanger the whole process of learning along with its acceptance.

4. GUIDELINES FOR POSTPRODUCTION PHASE


This is the phase when the packaging and distribution of the course to students occurs, as well as the monitoring - in case this is a course, requirement by teachers and tutors. The guidelines regarding this phase involve: 1) Promote the packaging and distribution of the course: this topic points out to the need of joining the materials used in the course such as video, audio, animations, text files and programming language files, in order to enable the distribution through the broadcasting channel. 2) Promote students monitoring: if the course requires, the teacher/tutor should provide feedback and monitoring, encouraging and guiding students during the course. 3) Promoting the learning process verification: system errors, along with flaws in the design of a t-learning course affect the learning process. In this sense, constant monitoring and evaluation of the student, technology and pedagogical content become necessary for ongoing upgrades and improvements.

6. REFERENCES
[1] MORAN, J. M. Ensino e aprendizagem inovadores com tecnologia audiovisuais e telemticas. In: MORAN, J. M.; MASETTO, M. T.; BEHRENS, M. A. Novas tecnologias e mediao pedaggica. Campinas: Papirus, 2000. [2] MOORE, M.; KEARSLEY, G. Educao a Distncia: uma viso integrada. So Paulo: Cengage learning, 2008. [3] JOCI. Convite apresentao de propostas DG EAC/46/02 Aces preparatrias e inovadoras 2002/b. Jornal Oficial das Comunidades Europeias, p. 179/14 - 179/20, 27 jul. 2002. Available at: <http://eurlex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:C:2002: 179:0014:0020:PT:PDF>. Accessed in: 13 september 2010. [4] ZANCANARO, Airton. Conhecimento envolvido na construo de contedo para TV Digital interativa na EaD. 2011. 196 f. Dissertao (Mestrado) - Departamento de Engenharia e Gesto do Conhecimento, Universidade Federal de Santa Catarina, Florianpolis, 2011. [5] BATES, P. J. T-learning Study: A study into TV-based interactive learning to the home. Prepared by pjb Associates. This study has been conducted by pjb Associates, UK with funding from the European Community under the IST Programme (1998-2002). [S.l.]. 2003. [6] HOUAISS, A. Dicionrio Houaiss da lngua portuguesa. 1. ed. Rio de Janeiro: Objetiva, v. lix, 2009. [7] SOUZA, M. I. F.; SANTOS, A. D.; AMARAL, S. F. Infraestrutura tecnolgica e metodologia de produo de contedo para TV digital interativa - uma proposta para a Embrapa. II Simpsio Internacional de Competncias em Tecnologias Digitais Interativas na Educao. Campinas, SP: [s.n.]. 2009. p. 1-25. [8] AARRENIEMI-JOKIPELTO, P. Modelling and content production of distance learning concept for interactive digital television. 2006. 204 f. Tese (Doutorado) - Curso de Industrial Information Technology Laboratory, Department of Computer Science and Engineering, Helsinki University of Technology, Helsinki, 2006. [9] PEADA, X. G. et al. Sistemas de tele-educacin para televisn digital interativa. Universidad de Oviedo/Universidad del Cauca. Espanha, p. 1-19. 2009

5. FINAL CONSIDERATIONS
Television is the electronic equipment of greatest penetration in homes. Thus, iDTV is essentially a tool for digital inclusion, which may lead to social inclusion. The survey of guidelines for the content production of tlearning seeks to assist future production of courses for iDTV. It should be noted that the human and technological requirements can be used for any application domain, be it tcommerce, t-government, t-health. However, the pedagogical requirements are specific for t-learning. Regarding this domain, the design of any courses must take into account the target audience involved, taking particularly into account their motivations and interests along with the availability of access to technology. Future studies are heading towards this direction, since different application domains can be considered regarding the specification of guidelines for content design.

Evaluation of an interactive TV service to reinforce dental hygiene education in children


F. Fraile, P. Guzmn, J.C Guerri
Universitat Politcnica de Valncia Camino de Vera S/N. Edificio 8G Acceso D, Planta 4, Valencia, Espaa

T.R. Vargas, N. Flrez


Universidad Santo Toms Bucaramanga Carrera 18 N 9 27 Bucaramanga Santander, Colombia

{ffraile, paoguzc1, jcguerri} @iteam.upv.es

{tivarher, nataliaf} @mail.ustabuca.edu.co

ABSTRACT
This paper presents an interactive TV service designed to support digital inclusion in emerging economies using the potential of Digital Television. The service adopts a generic platform for TV-Centric e-Health services. The proposed service is designed to promote good dental hygiene habits among children in Colombia. The evaluation hereby presented includes validation studies performed with health care experts and children in order to validate service usability and user interface functionality, hence applying different areas of knowledge in the design of an interactive TV service.

Categories and Subject Descriptors


J.3 [Life and Medical Sciences]: Medical information systems

similar trend in the years to come, it can be expected that, as DTT is introduced in the country, broadband Internet is sufficiently widespread in Colombia so that Internet becomes the prominent interaction channel for interactive TV services in Colombia, which is not only in line with CNTV strategy, but also with global trends in both standardization and industry. For instance, in the period between 2006 and 2010, the number of connected households sevenfold from 0.5M households to 3.5M, following an approximately linear increase. An ideal scenario where connected households grow in a similar way yields 5.75M households connected by 2013 and 9.75M households by 2019. Considering an average of 4-5 citizens per household, it can be seen that governmental institutions intend to achieve analogue TV switch-off and universal access to Internet at a similar path. On the other hand, the number of fixed telephone lines is decreasing very fast against mobile phone subscriptions. Mobile telephony has already become the most universal telecommunication service in the country, together with TV. In this context, this paper presents an interactive TV service designed to promote good dental hygiene habits among children in Colombia. Section 2 contains a description of the service. Section 3 describes the methodology applied to assess the usability of the proposed service. In section 4, results are presented and finally, section 5 includes some final remarks about the findings in this study.

General Terms
Design, Experimentation, Human Factors.

Keywords
Dental Care, e-Health, Interactive TV, Usability tests.

1. INTRODUCTION
Digital inclusion in emerging economies is a key factor to improve the quality of life with the support of new information and communication technologies. These technologies are being adopted rapidly in developing countries. Colombia, as an emerging economy candidate, is a good example of such technological adoption. Free-to-air television services in Colombia undergo a transition towards digital platforms based on the European DVB-T standard. According to the roadmap established by the government 0, through the National Commission of Television (CNTV), the transition to digital TV must be completed by 2019. The first phase, which started in 2010 and is to be finalized in 2013, regards the coverage of 63% of the population by the newly introduced Digital Terrestrial Television services. According to the CNTV plans, a key factor for the development of television services in Colombia is convergence with other telecommunication services and networks through IP technology, both for interactivity and transport. Nonetheless, The first interactive TV service offer in the country dates from 2008 and comes from the leading cable Triple-Play service provider 0, who offers an IPTV subscription service that includes several interactive TV services. In this sense, it is important to note how rapidly the number of Colombian households with Internet connections is increasing in the last years 0. If the number of connections follows a

2. SERVICE DESCRIPTION
General considerations
As mentioned above, the main purpose of the service is to promote healthy dental care habits in Colombia. As in many other countries, dental treatment in Colombia is expensive for most of the population, therefore the importance of preventive treatment. Normally, when patients attend dental clinics, they undergo an interview, part of the diagnosis process, so that therapists can assess dental care habits of patients. With these facts in mind, the service under study is designed to: Target children in elementary school, with ages between six and ten years. Show general information about dental hygiene. The information should be well adapted to the interactive TV media and the target audience, presented through video, multimedia presentations or interactive games. Collect information about healthy habits of children, creating user profiles that can later be accessed by professionals.

Make personalized content selections for children, adapted to the user profile that models their dental care habits.

question to complete his/her assignments, the application will prompt the question after login. After filling in the assigned questionnaires, the service shows a list of episodes, sorted by the relevance of the content for the profile of dental care habits that is obtained through the questionnaires.

Interactive TV appeared as an appropriate technology to provide this service, mainly because Colombian children are more familiar with the TV media than with any other alternative platform. Also, the socio-technological context of the study, with the introduction of digital TV in Colombia, aimed at this technology, as there is a general interest in exploring the possible applications of interactive TV services in the country. One of the main handicaps of the service is that it requires a broadband connection to support the Video On Demand component, whereas broadband connections are not the most democratic interactive media nationwide. However, as mentioned before, the only available interactive TV service is offered over an IPTV platform and the number of broadband Internet connections is increasing very rapidly.

All the multimedia content has been obtained from a web service offered by an oral hygiene product manufacturer [Ref]. The Faculty of Odontology of the University of Santo Toms of Bucaramanga provided the questionnaires. As mentioned, the service prototype followed a 3-tier architecture. At the data tier, a Data Base stores information about users, questionnaires, answers, etc; The logic tier offers two backend web services, one for user login and another to show the next assigned question (if any) or the list of recommended videos (if there are no questions to show to a particular user). The presentation tier connects to this backend service and adapts the information to an appropriate format for the STB embedded browser. Figure 1 shows a screenshot of the login page for the Multimedia service.

Adopted technologies
The aforementioned considerations motivate the adoption of a Hybrid Broadband / Broadcast (HBB) architecture for the service, using web technologies for the interactive service components. This approach seems more future-proof given the different candidates technologies for interactivity: by using a multi-tier architecture for the interactive services, one needs only to adapt the presentation tier for a given technology. The service prototype targeted a proprietary Set-Top-Box with an embedded browser that implements an ECMA Script 0 API to access the different STB functions, including interaction through the remote control. The embedded browser supports the CSS TV profile 0 and the XHTML standard 0. Hence, although it is a proprietary implementation, it provides a development platform very similar to CE-HTML 0, an industry standard incorporated in different solutions, such as connected TVs, HBBTv 0 and other embedded browsers. Thus, although it is a proprietary solution, the STB provides an adequate framework for development in academic environments. Additionally, the STB implements an IPTV stack supporting Video On Demand and a DVB-T receiver, together with lowlevel APIs to modify their behavior. For the user trials, we used HTTP streaming for the VOD components and UDP multicast streams for the digital TV channel, thus emulating an IPTV platform.

Figure 7: Screenshot of the service login page

3. METHODOLOGY
The methodology applied consisted of two different usability tests with two different user groups. First, the usability of the service has been assessed by a group of dental care professionals. Later, a group of children tested the functionality of the user interface. The methodology applied in each case is explained in the next sub-sections. With the service usability test, we expected to validate the service from the point of view of the therapists, i.e. find out if the system is actually useful and could serve as a replacement or supplement to the educative activities in dental hygiene currently taking place in Colombia. Once the service usability was validated, we proceeded to validate the interface functionality with a group of end users (children) in order to identify potential errors or problems that they could find in its use.

Service Architecture
The service prototype is structured as follows: All general information is available at the time the user access the portal. The general information is structured in three different sections of the portal, namely Inicio, Tips and Explora. Inicio contains a welcome message and information about the service. Tips shows pieces of advice about dental care, whereas Explora contains multimedia information about the anatomy of the mouth and tongue. The section Multimedia contains episodes of an animated series about dental care. Before accessing the videos, the user needs to login so that the service can keep track of the activity of each user individually. Upon login, the service checks if the user has any questionnaire to fill in. Dentists create these questionnaires in order to profile the dental care habits of users. The service allows dentists to assign questionnaires to users. If the user needs to fill in any

Service Usability
Participants
Service usability was validated through tests and questionnaires administered to a group of eight last level students of the Faculty of Dentistry of the University of Santo Toms of Bucaramanga Colombia who, as part of their professional tasks, develop educational workshops aimed at dental hygiene

care programs for children of different educational institutions in the city.

Environment
The service usability test was realized in the Telecomunications laboratory of the University of Santo Toms. It was cordinated by a faculty professor who acted as facilitator introducing participants on the services objectives, presenting the service, its user interface and its contents.

In order to evaluate the user interaction with the menus and control commands of the system. Children were given a set of tasks to perform in addition to free exploration of the service..

Procedure
First, the facilitator introduced the children to the testing situation, explained the objective, see whats to easy or whats to hard for children their age, in order to fix it and make it better. The facilitator gave to the children a brief explanation about digital TV and the advantages that digital TV can offer, and finally some entry points to the interface and remote control use. Once given the introduction began the usability test, in which each child accessed the interactive contents of the site and tried to do the proposed tasks. The evaluation was realized taking account the children reactions, their expressions and comments using the interface. Finally, the evaluator made some questions to the children so that they could express what they liked or disliked about the service, report what the found problematic and their answers to the questions made in a general way by the evaluator. And finally the evaluator made some questions to the children looking that they express likes or dislikes and report what they found problematic in the usage of the system.

Procedure
The main objective of the activity was to know whether the platform developed is useful in the conduct of educational activities aimed at prevention and promotion of dental hygiene care. After a brief introduction on the main features of the interface, the activity was based in the individual exploration of the service and its menus. Finally, each participant filled out a system usability scale (SUS) Likert questionnaire 0, in order to obtain an indication of the overall service usability. The original questions of the SUS questionnaire were translated to Spanish.

Interface Functionality
The interface functionality tests were designed taking account the characteristics given in 0, as described in the following sections.

Participants
The interface functionality was evaluated with the involvement of a children group of an elementary school. The test group was selected taking account demographic characteristics of children that, according to the literature, may impact the test. In this case, the authors selected a group of nine children within seven to ten years old, four girls and five boys, all students of an elementary school, who after a brief presentation of the service, expressed their interest in participating in the usability test. Figure 2 shows some pictures taken during the tests. According to 0 elementary school children (ages 6 to 10 years) are relatively easy to include in software usability testing. Their experience in school makes them ready to sit at a task and follow directions from adults, and they are generally not selfconscious about being observed as they play with the service under test. They will answer questions and try new things with ease. In this age range, children will develop more sophistication about how they can describe the things they see and do.

4. RESULTS AND ANALYSIS Service Usability


From the perspective of the therapists, is useful a service that from a visual and enjoyable way for children, allows educate them and provide them with recommendations related to dental hygiene care. According to the opinion of the dentistry students, the service could be a useful complement to current educational activities. About using the TV as interactive media, they found very appropriate the use of media formats that children relate to leisure activities. The results of the SUS questionnaire demonstrate that as professionals, they found the service usability was appropriate and the components of the service are integrated in a proper way. Furthermore, partial results of the Likert questionnaire show that the questions therein were clearly understood, showing that the translation used seems valid to proceed with more extensive tests.

Environment.
The test was developed in the children's classroom. The place selection was made taking into account that it was a more childfriendly place, allowing them to be safe and without additional distractions.

User Interface
Children were recognized as useful and active participants in the system validation. All demonstrated a great interest in using the TV as a platform, as well as in the service in particular. The children did not report any problem with service navigation with the remote control. The evaluator noticed that for all children was rather easy to understand and manipulate the menus with the remote control. Only one child had problems with the trick modes of the multimedia player of the videos as he could not remember how to pause and replay the video, but the evaluator needed only to explain once to this particular child how the trick modes worked.

Figure 8 Pictures taken during the trial with children

When questioned about their general opinion on the interface, children commented that navigation seemed nice and easy to understand.

Interaction tasks

5. CONCLUSIONS
The results show that, according to therapists, the service proposed is an easy to use tool that can be used as a complement to dental hygiene promotion activities. The end users showed their satisfaction in having a friendly and easy to use interface that can offer learning content. The good results encourage us to continue developing the service prototype. In particular, there is a need to produce new content for the service. Even though the navigation proves to be easy to use, it is needed to improve the appearance of the interface, applying specific design criteria for interactive TV user interfaces.

7. REFERENCES
CNTV, Plan de desarrollo de la televisin 2010-2013. http://www.cntv.org.co/cntv_bop/plan_2013/plan_desarrol lo.pdf http://www.une.com.co/hogares/ CINTEL, Cifras y estadsticas del mercado del sector TIC, March, 2011. http://www.interactic.org.co ECMA, Standard ECMA-262 ECMAScript Language Specification, December 2009. http://www.ecmainternational.org/publications/standards/Ecma-262.htm W3C, Candidate Recommendation CSS TV Profile 1.0, May 2003. http://www.w3.org/TR/css-tv W3C, Candidate Recommendation XHTML 1.0 The Extensible HyperText Markup Language (Second Edition), August 2002. http://www.w3.org/TR/xhtml1/ CEA, CEA-2014-B Web-based Protocol and Framework for Remote User Interface on UPnP Networks and the Internet (Web4CE), January 2011. http://www.ce.org/Standards/browseByCommittee_2757.a sp ETSI, TS 102 796 V1.1.1 Hybrid Broadcast Broadband TV, June 2010. J. Brooke. SUS: a quick and dirty usability scale. Usabilty evolution in industry. Taylor & Francis, (1996), pp. 189 194. P. Markopoulos, M. Bekker.On the assessment of usability testing methods for children. 2003 Elsevier Science. H. Risden, Alexander. Guidelines for Usability Testing with Children. 1997.

6. ACKNOWLEDGMENTS
The work described in this paper was carried out with the support of a Research and Development Cooperation agreement between the Multimedia Communication Research Group from Universitat Politcnica de Valncia Valencia, Spain and UNITEL Research Group from Universidad Santo Tomas (USTA) of Bucaramanga. Technical development of the project was carried out in the USTA premises with the support of the students: Javier Rios and Laura Bibiana Villamizar.

01 43 3 693 5 6
23 5 6013 5 23567 85 8
65 6 3 66
686  6 3
35 6 3 3    4 9 
G ! H" ') *I 9:$, $ "  " 6  6(")J. ) .K OPQO TWT LMNW X YUZU[XT ]\X[[^ Y_U ^ Zabc kWfWpU SXW\^qZ[^`WVV]_[^ZUWZfV^UT] UU`Y OPLQO n^ o[ UwXqX ^ RSUV VYVXX[ S \VX ZWTZV^XU`YW T
$<$=! > *6!/= 7A B ' .* C(" 5> 8! 26 5 ,!?(@ $ :!' B B + (5 9)D !! 8 3  723'  " .E F ;5

 !1+234+),)'1"))* $92:;)7 $0  "#$  &5 (6 7* , .!"/ %" %")+ )- (" 8!

T WX_ZTq q] e]Td `]kTdW eYd^ Z iRgSe ^[Z ^ UpX^eZ o^] Z ZXXvYkTd ^ [_Z ]d_]_ZXdZXW f[XZX ^ YZ Z ^W XS ^^TW ^eYX_ Z S Xf`Y^ XTdTZev [e Xe_U[ S UYtRSU[XfS UwX UW ]Sf TXUYTX ^[XqXU T YZ W XZX[wT X U [Yq S q XX[pWU\WZpUV ^[fw S ZYWTd S^ d ^ [Y_Z XYo T] q V YX p[XT_XUZ X^qTZ^ TSUXfW fw^f W fZ[ YUW ] U e][^ `Sd VX Z YW w Z Xk^l UkX]W [X\^XV kX`]^ U\X ]_YZ_ XTXtRS ^X^ S Y ^ Yq _Zq YXX[^ TUfe_U XY] ` Wfe_ USd ] y W ffkSZT W ^ ^ Z_TTp XTXT TSpXV Z X X [S UU\X_Z X[e[o X[X [ S qUf [Yq S[Xc[YXj^ S ZTX TSYkTS[X^ [TXhS^ d S ZXZ[ Y[XTUf ^ S UYUS \XhS^ d S t `[XqX~ ^ [Yq S[Xmnj RS [V Z[XT e][^ oXWfZ S XS ^^TW XUXUT S iRgX^qTZ YdYTd[X[_Z ]d_] _ZXdZXV^XU [YX [ S Y[ X[ `W VTW ^eYX_ Y_UoUW[fkTS[XuUU[UU^ V ] v _ [^ U[W YX Xe_U`^ WTq Xe_ YefY Z TZ S [\XdfUYTX Y\eY^ UUYTXV^TXUT WUZ ] ZX`_W[Y[Wo[X_Z X[e[ c[YX TdXT[YWX `X S [ S ^ Z_TTpRgZXZ[ ]f[ S ZXZ[V^^^ XXTTZhbRgj Z Z ]o X ^[XT[YX Y[_][]eU^ c W fu W] p [XT[dWTZkTS\^ T XTX \V^X S ^X[W S ZXY[^ [ w] e_UT Yef[XV [ZT] Xf ^ S iRgUYTX tqYZ]o[T YZ W XXv `[XT Xe_U YX[p SU[Xf_Zw y [Z X ^ UfYZ S [^ dV [ZT]^ S TTW Xff_ZTXTd[XUYZ ^X[W `[Xfd[]Rg XX Y [R^ U ^V X Q UX]ZXqU Y_TeX\X|[`YUWYZd\q]Tj[W_^eTZ[^Y[q[X`[\ N _]W ^ XeTZUhXv Vr^yXo|RrUfT YZ[fXWZW ^\WYf  T Z Y Z ^ S T [\X ^ X ^VY^ ^ it itt \WZ| XT_aYSv _ZY]W f\^ T^TdW[e[X Z[Xfe_U i^ T V_u _T ^[^ Z Z[YZ _TTTU^ S XTX t [_qX X[Yt RSUTX `X[Z Td[X_Z X[e[ `T [Yq S T fW^ yXfZ S ^ Z_TTp^ iRg S^ d S \XS UwX T_UX ZUeY] W XU][]o TS_Z ^ W XZfUqUfT XXW V VY WXpkS_ ^ v WdXZ[Y^Y Z_oiXTZoyVXT Z[[^ZoWZv UfXX\XzW[UqX mzj lTXT^UT]X^_ZeXdXT lX O T YfU\XTU U`Y\W Z UwX[ ^ Y iRg ^ ^ dV W \X obX` \W X Ud Z Y\X WT [Xm^ X ph tRS iRgq fW Zmz\W U S kW U UW q dU XYogXT_[^ t d WXoRS^p YuWTZ V UT] ^_ZY]X[X\WZ`q WX^TU^ ^ Xv ^UwX[ X[WT S T ^ Y_[dYX `S \XZ[ s k^lUYTX Y Xe_Uh []^ZeXdX_oT \XzW[kWp ZdWo{|zT _ ^ ] iRgv XTX _Y Z f ZWU dXT T iTTW Y ZX iRgom^ X ozT d t kSX[XToWW^^Td[ jT[XZXTX^ mzfe_ X Y [Y_ [ Z SYfe YUYTXwv [ p^ X X  T W X UWYU] `S k^pW UX XT ZT]f _\X S WZX`[TZkTSWXV^XU^ _ [SZe_VYv ^ [^ S ZX _T Y XT[Y [e dX y RSUVq vXYW UUX[[X X_ UWV YqXtRSWiRgv _Ue`kW_T]\XXT VYT X[^ [X[ [YRS UXT X _ UT [f oV ZXhX d Y j XT mzf TXT \VX ZX X ZX ^[XfY]ZeYTp^ sZ Uh ajW f w W U^ ^ ^]w Y[^ _W T\oqTd[XUW v [f[ S XXW TXU[ `a\W^ W Yv p\XZ `U\X_] ^WTZ\XS ZU UZ S [Z W U X Z Wf[_Z ]dX eT ] ^ WSf \WZ W UV ^[fw SW UX[^ TUWt Z^\WTZiX[Y f Y XS ^^TUWW]wX`YX_ ^ Tt q V YXj p[X]_YZ_sZ `Y [^ fc ZX h RcoW qv WT Z aorY ] & T^ tgTZ UWit_ [fZ Z]_Y_]Z TXY cUSoYX[ZYoUe^^^U^ UZ yUTfUWS\XST nWWZ|XW ZeYTp^ qW [TWUrYs] dZXU Z \U_[Y[ XkVXW_]l WXUq[X [_f Y_W Y X YW WTXU[|tU ZX YTo[WWta]^S~ Z[X]X Wq X XY[ YYqTdXT TdU Z Wf v X TT U Xv T TWTdZ Y]k^ USe [fZ q UXv TdX[f X]W_X [Rch X[^ TUW fc`Y v [X ^ iRgW VTWTZ Y`YUYTX [S \XV^TTd ZY [pWY XYT YW `T ]_ S V _[^ U^ e_UW o fZ _^oZ_Z ]UpiX[YoWUzX YZ_ Z Z^\W W Y`YT \X[`] YWTd_^ XUfW VT^TZ Ye[Xv q ZXeYZ Z ^ X[Z ^ ^X V ]W ^ U[W y Z T^ Y_ \V _[ S [ ZRXS ^^ S ZXj [ TY ] d at V^X[X`WqX `S k^VW`Y tafYZ V YWS ]Y S X[YU^ [X[ ][^\U TXX[W V^_ 0 ^tgTX[ XqXW UW U_WX \wY T ^ _TX ^ WS^ SU YU XXW _WUuWTZ bY` _ZXtf _Z YT ZaU^T[f\X X U_ZXef`YX_ XXk^ ^ at X[TT UkTZXTdiXWYZ[[[Z`|TZXW f VXX[fT [TU[`[eY\^ltadZY]T[Z _[^U `[X]_Y_UT dS[XfYWcUT[W fRcW S a]^SW [XSZ XXV][ Z X^ _X_ Z[ YUZX ZYX ^_ X U W[^ [XX ]UT Y T S ^^p^ ]Z^ WYcaj \XX[p T Y X hY\ ^j X^fZ S YpW SiRgv dk^Xv U `a\W T Uh tqYq]oSUk^lYv `^ [ oYdYTd[[V ` yUmnT[l XkW ZT RXU[]d`kWXZZXYZ oiTTWY Z W f\wf X W[^ _qUT S ^Wta U\T Y_WUuWTZTZXv _Z^| [ Y sT XTd d[]Rg Z Xff _TZ^_Y Z[XS \X T ] ]UT_[^ UVY W Y ][[\U ^ WX dZ | UX t p S [T[dWX ZXW[e TTW T V ] [^ U _ [W ZXY[UT[Y_TXfd[]Rg hiRgjW VTWTZ W fS \XZ[ Y mnjUYTXo\WZp_ZXZZ UqU Z ^ Xk^lh Xe_U T] ^ _YTdTUX ^ [XrYs] ZT `S WTW iRg\^ XtRS XXYST SUu] UT T f] XYUW_ Z[T XfT \v V YW[w_qXT XYUZU[X_YX[[Xf^ S U ^[Z XW U [YVXX[ S qYZ YZ Z[XqX ^ iRgtxTS[X_ZXdZX^ Xe_ YefYo_\v `T [ S ^eYX_ `UYTXV^TXU ^ \q TWTZ XS ^^TUW f_Zq YXX[^ TUfe_U Z_[^ U[_Z ]dX Z ^ U\X ]_YZ_ XTX W ^ oZk T W VTWTZ WXX YXoUXT] [S \X X iRg V ] [^ USe \Xdf V_W] _ p [^X[W U S iRgV^XUZ ^ Y[ Yef ^ S U S [qX[XT Y_UTdV kX ^V^TX_\v V UfUYTXokS_ yXf S ^Uw]TU^ ZXW[^ ^X Xe_U TSX[Z U[XV UTT[X `T[Y_TZ T w[ X S UYW f[X_Z X[fS \X \VX Z XkXZ[XqX Z S ^ Z_X ^ tR^T ]\X[ WVW`Y ][^\okXqX W[^ ZXT[Z f]`^ [X] UfV Y `W yUTd\^ X Y\ S T v [Y[YokS_ ^ UfY UX[ `[XzTd f ]kWX XWqX TS_ZTXUWV_U^ S Z W\TfX Y W f[X{| Y\X Yt{q ^q UT ^ ^_Z d Y Z S zT`W k^l Y`_UT ZSk[ ^ u qX W f}T XX^XXq XS ^^TUT YX ^T ]\X[ Z ^ZS[YdZ^ U[_Z ]dX Z^fY[ \VX Z l]YW VTWTZ ^e] WX[XU`kWXVW`Y kXe TX V ] [^ U[ WT [ S ^[ Y ][^\ ~X ] _ f w T S UX S UsZ [ \V YW _ X UZ ^[ Y q] q \V WTTdTUT ^[ZXkSZqTdU`kWX [[ XeYZ ZU[ ^eYXiTTW Z \XnXk^l Z T^ \X[ ^_ZXd d[]RgW fm^ [ Yt

d fghigj flmnhoj s oj e hk poqrft ufvhw&E036G E01%=#4&5 /65%68&=6$ X /221045 4 UV # 425 450 &5# / 5

"5&=#&76 4 0# %&56&(0x $24&>H60 # / $1' $:&;$4 4 $ $&)=&&5' 2G 1:-/ ': 0 #&2$ / 4&# &6 05 "0 6: 5&%# &6%25 47&5&05$=$(:)# ABC ) 0 6 =0604 D&Y6 'UV &;12( / $4 &;1 ( = : 5# 2W:5 4 E02$0& )26032$0&# G 1 _12*-T>B/ &6& # # &5 7 = $ / 4 6G &5 S , &$:5'1%= 4422= #55&E03\ UV 4&65# ':0 .#.62 #&= $ 7 '&2$ E005$104%# & 45 )[5 D #5( 4 : &G ;125 / ABC2' 2&c07& ` 4 #6:.UV 0& #5&0 0&W& 3 $ Y6 '5 : E0 # 2".6&1%= 442W7 '&Y612030 = :604 #5 $ # # &5 ( 4 :2 1&24 ABC6 =0 5# 2 1 W& 3 $ Y 2&c07&a>

0134 785151 54    25 2 56 9


4 2 5
15 4 32 1 9 1  9    ! 8 4  5 4   "$ '( +- 0/'23 & # "5&&610&5$20 #%& )*,./1 &04'2%&# /2 $/517& 4 #'$5 8 7 :60&9 6555; # 60# %# 3 $& #%6&69 60 5;<7 4060&1%= $246 4 5 5& # 2'$4 272# &# $& # 2%=04 #'4 /%-140&0302& "$2712147 5# -:6 03 1%= 4425%& 5$7 0 -&1 # # &5 0 -265=5%& 5> ?&6'4 / & $@: 40 )5%0 7'( 2": 3$035&4.(606 ABC22& 5.# : &7&7 0 5 # 2'$6 = $6/56 D :0 / 4 6*,%0'& #140& 46 =#1 / 5&=#55&E03 F ' :G .6&1= (:025 $;' & (: 4 (9 0# 22$ $ 6 6 05& #=#0&4.%# 0 '7 0757 &G 0 &6 ;125 / 606 # 2 4 / $5$261&20 0& #5&H$@: 4/ %&>I%# 35&J2 &&$/$ 4 0 50 # 0 )15 K-./1 $;''6 ABCG 2G /25=1.&%6 0&*, 0/=#0& 40 LE0# :5'1%%7 0604610&5$ # 45$4 4MNOG 6& # 415# $/517&"$%# 0#036 ( 2'24#P &.#85$7 /5&E03G :5#%> 6& &2$245 $ /# 3 / 4 6QRO=6"$ ?&&5) S ,/; $= 2'6 D&2# #5&E03 14:-*-T 6&=# #& 4&54045 / 4 6 %0'& $ #0536&0 = :604 5 / E0 ' :.6&5 4&$5 ABC6 =0 5# 2.0/5&UV 1 W & &;12E6& )X005;Y=6"$ Z>B/ $2 U=4V$0& 5.6 4560& :5#%*, &.#8 0;206& .#%6/ 42 "053604 4 24 / 4&53525 16 0%2# 4&$5# [# &7035& E03 $1' $:%6/4 E03\UV Y6 '5&:2 2 4 6=#&76 10&W 4 6G E0 4 / 657G 035&E03G E03QROG E0> 4 / 4 6QROW 4 6 UV Y B/ $2# *- , ]$5 / &&# & 2": )"$6G &.#8 "S T # & #5&';:=$67&7 .6 # 1 12032$0& 55&/ %&45 $ /# 3 $1' $:4 &24 &;126 / # &.#85$7 /=#&766 ' '1660&6 =0604>B/ $24 $ 26 D&2# &:$5; = : 5# 2 &=&&5.#80 4&5404 1 # T^/$-5&%64%&/ 42 / 56:.5&( 0 03# "*, && / 0 16 0%256 : / 7: 4 " # ' E03G E0=6"$ 6&'50'140&035&02& " 4 6UV :5#% $ &6: # 2'$4 / 272# & 020 :%&55# >B/ 2 $24 # #_0 / &&G 5 %=& 4604 7-.&=&&5/.5 #45&$"$ &1 %=& 4604 # ABC6 'LE2".6&=6"$ 4&0 :%&55# 2"$0 4 #5 $ :5#%2 0 03&';1-6 '/.5 7: 4&$5'2&602# 4624 : &0& 4 # #( 0 536& 14 $# 4 '0 50 :5#%-#:.0362&G)2& 5# >U7 #7 2 /2=6"$ ": 4 5=( G5=%&/ ' $"120 # # / $5= $ "/ 4 6UV %# &- 24 66( 2' 45&J2 65# 5&E03G E0 ':703\;G6& 1%= 4425 $;' #5 $ 4&$5# 4 6G E0 # # &5 #=#0&62".6&053604E03\UV > B/ 6 & 2#34@'6 #:.2 4#&;& "5&%# & &= =$0 $6 0& 2": [6 ;$0.# / ': # 5 &0 :%&5'W&0& $%*, 2=&&5'0 &5# #( %=& 4& '$;'"# TY0 $24& 42104 ` 42104 a.&'2$( # &6: ( 7 / #: >X &5# &10&2%&'50 # 55&5#2.& 26 7& 4 / 4&4 :=#&2"$145754 / $1' G 2'6 '5&05$6 $12 # # 2$1035&=#&7 $:(03 # # &5 E03\UV -( 2'# D/ 4 03 6 $' &1%= 442W 4 6G E0 6& 4&16 34 2$0&# &5 &.&4E03G 4 E0>V104b07G &;1 (_12(5 & 4 6\6 'UV Y &5# : 2 : 56& 0 : $12 # 7:0314&34 4 6G E0 $5262%=&=#&2"$( 0 4 #;$&5E03\UV ' 2&602( 6 2# 6&27 02 &5# T'50 / 14 $# )%&4 "12 5'&>V104 &6: & 25 5 606036 = :6042&602"$5&0 :%&5' .#;: 54 = =0 5# 14 $# # / %=& 4& ' 1 =6"$ 4 : &5# +=&&5 # 1704 4 77& :5#%>c06: 104 $24214:2# 26 '"5$ )2 .#8> $2

0134 75 2z{14  |}1  1 2 5y  1 5


15 8 9    5 4  ! 1   6

0134 7  1 5   9 4 32 5 2 5~ |9 2 5 5  z{1 1  5 4  !    6 I2'=1& 4J 7& `6 ' a / $1' $:%# & &05'0 3 $2 4 -5&=#&76 ': =#0& # # &5 %=& 4& 55&0 2'-5& $;'21%= 4420 :%&5'6 / ABC 0& / LE 2' 4 $' &&& 42 4 0&6 '(03 :%&5> 6 ' $ #5D D #5$ # / 6&14&5&= $&2"$5& 0 /0 &=10&' %64>B/ 0 '60 45&$$2&5; # 02 &%640& 2 5 &4 &&0 (_1 50 / #'J &634$1# &5W Y.0/45& (03 -./1 &$2425&5 0' "05$6& # $' &IX 0/$=&&5 / .#84 2# 4&"125 ( D/ 4 & &.&45&' %64-0-6 4&"1 $% &&16 3'(5 & / # 02 > 405$6&"# & 5&X 32$ #UV &;1 32$ $6 E02$ / MR?&05)5 E0V$0&?&05)# 4UV &G ;1 #( D #5'5 4 6\X 32$>B/ 0&5 &&= $& #E03G MR?&05) & 4 - 6 ' .&&14575'"$2= #54 / &;1 (_12&16 3> $ # 2$1& # 7 = $035&2$0&# &5 D/ 4 & B/2%# & $4 27 & #2(:025 &0& #;$&5 0 ':(03 24.= 2005& #': $14&34 0 ;

   163 5 TeVVX Z\^ U X{\ 213 32 37 47 SU\ UWYU[]_}T{e  2 4 7 7 2 27 2 5  X{\ dm bc 51 3307  422~T{eTaSeZ 4 ~T{ehZTXe 2 5 5522 2 1 2 3 X{\_eWc8D 72 2503040562  2 043517 2 22 52 2
232 34 226 O1 7 R 7 2 
7  4366 7  240562 7 !" 76 7 7 60 1 2 1 5 6 3 4 54 6 37 46 25022 04 60562 3 4  163  7 2654 3!" 2 18  D 6  1637 03517 26 7722 243 37 6 226 O1 0 51  6 51 120  O1 5228D 451 33 2410 727 26 25030 22 7 7 2 6 36542 523517 2 04 32 1 7  226 O18 D 47 6 32SUWYU[]_fZ_T 4 20 5 06  TVX Z\^ Ugeeg 142 ]
1 50301400 5123 633 14 6 5 622 02 7 5464 262 30 7 0 306  E58 06  3 1 30 1 2  3 04 42 $ &'( +,-.&',/0)0&* % )* * 12+'1'03 227
05226D63 362 7527 2456240 4 2 0  E5 )(* 3  77 76 2 3  3 -0 (7460 252 2264 6 665362415705O10 76200 >2 05 72!" 562>26 24314 2  6"  8D 0 4 5  7 7 7 07 42 ijz[_
1 25030520 6 5 56h eZ 6522 2  2 67 2 0 6 6  04 5   7 22 !" 
4  34#542 7
62  7 30 AM 2 3R2 265 2  O1N 34 5020457 2 
613  57 6#2 78 :; 7 3 7  463 32 36 899 7<=>< C=6=2
 50204572727N4  A M 34 2 3265 2  O1 3
 3R  ?=9A
6B6 >";;"54 3 =65428 >F=
325727 26  42Nh 2gee @  46 8 264 C" 32 10 AM 2  2  O1 0  ij]Z_T D 246" 02
4 3 7 2!53  0B6
0E#"
6 44; #;  66 46 6 2 E5224 h geeg 02 5 ij] " 88 H 88 4 7I@@8D 2 2 88 
133 170433 76h30g3_T7 2Z_T F7G;7 7:@ 3 88 H 2EDJ6
@8I2 507 02 50764263 ij]Z e O10 2 6 5 R0 e 8 47 7L
4 0 2503 3 4 2 53 3 6 5K@ 02 75226 3 67 2 3
1
743 42 5  265E57  6 6 250
224  32 2  317    5 5 227  445A5 $ /-, su ttx 6463ME5N0 7 7   0 2 % t3ke/TM2 r 5w N6  31 r v #203 #524 47 2 7 2 
6233 D ] _X | lge 02462 ; 027  7 57 4 7  3  546   6 26 3 0 2 ]l ]Z 2 30 52505 05627 2 32 14 6 7  267 2 1 O107  E58  33 242 76 7 3247 8 06 2 7 25 0
4 65163 6 2 32 D 7 Q50 88 7L
4 04 605624  4 ] _X ke]ZT1406 TeVVX Z\^  1 2P3  288IK7 02 0!" 2 1 3 0 ]l lge 0 3SU\ UWYU[]_ 4 42 3 58 20512 42 2 3   163  ] _Xke]ZT 8D76 260 37 2 ]l lge  0433 14 6 5056222 32"5 4 6 262 30 7 2 1 307  2 0 !325022 32!" 20562 0 22 52 1 05 62dldlUTYe TUe[c]Z_T 1 2656  6  2 10 R 7 62>26 03  TUg]WX }dT]e gee   3 4 4 5727 2 624 5 1 338!32405625057 012 25 726 0R2 2  O10M03  7 7 7 2 1 2 1 264636 22  2 1 3306  TVX Z\^ U 4N 326 7 26 40 3 4 5 jd ZayT\e4  7 7 2 32SUWYU[]_` 1428  3 4 42 O1 0 02 76 7  _eZ ]l 3 2 53 3 2 463<4 42 002002 ~T{eTaSeZ 3 67 2  6 7 3 52   1 X{\ dm bc8 #M46 44  203 7  3 3R N 77  O  1157 4 52 4 7 7135 2R2 R17 6aXXU cd\Zd UfZ`1  ]l lge 4 2 6  16361!"  2263M03 b[aY]Te]]_aeU 7 7 ] _X ke]ZT 05243 37 0 41 6 330
1 4 265O 86 727 2 32 05627 2 265227 2 326  3 2N 613502 445 8  O106  >2 2 1 O10502 0 O106  3 4 E54   05 6cYUV UfZ]ch lgT 0N E5>26M03 d g_aeUaUijke]Zm1408 57227 2 5 2 20  1636 4 2  O10 7  E58D 5 37 015 33 14 6 205022 567 5E>26 52  03 3 4!" 2  e[cz[_ 6 262 30 32  R 73  05 0 67 6404  5 622 3 2T]e eZ n 7 2 72!" 8 06 73 704  "
1 250305207 2 1 306  2 1 O10  6 6  54 4  2#0 6522 2  2 056222 32"562 54 1654201 0 24 7 7655 >26 e[cz[_8 " 7 5 64 32 1 4 7  6 62 327 2 056T]e eZ>FDF>FAM 2 1 4 23 40562 02 4 6 9J6 3 664  6 62 7 36 AJ 5 5022N e[cz[_8 M N4 4 7 71 1M 5N42 265 T]e eZC>FDF> 40562 0 6 " M 2 1 4 3 226  3 4 62 5 2 5 22 2 2  3 e[cz[_8 4 M 2 1 7 503 326  
42M
22513 23572N4 T]e eZ<!A FA 20562 542 2O ADJ7 24 o D# 7L  7 22 5  3 2N 21  44  3 BC 0K;N  5560
2214 8 2 5 06 072127 2<9#3 5 
222 36 5O1 24 0  427 24 7  446 42 3 67 2M 2 2 8 B3205627 26 2522 2 ]l l   2 ADJ2 2 53 3 EDJ6
N 2  2 1 O1 056   ] _Xke|   4 4 424 11 330  ]ZT12  7
67 22 7  30627   02 5 5 64 7 7 2 2 7 ge 102  3 55605 4oA#K8 7 7L 3 65226  3 4 6 37 4 7 7 28 7 32 5 326 !"  163 1 330 "  2 F>J " 56" 5Fo 9!<FFDB!>P o 4 " 5FD#F 3 F>J 9!<FFDB!>P " 5F F>J 8   2 7   2 1 06  3 
6 7 6 703 41 205626 346 6   4 3 $p qr2221su3tw44245 2 Xke75eT
622 207231073 62 565 %2
2503 tqr 7 2v x ssr 7r3t 22   E5>26 5 2  ] _ R 2 72 05
6 2 ]| 92 5 0 7 
7 l lg  R1 2 27   88 2 02 ]Z 13  6 651637200M06  50;4 88D426  27 26  E58 70 205627 26 34 3247 5102 4 3 2 3 N 2 5 32  O1 32 D7 7  2 1 O1 0666 SU\ UWYU[]_` 142 32437 6 7 3 62 64 3  7R 5 jd ZayT\e  02 TeVVX Z\^ U 48  6 5 0   9M03  2132 7 _eZ ]lN 40 2jd ZayT\e
1 R 7 7R O10  3  X{\ dm bc 4 2  E5>266 3  _eZ ]l 62 501327 2 722 7~T{eTaSeZ 3 32 05 06 53 38B341064jd ZayT\e 3R2 4 M03 3R2 7R 5 jd ZayT\e 7  67 2 2 1203 _eZ ]l 4 13 72 64 132 7 _eZ ]lN5  56 2 1 32!" "562>264 4 3  2h lgTU]\7 2463  0 562 62 2522405626  6 2 1 05 3  2  ijke]ZmS_ 57 8346 7 36 3  4  13410 2  O1 32 058  3 7 26 2522 5 2!" 2 1 62" 4 12045727 26  E5>26 322 O1 056  7  60562 03  F> 5 226 37 2604 1 jd ZayT\e 2 J E5o 55 62002 5 07  3 3327 _eZ ]l8D 5F "7 28 ifZ _eZ {Te 04 6047  74 ejd ZazadZT6 3R2 3 324 5   2  02  7R 72jd ZayT\e 3 _ fjd Za D 4 37 2 ]l lge 025 52 40060132  _eZ ]l 4 yV \e _eZ 20  1637  ] _X ke]ZT6  220 6 3!"  24 2 52  02  7R 7 720685762 2  O1 5  04 6 3   
304 40060132 510 8 2 3  E55727 2 7 E5>2 2 2jd ZayT\e 2jd ZayT\e6664636 7 2 6
3405626 33 7  2 1 058  _eZ ]l8D _eZ ]l 346 7 0 3 05 2 2 1 0 604 52"562>26 9 2
36  5E4 042  5ifZ _ D06 7 2 26 62T]e eZC>FDF> 2 372 04 2 3 5 7 0 7 ejd | 6 0 3  5 3  e[cz[_8  "  ZazadZT4 yV \e _eZ {Te 502 8 eZ {Te 3 _ fjd ZazadZT 2162 22 03  2 1 5560 25 52152 3 6205627 22 7041 7  75 02460
22 542 3 204 3 2
  13 57897 5 7 25 53 67  72 7 2 026 27 24 0 236 1  7 2 34636  42 374 0 1  24 0 07 7 44 5
4  7 3 122 

21 7 07 62 4 6 6 5
420
4207 
22 68 3 3
1 4 57 5 7632 5     21324 7 26  420220 7   7 1  04  53 6   2  1 3305 62 7 053 007  7 7 7 7 2 25013626602 270
2  03 82 7 25 3 25 04  3 4 426 3 2 3 224 426  26 !" 5 2#6 46 7 0512 7 7 6 6 66 020 3
6   3 1  6 5!" 721 330 3  510 7  3 47 615 7 7 2 4 2720 5 6  61  57 13 57 0 6202 3 7 7463024606  6 8 4

02 43 78
3  4 9 9 3 3  1 3 562 9 5  3 31256  ! 73 1 2  3 542 953

""$% 91
0,-/095)3 78
02 # 56 ( 2*0265 )36292 9  &'&'9 . 0.12 )+ 9  9 4943 6

9 495 5  85759495 5  2<   60 2 967 99


60: 9; ;=56
1 330959;9>?BCD  4 2 3 2 2 9  3294 @A  ?  3 ?BCD EF G 2
3 6 45  @ H? @A ? @ H:7 2 I !9 3 9EF G @A B ?   9KLG LMNLO@1
9 CD 2  J FFA GJ P6 Q 2 2 9 ?B ? RSAT
59  9
1 33 @A CD  O 9 0 9 EF G @ACD  19 30; 3 @ H?B ?  5 ;9 5 99 2 0 99 3 9 60 !  9 W Q 9302  495 5U2 6  V@ 9 33 2 02 @A CD  O9
1!< 5 !
53 ?B ? RSAT    56 9 292 9 95 3 3 9 93 0 3   9 78
:60956 4256 5 2 9 2 9759495 5 Q @A CD  78
  99
60 9?B ?   O40=560956! 609562 99
RSAT 9; 95 3 X 395 3 059 2 9 34Y3 52 ? @A  3 9 78
05 599 = CD ?B Z5  67 4 3 995 352 9059?B ?OS 85 2 5: 02 ; 3 99
: @A CD D 2 ?BCD EF GU9 6  W 533  @A ? @ H 09[5!9\:=
5 50  78
92525; 4 ; 292 9  32 9 1 0  99 2; 9 78
02 856< Q 99
 9
09 57 3 ;0  94 3 ] 9059
37
9047 ! 49  39 2 !9093 2 02 3 9 78
[ 9
 0 2 !
53  292 9 2  292 9 995 202 43 A_G T bS 9 78
 0
91 3 56^_`aCK 029 3 9?B ?OS 92525 45  @A CD D1 0  Q 
0 0=3?B ? RSAT 9 912900 9 @A CD  O4<
0 V@c 93 f9   3 39 9 3  de99  : 950
2  292 9 5 999 !56 9 9  9 9 78
0 594U03  g93
1 33W 41 09 2 9555;532 A_G 2 2 9 3 04  3 5 2 ^_ h `aCK Q3 @A CD D50
0
9 T bS 9:?B ?OS 3 395 < 9 2029 9^_`aCK f9   56 4   A_G T bS  : 373 59 I532   2 9 5 999  9 435
2  9 78
0 5942   iO@ 3 95 4 35012 9  9CD ?  46099 0 ; 152  ? @  B  7 g A Ug4j fk WQ2 6 050 f7 !  5 ; :3  67 ! 495 79
90 9?B ?OS 3 857 3 ; 0 ; 2
0  @A CD D 4 9^_`aCK2 3   A_G T bS 856] =2P3 02 1 ; 2 05  3 < 85U 9 9
54  1553 3 9856]67 ;Qj5=< x3 1 9I 39 4993 3 2 y0W5 69 3 52 9

1 09 2 97 3 ;
52 f9  0912 9 9 04  ! 49 9    2 150 2  99
7 gfk 3 g4j  9059Ug4j f7 47 gQg: 0 91345 9 2 \ 0 9
3 6 45 9 2 ~ ;59 30
53 W! 7 2 I !9 324 9 96099 850599079  3 ]4 < P  95 467 99
55; 2 9856< 2 3 f;25 03
05 ~ ;5; 93 350 5 0 5 990 2915 54I 9 ; 2
 9
91 3 56 78
0 56 9CD ?B   202 43 2 9 U9 3  ? @A f712 9 W ; 5 2 I ! 7 g  g4j fk 15 f; 0
3 6 < 2 
0
!05  O0 9 4 5312902
 3 9RSAT  2 D FT SH aT@ J  J _A NAAH _`@ OS T  ? 3 939 99
06099:5 04< @AB 9  5 0595 95 4 5 9
9 5 9CD ?B  5 3 5120 9 47  ? @A ; 99:=
9 0 912904 545 9 2 \ [50:CD ?B   
0 9; 30
53 9  ? @A 4
0 939 99
78
3 95  5  99  5 0592 9  460905 3 9 60 856< f9  @ H?B   495 52 3 ]  :EF G @ACD ?  99  95 42 9 U04 545 \: 4
0 96099 78
 9; 3\ W 9 7 7
!95 03  78
 4 9 2 94 ! 9 0 5 2 32 9 2 :5 203 39 991 9 P 85 f 5 23  
0 0 27 ~ 2 47
267 01 5 912905
199 Q 5 0595 95 79 2856< 2 ; 4 939 99
0; ; 27 3 ]  ~90 4 967 ! 49 ; 3  857 3 ;0 D T Fa _HA@  J @ _S@ @ J HA_ T TF_ S  [  ;  03 599
9=93 85 OS 2 3~9: 995 24Y397 9 367 0592 9  4856< 9 78
3 9 Q2 99
78
3 3 ]292 9 2   2 5 2 : ; 015;20! 3 960 739
 9 9~9 5 ; 2P 15  95 5 !56 9 12 950 Q2
0  2 012 03  1 15 ; 
90 99 4 < 9447  5 059 9~9 999 9 7 59 5 939 99
 ;  590292 < 8
0^_`aCK2 9 60 03 9  A_G T bS   495 5!56 90950 934  5  905960   ; 04I 9 9 92  99
95 5  5 85 f9   ; !9  3299 42 367  : 9~9 00 959P 2  ^_`aCK
90  _ 3 `@ A_G T bS 2
0 `@ 4  2 5  Q0=  ; 5 79 4   _ 39 5 5 9~9 0 ; 22 5 9 5 12 9 9  9059 4 99
 60=5 999 1502  99
3 0  3 9 ; 9 Y
; 2  85993 3 3 967 3 52 9 ;7  2 > ~ 2  292 9 2 856<  253 291  3 9 78
 3 ]  85759495 5 f0 ; 0 2 :
359 2 967 99
60 35! 53 2 04 ; 9095 99856< ; 390 ~ 2   
3 2= 3 ]~9 94 291  39 2  eQj 09 0 ;9 2 367 99
:5  95
93 399 3  85059 3 24 P 95 79
5565 6 78
02 85 9 29; ; 9993 92 9 67  2 ;  13 2 3 39 9 ! 49  3 60
!4 1 9:50
:=37 3 ;   9

l 5 9 23=9103 09<5< 11


0 27!54 9 # mnomp$pqr%$st9 290 $u  umovqwmt ; 00
5 3 99 17 0

lz04s{9;2=5.'92> ~12 9959278


#  /n&2 |m} . 60 2 59 2; 3
3 5 29 0
9 3

; 4 5 ; 39 99
67 :;=56~9 99 201915 0592 85 ; 3 ; 2  1553 2
 6 2 2 I !5312 9 9  9 1 ; 2 0  3 909
3 6  2 1502 
059 Q24 2  
0 2 0002 2 090 99
20: 91290
35 ! 1> D T FK@ H @ J OS Q ! 49 _S@ @ J F GA G 97 3 ;  4 90592 9  9 27 ~ 2 4  3  99
78
0=99
9 4 991 9 2 9  =2P f 1
;0
!92 ! 49129 ; 0 9  5 5 !  7 3 ; 8
:5!< ; 9 3[5!9 2 50 9 9 62 1 5 0  45 6  :
33 
52 ! :=
 9 4 2 3 63  ! 4909 KLMNL_Z 2 0 56 97 3 ; 0UFA TL@W 05962 1 =
 49  39
27 95 42 99
! : 5 3 ;0 95 9 96099 3 759495 5  19 35362 1 2 556 9 99
60Z 95 ;9 2 ! :
333  059
0 c959   59  9] 3 99
;0 040 999; 502  4g0 < 7 4  3  3 < 85P 2 y0<; 5=5
; :2 4 4 9856]67 U3 1 9I~999 ~ 5
< 7 1 047 HLA 3 @ _ LA W 29 5aO b 4TJb D @H@ J F GT J HTN Q ~ _AH K@ H @_AM 99<  1 9 99
KLMNL_`@W5 3 2 4059UFA TL@LOSL  0 9

l 2s{/'27; .n&30499 2 5 542


; # ; . |m}2 0
Q ;=561 9 = 2 5 4 0P3 2< 9 2 3 

0134 7895
 9  4   4  425 4 2 56 5    5   95
5 1 1
 1    ! "       ! "   #   % &'()* % ,  !$ !    +  -  "!.   !-  % " - !-  #   $! 1 !".! $$ .2 /0 )'% #"  ! ! !  35 5;; > 5 4 B54 ;45F I 1 46789: <=45?@A5ACDE< > GH) /0    JK  L O   #,!! $! MN +   . $P# $)Q L   JK  !  ! % MN,!!   R K#) !  "! %  "  -  ) , " % JK ! O %# !RK#) ! ..   %!  ) S-! -! )L % O - K %  JK .  T5 4UB ".!!  !J - !  @A5 C  T5 4UBF VJ  BW9 WY@W9[W5 4<D  @A5 C@6, !% % ?5XXZ <4>?@A5\ .  !  % $+ + ]+)'% JK%# ! ! #   - /0 )  $! 1 35 5<4?455ACDE< > GHT5<4_  a 46^8?=E5 >E B54 ;45F =E?)`%+  % JK % MNb  ! c +!  L  % M"% ,  " $!!    F =  ! %   JK  d   % GHT58  <4_J  " )'%  + ! !  ! !! E?) " J .     %.   !) 35 E; > ?95D@6;;E4 \A445F 5 46e8X<=45B<= A A 54 A56? > GH?8 @A5ACD AU3 )L %  % @A5 C 4 B544 f< !  % T5 4UB !   ,   J % ! % JK +   "  $  gX 95 ;ZU3 T5<4? I$ %  % ! !  ; Vf<=h f< =E5) ,.  !gX 95 ;ZU3 T5<4?"j?65k?\UB $ ; Vf<=h f< =E5  Z69 A_ 8 i CD,. +T5 4UB .   "  54  % @A5 C  b    ll)'%+ .!   J % f< ;Z )c  !   U3 f<=8 h =E5,J  !  + % JK, T5<4?    ! ]  %   !  J  +$! /0 )L %     ! # !   1 "  % Z69 A_ CD .   % ;Z Uf<8 .j?65k?\UB54  !f<=h 3 5[D %  !. ) ?<5R%  % !! 35 54; > X 954 D5EZ 9X5f<=8 5 4 46m8E4<=45B; V AZDE ; E ;Zh?@A5 ACDE !  " ! ! $! T5 4UB, B54)Q J   %  @A5 C . + +  - ! , d" - !   % &'(      f<=h f<5[D)I$ %  %  a  !d ;Z U3 3?<5 ,J  "   !] %  !  % ;  !"! .  # JK f<8 =h f<5[D !!  JK   J % Z U3 3?<5. )'%!   -    % ! %  MNO "  !  $+ L +)  0 n  %  !  -" %  % "  ,# JK    !  % j?65k?\UB54$ ,  ,R%  Z69 A_ CD !)'%  % J !  %   . % !  #] R  % a +)   !

o qrs  vwxy !+}~vxs! p " tus   uxzs  !  xts {v| { `J  !!    !  

#$ ".!  J$)/    "%  . !  +  a    .+  !   a   +   # !    !     )I! a !+#   &'(a R!]  + %,R%   % &'(  !  !  R% %%  ! $/0  $,    #! 1 # "  . !  )'%  a 1 !    $%   1+P/0 .$  %  . "  J   "! &'( . !  " !d a  #$!  +  'N   .    # % L R!] a $ !!   b% ! , !   .! ! % #    " ,.  . +.$ c  ,  ,!    ) i Q % ! !  $)L % +  " . ! %  !  "  . /0 #,  . " ! , % 1.. %  R!  ,45;45[D  ; V45;4 <;c % # >?53?<5bgX 95 >?5HV 4,R% 5 "R   % '*a R%!%   " ! +0 1   # % !   j5HZ 5[D  B @3?<5 ; Vc R!]  B @3?<5bj5HZ 5[DgX 95, R%   R% +    % " !  % . #) '% !  " !  a 1  !  d  " 1+P/0  % !    "  /0  "!   .! .  1. b%      R% %   ! )L +Rc % + #    !%R!  %  % a#J   !   ,  "  R%  /0   #  ! 1+P/0 ") 1   a 1   I #,I(J  R a % &'(.. a $  ,"R+ !J!  %"J  "  %  R %! + % # " a !   ,.!+ .!    ! + # $  $"! )L %   $1+P/0 ! .  % R!R!  a 1  )'% d !R%  ! !  R !   % " .! %   $ ",!!   &'(,! -." %   JK)'% i .! !   % 1$! % !  )2/0 #  gX 95 >?5HV;   gX 95 >?58 ; V45;4 < 4  ! % ; V45;4 5 3?5 %"! ! % B @3?<5 % 1 5[?R% !] .$j5HZ 5[D /0 0O! %     d + # + $+R)I$   %  % .  % "    ! #,!  !  R%$  j5HZ 5[D) % B @3?<5 O + R,  % B @3?<5 +  % !   ),j5HZ 5[D  - ! $!    %  a .   %   +R !1+P /0 %! + % R!]b%! + ; V45;45[? 1 %   %gX 95 >?53?5  gX 95 >?5HV;c gX 95 B @3?<5  ; V45;4 < 4)'% ; Vj5HZ 5[D8 5 Z Z5 #j5HZ 5[D  "! % 1 ; =? % B @3?<5 !] ./0 0O  !d f<=h f<5[D J # + !   ;Z U3 3?<5! a

0235571549 6 5 1 4615 69
6 9  8 02 761 51549 4 56     7 69 51
1    ! % '  (') +*( *& ,-% *( +,  "#$ & & *)  % )  $)( . 1 91/0235079:3=
9 >5? 6 6 766 124 68 ;<  4 1@5 u9   56 u1 1 4
f9 6 7j  964 4  69  q 9g   C D/3E1 3H<J1 13 71  ? 5 6 AB 43 : FG0I3 C K05  11 7 j 5 9  151 LMi 450<3IK 6  6 k 87 45 v5 634Cw 5 51 1
9 6 5 4 6 5
69
 71  LM1 9 491549 1 6 33= B63C<C33< CB63C =3wHC3021 6 0C36 021  9  9 9 4 : FG0I3 5 6  6   4 5/3E1 3H< I=0<I3=DHw6 6 0C3 3 i0  C31=H 33IIC<C33<i9M} q 9g iqij i ooo c2  {oicui~ } N5 O561  4 6
6  15 96
7 nprr5id1  4 ?  s 1h154  7  6 99 4 96 9? O ? 4 ? 4 i1 f9 i7569 7 755 7 6
469  5154  9 4  4  54P9 4 7 6 Q 6 1 71549  71 97 69 4 1 69
u5 7 9154 1
6 9 55571549 1  02 4 9 6 4615 69
6 5  5 0O121  8 c 6 9 4 4 5 6 
9 5ufd1 tj 41 4 >55 9 4 56 
9 6 1 LM 51
1 6 9  196 5 j  4 59  6v0 =6 C 8=Fw 16f9  4 4i C6F1IC1 4G w3 1
02 7690O2 CI30Ci7 z  yxozi~ KC3IK 9~i oy{yO } np>5i 1I1 x 4 1s 3A0D CJI43 0 I 4 D13 G3I<=6 vc 9 u1 o 17 2<1ICi  4o ~oc5 7 1 R5TUVTWXYUVY[V\]X^X_`aU_b S1 46 . 45 7 66949
14cd3 Z 9 61  415 1P 6Pl 4 19 6 6l5    M54 itku11 5iMN 4 i1 r r 1 4  5 4 761 1 6 9 97 69 7  19 6 61 e
7 9154 npqj 1 1 6 1 d56ic9  9 91  46 k 87 4 49s 4 4 d  76 151 5 6 96 9 9 6 9 LM1 f>1
  4
1  44
5 4 769 u5 9 LM1 f9 6 1
i  71
5 4  q 9gd769v  cd35 4 64549  54 759 9 1  9
6
469 15
6 6 6 5 634Cw6 = E6 0C36 450<3IK =3 C33< C  5 6 6 9 9 6 69g1  e64 5 4 4461 5 E6 wF3 FFB I1IC C = 0IK CB 0E6 C<=6 w1 436C
 56 ge 49 1  569   9  964 7 6
 h154076 E6 0C3 C33<iq} d561iq@ c2 i0 11 e ij i  5
5 1  4 79 1
c1 9 LM 4  
6 f> 7692  o~{yO d i~  y~o~i3  }
6 i 1 1 4  9
7    9g 4 59 cd36 7  966 e >5?   4154  59 4 1@
4 5 7 669 c79 1 9g4 41 nptj i92CG0I3 =1 C=1I3 I } >5s 3 3H<w13 8II=H9G  1 j f> 1 
1 769 9 96 9 57 k? 1 l96 9546   D1 <i  4o ~oc5  1P D C3vc 9 u1 o 17 7 6 I 1 6 61 5  769 6 4 9 6 1
6Pl 9  6l5  9 n s op7M5iCJ5FB16 6 3 C 3w6 D=00IIKD= C1 5Ei  4o ~oc5  1P vc 9 u1 o 17 7 6 1 mpqjM54114MN r41@ is5  S _`]`_`VT`Y  LMf9 6Pl9 9 l .  6l
469 6l765 n o 1 no 9 i92CG B< IG0I35D=6Fi op4  h s 3 6 039G 3H< 10 v 61 9 P1 96 7 >1 e4 4 4t4j
1 u9 c 9 u1 o 17 1P  4o ~oc5 7 6 1 36156 >5  71 1 6 tj 4 64 4 1u5 4 >5 6Pl g 9   6l4  h 9 N1 9gvJB=F3I 8=Fw 9oi y{}  i DI 41G w3 i7 x  z|i ~ooxy?o?~~ o olz? n~ opckqik 87 4c9569 9 47 s 151 154
 51 5 j1  5154 64 1869 AA 0 51= npd 1 4 ~  1  1 9 9 7s 769
954  9i 9549 IK 33H0C C 6 7<=6 C1 CI6 F3=0=333BIC6 Mj  i964i1  1 e6 4 6 4 9 263B1 2D1ICvc 9 u1 o 0<4 0D12I =6 wi  4o ~o < 36 55ev B 4 34 FE6 B=0< 4 166i 6 C 0C wB F2 = C31=i c5  1P 6Pl 9l 9 71 l4 17 7 6 6l6  49 ? 1 7 i yy}ic5 } 9~  x{z 7~ l 4 ckqqkoOO|~o3~o z? 4 
np> 4  ir5 17 4 y  4 4 i595  Q4s i1 55i ny  i 65 41549 opf9 s j 4  669
rLcd 3674 k?3 7 669
q 6> 154 4 7 6j  41549  4 69 5 j1 7 v5 N5 i 4E4=0 I3 ==wi 6 I6C=wG13 9 457>1 e5 67f9 i 453HwH
5 61 61 4L551  v5 01I3 c 9 u1 o 17 1P  4o ~oc5 7 6 1 E6 BIK1 42D1IC F2 =C C 2I =6 w E5E 4 < C 6Pl 56 l
l
~z 6l
  ~|66 9 =01IC DE6 0C36 i ~O{Oi~ C3C =6 1 C33< C  z ~o x npj 61 d 1 i3 66Iw06 | L55 4 1 <C DK3 6 F3 36Cq P@ 7ej 4i~ = 0IK 9g 9 45 9   npqj 1 11 MN 4 icj
1 z M54 4 r 1@s 96

An Approach for Content Personalization of ContextSensitive Interactive TV Applications


Fabiana Ferreira do Nascimento
PPGI, Departamento de Informtica Universidade Federal da Paraba Joo Pessoa, PB, Brasil

Ed Porto Bezerra
PPGI, Departamento de Informtica Universidade Federal da Paraba Joo Pessoa, PB, Brasil

fabiana@nti.ufpb.br ABSTRACT
This paper presents an approach for content personalization of interactive TV applications by context-sensitive. A context model related to the content semantics, used by sport services that require adaptations, is proposed. The model validation is described thereby AvanTV structure whose functionalities support the context usage sporting services.

edporto@di.ufpb.br

Categories and Subject Descriptors


H.3.4 [Information Storage and Retrieval]: Systems and Software current awareness systems, user profile and alert services.

and autonomous. Vieira 0 define the context of an interaction as the set of instances that enables to characterize an entity in a domain. The attributes of entities are organized about whos, wheres, whens and whats situation 0. Research investigations have explored personalization methods on TV. In 0 is considered a service based on media descriptors, joined to user preferences and a context model, to compose a personalized program. CollaboraTVware 0 applied context to choose interactive services and programs from use of collaborative participation of users with similar profiles and context. ITV application development is supported by Brazilian middleware Ginga 0. It is composed by modules that execute NCL declarative and Java imperative applications. Platform specifies a set of shared functionalities Common Core that includes a Context Manager component responsible for generating information about available services and viewer profiles. The AvanTV approach is being built up on Ginga.

Keywords
Interactive TV applications, context-awareness, sensitive, semantic, personalization context-

1. INTRODUCTION
Watching TV does not means a passive viewer activity anymore. The program reaction, sharing what is been viewed or searching for additional information can be accomplished through interactive television (ITV) receiver or set-top box. Among basic services types provided, the most common occurs by overlap content inclusion to programs. Game table navigation on soccer contests are examples in this enhance on TV. The reduced interaction amount on receiver and the televisionwatching behavior oriented to leisure present challenges for supply interactive services. An eligible feature in television systems is personalization, such to provide adjusted information based on user interests. User preferences and system knowledge domain content selection for viewer identification establish promising engines to adapt the content fitting the user. Context-Sensitive applications can be designed in different ways. Research fields propose separation of concerns between context acquisition and context use, domain-dependent tasks 0, to improve system reuse and extensibility 0. This article presents an approach for content personalization of context-sensitive ITV application. Content is domain-specific to sports. We propose a model to describe the content organization for sporting services access. A prototype from AvanTV approach was developed integrated middleware consumer to validate the context model.

3. CONTEXT MODEL
Context information was specified to sporting content. The knowledge about the user is structured by XML Schemas7 integrated with the User Description Profile (UDP) subset from MPEG-7 standard 0. Context semantic is defined by Resource Description Framework 8 (RDF) statements. The data model Sporting Content (SPTC) explores sport concepts. It aims to raise content interoperability and validation related to preference objects, such competitors (teams, athletes, and so), competitions and sporting services. Unlike similar works, as SportML9, the application domain data are associated to semantic description of user context. Sporting content is depicted by sptc class (Figure 1), which type extends MPEG7BaseType. The elements refer to alternative composition of preference objects: sport events in general (sport-event, tournament), the metadata about published data (sports-metadata) and services to be processed (sports-service). Viewer profile preferences and usage history (Viewers) are instantiated by UserDescriptionType from MPEG-7. A constraint set was defined to reduce complexity degree. The preferences values and the usage action are weighed to decide if a received content is from individual (User) or group (Group) audience

2. CONTEXT SENSIBILITY IN ITV


The Context-Aware or Context-Sensitive Systems differ about other systems because they are adaptive, responsive, proactive

7 http://www.w3.org/2001/XMLSchema 8 W3C Resource Description Framework http://www.w3.org/rdf 9 IPTC Council http://www.iptc.org/cms/site/index.html?channel=CH0105

interest. Together with viewer identification this information composes the properties of user context.

4. AVANTV STRUCTURE
Defining the AvanTV structure provides support content personalization of ITV applications based on Context-Sensitive. A prototype was designed as a Ginga extension. It consists of Inner-module e Outer-module. The Inner-module is responsible for content data and viewer interaction management (ModelManager), acquire context information (AcquisitionManager), perform monitoring (BehaviorMonitor), manage context rules (RuleManager) and make inferences (ReasonEngine). Applications that require personalization services use the Outermodule to register user interaction path (mapKeyActions()) and trigger rules (enterRules()); to get context information (getContextInfo()), user profile (getUserProfile()) and history usage (getHistoryUsage()); and to realize content adaptation process (such as filterItems(), mapItems() e sortItems()). Therefore personalization may happen either in AvanTV prototype scope as in the applications itself.

Figure 1: Sporting content representation

Context information considered about platform characteristics is acquired, mainly, from ITV middleware components. Relevant properties regard TV schedule and hardware capacity, from EPG and Context Manager source, respectively. The matching between user preferences and content of applications define the RDF statements in format [subject, predicate, object]. The Table 1 shows some statements from model. First assertion (1) states that Joao resource has preference object Corinthians, described by a team (Team) instance URI10. Other statements imply that the Brasileirao2011 program gathered from EPG is of type SPTC SportEvent (2), sport competitions are held by competitors (3) and the system has 512Mb of memory size (4). Table 1: Examples of RDF statements from model
1 [urn://avantv/2010/umsem#Joao, urn://avantv/2010/umsem#hasPreference, urn://avantv/2010/sptcsem#Corinthians] [urn://tvdigital:TVProgram#Brasileirao2011, http://www.w3.org/1999/02/22-rdf-syntaxns#type,urn://avantv/2010/sptcsem#SportEvent] [urn://avantv/2010/sptcsem#Competition, urn://avantv/2010/sptcsem#performedBy, urn://avantv/2010/sptcsem#Competitor] [urn://tvdigital:Memory, urn://avantv/2010/ctxsem#hasSize, 512Mb]

5. CONCLUSION
We focused an approach for content adaptation of ContextSensitive ITV applications. Context information is specified over a model that combines semantic to entities. The use of context to achieve sport content adaptations is described in accordance with AvanTV structure. Some features are being implemented to validate the presented artifacts.

6. ACKNOWLEDGMENTS
Our thanks to CAPES, the entity of the Brazilian Government for training of human resources that supported this work.

7. REFERENCES
Alves, L. G. P. 2008. CollaboraTVware: uma infra-estrutura ciente de contexto para suporte a participao colaborativa no cenrio de TV Digital Interativa. Master Thesis. Depto. Eng. de Computao e Sistemas /Escola Politcnica da USP. Dey, A. K. 2000. Providing architectural support for building context-aware applications. PhD Thesis, College of Computing, Georgia Institute of Technology. Goularte, R. 2003. Personalizao e Adaptao de Contedo Baseadas em Contexto para TV Interativa. Doctoral Thesis. Inst. de Mat. e Computao, USP, So Carlos. Martinez, J. M. 2004. MPEG-7 Overview ISO/IEC N6828, Tech. V.10 Oct .2004. Souza Filho, G. L., Leite, L. E. C. and Batista, C. E. C. F. 2007. Ginga-J: The Procedural Middleware for the Brazilian Digital TV System. Journal of the Brazilian Computer Society. Vol. 13. pp.47-56. Porto Alegre, RS. Vieira, V. 2008. CEManTIKA: A Domain-Independent Framework for Designing Context-Sensitive System. Doctoral Thesis. Centro de Informtica, UFPE, Recife.

Besides resources relationships the RDF schemas keep useful properties for classification (hasContextType), updating (isUpdated) and acquisition (isAcquiredBy) of context information. It is possible infer further statements based on rules over knowledge base. For example, the rule: [alertRule: (?V um#hasStatus watching), (?V um#hasPreference ?S), (?C prog#contains ?S)-> alert(?V, ?C)] Means that if an instance ?V has status watching and has preference for an instance ?S, and if the schedule ?C contains a content ?S, so issue an alert to ?V about ?C. Instance ?V can express either a user as a group, as ?S can include any preference object of SPTC model. Within the scope of this research project we have added procedural primitives to support the personalization services, such as alert, through the Jena framework11.

http://www.w3.org/TR/uri-clarification/

10 Uniform Resource Identifier, used to identify a name (URN) or a location (URL). 11 Hewlett-Packard Development Company (2011). Jena Framework http://jena.sourceforge.net

A Framework Architecture for Digital Games to the Brazilian Digital Television


Lady Daiana de Oliveira Pinto
Universidade Federal do Amazonas UFAM/ICET Av. Gen. Rodrigo Octvio Jordo Ramos, 300, 69077-000, Manaus AM - Brasil.

Jos Pinheiro Queiroz Neto


Instituto Federal do Amazonas - IFAM Av. 7 de setembro, 1975 Centro, 69020-120, Manaus - Amazonas Brasil.

ladydaiana@gmail.com ABSTRACT
1. Digital TV is in its consolidation process and has begun its popularization in Brazil. Therefore the experiences acquired in the development of interactive applications for the Brazilian model of Digital TV are few, and constitute a challenge for researchers, technicians and entrepreneurs, especially in regard to Digital Games, since the game industry grew 31% last year in various types of existing platforms. The purpose of this study is to investigate the environment of Digital TV in Brazil, conducting a survey of the game requirements to be run in this platform, and propose architecture for a framework aiming at assisting the process of the development of games. In order to achieve the goal of this study, the framework development uses the concept of tools for electronic games and will help non expert developers in the creation of design patterns typical of games, providing several facilities to the construction of graphical user interfaces, event handling and navigation control. An engineering software methodology is also presented for building game frameworks and the specifications for the Brazilian television standard, as an additional result. As a final point, we will present an application of the proposed framework, as proof of concept, using emulation in a computer that shows the facilities to develop a game to the Brazilian digital TV through this study.

pinheiro@cefetam.edu.br
control), the few tools to assist the applications development, and the lack of a strategy for design of games. Some ventures aimed to Digital TV constitute a challenge for researchers, technicians and companies. The Brazilian industry is now responsible for 0.16% of worldwide business with electronic games, and the growth of 31% of the industry of games between 2007 and 2008 is a clear sign of the strengthening of this business [2]. Therefore, this study has great importance in encouraging the development of digital TV games, not just as an academic assignment, but mainly in the support of the development of commercially viable applications. Additionally, the Brazilian government has a special interest in the use of digital TV in social inclusion programs, and thus this may be used by public educational institutions in order to create educational games of free access to the poorest population, increasing the educational level of the country, among other possibilities.

2. GAME TVD FRAMEWORK


There are several definitions of Framework concept in the academic bibliography. In this study, we used the definition [3] and [4], which a framework is defined as a set of objects that collaborate in order to meet a set of actions to an application or a specific application domain. This study presents the steps required to create the Game TVD Framework. The steps used in this assignment are: (i) the architecture specification, (ii) analysis and modeling, (iii) implementation of the Framework and, (iv) all the activities related of each stage, respectively. These steps are very important to the development of this assignment, because they present the form of conceptual modeling framework, its requirements and the tools used in its development, as shown in Figure 1.
Architecture specification Analysisand modeling Implementatio n

General Terms
2. Games, DTV, process, GUI, scenarios, generating the source code.

Keywords: Digital TV, Electronic games, framework. 1. INTRODUCTION


The International System of Digital Television (ISDTV) is the Brazilian digital TV standard which is open and free, allowing images and sound with high quality, features such as portability and mobility and the implementation of interactive applications. The ISDTV is based on the Integrated System for Digital Broadcasting (ISDB) of Japan, adopted and developed in Brazil, including updates on technology standards for audio, video and interactivity. This system was chosen from the other two existing standards, which are the ATSC (American standard) and DVB (European standard). The ISDB has shown to be the most appropriated to our social reality, according to [1]. Considering only games for digital TV, [1] says that the initiatives are smaller than other applications, because an electronic game requires a set of computational resources in order to meet the expectations of the user and, moreover, there are some limitations imposed by the Digital TV technology, such as the interface with the game (remote

Settings Framework Requirements specification Key processes Libraries Class diagrams Tool

Development

Figure 1: Stages in the development of GameTVD Framework.

Game Development

2.1.

Architecture specification

The GameTVD Framework is an environment for the generation of games that consists of a set of basic classes that allow the creation of new games, only being necessary for the user to add objects and scenarios for the creation of games through an intuitive graphical interface. The GameTVD offers several features, as display of images using display engine in two dimensions - 2D, facilities to play music and sound, the process of events generated by remote control and the process logic of the game through actions. The actions represent changes in the game objects. The focus of the Framework is to facilitate the creation of graphical interfaces of the games and provide support for running games in the DTV. Moreover, it is not necessary that the user know some specific programming language, which facilitates its use.

Figure 2:Main Interface.

2.3.1 Creating Sprites


The interface Create Sprites, shown in Figure 3, is used to add images or sprites with format ".jpg", ".jpeg", ".png" and ".gif" that can be used in the game. After inserting the image you chose, it appears on the left side of the interface, showing a folder structure corresponding to the directory Sprites.

2.2.

Analysis and modeling

In order to assist the development process of the architecture, it is necessary to survey the main processes related to the functionality of the Framework. Such processes have a simpler system functionality, which each process may involve one or more classes of control according to the analysis performed. In the analysis phase we have defined four basic processes: Main process: Its is responsible for system startup and coordination of other processes, including the creation of the life cycle Xlet. Process of adding objects: It is the process responsible for creating objects such as text, image and sound. It allows the loading and manipulation of objects in memory. Procedure for handling events: It is responsible for all the events of entry and exit system, the user interaction with the game. It is used to give actions to objects. Process of converting the game for DTV: it is the fundamental process of the framework. It allows the insertion of each object and performs processing in Xlet to run the game in the Digital TV, or in a set-top-box. For the development of applications Xlets a survey on the libraries was needed for TVD environment, which would function according to the specifications of the Ginga-J. The specifications are used to ensure compatibility with the standard GEM (Globally Executable MHP). This way we have not used the new specifications delivered by Sun Microsystem, since the work was already being completed. At first the Ginga-J was specified in a manner compatible with the standard GEM also has a set of APIs specific to the ISDTV-T. Thus it is possible to build applications that can run on any middleware, since they use the API set specified by GEM.

Figure 3: Interface to create sprites.

2.3.2 Creating Sounds


Using the interface Create Sound, represented in Figure 4, it is possible insert sound files. The class create Sound is responsible for allowing the reading, reproduction of sound and the parameter of volume control, represented by the Sound Manager module.

Figure 4: Interface to create sound.

2.3.3 Creating Scenarios


The interface Create Room is used to define the level of your game or scenario. In the scenario is possible to have a fund, known as the background, which can be a color or an image and can be loaded from a file on the interface Create Background. By creating a scenario, this interface allows to position objects on the screen, corresponding to the objects of the game, as well as generate multiple instances of the same object of the game. For example, you can add an object "stone" and use it in multiple locations, or even have multiple instances of an object "craft", and they all have the same behavior. The scenarios are the entities where the game takes place. Every game needs at least one scenario where the objects and the background are placed previously in a game. In addition, a text can also be added in the scenario. In the Create Room window, corresponding to Figure 5, we have

2.3.

Implementation

The GameDTV was developed on a PC platform 32bit, implemented in Java language and APIs, Windows Vista operating system, and the hardware used was a Pentium Dual-core 1.7GHz, 3 GB of memory. The implementation was done through the development of the following modules: Game Manager, Graphic Manager, Sound Manager, Event Manager, Object Manager, Object of the Game, Scenarios Manager. The interface is an intuitive framework, based on a computer game engine called GameMaker [5], which has a friendly interface as shown in Figure 2.

the properties of the room, where the tabs are displayed like Settings, Objects, Background and Titles. The properties are defined as follows: in Settings contains the settings window, under Objects are selected objects to be added; Backgrounds is inserted in a background color, and Titles is allowed to enter text and position them.

3. CASE STUDY: PAC-MAN GAME


The case study was developed by a graduate student in the area of computer science, with expertise in programming, but without experience in gaming and associated technologies to TVD under the supervision of the researcher who developed the game TVD. The aim is to demonstrate the ease of using interfaces. To implement the case study, was chosen a game created in 80s decade by Namco company, named Pac-Man. This game was a phenomenon of popularity and entered the record books as video-game icon best remembered in the world [6]. The development objective of this game is to allow the tests of functionalities implemented by GameTVD. The game consists of a mobile object named Pac-Man which is controlled by a user, and positioned on a simple maze full of small and large tablet-food and 4 (four) ghosts that haunt this maze. The goal is to "eat" all food tablets without the Pac-Man being overtaken by the ghosts. Each food tablet accrues points (scores), and when the Pac-Man eats the large food tablet it has the ability to "eat" the ghosts for a few seconds. The screens of the game are called rooms. The first room is the title screen, which displays buttons beginning to start the game, which helps contain the instructions to get out and play, to finish the game. The Figure 7 represents the initial room with their objects, the colors of the buttons were chosen according to the special colored buttons on the remote, which is on the right side of the emulator.

Figure 5: Interface to create room.

2.3.4 Creating objects


The objects are features that allow the generation of movements for the sprites. With the interface Create Object, shown in Figure 6, the user can choose which sprite he wants to add, its events and actions which means he can set the object's behavior, and the perception of animation and dynamism.

Figure 6: Interface to create object. The difference between objects and sprites is that the sprites are just animated images or not, while the objects have behaviors and are represented by sprites. The games are made by objects, and an object describes an entity and from it are created instances that perform the actions and the participation in the game. The idea of adding objects in games is interaction, and it is necessary to indicate which events to be added to the object and what actions should be taken after this event. The game's animation is created through a friendly interface where the user can add and delete events and actions.

Figure 7: Inteface to create initial room. The Figure 8 shows the components of the main stage of the game, represented by sprites. Pac-Man , , Figure 8: Pac-Man game According to this approach, after insertion of the sprites, one must create the game objects. Every game should have several managers of objects, who is responsible for management of similar objects. Thus the following objects were created to compose the game scenario, as shown in Figure 9. Food Ghost Maze Buttons

2.3.5 Creating the DTV Game


The last step consists of the process of generating the source code for the game in digital TV applications. The code is generated using the classes and methods GameTVD Framework, which makes the process easier and reduces the amount of source code that must be generated. The creation of the code is performed automatically at run time with the basic structure of a Xlet containing its life cycle, and it is possible to simulate the environment of digital TV with XletView that is also called by the framework at runtime. The idea of the GameTVD is to create the game on the computer through the graphical interface, and a usability and user friendly framework should be responsible for generating the code for DTV. To generate the code, the GameTVD translates the data to XML files, where each object and room (scenario) is an XML file for the game can be found and fitted for digital TV applications.

Figure 9: Room developed in GameTVD.

The objects of the scenario created have actions defined by the framework, which allows the following responsibilities for the objects: Pac-Man: It is responsible for managing their actions. If the user set an arrow, this action will change the movement of the Pac-Man. Set left arrow will move to left, set down arrow will move to down, and so on. If the Pac-Man collides with another object on the screen, it will also manage your action, if collides with one of the walls of the Labyrinth it continues moving, but if it collides with one ghost, it loses a life until the end of the game. Ghost: It is the controller of the ghost of a collision with other objects in the scene, the action is given to this sprite to move randomly on the screen until it finds Pac-Man. Small food: If it collides with Pac-Man, it will be gone and the gum will give points to the game, named Score. Great food: If it collides with Pac-Man, it will disappear and the tablet will allow Pac-Man to "eat the ghosts for a certain time. The Figure 10 shows the main room of the game running on the emulator Digital TV, the maze was created with the sprites in random order it may be modified, or created another room if the user wants to play other levels. The game has only one difficulty level and the screen displays the score achieved by Pac-Man through the scores and number of lives that it has.

allowed navigation between the screens in the development and consequent increase in the perception of different parts of game development. However, a major factor in using the framework, which relates to the development time of the game, could not be measured quantitatively, although the perception that people get is that it allows an economy of time and efforts in order to reduce the time of creation of the game, because the picture has the actions with the main events in a game, like moving, jumping, stopping, etc. Not aiming to be a definitive solution, the Game TVD still needs adaptation and testing in a real environment of digital TV, which is still in development and standardization, and after this we intend to socialize its using. Moreover, it would be interesting for future plans, an implementation module of basic algorithms of artificial intelligence and support on scripting language.

5. CONCLUSION
The convergence between television and world of digital games will increase the business of games in Brazil, mainly because the attraction exerted by the fact that the games already has significant results in the Brazilian industry of games. However, It is important to think about the development of these games, because the new environment in which the games will run has several limitations imposed by hardware and because these games differ from computer games. Using the Framework proposed in this study to develop games for digital TV, the latest thing would be viewers being able to create games on their PCs and then make them available to their family and friends via their TV sets. This also allows people who cant afford a Pc, the opportunity to play a game, which might have a positive effect on social inclusion. This study contributed to creating games for TVD and can be used as a basis for other projects intended to develop games for this platform.

6. REFERENCES
Figure 10: Room Pac-Man DTV Game.

4. TESTS AND RESULTS


The case study was conducted to verify the functionalities of the framework, identifying the qualities, defects and future modules to develop games for DTV. We observed aspects of performance, usability and portability for the Brazilian digital TV standard. Since at this project and Ginga-J was not totally ready, we have only implemented the specifications that are working well in Ginga-J. The Framework creates and runs the games on the PC platform when compiled and generates the code compatible with the TVD environment. After that, the tests on the emulator, and the code generated did not present any incompatibility. It was observed that as objects fall in the room (scene), the game starts to take longer to load. This occurs because of the need in reading the XML file that all objects are loaded in memory, increasing the load time Xlet. The GameTVD facilitates the creation of interfaces and event handling, since it is not necessary to know previously any programming language to be able to use it. The great difference is that anyone can either build a game for entertainment at home or educational purposes in the classroom. It can also be used as a tool to help social inclusion. For more experienced people, the framework

[1] ALENCAR, M.B. Digital televisions systems: chapter 9 - International System of Digital Television (ISDTV). Cambridge University Press, 2009. [2] PINTO, L. D. O. ; QUEIROZ-NETO, J. P. ; LUCENA JUNIOR, V. F. . An Engineering Educational Application developed for the Brazilian Digital TV System. In: 38th IEEE Frontiers in Education, 2008, Saratoga Springs, 2008. p. S2F-14-S2F-19. [3] JOHNSON, R. E.. Reusing Object-Oriented Design, University of Illinois, 1991. [4] GAMMA, E.; HELM, R.; JOHSON, R. E.; VISSIDES, J. (1995): Design patterns: elements of reusable objectoriented software. Addison-Wesley, Reading, MA, 1995. [5] GAMEMAKER. Gamemaker Lite. Available at http://www.yoyogames.com/make. Acessed in July. 2009. [6] BARBOZA, D. ; CLUA, E. W. G. . Ginga Game: A Framework for Game Development for the Interactive Digital Television. In: VIII Simpsio Brasileiro de Jogos e Entretenimento Digital, Rio de Janeiro, 2009, p. 162-

Das könnte Ihnen auch gefallen