Sie sind auf Seite 1von 18

Thesis Proposal User-centric Models for Network Applications

Athula Balachandran October 2013

Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Srinivasan Seshan, Chair Vyas Sekar, Hui Zhang, Peter Steenkiste, Aditya Akella (University of Wisconsin-Madison)

Submitted in partial fulllment of the requirements for the degree of Doctor of Philosophy.

Abstract Users are the true end points of various network applications (e.g., Internet video, web browsing). They sustain the advertisement-based and subscription-based revenue models that enable the growth of these applications. However, the design and evaluation of network applications are traditionally based on network-centric metrics (e.g., throughput, latency). Typically, the impact of these metrics on user behavior and quality of experience (QoE) are studied separately using controlled user studies involving a few tens of users. But, with the recent advancements in big data technologies, we now have the ability to collect large-scale measurements of network-centric metrics and user access patterns in the wild. Leveraging on these measurements, this thesis explores the use of big data techniques including machine learning approaches to characterize and capture various user access patterns, and develop user-centric models of quality of experience and user behavior in the wild. Different players such as content providers, ISPs and CDNs can improve content delivery by using these models.

Introduction

Traditionally, the design and evaluation of network applications and protocols have been largely based on simple network-centric metrics like throughput, latency. Studies on understanding how these network-centric metrics affect user behavior and experience are primarily done using controlled user studies involving a few tens of users [9, 32]. However, users are the true end points of these applications since they sustain the advertisementbased and subscription-based revenue models that enable its growth. Hence, it is important to understand and architect network applications by primarily taking into account user behavior and user satisfaction. This is gaining more importance due to the ever-increasing Internet trafc [4], and rising user expectation for quality (lower buffering and higher bitrates for video, lower page load times for web browsing) [1, 23, 33]. These phenomena put together places the onus on content providers, CDNs and ISPs to understand and characterize user satisfaction and user behavior in order to explore strategies to cope with the increasing trafc and user expectations. The traditional approach of employing small scale controlled user studies to understand user access patterns are not applicable anymore because of the signicant heterogenity of factors (e.g., device, connectivity etc.) in the modern network application setting. But, fortunately, due to the recent development in big data technologies, we now have the ability to collect largescale measurements of actual user access patterns in the wild. This opens up the opportunity to leverage the use of big data techniques and analytics towards designing and evaluating network applications. My aim in this thesis is to explore the use of big data technologies including machine learning techniques to characterize and capture user access patterns in the wild, and develop user-centric models of quality of experience and user behavior that can be used by different players in the ecosystem (e.g., content providers, CDNs, ISPs) to improve network applications. In order to systematically explore the problem space, I will present the following three studies: 1. First, I look at how we can use big data and machine learning techniques to understand and capture user Quality of Experience (QoE) in the wild. I perform this study using client-side 1

measurements of user access from Internet video providers. Despite the rich literature on video and QoE measurement, our understanding of Internet video QoE is limited because of the shift from traditional methods of measuring user experience and quality. Quality metrics that were traditionally used like Peak Signal to Noise Ratio (PSNR) are replaced by average bitrate, rate of buffering etc. Similarly experience is now measured in terms of metrics like user engagement that are directly related to the revenue of providers. Using machine learning techniques, I develop a predictive model for video QoE from the user access data. The QoE model can be used by content providers to select bitrate and CDNs for sessions. 2. However, not all players have access to end-point (server-side or client-side) logs of user access patterns. For example, improving and monitoring QoE is an important goal for the network carriers and operators. But, unlike other players (e.g., content-providers and CDNs), their knowledge of user access patterns is limited to information that can be extracted from network ow traces. Next, I look at how we can overcome this challenge by using data-driven techniques to infer quality metrics and user experience measures from network ow traces, and analyze quality of experience for mobile web browsing sessions using network traces from a mobile ISP. 3. Third, I look at how we can use data-driven techniques to characterize various user access patterns that have important implications to system design. We identied various user behaviors (e.g, evolution of interest in content) that can be used to prole and classify users. We also observed several aggregate user behaviors (e.g., partial interest in content, regional interests) that have important implications to content delivery infrastructure designs. We use these observations to prole individual users and generate population models. In this proposal, I will discuss existing results and planned work for the above three studies. In Section 2, I will discuss my work on using machine learning techniques to develop a predictive model for Internet video Quality of Experience. In Section 3, I present results and planned work towards analyzing QoE for mobile web browsing sessions using network ow traces. In Section 4, I present various viewing patterns and population models that we observed in our dataset along with planned work to extend the results before presenting the proposed timeline for the completion of the thesis in Section 5.

Developing a Predictive Model for Internet Video Quality of Experience

The growth of Internet video has been driven by the conuence of low content delivery costs and the success of subscription-based and advertisement-based revenue models [5]. Hence, there is agreement among leading industry and academic initiatives that improving users quality of experience (QoE) is crucial to sustain these revenue models, especially as user expectations of video quality are steadily rising [1, 23, 33]. However, our understanding of Internet video QoE is limited despite a very rich history in the multimedia community [6, 7, 9]. The reason is that 2

34 Fraction of content viewed (%) 32 30 28 26 24 20 22 20 500 1000 1500 Average bitrate (Kbps) 2000 0 400 600 800 1000 1200 1400 Average bitrate (Kbps) 1600 1800 Join time (s) Fraction of content viewed (%) 80

80 70 60 50 40 30 20

DSL/Cable providers Wireless providers

60

40

0. 0

0. 2 0. 4 0. 6 Rate of buffering (/minute)

0. 8

(a) Complex relationships

(b) Interaction between metrics

(c) Confounding Factors

Figure 1: Challenges in developing a QoE model for Internet video Internet video introduces new effects with respect to both quality and experience. First, traditional quality indices (e.g., Peak Signal-to-Noise Ratio (PSNR) [8]) are now replaced by metrics that capture delivery-related effects such as rate of buffering, bitrate delivered, bitrate switching, and join time [1, 15, 23, 37, 43]. Second, traditional methods of quantifying experience through user opinion scores are replaced by new measurable engagement measures such as viewing time and number of visits that more directly impact content providers business objectives [1, 43]. In [17], we described a systematic methodology to develop a predictive model for Internet video QoE. We identied two key requirements that any such model should satisfy. First, we want an engagement-centric model that accurately predicts user engagement in the wild (measured in terms of fraction of video viewed before quitting). Second, the model should be actionable and useful to guide the design of video delivery mechanisms; e.g., content providers can use it to evaluate cost-performance tradeoffs of different CDNs and bitrates [3, 37], adaptive video player designers can use this model to tradeoff bitrate, join time, and buffering [13, 25, 42]. However, meeting these requirements is very challenging. In Section 2.1, I summarize the three main challenges in developing a predictive model. In Sections 2.2, I present our data-driven approach to develop a robust model. In Section 2.3, I demonstrate a practical utility of QoE model to improve content delivery, before discussing related work in Section 2.4. The results are based on data collected by conviva.com over 3 months, spanning two popular video content providers (based in the US) and consisting of around 40 million video sessions.

2.1

Challenges

We use our dataset to highlight the three main challenges in developing an engagement-centric model for video QoE: Complex relationships: The relationships between different individual quality metrics and user engagement are very complex. These were shown by Dobrian et al., and we reconrm some of their observations [23]. For example, one might assume that higher bitrate should result in higher user engagement. Surprisingly, there is a non-monotonic relationship between them as shown in Figure 1a. The reason is that videos are served at specic bitrates and hence the values of average bitrates in between these standard bitrates correspond to clients that had to switch bitrates during the session. These clients likely experienced higher 3

100

12 Improvement in accuracy (%)

80

Accuracy in %

60

Naive Bayes Decision Tree Regression Single metric regression Random coin toss

10 8 6 4 2 0
e Typ

Split Feature

40

20

10 Number of classes

15

20

e ay e-2 vice onn onn evic of d evic -De OD-C ive-C D-D L ime V D-D Live VO D-T VO O V Confounding factor

(a) Compare models

(b) Split vs. Feature

Figure 2: Machine Learning Approach to capture QoE buffering, which led to a drop in engagement. Complex interaction between metrics: The various quality metrics are interdependent on each other in complex ways. For example, streaming video at a higher bitrate would lead to better quality. However, as shown in Figure 1b, it would take longer for the video player buffer to sufciently ll up in order to start playback leading to higher join times. Confounding Factors: In addition to the quality metrics, several external factors also directly or indirectly affect user engagement [33]. A confounding factor could affect engagement and quality metrics in the following three ways. First, some factors may affect user viewing behavior itself and result in different observed engagements. For example, we observed that live and VOD video sessions have signicantly different viewing patterns [17] (not shown here). Second, the confounding factor can impact the quality metric. For example, we observed that the join time distribution for live and VOD sessions are considerably different [17] (not shown here). Finally, and perhaps most importantly, the confounding factor can affect the relationship between the quality metrics and engagement. For example, Figure 1c shows that users on wireless connectivity are more tolerant to rate of buffering compared to users on DSL/cable connectivity.

2.2

Approach

The three main steps in our approach to overcome the above challenges in order to develop a predictive model are as follows: 1. Tackling complex relationships and interactions: We cast the problem of modeling the relationship between the different quality metrics as a machine learning problem and use discrete classication algorithms. Engagement is classifed based on the fraction of video that the user viewed before quitting. For example, when the number of classes is set to 5 the model tries to predict if the user viewed 0-20% or 20-40% or 40-60% or 60-80% or 80-100% of the video before quitting. We use similar domain-specic discrete classes to bin the different quality metrics. Figure 2a compares the performance of three different machine learning algorithms: binary decision trees, naive Bayes, and classication based on linear regression. The results are based on 10-fold cross-validation [39].We need to be careful in using machine learning as a black-box on two accounts. First, the learning algorithms must be expressive enough to tackle our challenges. As observed in Figure 5, approaches like Naive Bayes that assume that the quality metrics are 4

Confounding Factor Type of video - live or VOD Overall popularity (live) Overall popularity (VOD) Time since release (VOD) Time of day (VOD) Day of week (VOD) Device (live) Device (VOD) Region (live) Region (VOD) Connectivity (live) Connectivity (VOD)

EngmntQuality QE QE Qual Quant                                      

Table 1: Summary of the confounding factors. Check mark indicates if a factor impacts quality or engagement or the qualityengagement relationship. The highlighted rows show the key confounding factors that we identify and use for rening our predictive model independent variables or simple regression techniques that implicitly assume that the relationships between quality and engagement are linear are unlikely to work. Second, we do not want an overly complex machine learning algorithm that becomes unintuitive or unusable for practioners. Fortunately, we nd that decision trees which are generally perceived as usable intuitive models [35, 44] are also the most accurate. 2. Identifying Confounding Factors: As mentioned in Section 2.1, confounding factors can have the following three effects: 1. They can affect the observed engagement 2. They can affect the observed quality metric and thus indirectly impact engagement 3. They can impact the nature and magnitude of quality engagement relationship For (1) and (2), we use information gain analysis to identify if there is a hidden relationship between the potential confounding factor w.r.t engagement or the quality metrics. For (3), we identify two sub-effects: the impact of the confounding factor on the quality engagement relationship can be qualitative (i.e., the relative importance of the different quality metrics may change) or it can be quantitative (i.e., the tolerance to one or more of the quality metrics might be different). For the qualitative effect, we use the technique described in [35] to compact the decision tree separately for each class (e.g., TV vs. mobile vs. PC) and compare the tree structure for each class. For the quantitative sub-effect in (3), we simply check if there is any signicant difference in tolerance. More details can be found in [17]. We identify a few potential confounding factors from our dataset (Table 1) and perform each of these tests on all of them. We acknowledge that this list is only representative as we are only accounting for factors that can be measured directly and objectively. We take a very conservative stance and mark a factor as confounding if any of the tests shows positive. The results are shown in Table 1. 3. Incorporating Confounding Factors: There are two candidate approaches to incorporate the confounding factors that we identied into the predictive model: 5

Add as new feature: The simplest approach is to add the key confounding factors as additional features in the input to the machine learning algorithm and relearn the prediction model. Split Data: Another possibility is to split the data based on the confounding factors (e.g., live on mobile device) and learn separate models for each split. Our predictive model would then be the logical union of multiple decision treesone for each combination of the values of various confounding factors. Each of the above two approaches has it pros and cons. While feature-addition approach has the appeal of being simple and requiring minimal modications to the machine learning framework, it assumes that the learning algorithm is robust enough to capture the effects caused by the confounding factors. The split data approach avoids any doubts we may have about the expressiveness of the machine learning algorithm. The challenge with the split approach is the curse of dimensionalitythe available data per split becomes progressively sparser with increasing number of splits. However, the fact that we have already pruned the set of possibly confounding external factors, and that the growth of Internet video will enable to capture larger datasets alleviate this concern. We analyze the improvements in prediction accuracy that each approach gives for different datsets in Figure 2b and observe that the split method performs better (or equivalent) to the feature addition approach. The reason for this is related to the decision tree algorithm. Decision trees use information gain for identifying the best attribute to branch on. Information gain based schemes, however, are biased towards attributes that have multiple levels [22]. While we bin all the quality metrics at an extremely ne level, the confounding factors have only few categories. This biases the decision tree towards always selecting the quality metrics to be more important. Final Model: We observed many users who sample the video and quit early if it is not of interest to them. Taking into account this domain-specic observation, we ignore these early quitter sessions from our dataset and relearn the model leading to 6% increase in accuracy. Further, incorporating the three key confounding factors (type of device, device and connectivity), we propose a unied QoE model based on splitting the dataset for various confounding factors and learning multiple decision treesone for each split. Accounting for all the confounding factors further leads to around 18% improvement. Our nal model predicts the fraction of video viewed within the same 10% bucket as the actual user viewing duration with an accuracy of 70%.

2.3

Implication for System Design

The QoE model that we developed can be used by various principals in the Internet video ecosystem to guide system design decisions (e.g., video player designers can use the model design efcient bitrate adaptation algorithms, CDNs can use it to pick bitrates). We evaluate the model in the context of a (hypothetical) control plane [36] that content providers can use to choose CDN and bitrate for each session using a global optimization framework. For this evaluation, we need to also model a quality model that predicts various quality metrics for a given session. We use a simplied version of the quality prediction model proposed from prior work [36] that computes the mean performance (buffering ratio, rate of buffering and join time) for each combination of attributes (e.g., type of video, ISP, region, device) and control parameters (e.g., bitrate and CDN) using empirical estimation. 6

Average engagement (Fraction of video viewed)

0. 8

Smart QoE Baseline Smart CDN + Lowest BuffRatio Smart CDN + Highest Bitrate Smart CDN + Utility Function

Average engagement (Fraction of video viewed)

1. 0

1. 0

0. 8

0. 6

0. 6

Smart QoE Baseline Smart CDN + Lowest BuffRatio Smart CDN + Highest Bitrate Smart CDN + Utility Function

0. 4

0. 4

0. 2

0. 2

0. 0

VOD-PC

VOD-TV

VOD-Mobile Dataset

VOD-Overall

0. 0

Live-PC

Live-TV

Live-Mobile Dataset

Live-Overall

(a) VOD

(b) Live

Figure 3: Comparing the predicted average engagement for the different strategies Using this framework we compare different strategies to pick control parameters (CDN and bitrate): Smart QoE approach: This approach uses a predicted quality model and a predicted QoE model based on historical data. For choosing the best control parameters for a particular session, we estimate the expected engagement for all possible combinations of CDNs and bitrates by querying the predicted quality model and the predicted QoE model with the appropriate attributes (ISP, device etc.) and assigns the CDN, bitrate combination that gives the best predicted engagement. Smart CDN approaches: We nd the best CDN for a given combination of attributes (region, ISP and device) using the predicted quality model by comparing the mean performance of each CDN in terms of buffering ratio across all bitrates and assign clients to this CDN. We implement three variants for picking the bitrate: (a) Smart CDN, highest bitrate: The client always chooses to stream at the highest bitrate that is available. (b) Smart CDN, lowest buffering ratio: The client is assigned the bitrate that is expected to cause the lowest buffering ratio based on the predicted quality model (c) Smart CDN, control plane utility function: The client is assigned the bitrate that would maximize the utility function (3.7 Buff Ratio + Bitrate ) which was the optimization goal in prior work [36]. Baseline: We implemented a naive approach where the client picks a CDN and bitrate randomly. We quantitatively evaluate the benets of these techniques using a trace based simulation. We use a week-long trace to simulate client attributes and arrival times. In each epoch (one hour time slots), a number of clients with varying attributes (type of video, ISP, device) arrive. For each client session, we assign the CDN and bitrate based on the various strategies mentioned earlier. For simplicity, we assume the CDNs are sufciently provisioned and do not degrade their performance throughout our simulation. To evaluate the performance of these strategies, we develop actual engagement models and an actual quality models based on the empirical data from the current measurement epoch and compare the engagement predicted by these models for each session. Since the arrival patterns and the client attributes are the same for all the strategies, they have the same denominator in each epoch. Figure 3 compares the performance of the different strategies for live and VOD datasets broken down by performance on each device type. As ex7

pected, the baseline scheme has the worst performance. The smart QoE approach can potentially improve user engagement by up to 2 compared to the baseline scheme. We observed that the smart CDN and lowest buffering ratio scheme picks the lowest bitrates and hence the expected engagements are lower compared to the other smart schemes. The smart CDN with utility function approach and smart CDN highest bitrate approaches have very comparable performances. This is because the utility function favors the highest bitrate in most cases. Our smart QoE approach picks intermediate bitrates and dynamically shifts between picking the lower and the higher bitrates based on the various attributes and the predicted quality. Thus, it can potentially improve user engagement by more than 20% compared to the other strategies.

2.4

Related Work

Engagement in Internet video: Past measurement studies provide a simple quantitative understanding of the impact of individual quality metrics on engagement [23, 33].We shed further light and provide a unied understanding of how all the quality metrics when put together impact engagement by developing a QoE model. Similarly, although previous studies have also shown that a few external factors (e.g., connectivity) affect user engagement [33], there does not exist any techniques to identify if an external factor is potentially confounding or not. We extend our previous work [15] by developing techniques to identify external factors that are confounding and incorporate these factors to form a unied QoE model. User studies: Prior work by the multimedia community try to assess video quality by performing subjective user studies and validating objective video quality models against the user study scores [9, 20, 32, 38, 40]. User studies are typically done at a small-scale with a few hundred users and the perceptual scores given by users under a controlled setting may not translate into measures of user engagement in the wild. The data-driven approach that we proposed is scalable and it produces an engagement-centric model. QoE metrics in other media: There have been attempts to study the impact of network factors on user engagement and user satisfaction in the context of other media technologies. For example, in [10], the authors study the impact of bitrate, jitter and delay on call duration in Skype and propose a unied user satisfaction metric as a combination of these factors.

Analyzing Quality of Experience for Mobile Web Browsing Sessions

With the proliferation of smartphone applications, cellular network operators are now expected to support and provide wire-line compatible quality of experience (QoE) for several applications. However, measuring QoE is extremely challenging for a network operator compared to other players like CDNs and content providers due to the following reasons: Unlike CDNs and content providers, ISPs do not have access to client-side and server-side logs that are extensively used for QoE estimation in prior works [17]. The physical environment plays a critical role in wireless user experience. Hence it is not practical to use active probes to measure and understand application QoE. 8

Applications running over cellular networks have complex interaction with a number of different protocol layers. This leads to trade-offs in several performance characteristics (e.g., latency vs. capacity [29], average throughput vs. link disruptions [30]). Because of these complex interactions and trade-offs, the relationship between network characteristics and QoE is poorly understood. In order to understand how different network characteristics affect user QoE, cellular network operators need to rst develop techniques to extract ground truth benchmark data of user engagement and other quality metrics from passive network traces. I will next present my study on analyzing QoE for mobile web browsing sessions using passive network ow traces and radio access network traces collected by AT&T over two months. It looks at whether the techniques proposed and evaluated for developing a QoE model for Internet video can be extended to a new domain (web browsing) under a more constrained setting (e.g., lack of client side logs on user engagement).

3.1

Limitations of Previous Work

Previous work on measuring web browsing QoE have always looked at the impact of page load time on user experience [19, 21]. Similar to the Internet video QoE studies in the past, these involved controlled user studies to understand the impact of page load time on user experience by installing browser plugins that monitor web page loading time and frequently prompt users to provide feedback on their experience (e.g., satised or not). This approach has two major limitations: First, page load time does not capture all aspects of a web browsing session. For instance, since web pages are downloaded progressively, users can interact and start browsing the page even before it is completely loaded. In order to get a holistic view of the impact of network characteristics on mobile web browsing QoE, we need to also accomodate and study the impact of new metrics such as network-ow based (e.g., TCP reset ags, partially downloaded objects) and radio network-based (e.g., cellular handover, radio signal strength) parameters. Second, user studies are typically conducted in a controlled setting with a few tens of users. It is impossible to incorporate all possible scenarios under such a controlled setting. Leveraging on the data that is collected by providers, user feedback can be replaced with engagement in the wild. In this study, I use web session length, measured in terms of the number of clicks that the user makes within a session, as the measure of engagement. However, as mentioned earlier, estimating engagement is challenging for a network operator since they do not have access to client-side or server-side logs. In what follows, I rst present a few initial results and discuss my plans complete this study.

3.2

Preliminary Results

Estimating session length: Since ISPs do not have access to client-side or server-side logs, they need to estimate ground truth regarding engagement (measured as web session length) from the passive network traces. I worked on measuring web session length from HTTP traces. The HTTP trace consists of GET requests for web objects generated either by (1) the user clicking 9

1.0 0.8 Average session length CDF (% of sessions) 0.6 0.4 0.2 0.0

3.5 3.0 Average session length 0.0 0.2 0.4 0.6 Fraction of partially downloaded objects 0.8 2.5 2.0 1.5

4.0 3.5 3.0 2.5 2.0 1.5 1.0

Embedded objects Page clicks


0 100 200 300 400 Request arrival time 500 600

1.0

0.0

0.2

0.4 0.6 0.8 Fraction of flows with C->S reset flags

1.0

(a) Arrival time distribution

(b) Partial download ratio vs session length

(c)

Figure 4: Selected results from analyzing mobile web browsing QoE on a link or (2) automatic requests for embedded objects from web pages. The main challenge in estimating session length is in differentiating clicks from embedded objects. I rst worked on classifying requests within a single domain (www.cnn.com). By investigating the domain names, I was able to nd patterns for web clicks. Codifying this using regular expressions, we collected ground truth for the data and classied URLs as either web clicks or embedded objects. We explored the following three techniques to automate this classication and generalize it to other domains: Using inter-arrival time: Based on the assumption that embedded objects have low interarrival time since they are trigerred automatically as opposed to clicks that need human intervention, I tried to nd a threshold to differentiate the two based on inter-arrival time. The inter-arrival time distribution is shown in Figure 4a. Using a threshold of 2 seconds to differentiate between the two led to about 85% accuracy. Investigating the cases where the misclassication ocurred, we found out that CNN uses various third party services (e.g., discus, scorecard research etc) that regularly send beacons to understand user web browsing behavior. Although these are requests for embedded objects, this technique classies it wrongly as clicks. Using domain name: We then looked at classifying based on domain names and represented the domain name as a bag of words. We obtained the training set by looking at just the very rst 10 seconds of a session and assuming that the rst request during this period has to be click and the rest are embedded objects. Using this as the training set, we used Naive Bayes algorithm to learn a model. The model was able to classify clicks and embedded objects with 92% accuracy. Using domain name, URN and type of object: Incorporating information about the type of object and URN along with the domain name bag of words and relearning the model led to 98% classication accuracy. Session length vs. Network-ow features: We also investigated the relationship between session length and a few network-ow features. Partial download ratio and client to server reset ag ratio are two network ow metrics that we measured from the traces. We observed several objects that are partially downloaded (i.e. downloaded bytes < content length). We calculate the fraction of objects that are partially downloaded in a session and performed correlation analysis 10

with session length as shown in Figure 4b. We observed that sessions with higher partial download ratio had lower session length. Another metric that we used to characterize a session is the fraction of TCP ows in a session that had client to server reset ags. We observed that higher this ratio, lower the session length (Figure 4c). In short, as expected, we observed that worser ow-metrics resulted in lower session lengths. We plan to extend this study by incorporating more network ow metrics and also radio network metrics.

3.3

Planned Work

Better session length models: We want to further improve the session length model by (1) trying other advanced machine learning algorithms to do the click vs. embedded object classication, (2) adding more input features (e.g., inter-arrival time) to the model. Further, we want to test if our technique for classifying web clicks and embedded objects works for other types of websites (e.g, blogging (tumblr), shopping (amazon), social networking (twitter)). This would require collecting ground truth for these domains and testing our nal classication algorithm against it. Unied QoE model: We want to incorporate more features, including radio network level feature (e.g., hand offs, signal strength) and have a more holistic understanding of how network features affect web browsing QoE. The rst step would be to extract these features from the dataset and then perform correlational analysis to understand the relationships. I would then try to use machine learning techniques to capture a QoE model similar to the Internet video study. It would also be interesting to study if the relationship between performance and engagement is affected by external confounding factors, if any (e.g., type of website).

Characterizing Internet Video Viewing Behavior In the Wild for User Customization

The previous two studies looked at leveraging big data techniques towards understanding how network characteristics affect users quality of experience. Another application of big data techniques is to characterize and model user behavior in the wild, in order to customize the design of network applications and delivery infrastructure towards individual users and general trends. In [16], we studied various user behaviors and their implications to CDN augmentation strategies. In this section, I will present some of these results. The data used for this analysis was collected by conviva.com in real time using a clientside instrumentation library in the video player.We focus on data queried over two months (consisting of around 30 million video sessions) from two content providers based in the US. The rst provider serves VOD objects between 35 minutes and 60 minutes long. These comprise of TV series, news shows and reality show episodes. The second provider serves sports events that are broadcast while the event is happening, and hence the viewing behavior is synchronized. Network applications can be customized based on general trends in user behavior as well as individual user access patterns. In Section 4.1, I present selected measurements of general trends in video viewing pattern that have signicant implications to delivery infrastructure design. In Section 4.2, I present some observations that can be used to prole and classify individual users 11

1. 0 CDF of fraction of video objects

Normalized Number of Accesses

0. 8

Regional Non-regional

1. 0

0. 8

Fraction of sessions

0. 6

0. 6

0.10

Fraction of sessions

region 1 region 2 region 9

0.15

Data Mixture Model

0. 4

Data Model

0. 3

0. 2

0. 4

0. 4

0.05

0. 2

0. 2

0. 1

0. 0 0. 4

0. 2

0. 0 0. 0 0. 2 0. 4 0. 6 Pearson correlation co-efcient 0. 8 1. 0 0 5 10 15 Time (hours) in PST 20 25

0.00 0 20 40 60 Fraction of the video 80 100

0. 0 0 20 40 60 Fraction of the video 80 100

(a) Regional Effect

(b) Time of day Effect

(c) Partial Interest - VOD

(d) Partial Interest - Live

Figure 5: User access Patterns in order to customize the application. I further present planned work to extend these results in Section 4.3 before concluding in Section 4.4.

4.1

General Trends

Some of the interesting general trends in video viewing behavior that we observed are: 1. Regional Effects: Typically, the number of accesses to a particular content from a geographical region is strongly correlated with the total population of the region. However, in our live dataset, we observed anomalies in the case of content with region-specic interest (e.g., when a local team is playing a game). Our data consists of only clients within the United States and hence we classied the content as regional or non-regional based on whether it appeals to a particular region within the US. Sports matches between local teams within the US (e.g., NCAA) were classied as regional as opposed to events that are non-regional to the US viewers (e.g., Eurocup soccer). Figure 5a shows the CDF of the Pearson correlation coefcient [39] between number of accesses from each region to the population of the region (obtained from census data [2]) for all live video objects. Access rates of non-regional content show strong correlation to the population, whereas for regional matches it is uncorrelated or negatively correlated. Implications: The skeweness in access rates caused by regional interest is an important factor to be considered while provisioning the delivery infrastructure to handle unexpected high loads. 2. Time of day Effects: In our VOD dataset, we clearly observe strong time-of-day effects access peak in the evenings with a lull in the night. Further, in order to identify regional variations in peak load, we zoom into a day and plot the time series of the normalized number of accesses separately for each region in Figure 5b. Due to lack of space, we only show the results for the top 3 regions. The number of accesses peaks around 8 pm local time with a lull in the night. We observe that there is a difference between the time when the load peaks at different regions (caused by time zone difference). We further performed auto-correlation and cross-correlation analysis to conform that these patterns over the entire two months of data [16] (results not shown here). Implications: The temporal shift in peak access times across different regions opens up new opportunities to handle peak loadse.g., spare capacity at servers in regions 1 and 2 can be used to serve content in region 9 when access rates peak at region 9.

12

4.2

User proling and classication

We also observed patterns in user behavior that can be used to prole and classify them. For instance, we observed that several users had partial interest in the content that they are viewing and they quit the session without watching the content fully in the case of both VOD and live. We investigated the temporal characteristics of user behavior within a given video session by analyzing what fraction of a video object users typically view before quitting. For VOD content, Figure 5c shows that based on the fraction of video that a user viewed within a session, users can be classied into three categoriesearly-quitters (around 10% of the users watch less than 10% of the video before quitting and might be sampling the video), dropouts (users steadily drop out of the session possibly due to quality issues or lack of interest in the content) and steady viewers (a signicant fraction of the users watch video to completion). This distribution can be modeled using a mixture model with separate components. We also nd that 4.5% of the users quit the session early for more than 75% of the sessions; i.e., these users are serial early quitters. Similarly, 16.6% of the users consistently viewed the video to completion; i.e., these are consistently steady viewers. We performed the same analysis for live content. As shown in Figure 5d, we observed users watching live content can be classied into two categoriesearly-quitters (a very large fraction of users watch less than 20% of the video before quitting the session) and drop-outs (the remaining fraction of users steadily drop out of the video session). We also prole users viewing history and notice that around 20.7% of the clients are serial early quittersi.e., they quit the session early for more than 75% of the sessions for live content. We also observe several users joining and quitting multiple times during the same event. Since our dataset consists of sporting events, one possibility is that they might be checking for the current score of the match. Implications: (1) This analysis is particularly relevant in the context of augmenting CDNs with P2P based delivery. For example, the fact that users typically view VOD objects from the start and quit early might imply higher availability for the rst few chunks of the video. For live, even though users quit quickly, they arrive randomly in between the event and hence the rst part of the event may not necessarily be the most popular part; (2) Content providers and content delivery infrastructures can identify the early quitters and steady viewers and customize the allocation of resources (e.g., use P2P to serve the content to early quitters who are sampling the video).

4.3

Planned Work

Characterizing User Interest: I plan to extend the above results by trying to prole individual users further in terms of understanding and predicting their interest in a particular content. As we saw in Section 4.2, we observed users who sample videos and leave the session early in the case of VOD. While quality issues could be a possible reason for this, it could also be due to lack of interest in the content. Being able to predict if a user is interested in a particular video ahead of time has important implications to caching and provisioning decisions (e.g. P2P vs server based delivery) of the infrastructure. In order to characterize user interest, I plan to use fraction of video viewed as an indicator of interest (after ltering out obvious cases where quality issues could have affected engagement). I will explore various machine learning techniques to learn a 13

Time Task Nov 2013 - Jan 2014 QoE for mobile web browsing sessions Feb 2014 - Apr 2014 User customization models May 2014 - Aug 2014 Writing dissertation Sept 2014 Thesis defense Table 2: Proposed Timeline model for user interest. Applications of User Interest Model: The user interest model that I develop can be used for the following two purposes: Identifying early quitters: The user interest model can be used to predict if a user would quit session early or watch the video to completion. I will evaluate how well the model performs in identifying early quitters. Understanding user tolerance: The model can also be used to understand if user interest has an impact on tolerance to quality issues. If I observe that interest signicantly impacts user tolerance, I will try to identify if user interest is a potential confounding factor and further ne tune the video QoE model by incorporate this factor.

4.4

Related Work

Content Popularity: There have been studies to understand content popularity in user-generated content systems (e.g., [18, 28]), IPTV systems (e.g., [12, 14, 41]), and other VOD systems (e.g., [24, 31, 34]). The focus of these studies was on understanding content popularity to enable efcient content caching and prefetching. Other studies analyze the impact of recommendation systems on program popularity (e.g., [45]) or the impact of ash-crowd like events (e.g. [26]). In contrast, this work focuses on analyzing general trends in content popularity for customizing the infrastructure design and extends these studies along two key dimensions. First, we model the longitudinal evolution in interest for different genres of video content. Second, we analyze regional variations and biases in content popularity. User behavior: Previous studies show that many users leave after a very short duration possibly due to low interest in the content (e.g., [11, 27]). While we reconrm these observations, we also provide a systematic model for the fraction of video viewed by users using mixture model and gamma distributions, and highlight key differences between live and VOD viewing behavior. Furthermore, we look at the implications of such partial user interest in the context of customizing the delivery infrastructure design.

Timeline

Table 2 describes the timeline to complete the remainder of this thesis. We expect to write atleast one research paper as a result of the proposed research.

14

References
[1] Driving Engagement for Online Video. http://events. digitallyspeaking.com/akamai/mddec10/post.html?hash= ZDlBSGhsMXBidnJ3RXNWSW5mSE1HZz09. 1, 2 [2] Census Bureau Divisioning. http://www.census.gov/geo/www/us_regdiv. pdf. 4.1 [3] Buyers Guide: Content Delivery Networks. http://goo.gl/B6gMK. 2 [4] Cisco forecast. http://blogs.cisco.com/sp/comments/cisco_visual_ networking_index_forecast_annual_update/, . 1 [5] Cisco study. http://goo.gl/tMRwM, . 2 [6] P.800 : Methods for subjective determination of transmission quality. http://www. itu.int/rec/T-REC-P.800-199608-I/en, . 2 [7] P.910 : Subjective video quality assessment methods for multimedia applications. http: //goo.gl/QjFhZ, . 2 [8] Peak signal to noise ratio. signal-to-noise_ratio. 2 http://en.wikipedia.org/wiki/Peak_

[9] Vqeg. http://www.its.bldrdoc.gov/vqeg/vqeg-home.aspx. 1, 2, 2.4 [10] K. Chen, C. Huang, P. Huang, C. Lei. Quantifying Skype User Satisfaction. In Proc. SIGCOMM, 2006. 2.4 [11] L. Plissonneau and E. Biersack. A Longitudinal View of HTTP Video Streaming Performance. In Proc. MMSys, 2012. 4.4 [12] Henrik Abrahamsson and Mattias Nordmark. Program popularity and viewer behavior in a large TV-on-Demand system. In IMC, 2012. 4.4 [13] Saamer Akhshabi, Lakshmi Anantakrishnan, Constantine Dovrolis, and Ali C. Begen. What Happens when HTTP Adaptive Streaming Players Compete for Bandwidth? In Proc. NOSSDAV, 2012. 2 [14] David Applegate, Aaron Archer, Vijay Gopalakrishnan, Seungjoon Lee, and Kadangode K. Ramakrishnan. Optimal Content Placement for a Large-Scale VoD System. In Proc. CoNext, 2010. 4.4 [15] Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica, and Hui Zhang. A quest for an internet video quality-of-experience metric. In Hotnets, 2012. 2, 2.4

15

[16] Athula Balachandran, Vyas Sekar, Aditya Akella, and Srinivasan Seshan. Analyzing the Potential Benets of CDN Augmentation Strategies for Internet Video Workloads. In IMC, 2013. 4, 4.1 [17] Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica, and Hui Zhang. Developing a Predictive Model of Quality of Experience for Internet Video. In SIGCOMM, 2013. 2, 2.1, 2.2, 3 [18] Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. I Tube, You Tube, Everybody Tubes: Analyzing the Worlds Largest User Generated Content Video System. In Proc. IMC, 2007. 4.4 [19] Kuan-Ta Chen, Cheng-Chun Tu, and Wei-Cheng Xiao. OneClick: A Framework for Measuring Network Quality of Experience. In INFOCOMM, 2009. 3.1 [20] Nicola Cranley, Philip Perry, and Liam Murphy. User perception of adapting video quality. International Journal of Human-Computer Studies, 2006. 2.4 [21] Heng Cui and Ernst Biersack. On the Relationshup between QoE and QoE for Web sessions. In Technical Report, EURECOM, 2012. 3.1 [22] Houtao Deng, George Runger, and Eugene Tuv. Bias of importance measures for multivalued attributes and solutions. In ICANN, 2011. 2.2 [23] Florin Dobrian, Vyas Sekar, Asad Awan, Ion Stoica, Dilip Antony Joseph, Aditya Ganjam, Jibin Zhan, and Hui Zhang. Understanding the impact of video quality on user engagement. In Proc. SIGCOMM, 2011. 1, 2, 2.1, 2.4 [24] Jeffrey Erman, Alexandre Gerber, K.K. Ramakrishnan, Subhabrate Sen, and Oliver Spatscheck. Over the top video: The gorilla in cellular networks. In IMC, 2011. 4.4 [25] Jairo Esteban, Steven Benno, Andre Beck, Yang Guo, Volker Hilt, and Ivica Rimac. Interactions Between HTTP Adaptive Streaming and TCP. In Proc. NOSSDAV, 2012. 2 [26] H. Yin et al. Inside the Birds Nest: Measurements of Large-Scale Live VoD from the 2008 Olympics. In Proc. IMC, 2009. 4.4 [27] Alessandro Finamore, Marco Mellia, Maurizio Munafo, Ruben Torres, and Sanjay G. Rao. Youtube everywhere: Impact of device and infrastructure synergies on user experience. In Proc. IMC, 2011. 4.4 [28] H. Yu, D. Zheng, B. Y. Zhao, and W. Zheng. Understanding User Behavior in Large-Scale Video-on-Demand Systems. In Proc. Eurosys, 2006. 4.4 [29] Emir Halepovic, Jeffrey Pang, and Oliver Spatscheck. Can you GET me now? Estimating the rst time-to-rst-byte of HTTP transactions with passive measurements. In IMC, 2012. 3

16

[30] Robert Hsieh and Aruna Seneviratne. A Comparison of Mechanisms for Improving Mobile ISP Handoff Latency for End-to-End TCP. In Mobicom, 2003. 3 [31] Yan Huang, Dah-Ming Chiu Tom Z. J. Fu, John C. S. Lui, and Cheng Huang. Challenges, Design and Analysis of a Large-scale P2P-VoD System. In Proc. SIGCOMM, 2008. 4.4 [32] A Khan, Lingfen Sun, and E Ifeachor. Qoe prediction model and its applications in video quality adaptation over umts networks. In IEEE Transactions on Multimedia, 2012. 1, 2.4 [33] S. Shunmuga Krishnan and Ramesh K. Sitaraman. Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. In IMC, 2012. 1, 2, 2.1, 2.4 [34] Zhenyu Li, Jiali Lin, Marc-Ismael Akodjenou-Jeannin, Gaogang Xie, Mohamed Ali Kaafar, Yun Jin, and Gang Peng. Watching video from everywhere: a study of the pptv mobile vod system. In IMC, 2012. 4.4 [35] Bing Liu, Minqing Hu, and Wynne Hsu. Intuitive Representation of Decision Trees Using General Rules and Exceptions. In Proc. AAAI, 2000. 2.2, 2.2 [36] X. Liu, F. Dobrian, H. Milner, J. Jiang, V. Sekar, I. Stoica, and H. Zhang. A Case for a Coordinated Internet Video Control Plane. In Proc. SIGCOMM, 2012. 2.3 [37] Xi Liu, Florin Dobrian, Henry Milner, Junchen Jiang, Vyas Sekar, Ion Stoica, and Hui Zhang. A case for a coordinated internet video control plane. In SIGCOMM, 2012. 2 [38] V Menkvoski, A Oredope, A Liotta, and A C Sanchez. Optimized online learning for qoe prediction. In BNAIC, 2009. 2.4 [39] Tom Mitchell. Machine Learning. McGraw-Hill. 2.2, 4.1 [40] Ricky K. P. Mok, Edmond W. W. Chan, Xiapu Luo, and Rocky K. C. Chang. Inferring the QoE of HTTP Video Streaming from User-Viewing Activities . In SIGCOMM W-MUST, 2011. 2.4 [41] Tongqing Qiu, Zihui Ge, Seungjoon Lee, Jia Wang, Qi Zhao, and Jun Xu. Modeling channel popularity dynamics in a large IPTV system. In Proc. SIGMETRICS, 2009. 4.4 [42] S. Akhshabi, A. Begen, C. Dovrolis. An Experimental Evaluation of Rate Adaptation Algorithms in Adaptive Streaming over HTTP. In Proc. MMSys, 2011. 2 [43] Mark Watson. Http adaptive streaming in practice. claypool/mmsys-2011/Keynote02.pdf. 2 http://web.cs.wpi.edu/

[44] Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. Detecting large-scale system problems by mining console logs. In Proc. SOSP, 2009. 2.2 [45] Renjie Zhou, Samamon Khemmarat, and Lixin Gao. The impact of YouTube recommendation system on video views. In Proc. IMC, 2010. 4.4 17