Sie sind auf Seite 1von 32

University of Kent

MSc Dissertation
Remote Support Services Using Peer to Peer
Communication Between Browsers (WebRTC)

Author:
Yann Guillon - Ydg2

Project realized with:


Quentin Huet

Supervisors:
Dr. Frank Wang
Dr. Matteo Migliavacca

January 4, 2015

Contents
1 Introduction
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background: WebRTC and the real-time web
2.1 The real-time web . . . . . . . . . . . . . . . .
2.1.1 Overview . . . . . . . . . . . . . . . . .
2.1.2 HTTP-based Techniques . . . . . . . . .
2.1.3 WebSocket . . . . . . . . . . . . . . . .
2.2 WebRTC . . . . . . . . . . . . . . . . . . . . .
2.2.1 Overview . . . . . . . . . . . . . . . . .
2.2.2 Technical aspects . . . . . . . . . . . . .
2.2.3 Browsers support . . . . . . . . . . . . .

3
3
3
3

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

4
4
4
4
6
6
6
8
12

3 Technologies
3.1 Considerations . . . . . . . . . . . . . . . . . . . . . .
3.1.1 A future oriented architecture . . . . . . . . . .
3.1.2 An environment built for real time interactions
3.1.3 Scalability and cloud computing . . . . . . . .
3.2 The development Stack . . . . . . . . . . . . . . . . .
3.2.1 The MEAN Stack as a base . . . . . . . . . . .
3.2.2 Stack Additions . . . . . . . . . . . . . . . . . .
3.3 RTCMultiConnection . . . . . . . . . . . . . . . . . .
3.3.1 Initialization . . . . . . . . . . . . . . . . . . .
3.3.2 Sessions . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Rooms management . . . . . . . . . . . . . . .
3.3.4 Signaling . . . . . . . . . . . . . . . . . . . . .
3.3.5 Media streams gathering . . . . . . . . . . . . .
3.3.6 Error Handling and browser capabilities . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

13
13
13
13
14
14
14
16
17
17
17
18
18
18
18

4 Project architecture
4.1 Overview . . . . . . . . . . . . . . . .
4.2 Application flow and features . . . . .
4.3 Application architecture . . . . . . . .
4.3.1 The client side (1) . . . . . . .
4.3.2 The server side (2) . . . . . . .
4.4 System architecture . . . . . . . . . .
4.4.1 The development environment
4.4.2 The production environment .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

19
19
19
19
20
21
22
22
23

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

5 Project implementation
5.1 Key points . . . . . . . . . . . . . . . . . . .
5.1.1 Signaling server strategies 1 . . . . . .
5.1.2 AngularJS and WebRTC . . . . . . .
5.2 Teamwork strategy : pair programming . . .
5.3 Contribution to the related communities . . .
5.3.1 Contributions to RTCMulticonnection
5.3.2 Contributions to the MEAN Stack . .
5.4 Future work . . . . . . . . . . . . . . . . . . .
5.4.1 Full cloud integration . . . . . . . . .
5.4.2 Screen interactions . . . . . . . . . . .
5.4.3 Files transfer . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

23
23
23
24
25
26
26
26
26
26
26
26

6 Performance and security concerns


6.1 Bandwidth and media quality . . .
6.1.1 Context of the experiment .
6.1.2 Testing environment . . . .
6.1.3 Process followed . . . . . .
6.1.4 Results and analysis . . . .
6.2 Security concerns . . . . . . . . . .
6.2.1 Data security . . . . . . . .
6.2.2 Signaling server concerns .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

27
27
27
27
27
27
29
29
29

7 Conclusion

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

30

8 Annexes
30
8.1 Project resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1 This

section is related to many concepts of RTCMulticonnection, please see section 3.3

1
1.1

Introduction
Context

Since the very early ages of its creation, internet have been mainly used to
communicate. During the late 1970s, the introduction of Bulletin Board Systems
allowed people to send direct messages to each other. A decade later, with the
tremendous expansion of the World Wide Web, blogging platforms and social
networks started to appear. But users wanted more : in the early 2000s, with
appearance of the Web 2.0 and the explosion of social media, video through the
internet was made popular by software like Skype, iChat, or MSN Messenger.
The considerably big amount of data to carry with the transmission of videobased content has always been a problem for software companies. The most
efficient and less broadband consuming solution found by software creators was
to use peer-to-peer direct communication between the software of each participant. Each company created its own proprietary or open-source software to do
so.
In in concern of unification and standardization, the WebRTC project was
created by Google in 2013. Its aim is to provide a set of technologies to allow
flawless Peer-to-Peer communication directly between web browsers without the
need of any external plug-in. It includes audio, video, screen-sharing, and data
transfer. It focuses on the refashioning of the real-time communication world,
targeting businesses and individuals.

1.2

Objectives

The aim of this dissertation is to provide a wide overview of the WebRTC


project. As the WebRTC technology is still under a developing process, only
the currently implemented features will be analysed. This work is based on the
project developed on the side of this dissertation, which is a full featured web
conferencing website using WebRTC. This dissertation will also provide benchmarks and tests to assess the current usability and performances of WebRTC
(see section 8.1).

1.3

Chapter Overview

This dissertation is based on multiple structured parts. The following chapter


will review the literature about the WebRTC standard and the real-time web.
In chapter 3, an overview of the technologies we used in our project, and the
considerations about the technical choices we made will be provided.
Chapter 4 is about the architecture of the project, including the development
environment and the internal structure of the project.
Chapter 5 covers the implementation of the project, including project management, issues encountered and successes during the realization of the project.
Chapter 6 analyses the performance of the WebRTC standards using the
provided implementation.

2
2.1

Background: WebRTC and the real-time web


The real-time web

2.1.1

Overview

The real-time web consists of sending data though the web as soon as it is made
available by its publisher(s). The evolution of the on-line media is the main
reason of the expansion of this technology since the early 1990s: everybody
wants to know everything before everyone.
Ajax
The purpose of Ajax (Asynchronous JavaScript + XML) is to send asynchronous requests to any HTTP server. Ajax is implemented using JavaScript,
a language that is automatically interpreted by every recent browser at the loading of any HTTP page. The JavaScript layer of the website submits the request
in a defined format, and fetches the response from the HTTP server without the
need to reload the page. It can be triggered by the user, via JavaScript events,
such as mouse or keyboard interactions. Ajax is the most famous way to build
HTTP-based real-time interactions. [5]
2.1.2

HTTP-based Techniques

Since its invention, the HTTP protocol has been built on a request-response
structure, and was not designed for real-time interactions. The browser submits
a request to a HTTP server, which submits a response as a result of the request
made. [5]

Figure 1: Classic HTTP interaction [7]


Polling
Polling uses the timer feature of JavaScript to submit Ajax requests periodically. If a change has been made on the server side, the server will respond with
4

the new data, but if no change is made, a response is still given, resulting in an
exceed of resources and bandwidth usage from both the client and the server
[7].

Figure 2: A HTTP real-time technique : Polling [7]


Long Polling
Long polling is a technique where the browser submits a request to the server
which waits for a change on its side to respond to the request. This technique is
much more efficient than classic polling (2.1.2), but still implies resources waste
on the server side : as the connection is hold, any extra requests from the same
clients will require a new connection [7].

Figure 3: Another HTTP real-time technique : Long polling [7]


Streaming
HTTP Streaming is very close to Long Polling. The only difference is that
streaming doesnt close the connection when data is available from the server.
HTTP Streaming is considered as the most effective HTTP real-time technique,
due to its responsiveness, but the data is mono-directional: only the server can
send messages to the client [5].

Issues with the HTTP-based real-time web


Each technique has its own drawbacks, which can be grouped in sub-categories.
Resources and bandwidth problems are found in every techniques.
This is a consequence of the alteration of the HTTP model. On a server
point of view, either many requests are made (i.e. polling, see 2.1.2), or
connections have to be held by the server (i.e. long polling, see 2.1.2 and
streaming, see 2.1.2), which can pose a scalability problem for applications
that have to handle a very large amount of users, as the server charge for
a single user can be multiplied depending on the technique used.
Timing Delays are also to consider, as no direct connection is established
between the clients and the server. Delays are to consider each time that
the browser needs to request the server for new information [5].
2.1.3

WebSocket

WebSocket is a protocol that allows to get a bidirectional real-time communication from a single connection, between a web browser and a server. WebSocket is directly integrated to every recent web browsers. WebSocket is a
standard within the HTML5 project, managed by the World Wide Web Consortium (W3C) authority. As the HTTP-based techniques, the JavaScript layer
loaded in the web browsers handle the connection and communication with the
server. On the server side, a custom server is to be used. It needs to implement
the WebSocket RFC, and the Berkeley Sockets protocol [7]. WebSocket solves
all the problems raised by HTTP solutions for real-time interactions (2.1.2).

Figure 4: A simplified WebSocket communication [5]

2.2
2.2.1

WebRTC
Overview

WebRTC is a project that aims to regroup a set of standards, in order to provide real-time web communication without the need of any external plug-in or
6

software. WebRTC is designed to be directly included into software by their


developers. Web browsers are directly focused by the WebRTC project, as they
are the main and most common way to access the internet, and are present on
every internet-connected devices, from phones to computers.
All the media communications in WebRTC are peer-to-peer: each device
establishes a direct connection with every device it is communicating with. In
the case of multiple participants, a mesh network 1 is automatically created, in
order to optimize the bandwidth usage, and provide the best quality as possible.
The WebRTC standards are not only about web browsers : they also offer the opportunity to communicate with any kind of software supporting the
WebRTC standards. Therefore, it can extended to telephony (SIP) or proprietary communication software, in the business sector in example. This makes
WebRTC adaptive and expendable.
We will mainly focus on web browsers, as they are the key to the wide establishment of WebRTC. WebRTC in browsers can be seen as a built-in feature.
The WebRTC functionalities can be accessed through a JavaScript API, which
is defined by a specific part of the WebRTC standards [6]. This API provides a
link to each user media resource, and is accessible through the JavaScript engine
of browsers. HTML5 is used to display each media, using the video and audio
standards it features.
Historic and contributors
Google first acquiered On2 in 2010, a video codec company that has developed the VP series of codecs (vp8), made for real time media, and which were
not proprietary. Google went on and acquired Global IP Solutions (GIPS) the
same year, a company that was providing a proprietary implementation of peerto-peer connection between browsers. Google wanted to make this technology
available freely and openly to everybody: the WebRTC project was born. This
project became quickly supported by two working groups : by the World Wide
Web Consortium (W3C) for the establishment of the standards, and by the
Internet Engineering Task Force (IETF) for the development of the protocols,
known as the RTCWEB project.[2]
Early adopters use cases
The two largest internet companies by revenue in 2014 already own software running on WebRTC. The first example is Google, with its communication
platform Google Hangouts, that uses a proprietary plug-in running WebRTC,
to handle the audio, video and screen-sharing during conversations. Amazon
also integrates WebRTC to its customer service, called Mayday, that features
the standard WebRTC audio and video, but also remote help via screen-sharing
and control [9].
1 A Mesh Network is a topology of network where all the hosts are connected without a
central hierarchy, by building a mesh structure.

2.2.2

Technical aspects

Overview of a standard WebRTC application


The simplified architecture of a basic WebRTC application can be split in
two distinct parts.
The first noticeable part is the client-side. It is composed of the web browser,
with its WebRTC bindings and its JavaScript layer. This last one is commonly
added by developers depending on the behaviour of their application, to add
script-able actions executed from the browser of users. In a WebRTC application, this layer has two extra roles. It requires to, of course, include the calls
to the WebRTC API, provided by the browsers, mixed with the business logic
of the client-side behaviour of the program. Its also the role of this JavaScript
layer to make sure that it communicates with the second mandatory part when
implementing a WebRTC application: the signaling server.
It is used to make a link between the different clients connected to the main
server. Signals are data messages sent through the network containing various
control information, that will be detailed in section 2.2.2.
The connection between the signaling server and the JavaScript layer can
be made using any kind of real-time web protocol (see section 2.1). The reason
why a classic HTTP request is not suitable, is that the information to receive
about other participants, is not predictable [6].

Figure 5: Real-Time scheme of WebRTC [4]


Sessions management
Each time participants want to create a conference, a WebRTC session is
created. All the data to be transferred will be exchanged within the scope of
this session.
The current WebRTC communication model allows two types of configuration. In the triangular situation, only a single web server instance is used for all

the participants. The second configuration, known as trapezoidal, allows multiple signaling server instances to handle different clients, using messages known as
jingles to share the session information between the signaling servers. Those
two architecture are very similar, and in a matter of simplicity, we will only
focus on the triangular architecture [4].
Establishment
The establishment of a WebRTC session is done in five steps.
Like any other Website, the first step is to fetch the public files from
the server. It includes the JavaScript layer mentioned in section 2.2.1,
alongside the HTML and CSS code. The JavaScript is loaded into the
browser, and the connection between the JavaScript and the signaling
server is instantiated.
Once the client is connected to the signaling server, messages following
the Session Description Protocol (SDP) are exchanged with the server to
define the set of technologies to use during the session (2.2.2).
The initiation of the session using the signaling server is then finished.
The browser itself now needs to establish a peer-to-peer link with the
other browsers involved. To do so, the ICE Hole punching technique is
used. Described in section 2.2.2, this technique uses a distant server to
instantiate a bidirectional access to each client, regardless of the topology
of their network.
The next step is a handshake about the secure pass-phrases to transfer, if
an eventual secured transfer (using SRDP, see 2.2.2) occurs.
Finally, the media session is opened, using either RDP or SRDP, and the
communication can start [4].

Figure 6: Session establishment of a triangular WebRTC communication [4]


Ice servers and hole punching
With the massive use of Network Address Translators (NAT) and proxies by
internet users, the peer-to-peer connection of users without any ports opening
on their routers is difficult, as users tend to be hidden in either a local subnetwork or the restriction imposed by firewalls. In order to solve this problem,
a technique called hole punching have been created. Following the Interactive
Connectivity Establishment(ICE) protocol, this technique consists of multiple
steps. The first one is to get as much information as possible on the transport
of packets from the browser to the internet. This means get as much as IP
addresses on the transit of a packet. To do so, a request is made to a third
party server, known as a Session Traversal Utilities for NAT (STUN) server.
It is used to get the public IP address of the caller. The local IP address of
a client can be fetched by using local network utilities on the machine. The
IPs addresses are then sent through the signaling server, in a process defined as
Exchange Candidates. [1]. Each ICE agent (integrated in browsers) then effectuates connectivity checks with the provided IPs, and selects the most suitable
for the transfer.
Traversal Using Relays around NAT (TURN) servers can also be used, in
cases of unreachable peers. TURN servers act as relays, by assigning a relay IP
address publicly accessible to each party, that re-routes the data to the creator
of the said IP address. In case of an ICE communication, the IP address is
10

added to the list of IP candidates [1].


Signaling
The purpose of the signaling server is too handle messages called signals, used
to coordinate and handle control messages during a WebRTC communication
[2]. The way that signaling is implemented is up to the developer, as it is
not standardized. In every web standards, only the elements that need to be
standardized are [4]. Signaling stays mandatory, but the way to implement it
does not need a specific structure in order to get a WebRTC implementation to
function.
The signaling server handles four principal tasks :
The negotiation of media and session settings regroups the session
establishment calls. Using the SDP protocol, each party will share media
formats and their codecs, and information about the bandwidth and IP
addresses to be used during the Ice Hole Punching phase (2.2.2) [4].
The credentials management of users is also handled by the signaling
server. It needs to identify each party, and remember them, using storage
methods in either memory or a database system.
The management of the current session regroups all the user actions
that can be done during a session. This can be the addition of new participants, new media shared or added, hangouts and drops of users during
the session, and the session termination.
The Glare Resolution allows to remove the risk of simultaneous connections to a server. It establishes a master/slave relation between two
nodes, using timeslots, and therefore avoid conflicts [4].
Main Protocols
WebRTC is an aggregation of either old, heavily used protocols though the
time, and very recent and effective protocols [4].
Video codec
The video is handled by the VP8 Codec, in a WebM video container. Acquired by Google, it used to be proprietary but Google released its source code
in 2010. It is known to be less efficient on nearly every domain than its widely
used concurrent, x.264, but this last one is proprietary, and therefore could not
be included to the WebRTC project in a matter of source code transparency
and openness. [2]
Media Transport

11

The negotiation of technologies to used between participants is done using


the Session Description Protocol. Established in 1999, this protocol describes
the media used during a session. The first party shares the media it supports,
and the other party chooses the media depending on those its own supporting
list. [3]. This protocol have been widely used in other protocols, such as VOIP
or SIP.
After the negotiation, the transport of media stream follows the Real-Time
Transport Protocol (RTC) standard. It handles the transport of real-time data
(i.e. media streams) using the Internet protocol (IP). It defines a packet format
to send those data. There is no verification that the data arrived to the proper
sender, but as media streams are transferred, data loss is negligible [8]. A
secured version of this protocol, SRTP, can also be used to add an SSL layer for
the transfer of media [4].
2.2.3

Browsers support

WebRTC is not yet supported by every popular web browsers available today.
In fact, we can note huge gaps between the features supported by each browser.
They can be classified in three groups.
Good participation: Google and Mozilla, respectively with their browsers
Chrome and Firefox, are doing great efforts to include each WebRTC specification to their browsers, and that since the beginning of the WebRTC project.
Both browsers almost include the whole specification, with a slight advantage
for Google Chrome, that can be understood by the fact that WebRTC is carried
by Google since the appearing of this technology.
Medium participation: In March 2014, Opera Software released a version
of its browser including the most basic functionalities of WebRTC, which shows
an interest in this technology. As it is recent, just simple audio and video calls
are possible.
No participation yet: Apple and Microsoft did not show any interest in the
WebRTC project yet. Microsoft is envisaging the integration of Object Realtime Communications (ORTC) to Internet Explorer, a WebRTC variant, which
is an object oriented protocol, where all the data transfer is automated and sent
using peer-to-peer communication only.

12

Figure 7: WebRTC support at mid August 2014


Source: iswebrtcreadyyet.com

3
3.1
3.1.1

Technologies
Considerations
A future oriented architecture

WebRTC is still at its early development stages, and will surely evolve with
technologies that will become popular in an early future. In a matter of accuracy,
we focused on the most recent technologies currently available that are becoming
popular. We also took a look at the key projects already developed for WebRTC,
to decide what technologies will be the most adequate to our implementation.
The guidelines of the WebRTC and RTCWEB projects have also been taken in
count.
3.1.2

An environment built for real time interactions

In order to function flawlessly and quickly, WebRTC signaling strategies require


the most efficient and responsive web techniques available today. Our concern
was to have an environment that is really built for live interactions, without the
need to add external plug-ins or layers, in order to offer easy maintainability
and ease of use.

13

3.1.3

Scalability and cloud computing

PaaS1 could services are becoming a standard in order to maintain performance


and scalability. It preserves the whole web application concept : web sites
are no longer seen as an imbrication of scripts, but as an application as a whole.
Still from a future oriented point of view, its a real must have for current web
applications. Most PaaS solutions also offers metrics and analysis of the performance of the application, useful for further experiments.

3.2
3.2.1

The development Stack


The MEAN Stack as a base

Presentation
The MEAN stack is an open-source development stack, featuring the very
latest web technologies (MongoDB as a database, Express as back-end web
framework, AngularJS as a front-end web framework and NodeJS as a back-end
technology). This stack provides very useful features for developers. First, it
wraps all the technologies cited before in a coherent, ready to use environment,
as every component is linked to the others. This results in a considerable gain
of time, as developers do not need to build their own stack, which can be a very
long task if the technologies are not perfectly mastered by them. Second, it offers
powerful Command Line Interface (CLI) tools for monitoring and debugging,
such as the uptime of each component. Third, it includes many productivitybased tools, such as the live reload of both the server and the web page if any
changes have been detected in the code.
Two different but very similar implementations of the MEAN stack are
available : the MEAN.io version, which is owned by Linovate Corp. and
MEAN.js, maintained by the original creator of the MEAN stack.
The MEAN.io stack was more adapted to this project, as it is more popular and structured, and provides a package manager. The fact that a whole
company is behind the project also ensures more evolutions and quicker communication on eventual bugs or requests.
The MEAN stack has two drawbacks for the implementation of our project.
First, the learning curve is significant. The fact that every components of the
library are integrated to each other, and the difficulty to understand the whole
architecture without a good knowledge of each element, makes it very hard to
master it at the early stages of the development. Second, it doesnt provide a
web-socket architecture by default - which is essential for our signaling server but it can be easily added to the express framework.
NodeJS and Express
1 Platform

as a service : provides a stack and a platform as a service

14

Node.js is a full-JavaScript server-side development platform to build web


applications. Its main advantage and the reason of its creation is its speed.
As JavaScript is the only universal language for web scripting on the client
side of websites, the competition between the modern web browsers (mainly
Google Chrome, Mozilla Firefox and Microsoft Internet Explorer) have been
very intense, resulting in more and more powerful virtual machines to load web
pages faster. Node.js uses the fastest virtual machine currently available which
belongs to Google Chrome, and uses it to execute server-side code. But its
not only about the speed : Node.js also provides lightweight applications, as
the structure of Node.js is minimalistic. Therefore it improves scalability for
demanding applications.
Node.js features a famous package manager, Node Package Manager (NPM),
which allows developers to easily publish their library. The community can then
include those extensions easily within their project.
Express.js, as an Application framework, adds a complete structure to Node.js
to build web applications quickly and easily. To follow the Node.js guidelines on
lightness and performance, its structure is very minimalistic. It follows the classic Model-View-Controller design pattern (MVC). Express.js is combined with
a Node.js entity manager, mongoose, to provide an abstraction to the persistent
data to be stored (also known as model in the MVC architecture). It also features an authentication utility, passport, which handles user management, and
a built-in social media third party login with multiple famous authentication
providers included, such as Facebook, Twitter, and Google.

Figure 8: The MVC Architecture, as used by ExpressJS


Source: yalantis.com
AngularJS
AngularJS is a front-end Javascript framework owned by Google. Its main
feature is to extend the HTML basic syntax allowing a cleaner and more main-

15

tainable code. Its regroups the following features : dynamic data binding directly inside the HTML, the possibility to repeat DOM elements, helpers to
build HTML forms and finally, setting up HTML modules to be reused. Its
structure is based on three main components:
The view can be seen as the structure of what is going to be displayed
on the web page. It regroups HTML and CSS, and uses data bindings to
display the dynamic data.
The controller uses the Angular API to fetch the dynamic data from
the server, and then makes it available to the view.
The model allows to structure the data that is going to be exchanged on
the web page, either from the server through the controller, or from user
input. It also features validation helpers, in example if a form needs to be
filled on the page.

Figure 9: The basic architecture of AngularJS


MongoDB
MongoDB is a Database Management System (DBMS) following the NoSQL
guidelines. The purpose of NoSQL database systems is to avoid the tablesystem scheme of the classic SQL systems, and directly store data as objects
with dynamic schemes. The structure of MongoDB is designed to handle big
volumes of data, and to be integrated in scalable environments. For this purpose,
it features : automated load-balancing using an horizontal scaling, an advanced
indexation system and a file-storage that splits elements in parts, that can then
be stored seamlessly within different MongoDB instances.
3.2.2

Stack Additions

Socket.io and WebSockets

16

The MEAN Stack does not provide tools for real-time interactions by default. We chose Socket.io, a JavaScript library to establish real-time eventdriven communications using either HTTP techniques (2.1.2) or WebSockets
(2.1.3). Socket.io is made of two distinct libraries, for the client and the server.
Its main strength is its ease of use, as minimal code needs to be written for each
side. Its event-driven architecture is also simplistic yet powerful. Events are
identified by a name, and contains a message. A comportment just have to be
defined by each party in case of the reception of each event.

3.3

RTCMultiConnection

RTCMulticonnection is a JavaScript library to create WebRTC applications


with multiple participants. It allows people to share video, sound, screen and
data though WebRTC channels. Unlike the other libraries available, it only
wraps the WebRTC calls at a slightly low level, allowing developers to have
more flexibility and create complex WebRTC experiments. Its creator, Muaz
Khan, is very reactive to questions and bugs solving.
3.3.1

Initialization

The initialization of the RTCMulticonnection library takes place in 4 simple


steps:
Initialization the Session Description Protocol (SDP) and its constraints
Initialization the ICE servers, which can be configured using the iceServers() method
Setting of the bandwidth options
Initializing and test the WebSocket communication
3.3.2

Sessions

The RTCMultiConnection library allows to create local media-sessions for each


user. Those contain information about each media shared by the user. A simple
call is required to define the media.
1
2
3
4
5
6

connection . session ({
audio : true , // audio only connection
video : true , // audio + video connection ( disables audio only )
screen : true , // screensharing
data : true // direct data transfer , i . e files
}) ;

When the session is initialized locally, the initialization steps of WebRTC


are automatically made, and the defined streams (audio, video, screen-sharing)
are dynamically added or removed using the WebRTC API calls :
RTCPeerConnection.addStream()
RTCPeerConnection.removeStream()

17

3.3.3

Rooms management

3.3.4

Signaling

The RTCMulticonnection library has its own signaling server strategy. It creates
multiple channels (identified by a unique hash) for each data stream. The default implementation of signaling uses the Web-socket JavaScript library. However, the whole signaling method can be changed by overriding the openSignalingChannel() method. Therefore, any kind of real time web service that handles
the Pub/Sub mechanism can be used to handle signaling.
1
2
3
4

connection . o p e n S i g n a l i n g C h a n n e l = f u n c t i o n ( config ) {
var channel = config . channel || def aultChan nel ;
// connection to the main channel , expressing the need
to join a sub - channel have to be done here

5
6
7
8

var socket ;
// The sub channel has to be opened here , using the
channel variable as an identifier , and needs to be
stored locally ( in the socket varaible in this
example )

9
10
11

socket . send = f u n c t i o n ( message ) {


// socket logic to send and recieve messages has to be
implemented here
};

12
13

};

The openSignalingChannel() function is called every time that RTCMulticonnection needs to open a channel. The config variable passed to the function
contains the channel to open. The first step is to connect to the main channel,
to tell the server that a sub channel connection is required. Then connect to
the sub channel and finally override the send method on the socket, as send in
bind by RTCMulticonnection.
3.3.5

Media streams gathering

RTCMulticonnection wraps the different streams in a single method onStream()


when a stream is received. It contains the stream itself, and various information
about originator and the type of stream (audio, video, screen-share or data).
It also contains the type of stream (local or remote). This stream can be bind
to an HTML5 video element using the createObjectURL() JavaScript system
method.
3.3.6

Error Handling and browser capabilities

Many methods are available to simplify the complex error handling of WebRTC.
All the following methods can be overridden to display errors to fit any user
interface.

18

Errors
The onError() method handles RTCMulticonnection errors.
The onMediaError() method handles the general media errors of WebRTC
media connections, such as stream errors and connection-related issues.
Browser Capabilities
The DetectRTC object detects if media devices, such as the web camera,
microphone, or eventual external plug-ins are present.
The connection.UA object contains various information about the browser.

4
4.1

Project architecture
Overview

The purpose of the project we implemented, called OpenHangouts, is to provide


a working demonstration of the main features of WebRTC. It uses new and
upcoming web technologies, that are more likely to be used when the WebRTC
project will be fully completed and used by the community.

4.2

Application flow and features

OpenHangouts is a simple and easy to use conferencing website. After an authentication step, using local or social media authentication, users are able to
join conferencing rooms. The user that wants to create the room simply copies
the room identifier given when the room is opened, and gives it to the other
participants. People who joined the room can then communicate using audio
and video.
A role of presenter also exists : the presenter can share what is displayed on
its screen (or a part of it) with all the other participants in the conference. This
role is first assigned to the creator of the session, and the current presenter can
then send this role with any other participant.

4.3

Application architecture

The purpose of this section is to explain how the different technologies listed in
section 3 interact with each other in the scope of the project, and to highlight
the main developing parts that we made.

19

Figure 10: Application Architecture of the OpenHangouts project


4.3.1

The client side (1)

AngularJS
AngularJS (see 3.2.1), as a MVC font-end framework handles the whole
client-side display and interactions of the client-side of the project. It simply
uses a URL routing system, to display the content dynamically depending on the
URL requested by the browser. If an URL matches a defined route, a targeted
controller will be called. It will generate a view with dynamic content depending
on the data sent by the server via an API.
WebRTC as an AngularJS service
Angular JS (3.2.1) is modular : non specific modules can be added as services, that are query-able by the controllers for data or actions. Bindings of
the RTCMulticonnection library are integrated to a service. It regroups all the
actions that can be executed on the WebRTCMulticonnection library, and a
callback system that notifies controllers if a change is made directly within the
service, in example the connection of a new participant.

20

4.3.2

The server side (2)

The server side is composed of two distinct servers. First, Express.js handles
the HTTP requests made by the browser, following the MVC architecture on
the server side. Its controllers interact with Mongoose, using two main data
models (User and Channel) using classic CRUD actions. Second, the socket.io
server handles the role of signaling server. It includes sub-channels, known
as name-spaces, to handle its role. It also uses the channel controller from
Express.js to access the Channel model.
The signaling management is split in two different groups of name-spaces.
Socket.io offers the opportunity to create namespaces easily, which have their
own logic, and can be seen as rooms. It can be easily implemented using a
simple URL parameter.
From a server point of view :
1
2
3
4
5

io . of ( DEFAULT_URL + channel ) // channel is defined in the


calling URL , of defines the want to use a namespace
. on ( connection , f u n c t i o n ( socket ) {
// The Signaling logic for this specific namespace can be
defined here
}

From a client point of view :


1
2
3

n a m e s p a c e _ s o c ke t = io . connect ( DEFAULT_URL + channel ) ; // channel


repesents the namespace to join .
// actions can be bound on the n a m e sp a c e _ s o c k e t variable

When a client connects to the signaling server, it first connects to the main
socket.io server, and submits multiple requests to access different namespaces.
The first name-spaces to join concerns classical WebRTC signaling actions,
they are created dynamically by the RTCMulticonnection library.
For each session, one other name-space is joined : created by us, this
namespace is used to manage extra actions, in our case the presenter
management.

21

4.4

System architecture

Figure 11: System Architecture of the OpenHangouts project


The system architecture can be split in 2 different parts.
4.4.1

The development environment

The development environment is minimalistic. A NodeJS instance is run locally,


with a local MongoDB database connected to it. For testing purposes, multiple
browsers are run locally. A GitHub server is also present. Its main purpose
is to handle version updates of the source code using a git server. GitHub is
also a way to expose the source code to the community, which can review bugs,
propose ameliorations, or modify the code from any point of the project. The
mean stack provide debugging tools and useful plug-ins such as code syntax
validators (JSLint, CSSLint).

22

4.4.2

The production environment

The production environment is located on Modulus, a PaaS cloud platform


made for NodeJS hosting. It feature multiple interesting features
Easy code deployment using a single command on a command line
interface. The code is compressed and then fully deployed on a cloud
instance.
On demand scaling is also available. Computing instances can easily
be added using the Web interface provided.
Metrics and statistics about the running project are available from the
web interface.
A MongoDB shared server is also run on the modulus cloud. It is shared by
all instances and also offer scaling opportunities.

Project implementation

5.1

Key points

5.1.1

Signaling server strategies

Problem
We wanted to offer the possibility to build custom features on top of the
signaling server basic capacities. As seen in section 2.2.2, the RTCMulticonnection library uses the openSignalingChannel() method each time that a user
needs to connect to a specific signaling channel, to get or send control messages.
The openSignalingChannel() method uses a channel stored in the configuration of RTCMulticonnection, and no control is given over the channels that are
opened, as RTCMulticonnection uses this method for its own signaling strategy.
Resolution
We went through multiple tests before finding the most appropriate solution
to this problem.
First, we tried to add the extra methods directly in the sockets opened by
the openSignalingChannel method, but it resulted in the extra methods
being called multiple times on the clients. As multiple channels are created
by RTCMulticonnection through the same function, the socket function
was bind for each channel opened.
We tried setting the config.channel variable in the connection of WebRTC.
The problem is this channel variable can be rewritten an any time by
RTCMulticonnection, blocking every possibility of overriding it ourselves.
To finally solve this problem, we created a method that is analogue to
openSignalingChannel(), but where we have full control of the behaviour
of the signal sent through it, and that is called at the creation of a room.
1 This

section is related to many concepts of RTCMulticonnection, please see section 3.3

23

var o p e n C u s t o m A c t i o n s C h a n n e l = f u n c t i o n ( channel , connection


)
{
io . connect ( S I G N A L I N G _ S E R V E R ) . emit ( new - custom - channel ,
{ // a request for a custom channel is sent here
channel : channel ,
sender : Global . user . _id
}) ;

2
3
4
5
6
7
8

self . channels [ channel ] = channel ; // channels are


stored locally for deletion purposes

9
10

self . mysock = io . connect ( S I G N A L IN G _ S E R V E R + channel , {


custom : true }) ; // the custom socket is opened , on
the right sub channel

11
12

// all the custom signaling actions can be made here ,


here is the example of setting a new presenter
through the signaling server
self . mysock . setPresenter = f u n c t i o n ( id ) {
self . mysock . emit ( setPresenter , {
id : id
}) ;
};

13
14
15
16
17
18
19

// ...

5.1.2

AngularJS and WebRTC

Problem
We wanted to get the data on the front-side of our application to be automatically updated each time that a modification is made in our angularJS
service. AngularJS controllers use the $scope variable to store and update the
variables in real time. The problem is that the queries to our AngularJS service
are only single-sided: if any variable changes within the service, the controller
couldnt know that any modification have been done, as the controller was only
active when the user was active.
Resolution
To solve this problem, we used the observer design pattern of JavaScript.
The AngularJS controller simply says that he wants to observe the modifications
through a registerObserverCallback() method. Each time that a modification
within the service itself is made, a notifyObserver() method is triggered and the
controller is notified of the changes.
1
2

// In the service , a list of observers is created . a function


N ot if yO bs e rv er s is also created , to apply a callback function
to each observer .
var o b s e r v e r C a l l b a c k s = [];

24

4
5
6
7
8
9
10
11
12
13
14
15
16
17

18
19
20
21
22
23
24

25
26
27

var n ot if yO b se rv er s = f u n c t i o n () {
angular . forEach ( observerCallbacks , f u n c t i o n ( callback ) {
callback () ;
}) ;
};
// The r e g i s t e r O b s e r v e r C a l l b a c k is exposed to the caller
return {
r e g i s t e r O b s e r v e r C a l l b a c k : f u n c t i o n ( callback ) {
o b s e r v e rC a l l b a c k s . push ( callback ) ;
},
};
// In the controller , the registering method can be called , and the
data , in that case the stream of the screen currently shared ,
is updated automatically .
WebRTC . r e g i s t e r O b s e r v e r C a l l b a c k ( f u n c t i o n () {
$scope . screen = WebRTC . getScreen () ;
$scope . $apply () ;
}) ;
// The $scope variable is bind by to the elements in the view ,
allowing elements to update automatically
// The <div > element containing the screen stream only appears if
the $scope . stream is set ( ng - show directive ) , and uses the
$scope . screen . stream as a source .
< div class = " screen - container container " >
< video ng - show = " screen " ng - src = " {{ screen . stream }} "
autoplay > </ video >
</ div >

5.2

Teamwork strategy : pair programming

My co-worker and I tried to figure out what was the most effective way to
produce the best quality software possible in the time given for the realization
of this project. We came to the conclusion that pair programming was the
most suitable for us. We can talk about pair programming when two developers
work on the same feature of a project behind a single screen. Our choice was
motivated by the following reasons:
Our background knowledge in the technologies required at the beginning
of the project was small.
We were sure to gain skills at the same time and always have the same
level of knowledge of the project.
It also helped producing a better quality code, as one can see a better way
to do a certain task.
The complexity of various parts of the project really required two developers working on them at the same time, especially the integration of the
AngularJS WebRTC service (5.1.2), and the signaling server (5.1.1).
At the beginning and the end of the project, we took the liberty to do small
and non blocking tasks separately, such as small features to improve the usability
25

of the project, the design and its integration, and finally the error handling.

5.3
5.3.1

Contribution to the related communities


Contributions to RTCMulticonnection

We had the opportunity of discovering a major issue in the RTCMulticonnection


library. When the sharing of a media is stopped, using the built-in button in
Google Chrome, the stream continued to flow and if the user decided to open
this stream again, there was an SDP (see 2.2.2) error saying that the connection
was called in a wrong state. After multiple e-mail exchanges with Muaz Khan,
the creator of the library, we fixed the bug together in the last version of its
library (2.4).
OpenHangouts will also be released soon on the RTCMulticonnection website in the demonstration category.
5.3.2

Contributions to the MEAN Stack

The mean stack offers the possibility to create code packages to be installed
directly in any MEAN code structure, using the built in package manager or
Node Package Manager (see 3.2.1). We decided to release the code created as
one of those packages, allowing to include our whole conferencing website to
any kind of application with a minimalistic code integration to do.

5.4
5.4.1

Future work
Full cloud integration

Only one step is missing to manage a full cloud integration and scaling in our
application. Horizontal scaling balances the load on different exact same instances of the application, using a shared database. As those two first points
are already included in our project, the only thing to add is to be able to share
WebSocket server connection between multiple instance, using, in example a
Redis1 shared database.
5.4.2

Screen interactions

Many interactions could be added to the screen sharing features. It could be


drawings on screen that are shared via a WebRTC data connection, or a pointand-click-system allowing participants to have more interactions with the presenter.
5.4.3

Files transfer

Peer-to-peer files transfer could also be included easily, as a raw data connection
is present by default in WebRTC. This feature is quite basic and do not consider
a real breakthrough unlike screen sharing and video sharing.
1 Redis

is a key-value data store mainly used for scalability purposes

26

Performance and security concerns

6.1
6.1.1

Bandwidth and media quality


Context of the experiment

Internet has to be seen as a public resource : the quality of the internet connection of users is subject to many factors, such as the coverage of their country,
the type of connection used (Modem, ADSL, optical fibre, 3G, etc.), and the
various parameters within their area (quality of the line, distance of the internet relay). We decided to run an experiment to test the quality of WebRTC
connections by limiting the bandwidth allowed to it, and compare it with the
performance and the usability of WebRTC services.
6.1.2

Testing environment

The computer used for the test was a Lenovo Thinkpad W520 running
two distinct Google Chrome instances.
The internet connection used was a fast optical fibre connection, with a
bandwidth highly superior of the bandwidth needs of each test.
The OpenHangouts project was run distantly at the following url:
https://openhangouts.uni.me
6.1.3

Process followed

The bandwidth have been limited using a call to the RTCMulticonnecton library
to manually restrict the bandwidth allocated for each type of stream.
For each test, we reached the point where the media would not be displayed.
We then increased the bandwidth until the usability changed slightly.
To benchmark the video displayed, we modified a benchmarking code1 using
Google Chrome system functions, allowing to get an accurate frame-rate and
resolution of the video to analyse.
6.1.4

Results and analysis

Video and sound sharing


1 http://webrtchacks.com/mirror-framerate/

27

Figure 12: Results of bandwidth tests on the video and audio streams
WebRTC allows to have a decent conversation quality even at very low upload rates, but as seen on almost every result, a high activity on the web camera
conducts the frames per second to drop to a very low value.
An automatic resizing of the video is also to note, even if it is minor. The
WebRTC algorithms favour high resolution over the frames per seconds available, as the resolution of the video could have been reduced a lot more, implying
a gain of frames per second.
Finally, we can see that bandwidths around 300kb/s ensures a very good
quality of conversation, even with high activity on the web camera.
Screen sharing

28

Figure 13: Results of bandwidth tests on the screen sharing stream


When it comes to screen sharing, the WebRTC algorithms do not offer to
lower the resolution to possibly get a higher frames per seconds count. At every
test realized, a high activity on the screen resulted in a dramatic drop of frames
per second, as the screen to be shared was using a high resolution display. The
only notable element is the increase of the frames per second depending on the
bandwidth when no big changes were made on the screen (in example mouse
moves).

6.2
6.2.1

Security concerns
Data security

The first concern is about data and media security. Most of the already implemented WebRTC applications mainly use unprotected data transfer using
RTP, in a matter of speed or simplicity, that can result in the interception of
those media. It is also a problem of transparency between the user and the
actions realized by WebRTC. Once the media sharing have been accepted by
a user, it is remembered by the browser and malicious record of data can be
operated. A client may also be infected by a virus or malicious software that
can load on top of WebRTC and fetch the shared media from all the parties in
the conference. To counter those problems, the use of secure protocols for the
transmission of the data itself can be set-up by the developers, by encrypting
the streams themselves by both parties.
6.2.2

Signaling server concerns

The signaling server implementation is also a key point when it comes to security. As this part is not standardized, and is important because it deals with
29

reasonably sensible data, such as IP addresses, sessions, and various information


about the browsers of each participant. The use of SSL communications for the
real-time interactions can add a layer of security, but the developers still need
to be careful and be fully aware of the data that is manipulated.

Conclusion

The implementation of this project have been very profitable for us. First
it allowed us to learn a lot about various cutting-edge technologies alongside
WebRTC. We also went through a complete process of research and implementation, to end with a finished product usable on-line. Working on very recent
concepts is far from easy. Our teamwork strategies helped us to finish the
project in time despite the complexity of some parts of the implementation.
This project will hopefully be reused by the WebRTC community, especially by
developers wanting to either include WebRTC features to the MEAN stack, or
to an AngluarJS project.
WebRTC on its own is a very promising technology that considers a major
breakthrough in the world of web conferencing, by its simplicity of use and its
support by the most trusted authorities on the web. The performances shown
by WebRTC are good but can be improved on large conferences, with, in example, the possibility to use relay servers to manage the data transfer for users
with a limited bandwidth. The possible success of WebRTC is based first on the
want of the major web browsers owners to implement it. Second, the actors of
the web scene can also decide if WebRTC can become the only real time communication standard on the web, by using it in popular projects. The support
of ORTC by Internet Explorer shows that the real time communication on the
web may be divided in two sides, or WebRTC may evolve in a different way.

Word count excluding figures : 7395

8
8.1

Annexes
Project resources

A fully working demonstration of the project is available online.


https://openhangouts.uni.me
The project have also been submitted to the webRTC-experiment project,
and will soon be available in the list of the public demonstrations.
https://www.webrtc-experiment.com/RTCMultiConnection/

30

Developers can also get full access to the code using our public repository.
It can be used to analyse the structure of the project, modify it or report bugs
or wanted ameliorations.
http://github.com/overlox/openhangouts
A brief documentation for installation and use can also be found at :
http://github.com/overlox/openhangouts/wiki

References
[1] D. K. Bryan Ford, Pyda Srisuresh, Peer-to-peer communication
across network address translators.
http://www.brynosaurus.com/pub/net/p2pnat/, feb 2005.
[2] S. Dutton, Getting started with webrtc.
http://www.html5rocks.com/en/tutorials/webrtc/basics/, jul 2012.
[3] M. Handley and V. Jacobson, RFC 2327: SDP: Session description
protocol, Apr. 1998. Status: PROPOSED STANDARD.
[4] A. B. Johnston, B. D., and D. C., WebRTC: APIs and RTCWEB
Protocols of the HTML5 Real-Time Web, Digital Codex LLC, USA, 2012.
[5] J. Lengstorf, Realtime web apps HTML5 WebSocket, Pusher, and the
webs next big thing, Apress Computer Bookshops distributor, Berkeley,
Calif. Birmingham, 2013.
[6] A. Narayanan, C. Jennings, B. A., and B. D., WebRTC 1.0:
Real-time communication between browsers, W3C working draft, W3C,
Sept. 2013. http://www.w3.org/TR/2013/WD-webrtc-20130910/.
[7] R. Rai, Socket.io Real-time Web Application Development, Packt Pub,
Birmingham, 2013.
[8] H. Schulzrinne, S. L. Casner, R. Frederick, and V. Jacobson,
RTP: A transport protocol for real-time applications. IETF Request for
Comments: RFC 3550, jul 2003.
[9] L.-L. Tsahi, Seven reasons for webrtc server-side media processing.
http://networkfuel.dialogic.com/webrtc-whitepaper, apr 2014.

31

Das könnte Ihnen auch gefallen