Sie sind auf Seite 1von 22

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO.

1, FIRST QUARTER 2018 333

Distributed SDN Control: Survey,


Taxonomy, and Challenges
Fetia Bannour, Sami Souihi, and Abdelhamid Mellouk

Abstract—As opposed to the decentralized control logic under- the recent emergence of the Internet of Things (IoT) has
pinning the devising of the Internet as a complex bundle of allowed for the creation of new advanced services with more
box-centric protocols and vertically integrated solutions, the stringent communication requirements in order to support its
software-defined networking (SDN) paradigm advocates the sep-
aration of the control logic from hardware and its centralization innovative use cases. In particular, e-health is a typical IoT
in software-based controllers. These key tenets offer new oppor- use case where the health-care services delivered to remote
tunities to introduce innovative applications and incorporate patients (e.g., diagnosis, surgery, medical records) are highly
automatic and adaptive control aspects, thereby, easing network intolerant of delay, quality and privacy. Such sensitive data
management and guaranteeing the user’s quality of experience. and life-critical traffic are hardly supported by traditional
Despite the excitement, SDN adoption raises many challenges
including the scalability and reliability issues of centralized networks.
designs that can be addressed with the physical decentralization Furthermore, in the traditional architecture where the con-
of the control plane. However, such physically distributed, but trol logic is purely distributed and localized, solving a specific
logically centralized systems bring an additional set of challenges. networking problem or adjusting a particular network pol-
This paper presents a survey on SDN with a special focus on the icy requires acting separately on the affected devices and
distributed SDN control. Besides reviewing the SDN concept and
studying the SDN architecture as compared to the classical one, manually changing their configuration. In this context, the cur-
the main contribution of this survey is a detailed analysis of state- rent growth in devices and data has exacerbated scalability
of-the-art distributed SDN controller platforms which assesses concerns by making such human interventions and network
their advantages and drawbacks and classifies them in novel operations harder and more error-prone.
ways (physical and logical classifications) in order to provide Altogether, it has become particularly challenging for
useful guidelines for SDN research and deployment initiatives. A
thorough discussion on the major challenges of distributed SDN today’s networks to deliver the required level of Quality of
control is also provided along with some insights into emerging Service (QoS), let alone the Quality of Experience (QoE)
and future trends in that area. that introduces additional user-centric requirements. To be
Index Terms—Software-defined networking (SDN), distributed more specific, relying solely on the traditional QoS that is
control, network management, quality of experience (QoE), adap- based on technical performance parameters (e.g., bandwidth
tive and automatic control approaches, programmable networks. and latency) turns out to be insufficient for today’s advanced
and expanding networks. Additionally, meeting this growing
number of performance metrics is a complex optimization task
I. I NTRODUCTION
that can be treated as an NP-complete problem. Alternatively,
HE UNPRECEDENTED growth in demands and data
T traffic, the emergence of network virtualization along
with the ever-expanding use of mobile equipment in the mod-
network operators are increasingly realizing that the end-user’s
overall experience and subjective perception of the delivered
services are as important as QoS-based mechanisms. As a
ern network environment have highlighted major problems that result, current trends in network management are heading
are basically inherent to the Internet’s conventional architec- towards this new concept commonly referred to as the QoE
ture. That made the task of managing and controlling the to represent the overall quality of a network service from an
information coming from a growing number of connected end-user perspective.
devices increasingly complex and specialized. That said, this huge gap between, on the one hand, the
Indeed, the traditional networking infrastructure is consid- advances achieved in both computer and software technolo-
ered as highly rigid and static as it was initially conceived gies and on the other, the traditional non-evolving and hard
for a particular type of traffic, namely monotonous text- to manage [1] underlying network infrastructure supporting
based contents, which makes it poorly suited to today’s these changes has stressed the need for an automated net-
interactive and dynamic multimedia streams generated by working platform [2] that facilitates network operations and
increasingly-demanding users. Along with multimedia trends, matches the IoT needs. In this context, several research strate-
gies have been proposed to integrate automatic and adaptive
Manuscript received March 19, 2017; revised October 1, 2017; accepted
November 18, 2017. Date of publication December 12, 2017; date of current approaches into the current infrastructure for the purpose of
version February 26, 2018. (Corresponding author: Abdelhamid Mellouk.) meeting the challenges of scalability, reliability and availability
The authors are with the LiSSi/TincNetwork Research Team, University for real-time traffic, and therefore guaranteeing the user’s QoE.
of Paris-Est Créteil, 94400 Creteil, France (e-mail: fetia.bannour@u-pec.fr;
sami.souihi@u-pec.fr; mellouk@u-pec.fr). While radical alternatives argue that a brand-new network
Digital Object Identifier 10.1109/COMST.2017.2782482 architecture should be built from scratch by breaking with
1553-877X  c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
334 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

the conventional network architecture and bringing fundamen- importance of conducting a serious analysis of the proposed
tal changes to keep up with current and future requirements, SDN solutions in envisioning the potential trends that may
other realistic alternatives are appreciated for introducing drive future research in this field.
slight changes tailored to specific needs and for making a
gradual network architecture transition without causing costly A. Main Contributions of This Survey
disruptions to existing network operations.
In particular, the early Overlay Network alternative intro- Prior surveys [1], [6]–[8] have covered different aspects of
duces an application layer overlay on the top of the con- the SDN paradigm. In particular, surveys published in IEEE
ventional routing substrate to facilitate the implementation of CST over the last few years elaborated on various topics within
new network control approaches. However, the obvious disad- the SDN scope such as the concept, benefits and historical
vantage of Overlay Networks is that they depend on several roots [9], [10], the architecture elements and the design chal-
aspects (e.g., selected overlay nodes) to achieve the required lenges [9]–[12], the SDN programming languages [13], the
performance. Besides, such networks can be criticized for virtualization of SDN networks using hypervisors [14], the
compounding the complexity of existing networks due to the security challenge in SDN [15], the fault management chal-
additional virtual layers. lenge in SDN [16] and the application of SDN in wireless
On the other hand, the recent Software-Defined Networking networks [17]. Despite reviewing the distributed SDN control
(SDN) paradigm [3] offers the possibility to program the topic in some specific sections (e.g., the future perspective
network and thus facilitates the introduction of automatic section), none of these surveys has, to the best of our knowl-
and adaptive control approaches by separating hardware (data edge, particularly focused on covering the various aspects of
plane) and software (control plane) enabling their independent the decentralization problem in SDN.
evolution. SDN aims for the centralization of the network con- While the decentralized SDN control may be implemented
trol, offering an improved visibility and a better flexibility to using the existing distributed SDN controllers, their great
manage the network and optimize its performance. When com- number along with their particular pros and cons made the
pared to the Overlay Network alternative, SDN has the ability choice extremely difficult for those who attempted to adopt
to control the entire network not only a selected set of nodes a distributed SDN architecture in the context of large-scale
and to use a public network for transporting data. Besides, deployments. In order to assist and promote recent initiatives
SDN spares network operators the tedious task of temporar- to put into practice the SDN paradigm, this survey proposes
ily creating the appropriate overlay network for a specific use original classifications that make comparisons between the
case. Instead, it provides an inherent programmatic framework broad range of SDN controller platform solutions with respect
for hosting control and security applications that are developed to various scalability, reliability and performance criteria.
in a centralized way while taking into consideration the IoT
requirements [4] to guarantee the user’s QoE. B. Outline
Along with the excitement, there have been several concerns In this paper, we present a survey on distributed con-
and questions regarding the widespread adoption of SDN net- trol in Software-Defined Networking. In Section II, we start
works. For instance, research studies on the feasibility of the by exposing the promises and solutions offered by SDN as
SDN deployment have revealed that the physical centralization compared to conventional networking. Then, we elaborate on
of the control plane in a single programmable software compo- the fundamental elements of the SDN architecture. In sub-
nent, called the controller, is constrained by several limitations sequent sections, we expand our knowledge of the different
in terms of scalability, availability, reliability, etc. Gradually, approaches to SDN by exploring the wide variety of existing
it became inevitable to think about the control plane as a SDN controller platforms. In doing so, we intend to place
distributed system [5], where several SDN controllers are in a special emphasis on distributed SDN solutions and clas-
charge of handling the whole network, while maintaining a sify them in two different ways: In Section III, we propose a
logically centralized network view. physical classification of SDN control plane architectures into
In that respect, networking communities argued about the centralized and distributed (Flat or Hierarchical) in order to
best way to implement distributed SDN architectures while highlight the SDN performance, scalability and reliability chal-
taking into account the new challenges brought by such dis- lenges. In Section IV, we put forward a logical classification
tributed systems. Consequently, several SDN solutions have of distributed SDN control plane architectures into logically
been explored and many SDN projects have emerged. Each centralized and logically distributed while tackling the associ-
proposed SDN controller platform adopted a specific archi- ated consistency and knowledge dissemination issues. Finally,
tectural design approach based on various factors such as the Section V discusses the emerging challenges, opportunities
aspects of interest, the performance goals, the deployed SDN and trends facing the distributed control in SDNs.
use case, and also the trade-offs involved in the presence of
multiple conflicting challenges.
Despite that great interest in SDN, its deployment in the II. SDN A RCHITECTURE
industrial context is still in its relative early stages. There Over the last few years, the need for a new approach to
might be indeed a long road ahead before technology matures networking has been expressed to overcome the many issues
and standardization efforts pay off so that the full potential associated with current networks. In particular, the main vision
of SDN can be achieved. At this point, we underline the of the SDN approach is to simplify networking operations,
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 335

Fig. 1. Conventional Networking Versus Software-Defined Networking.

optimize network management and introduce innovation and the QoE requirements of modern users. This fact has fueled the
flexibility as compared to legacy networking architectures. need for the enforcement of complex and high-level policies to
In this context and in line with the vision of adapt to current networking environments, and for the automa-
Kim and Feamster [18], four key reasons for the problems tion of network operations to reduce the tedious workload of
encountered in the management of existing networks can be low-level device configuration tasks.
identified: In this sense, and to deliver the goals of easing network
(i) Complex and low-level Network configuration: Network management in real networks, operators have considered run-
configuration is a complex distributed task where each ning dynamic scripts as a way to automate network configura-
device is typically configured in a low-level vendor- tion settings before realizing the limitations of such approaches
specific manner. Additionally, the rapid growth of the which led to misconfiguration issues. It is, however, worth
network together with the changing networking con- noting, that recent approaches to scripting configurations and
ditions have resulted in network operators constantly network automation are becoming relevant [22].
performing manual changes to network configurations, The SDN initiative led by the Open Networking Foundation
thereby compounding the complexity of the configu- (ONF) [23], on the other hand, proposes a new open archi-
ration process and introducing additional configuration tecture to address current networking challenges with the
errors. potential to facilitate the automation of network configurations
(ii) Dynamic Network State: Networks are growing dramat- , and better yet, fully program the network. Unlike the con-
ically in size, complexity and consequently in dynam- ventional distributed network architecture (Figure 1(a)) where
icity. Furthermore, with the rise of mobile computing network devices are closed and vertically-integrated bundling
trends as well as the advent of network virtualiza- software with hardware, the SDN architecture (Figure 1(b))
tion [19] and cloud computing [20], [21], the networking raises the level of abstraction by separating the network data
environment becomes even more dynamic as hosts are and control planes. That way, network devices become simple
continually moving, arriving and departing due to the forwarding switches whereas all the control logic is central-
flexibility offered by VM migration, and thus making ized in software controllers providing a flexible programming
traffic patterns and network conditions change in a more framework for the development of specialized applications and
rapid and significant way. for the deployment of new services.
(iii) Exposed Complexity: In today’s large-scale networks, Such aspects of SDN are believed to simplify and improve
network management tasks are challenged by the high network management by offering the possibility to innovate,
complexity exposed by distributed low-level network customize behaviors and control the network according to
configuration interfaces. That complexity is mainly gen- high-level policies expressed as centralized programs, there-
erated by the tight coupling between the management, fore bypassing the complexity of low-level network details and
control, and data planes, where many control and man- overcoming the fundamental architectural problems raised in
agement features are implemented in hardware. (i) and (iii). Added to these features is the ability of SDN to
(iv) Heterogeneous Network Devices: Current networks are easily cope with the heterogeneity of the underlying infrastruc-
comprised of a large number of heterogeneous network ture (outlined in (iv)) thanks to the SDN Southbound interface
devices including routers, switches and a wide variety abstraction.
of specialized middle-boxes. Each of these appliances More detailed information on the SDN-based architecture
has its own proprietary configuration tools and operates which is split vertically into three layers (see Figure 2) is
according to specific protocols, therefore increasing both provided in the next subsections:
complexity and inefficiency in network management.
All that said, network management is becoming more dif- A. SDN Data Plane
ficult and challenging given that the static and inflexible The data plane, also known as the forwarding plane, con-
architecture of legacy networks is ill-suited to cope with sists of a distributed set of forwarding network elements
today’s increasingly dynamic networking trends, and to meet (mainly switches) in charge of forwarding packets. In the
336 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

Fig. 2. A three-layer distributed SDN architecture.

context of SDN, the control-to-data plane separation feature In more specific terms, and according to the original version
requires the data plane to be remotely accessible for software- 1.0.0 of the standard defined in [26], an OpenFlow-enabled
based control via an open vendor-agnostic Southbound Switch consists of a flow table and an OpenFlow secure
interface. channel to an external OpenFlow controller. Typically, the
Both OpenFlow [24] and ForCES [25] are well-known forwarding table maintains a list of flow entries; Each flow
candidate protocols for the Southbound interface. They both entry comprises match fields containing header values to match
follow the basic principle of splitting the control plane and the packets against, counters to update when packets match for
forwarding plane in network elements and they both standard- flow statistics collection purposes, and a set of actions to apply
ize the communication between the two planes. However, these to matching packets.
solutions are different in many aspects, especially in terms of Accordingly, all incoming packets processed by the switch
network architecture design. are compared against the flow table where flow entries match
Standardized by IETF, ForCES (Forwarding and Control packets based on a priority order specified by the controller.
Element Separation) [25] introduced the separation between In case a matching entry is found, the flow counter is incre-
the control plane and the forwarding plane. In doing so, mented and the actions associated with the specific flow entry
ForCES defines two logic entities that are logically kept in are performed on the incoming packet belonging to that flow.
the same physical device: the Control Element (CE) and According to the OpenFlow specification [26], these actions
the Forwarding Element (FE). However, despite being a may include forwarding a packet out on a specific port, drop-
mature standard solution, the ForCES alternative did not gain ping the packet, removing or updating packet headers, etc.
widespread adoption by major router vendors. If no match is found in the flow table, then the unmatched
On the other hand, OpenFlow [24] received major attention packet is encapsulated and sent over the secure channel to the
in both the research community and the industry. Standardized controller which decides on the way it should be processed.
by the ONF [23], it is considered as the first widely accepted Among other possible actions, the controller may define a new
communication protocol for the SDN Southbound interface. flow for that packet by inserting new flow table entries.
OpenFlow enables the control plane to specify in a centralized Despite the advantages linked to the flexibility and innova-
way the desired forwarding behavior of the data plane. Such tion brought to network management, OpenFlow [24] suffers
traffic forwarding decisions reflect the specified network con- from scalability and performance issues that stem mainly
trol policies and are translated by controllers into actual packet from pushing all network intelligence and control logic to the
forwarding rules populated in the flow tables of OpenFlow centralized OpenFlow controller, thus restricting the task of
switches. OpenFlow switches to a dumb execution of forwarding actions.
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 337

To circumvent these limitations, several approaches According to [36], the InSP function, like any particular
[27]–[30] suggest revisiting the delegation of control between offloading function, faces the challenging issue of finding the
the controller and switches and introducing new SDN switch relevant positioning with respect to the broad design space
Southbound interfaces. for delegation of control to SDN switches. In their opinion,
Notably, DevoFlow [28] claims to minimize switch-to- a good approach to conceiving (eventually standardizing) a
controller interactions by introducing new control mecha- particular offloading function should involve a programming
nisms inside switches. That way, switches can make local abstraction that achieves a fair compromise between viability
control decisions when handling frequent events, without and flexibility, far from extreme solutions that simply turn on
involving controllers whose primary tasks will be limited well-known legacy protocol functions (e.g., MAC learning) or
to keeping centralized control over far fewer significant push a piece of code inside the switches [37], [38].
events that require network-wide visibility. Despite intro- The authors of FOCUS [39] express the same challenges
ducing innovative ideas, the DevoFlow alternative has been but, unlike the above proposals, they reject a performance-
mainly criticized for imposing major modifications to switch based design choice that requires adding new hardware
designs [31]. primitives to OpenFlow switches in the development of the
On the other hand, stateful approaches [29], [32], [33], delegated control function. Instead, they promote a deploy-
as opposed to the original stateless OpenFlow abstraction, able software-based solution to be implemented in the switch’s
motivate the need to delegate some stateful control func- software stack to achieve a balanced trade-off between
tions back to switches in order to offload the SDN controller. the flexibility and cost of the control function delegation
These approaches face the challenging dilemma of program- process.
ming stateful devices (evolving the data plane) while retaining
the simplicity, generality and vendor-agnostic features offered
by the OpenFlow abstraction. In particular, the OpenState
proposal [29] is a stateful platform-independent data plane B. SDN Control Plane
extension of the current OpenFlow match/action abstraction Regarded as the most fundamental building entity in SDN
supporting a finite-state machine (FSM) programming model architecture, the control plane consists of a centralized soft-
called Mealy Machine in addition to the flow programming ware controller that is responsible for handling communica-
model adopted by OpenFlow. That model is implemented tions between network applications and devices through open
inside the OpenFlow switches using additional state tables in interfaces. More specifically, SDN controllers translate the
order to reduce the reliance on remote controllers for applica- requirements of the application layer down to the underlying
tions involving local states like MAC learning operations and data plane elements and give relevant information up to SDN
port-knocking on a firewall. applications.
Despite having the advantage of building on the adapta- The SDN control layer is commonly referred to as the
tion activity of the OpenFlow standard and leveraging its Network Operating System (NOS) as it supports the net-
evolution using the (stateful) extensions provided by recent work control logic and provides the application layer with an
versions (version 1.3 and 1.4), OpenState faces important chal- abstracted view of the global network, which contains enough
lenges regarding the implementation of a stateful extension for information to specify policies while hiding all implementation
programming the forwarding behaviour inside switches while details.
following an OpenFlow-like implementation approach. The Typically, the control plane is logically centralized and yet
feasibility of the hardware implementation of OpenState has implemented as a physically distributed system for scalabil-
been addressed in [34]. Finally, the same authors extended ity and reliability reasons as discussed in Sections III and IV.
their work into a more general and expressive abstrac- In a distributed SDN control configuration, East-Westbound
tion of OpenState called OPP [35] which supports a full APIs [40] are required to enable multiple SDN controllers to
extended finite-state machine (XFSM) model, thereby enabling communicate with each other and exchange network informa-
a broader range of applications and complex stateful flow tion. Despite the many attempts to standardize SDN protocols,
processing operations. there has been to date no standard for the East-West API which
In the same spirit, the approach presented in [36] explored remains proprietary for each controller vendor. Although a
delegating some parts of the controller functions involving number of East-Westbound communications happen only at
packet generation tasks to OpenFlow switches in order to the data-store level and do not require additional protocol
address both switch and controller scalability issues. The specifics, it is becoming increasingly advisable to standardize
InSP API was introduced as a generic API that extends that communication interface in order to provide wider interop-
OpenFlow to allow for the programming of autonomous erability between different controller technologies in different
packet generation operations inside the switches such as ARP autonomous SDN networks.
and ICMP handling. The proposed OpenFlow-like abstrac- On the other hand, an East-Westbound API standard
tions include an InSP Instruction for specifying the actions requires advanced data distribution mechanisms and involves
that the switch should apply to a packet being generated other special considerations. This brings about additional
after a triggering event and a Packet Template Table (PTE) SDN challenges, some of which have been raised by the
for storing the content of any packet generated by the state-of-the art distributed controller platforms discussed in
switch. Sections III and IV, but have yet to be fully addressed.
338 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

C. SDN Application Plane way for applications to interact with the controller. These
The SDN application plane comprises SDN applications APIs are designed to raise the level of abstraction in order
which are control programs designed to implement the net- to allow for the flexible development of applications and
work control logic and strategies. This higher-level plane for the specification of high-level network policies.
interacts with the control plane via an open Northbound API.
In doing so, SDN applications communicate their network III. P HYSICAL C LASSIFICATION OF SDN
requirements to the SDN controller which translates them into C ONTROL P LANE A RCHITECTURES
Southbound-specific commands and forwarding rules dictat- Despite the undeniable strengths of SDN, there have always
ing the behavior of the individual data plane devices. Routing, been serious concerns about the ability to extend SDN to large-
Traffic Engineering (TE), firewalls and load balancing are typ- scale networks.
ical examples of common SDN applications running on top of Some argue that these scalability limits are basically linked
existing controller platforms. to the protocol standards being used for the implementation
In the context of SDN, applications leverage the decou- of SDN. OpenFlow [24] in particular, although recognized as
pling of the application logic from the network hardware a leading and widely-deployed SDN Southbound technology,
along with the logical centralization of the network con- is currently being rethought for potentially causing excessive
trol, to directly express the desired goals and policies in overheads on switches (switch bottleneck). Scalable alterna-
a centralized high-level manner without being tied to the tives to the OpenFlow standard which propose to revisit the
implementation and state-distribution details of the underly- delegation of control between the controller and the switches
ing networking infrastructure. Concurrently, SDN applications with the aim of reducing the reliance on SDN the control
make use of the abstracted network view exposed through the plane, have been discussed in Section II-A.
Northbound interface to consume the network services and Another entirely different approach to addressing the SDN
functions provided by the control plane according to their scalability and reliability challenges, which is advocated by
specific purposes. the present paper, is to physically distribute the SDN con-
That being said, the Northbound API implemented by SDN trol plane. This has led to a first categorization of existing
controllers can be regarded as a network abstraction interface controller platforms into centralized and distributed architec-
to applications, easing network programmability, simplifying tures (see Figure 3). Please note that, in Figure 3 and Figure 4,
control and management tasks and allowing for innovation. In controllers that present similar characteristics for the discussed
contrast to the Southbound API, the Northbound API is not comparison criteria are depicted in the same color.
supported by an accepted standard.
Despite the broad variety of Northbound APIs adopted by A. Centralized SDN Control
the SDN community (see Figure 2), we can classify them into A physically-centralized control plane consisting of a sin-
two main categories: gle controller for the entire network is a theoretically perfect
• The first set involves simple and primitive APIs that are design choice in terms of simplicity. However, a single con-
directly linked to the internal services of the controller troller system may not keep up with the growth of the
platform. These implementations include: network. It is likely to become overwhelmed (controller bot-
– Low-level ad-hoc APIs that are proprietary and tleneck) while dealing with an increasing number of requests
tightly dependent on the controller platform. Such and concurrently struggling to achieve the same performance
APIs are not considered as high-level abstractions as guarantees.
they allow developers to directly implement appli- Obviously, a centralized SDN controller does not meet
cations within the controller in a low-level manner. the different requirements of large-scale real-world network
Deployed internally, these applications are tightly deployments. Data Centers and Service Provider Networks
coupled with the controller and written in its native are typical examples of such large-scale networks presenting
general-purpose language. NOX in C++ and POX in different requirements in terms of scalability and reliability.
Python are typical examples of controllers that use More specifically, a Data Center Network involves tens of
their own basic sets of APIs. thousands of switching elements. Such a great number of for-
– APIs based on Web services such as the widely-used warding elements which can grow at a fast pace is expected to
REST API. This group of programming interfaces generate a huge number of control events that are enough to
enables independent external applications (Clients) overload a single centralized SDN controller [44], [45]. Studies
to access the functions and services of the SDN con- conducted in [46] show important scalability implications (in
troller (Server). These applications can be written in terms of throughput) for centralized controller approaches.
any programming language and are not run inside They demonstrate that multiple controllers should be used to
the bundle hosting the controller software. Floodlight scale the throughput of a centralized controller and meet the
is an example of an SDN controller that adopts an traffic characteristics within realistic data centers.
embedded Northbound API based on REST. Unlike data centers, Service Provider Networks are charac-
• The second category contains higher level APIs that terized by a modest number of network nodes. However, these
rely on domain-specific programming languages such as nodes are usually geographically distributed making the diam-
Frenetic [41], Procera [42] and Pyretic [43] as an indirect eter of these networks very large [44]. This entails a different
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 339

Fig. 3. Physical classification of SDN control plane architectures.

type of controller scalability issues for centralized controller been consequently developed by the same community (Nicira
approaches, more specifically, high latencies. In addition to Networks) such as NOX-MT [55] for better performance and
latency requirements, service provider networks have large POX [56] for a more developer-friendly environment.
numbers of flows that may generate overhead and bandwidth However, while none of these centralized designs is believed
issues. to meet the above scalability and reliability requirements of
In general, Wide Area Network (WAN) deployments typi- large-scale networks, they have gained greater prominence as
cally impose strict resiliency requirements. In addition, they they were widely used for research and educational purposes.
present higher propagation delays as compared to data cen- Additionally, Floodlight [53] which is a very popular Java-
ter networks. Obviously, a centralized controller design in a based OpenFlow controller from Big Switch Networks, suffers
SD-WAN cannot achieve the desired failure resiliency and from serious security and resiliency issues. For instance,
scale-out behaviors [47]. Several studies have emphasized the Dhawan et al. [57] have reported that the centralized SDN
need for a distributed control plane in a SD-WAN architec- controller is inherently susceptible to Denial-of-Service (DoS)
ture: They indeed focused on placing multiple controllers on attacks. Another subsequent version of Floodlight, called
real WAN topologies to benefit both control plane latency and SE-Floodlight, has therefore been released to overcome
fault-tolerance [48], [49]. these problems by integrating security applications. However,
That said, the potential scalability, reliability and vulnerabil- despite the introduced security enhancements aimed at shield-
ity concerns associated with centralized controller approaches ing the centralized controller, the latter remains a potential
have been further confirmed through studies [7], [50] on the weakness compromising the whole network. In fact, the con-
behavior of state-of-the-art centralized SDN controllers such troller still maintains a single point of failure and bottlenecks
as NOX [51], Beacon [52] and Floodlight [53] in different even if its latest version is less vulnerable to malicious attacks.
networking environments. On the other hand, given its obvious performance and
In particular, NOX classic [51], the world’s first-generation functionality advantages, the open-source Floodlight has been
OpenFlow controller with an event-based programming model, extensively used to build other SDN controller platforms
is believed to be limited in terms of throughput. Indeed, it supporting distributed architectures such as ONOS [58] and
cannot handle a large number of flows, namely a rate of 30k DISCO [59].
flow initiation events per second [7], [54]. Such a flow setup
throughput may sound sufficient for an enterprise network, B. Distributed SDN Control
but, it could be arguable for data-center deployments with Alternatively, physically-distributed control plane architec-
high flow initiation rates [46]. Improved versions of NOX have tures have received increased research attention in recent years
340 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

since they appeared as a potential solution to mitigate the them to another nearby controller instance (from a neighbor’s
issues brought about by centralized SDN architectures (poor site). Alongside this ability to tackle component failures,
scalability, Single Point of Failure (SPOF), performance bot- HyperFlow is resilient to network partitioning thanks to the
tlenecks, etc). As a result, various SDN control plane designs partition tolerance property of WheelFS. In fact, in the pres-
have been proposed in recent literature. Yet, we discern ence of a network partitioning, WheelFS partitions continue
two main categories of distributed SDN control architectures to operate independently, thus favoring availability.
based on the physical organization of SDN controllers: A Similarly, ONOS [58] considers fault-tolerance as a
flat SDN control architecture and a hierarchical SDN control prerequisite for adopting SDN in Service Provider networks.
architecture (see Figure 3). ONOS’s distributed control plane guards against controller
1) Flat SDN Control: The flat structure implies the hori- instance failures by connecting, from the onset, each SDN
zontal partitioning of the network into multiple areas, each of switch to more than one SDN controller; its master con-
which is handled by a single controller in charge of manag- troller and other backup controllers (from other domains) that
ing a subset of SDN switches. There are several advantages to may take over in the wake of master controller failures. Load
organizing controllers in such a flat style, including reduced balancing mechanisms are also provided to balance the mas-
control latency and improved resiliency. tership of switches among the controllers of the cluster for
Onix [60], Hyperflow [61] and ONOS [58] are typical exam- scalability purposes. Besides, ONOS incorporates additional
ples of flat physically-distributed controller platforms which recovery protocols, such as the Anti-Entropy protocol [63],
are initially designed to improve control plane scalability for healing from lost updates due to such controller crashes.
through the use of multiple interconnected controllers sharing Recent SDN controller platform solutions [64]–[69] focused
a global network-wide view and allowing for the development specifically on improving fault-tolerance in the distributed
of centralized control applications. However, each of these SDN control plane. Some of these works assumed a simplified
contributions takes a different approach to distribute controller flat design where the SDN control was centralized. However,
states and providing control plane scalability. since the main focus was placed at the fault-tolerance aspect,
For example, Onix provides a good scalability through addi- we believe that their ideas and their fault-tolerance approaches
tional partitioning and aggregation mechanisms. To be more can be leveraged in the context of medium to large scale SDNs
specific, Onix partitions the NIB (Network Information Base) where the network control is physically distributed among
giving each controller instance responsibility for a subset of multiple controllers.
the NIB and it aggregates by making applications reduce the In particular, Botelho et al. [70] developed a hybrid
fidelity of information before sharing it between other Onix SDN controller architecture that combines both passive and
instances within the cluster. Similar to Onix, each ONOS active replication approaches for achieving control plane fault-
instance (composing the cluster) that is responsible for a subset tolerance. SMaRtLight adopts a simple Floodlight [53]-based
of network devices holds a portion of the network view that is multi-controller design following OpenFlow 1.3, where one
also represented in a graph. Different from Onix and ONOS, main controller (the primary) manages all network switches,
every controller in HyperFlow has the global network view, and other controller replicas monitor the primary controller
thus getting the illusion of control over the whole network. Yet, and serve as backups in case it fails.
HyperFlow can be considered as a scalable option for specific This variant of a traditional passive replication system relies
policies in which a small number of network events affect the on an external data store that is implemented using a modern
global network state. In that case, scalability is ensured by active Replicated State Machine (RSM) built with a Paxos-
propagating these (less frequent) selected events through the like protocol (BFT-SMaRt [71]) to ensure fault-tolerance and
event propagation system. strong consistency. This shared data store is used for stor-
Furthermore, different mechanisms are put in place by ing the network and application state (the common global
these distributed controller platforms to meet fault-tolerance NIB) and also for coordinating fault detection and leader elec-
and reliability requirements in the event of failures or tion operations between controller replicas that run a lease
attacks. management algorithm.
Onix [60] uses different recovery mechanisms depending on In case of a failure of the primary controller, the elected
the detected failures. Onix instance failure is most of the time backup controller starts reading the current state from the
handled by distributed coordination mechanisms among repli- shared consistent data store in order to mitigate the cold-start
cas whereas network element/link failures are under the full (empty state) issue associated with traditional passive replica-
responsibility of applications developed atop Onix. Besides, tion approaches, and thereby ensure a smoother transition to
Onix is assumed reliable when it comes to connectivity infras- the new primary controller role.
tructure failures as it can dedicate the failure recovery task to The limited feasibility of the deployed controller fault-
a separate management backbone that uses a multi-pathing tolerance strategy is warranted by the limited scope of the
protocol. SMaRtLight solution which is only intended for small to
Likewise, Hyperflow [61] focuses on ensuring resiliency and medium-sized SDN networks. On the other hand, in large-scale
fault tolerance as a means for achieving availability. When a deployments, adopting a simplified Master-Slave approach,
controller failure is discovered by the failure detection mecha- and more importantly, assuming a single main controller
nisms deployed by its publish/subscribe WheelFS [62] system, scheme where one controller replica must retrieve all the net-
HyperFlow reconfigures the affected switches and redirects work state from the shared data store in failure scenarios, have
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 341

major disadvantages in terms of increased latency and failover On the other hand, Google’s B4 [75], [76], a private intra-
time. domain software-defined WAN connecting their data centers
Similarly, the Ravana controller platform proposal [66] across the planet, proposes a two-level hierarchical control
addresses the issue of recovering from complete fail-stop framework for improving scalability. At the lower layer, each
controller crashes. It offers the abstraction of a fault-free data-center site is handled by an Onix-based [60] SDN con-
centralized SDN controller to unmodified control applica- troller hosting local site-level control applications. These site
tions which are relieved of the burden of handling controller controllers are managed by a global SDN Gateway that collects
failures. Accordingly, network programmers write application network information from multiple sites through site-level TE
programs for a single main controller and the transparent services and sends them to a logically centralized TE server
master-slave Ravana protocol takes care of replicating, seam- which also operates at the upper layer of the control hierarchy.
lessly and consistently, the control logic to other backup Based on an abstract topology, the latter enforces high-level
controllers for fault-tolerance. TE policies that are mainly aimed at optimizing bandwidth
The Ravana approach deploys enhanced Replicated State allocation between competing applications across the different
Machine (RSM) techniques that are extended with switch- data-center sites. That being said, the TE server programs these
side mechanisms to ensure that control messages are processed forwarding rules at the different sites through the same gate-
transactionally with ordered and exactly-once semantics even way API. These TE entries will be installed into higher-priority
in the presence of failures. The three Ravana prototype switch forwarding tables alongside the standard shortest-path
components, namely the Ryu [72]-based controller runtime, forwarding tables. In this context, it is worth mentioning that
the switch runtime, and the control channel interface, work the topology abstraction which consists in abstracting each
cooperatively to guarantee the desired correctness and robust- site into a super-node with an aggregated super-trunk to a
ness properties of a fault-tolerant logically centralized SDN remote site is key to improving the scalability of the B4
controller. network. Indeed, this abstraction hides the details and com-
More specifically, when the master controller crashes, the plexity from the logically centralized TE controller, thereby
Ravana protocol detects the failure within a short failover allowing it to run protocols at a coarse granularity based on
time and elects the standby slave controller to take over a global controller view and, more importantly preventing it
using Zookeeper [73]-like failure detection and leader election from becoming a serious performance bottleneck.
mechanisms. The new leader finishes processing any logged Unlike Kandoo [31], B4 [75] deploys robust reliability
events in order to catch up with the failed master controller and fault-tolerance mechanisms at both levels of the con-
state. Then, it registers with the affected switches in the role trol hierarchy in order to enhance the B4 system availability.
of the new master before proceeding with normal controller These mechanisms have been especially enhanced after expe-
operations. riencing a large-scale B4 outage. In particular, Paxos [77] is
2) Hierarchical SDN Control: The hierarchical SDN con- used for detecting and handling the primary controller failure
trol architecture assumes that the network control plane is within each data-center site by electing a new leader controller
vertically partitioned into multiple levels (layers) depending on among a set of reachable standby instances. On the other
the required services. According to [74], a hierarchical orga- hand, network failures at the upper layer are addressed by
nization of the control plane can improve SDN scalability and the logically centralized TE controller which adapts to failed
performance. or unresponsive site controllers in the bandwidth allocation
To improve scalability, Kandoo [31] assumes a hierarchical process. Additionally, B4 is resilient against other failure sce-
two-layer control structure that partitions control applica- narios where the upper-level TE controller encounters major
tions into local and global. Contrary to DevoFlow [28] and problems in reaching the lower-level site controllers (e.g., TE
DIFANE [27], Kandoo proposes to reduce the overall stress operation/session failures). Moreover, B4 guards against the
on the control plane without the need to modify OpenFlow failure of the logically centralized TE controller by geograph-
switches. Instead, it establishes a two-level hierarchical con- ically replicating TE servers across multiple WAN sites (one
trol plane, where frequent events occurring near the data master TE server and four secondary hot standbys). Finally,
path are handled by the bottom layer (local controllers with another fault recovery mechanism is used in case the TE con-
no interconnection running local applications) and non-local troller service itself faces serious problems. That mechanism
events requiring a network-wide view are handled by the top stops the TE service and enables the standard shortest-path
layer (a logically centralized root controller running non-local routing mechanism as an independent service.
applications and managing local controllers). In the same spirit, Espresso [78] is another interesting
Despite the obvious scalability advantages of such a control SDN contribution that represents the latest and more chal-
plane configuration where local controllers can scale linearly lenging pillar of Google’s SDN strategy. Building on the
as they do not share information, Kandoo did not envision previous three layers of that strategy (the B4 WAN [75],
fault-tolerance and resiliency strategies to protect itself from the Andromeda NFV stack and the Jupiter data center inter-
potential failures and attacks in the data and control planes. connect), Espresso extends the SDN approach to the peering
Besides, from a developer perspective, Kandoo imposes some edge of Google’s network where it connects to other networks
kandoo-specific conditions on the control applications devel- worldwide. Considered as a large-scale SDN deployment for
oped on top of it, in such a way that makes them aware of its the public Internet, Espresso, which has been in production
existence. for more than two years, routes over 22% of Google’s total
342 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

traffic to the Internet. More specifically, the Espresso technol- While the main advantage of Onix is its programmatic
ogy allows Google to dynamically choose from where to serve framework created for the flexible development of control
content for individual users based on real-time measurements applications with desired trade-offs between performance and
of end-to-end network connections. state consistency (strong/eventual), it carries the limitations
To deliver unprecedented scale-out and efficiency, Espresso of eventually consistent systems which rely on application-
assumes a hierarchical control plane design split between specific logic to detect network state inconsistencies for
Global controllers and Local controllers that perform differ- the eventually-consistent data and provide conflict resolution
ent functions. Besides, Espresso’s software programmability methods for handling them.
design principle externalizes features into software thereby As mentioned in Section III-B1, SMaRtLight is a fault-
exploiting commodity servers for scale. tolerant logically centralized Master-Slave SDN controller
Moreover, Espresso achieves higher availability (reliability) platform, where a single controller is in charge of all net-
when compared to existing router-centric Internet protocols. work decisions. This main controller is supported by backup
Indeed, it supports a fail static system, where the local data controller replicas that should have a synchronized network
plane keeps the last known good state to allow for con- view in order to take over the network control in case of the
trol plane unavailability without impacting data plane and primary failure. All controller replicas are coordinated through
BGP peering operations. Finally, another important feature of a shared data store that is kept fault-tolerant and strongly con-
Espresso is that it provides full interoperability with the rest sistent using an implementation of Replicated State Machine
of the Internet and the traditional heterogeneous peers. (RSM).
Consistency between the master and backup controllers is
guaranteed by replicating each change in the network image
IV. L OGICAL C LASSIFICATION OF D ISTRIBUTED SDN (NIB) of the master into the shared data store before modify-
C ONTROL P LANE A RCHITECTURES ing the state of the network. However, such synchronization
Apart from the physical classification, we can categorize updates generate additional time overheads and have a drastic
distributed SDN control architectures according to the way impact on the controller’s performance. To address this issue,
knowledge is disseminated among controller instances (the the controllers keep a local cache (maintained by one active
consistency challenge) into logically centralized and logically primary controller at any time) to avoid accessing the shared
distributed architectures (see Figure 4). This classification has data store for read operations. By keeping the local cache and
been recently adopted by [79]. the data store consistent even in the presence of controller fail-
ures, the authors claim that their simple Master-Slave structure
achieves, in the context of small to medium-sized networks, a
A. Logically Centralized SDN Control balance between consistency and fault-tolerance while keeping
1) Onix and SMaRtLight: Both Onix [60] and performance at an acceptable level.
SMaRtLight [70] are logically centralized controller platforms 2) HyperFlow and Ravana: Both HyperFlow [61] and
that achieve controller state redundancy through state replica- Ravana [66] are logically centralized controller platforms that
tion. But the main difference is that Onix uses a distributed achieve state redundancy through event replication. Despite
data store while SMartLight uses a centralized data store their similarities in building the application state, one differ-
for replicating the shared network state. They also deploy ence is that the Ravana protocol is completely transparent to
different techniques for sharing knowledge and maintaining a control applications while HyperFlow requires minor modi-
consistent network state. fications to applications. Besides, while HyperFlow is even-
Onix is a distributed control platform for large-scale pro- tually consistent favoring availability, Ravana ensures strong
duction networks that stands out from previous proposals by consistency guarantees.
providing a simple general-purpose API, a central NIB abstrac- More specifically, Hyperflow [61] is an extension of NOX
tion and standard state distribution primitives for easing the into a distributed event-based control plane where each NOX-
implementation of network applications. based controller manages a subset of OpenFlow network
In more specific terms, Onix uses the NIB data structure to switches. It uses an event-propagation publish/subscribe mech-
store the global network state (in the form of a network graph) anism based on the distributed WheelFS [62] file system
that is distributed across running Onix instances and synchro- for propagating selected network events and maintaining the
nized through Onix’s built-in state distribution tools according global network-wide view across controllers. Accordingly, the
to different levels of consistency as dictated by application Hyperflow controller application instance running on top of an
requirements. In fact, besides interacting with the NIB at individual NOX controller selectively publishes relevant events
run-time, network applications on top of Onix initially config- that affect the network state and receives events on subscribed
ure their own data storage and dissemination mechanisms by channels to other controllers. Then, other controllers locally
choosing among two data-store options already implemented replay all the published events in order to reconstruct the state
by Onix in the NIB: A replicated transactional database that and achieve the synchronization of the global view.
guarantees strong consistency at the cost of good performance By this means, all controller instances make decisions
for persistent but slowly-changing data (state), and a high- locally and individually (without contacting remote controller
performance memory-only distributed hash table (DHT) for instances): They indeed operate based on their synchronized
volatile data that does not require strict consistency. eventually-consistent network-wide view as if they are in
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 343

Fig. 4. Logical classification of distributed SDN control plane architectures.

control of the entire network. Through this synchronization accordance with the following properties: (i) events are pro-
scheme, Hyperflow achieves the goal of minimizing flow cessed in the same total order at all controller replicas so that
setup times and also congestion, in other words, cross-site controller application instances would reach the same internal
traffic required to synchronize the state among controllers. state, (ii) events are processed exactly-once across all the con-
However, the potential downside of Hyperflow is related to troller replicas, (iii) commands are executed exactly-once on
the performance of the publish/subscribe system which can the switches.
only deal with non-frequent events. Besides, HyperFlow does To achieve such design goals, Ravana follows a Replicated
not guarantee a strict ordering of events and does not handle State Machine (RSM) approach, but extends its scope to deal
consistency problems. This makes the scope of HyperFlow with switch state consistency under failures. Indeed, while
restricted to applications that does not require a strict event Ravana permits unmodified applications to run in a transpar-
ordering with strict consistency guarantees. ent fault-tolerant environment, it requires modifications to the
To correctly ensure the abstraction of a “logically central- OpenFlow protocol, and it makes changes to current switches
ized SDN controller”, an elaborate fault-tolerant controller instead of involving them in a complex consensus protocol.
platform called Ravana [66] extended beyond the requirements To be more specific, Ravana uses a two-stage replication
for controller state consistency to include that for switch state protocol that separates the reliable logging of the master’s
consistency under controller failures. event delivery information (stage 1) from the logging of the
Maintaining such strong levels of consistency in both con- master’s event-processing transaction completion information
trollers and switches in the presence of failures, requires (stage 2) in the shared in-memory log (using Viewstamped
handling the entire event-processing cycle as a transaction in Replication [80]) in order to guarantee consistency under
344 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

joint switch and controller failures. Besides, it adds explicit set of core building blocks for dealing with differ-
acknowledgement messages to the OpenFlow 1.3 protocol and ent types of distributed control plane state, including a
implements buffers on existing switches for event retrans- ConsistentMap primitive for state requiring strong consistency
mission and command filtering. The main objective of the and an EventuallyConsistentMap for state tolerating weak
addition of these extensions and mechanisms is to guaran- consistency.
tee the exactly-once execution of any event transaction on the In particular, applications that favor performance over con-
switches during controller failures. sistency store their state in the shared eventually-consistent
Such strong correctness guarantees for a logically central- data structure that uses optimistic replication assisted by the
ized controller under Ravana come at the cost of generating gossip-based Anti-Entropy protocol [63]. For example, the
additional throughput and latency overheads that can be global network topology state which should be accessible to
reduced to a quite reasonable extent with specific perfor- applications with minimal delays is managed by the Network
mance optimizations. Since the Ravana runtime is completely Topology store according to this eventual consistency model.
transparent and oblivious to control applications, achieving Recent releases of ONOS treat the network topology view
relaxed consistency requirements for the sake of improved as an in-memory state machine graph. The latter is built and
availability as required by some specific applications, entails updated in each SDN controller by applying local topology
considering new mechanisms that consider relaxing some of events and replicating them to other controller instances in
the correctness constraints on Ravana’s design goals. the cluster in an order-aware fashion based on the events’
A similar approach to Ravana [66] was adopted by logical timestamps. Potential conflicts and loss of updates
Mantas and Ramos [81] to achieve a consistent and fault- due to failure scenarios are resolved by the anti-entropy
tolerant SDN controller platform. In their ongoing work, the approach [63] where each controller periodically compares its
authors claim to retain the same requirements expressed by topology view with that of another randomly-selected con-
Ravana, namely the transparency, reliability, consistency and troller in order to reconcile possible differences and recover
performance guarantees, but without requiring changes to the from stale information.
OpenFlow protocol or to existing switches. On the other hand, state imposing strong consistency guar-
Likewise, Kandoo [31] falls in this category of logically antees is managed by the second data structure primitive built
centralized controllers that distribute the control state by prop- using RAFT [85], a protocol that achieves consensus via an
agating network events. Indeed, Kandoo assumes, at the top elected leader controller in charge of replicating the received
layer of its hierarchical design, a logically centralized root log updates to follower controllers and then committing these
controller for handling global and rare network events. Since updates upon receipt of confirmation from the majority. The
the main aim was to preserve scalability without changing mapping between controllers and switches which is handled
the OpenFlow devices, Kandoo did not focus on knowl- by ONOS’s Mastership store is an example of a network state
edge distribution mechanisms for achieving network state that is maintained in a strongly consistent manner.
consistency. Administered by the Linux Foundation and backed by
3) ONOS and OpenDayLight: ONOS and the industry, OpenDayLight (ODL) [83] is a generic and
OpenDayLight [82] represent another category of logi- general-purpose controller framework which, unlike ONOS,
cally centralized SDN solutions that set themselves apart was conceived to accommodate a wide variety of applications
from state-of-the-art distributed SDN controller platforms by and use cases concerning different domains (e.g., Data Center,
offering community-driven open-source frameworks as well Service Provider and Enterprise). One important architectural
as providing the full functionalities of Network Operating feature of ODL is its YANG-based Model-Driven Service
Systems. Despite their obvious similarities, these prominent Abstraction Layer (MD-SAL) that allows for the easy and
Java-based projects present major differences in terms of flexible incorporation of network services requested by the
structure, target customers, focus areas and inspirations. higher layers via the Northbound Interface (OSGi framework
Dissimilar to OpenDayLight [83] which is applicable to and the bidirectional RESTful Interfaces) irrespective of the
different domains, ONOS [58] from ON.LAB is specifically multiple Southbound protocols used between the controller
targeted towards service providers and is thus architected to and the heterogeneous network devices.
meet their carrier-grade requirements in terms of scalability, The main focus of ODL was to accelerate the integra-
high-availability and performance. In addition to the high- tion of SDN in legacy network environments by automating
level Northbound abstraction (a global network view and an the configuration of traditional network devices and enabling
application intent framework) and the pluggable Southbound their communication with OpenFlow devices. As a result, the
abstraction (supporting multiple protocols), ONOS, in the project was perceived as adopting vendor-driven solutions that
same way as Onix and Hyperflow, offers state dissemina- mainly aim at preserving the brands of legacy hardware. This
tion mechanisms [84] to achieve a consistent network state represents a broad divergence from ONOS which envisions a
across the distributed cluster of ONOS controllers, a required carrier-grade SDN platform with enhanced performance capa-
or highly desirable condition for network applications to run bilities to explore the full potential of SDN and demonstrate
correctly. its real value.
More specifically, ONOS’s distributed core eases the The latest releases of ODL provided a distributed SDN con-
state management and cluster coordination tasks for appli- troller architecture referred to as ODL clustering. Differently
cation developers by providing them with an available from ONOS, ODL did not offer various consistency models
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 345

for different types of network data. All the data shared across network management at the intra-domain level, cannot be fully
the distributed cluster of ODL controllers for maintaining the exploited for controlling heterogeneous networks involving
logically centralized network view is handled in a strongly- multiple Autonomous Systems (ASes) under different admin-
consistent manner using the RAFT consensus algorithm [85] istrative authorities (e.g., the Internet). In this context, recent
and the Akka framework [86]. works have considered extending the SDN scheme to such
4) B4 and SWAN: Google’s B4 [75] network leverages the inter-domain networks while remaining compatible with their
logical centralization enabled by the SDN paradigm to deploy distributed architecture. In this section, we shed light on these
centralized TE in coexistence with the standard shortest-path SDN solutions which adopted a logically distributed architec-
routing for the purpose of increasing the utilization of the ture in accordance with legacy networks. For that reason, we
inter-data-center links (near 100%) as compared to conven- place them in the category of logically distributed SDN plat-
tional networks and thereby enhancing network efficiency and forms as opposed to the logically centralized ones mainly used
performance. As previously explained in Section III-B2, the for intra-domain scenarios.
logically centralized TE server uses the network information 1) DISCO and D-SDN: For instance, the DISCO
collected by the centralized SDN Gateway to control and project [59] suggests a logically distributed control plane
coordinate the behavior of site-level SDN controllers based architecture that operates in such multi-domain heterogeneous
on an abstracted topology view. The main task of the TE environments, more precisely WANs and overlay networks.
server is indeed to optimize the allocation of bandwidth among Built on top of Floodlight [53], each DISCO controller admin-
competing applications (based on their priority) across the isters its own SDN network domain and interacts with other
geographically-distributed data-center sites. controllers to provide end-to-end network services. This inter-
That being said, we implicitly assume the presence of a spe- AS communication is ensured by a unique lightweight control
cific consistency model used by the centralized SDN Gateway channel to share summary network-wide information.
for handling the distributed network state across the data- The most obvious contribution of DISCO lies in the sepa-
center site controllers and ensuring that the centralized TE ration between intra-domain and inter-domain features of the
application runs correctly based on a consistent network-wide control plane, while each type of features is performed by
view. However, there has been very little information provided a separate part of the DISCO architecture. The intra-domain
on the level of consistency adopted by the B4 system. As a modules are responsible for ensuring the main functions of
matter of fact, one potential downside of the SDN approach the controller such as monitoring the network and reacting
followed by Google could be the fact that it is too customized to network issues, and the inter-domain modules (Messenger,
and tailored to their specific network requirements as no gen- Agents) are designed to enable a message-oriented com-
eral control model has been proposed for future use by other munication between neighbor domain controllers. Indeed,
SDN projects. the AMQP-based Messenger [90] offers a distributed pub-
Similarly, Microsoft has presented SWAN [87] as an intra- lish/subscribe communication channel used by agents which
domain software-driven WAN deployment that takes advan- operate at the inter-domain level by exchanging aggregated
tage of the logically-centralized SDN control using a global information with intra-domain modules. DISCO was assessed
TE solution to significantly improve the efficiency, reliabil- on an emulated environment according to three use cases:
ity and fairness of their inter-DC WAN. In the same way as inter-domain topology disruption, end-to end service priority
Google, Microsoft did not provide much information about the request and Virtual Machine Migration.
control plane state consistency updates. The main advantage of the DISCO solution is the possibility
to adapt it to large-scale networks with different ASes such as
the Internet [79]. However, we believe that there are also sev-
B. Logically Distributed SDN Control eral drawbacks associated with such a solution including the
The potential of the SDN paradigm has been properly static non-evolving decomposition of the network into several
explored within single administrative domains like data cen- independent entities, which is in contrast to emerging theories
ters, enterprise networks, campus networks and even WANs such as David D. Clark’s theory [91] about the network being
as discussed in Section IV-A. Indeed, the main pillars of manageable by an additional high-level entity known as the
SDN – the decoupling between the control and data planes Knowledge Plane. Besides, following the DISCO architecture,
together with the consequent ability to program the network network performance optimization becomes a local task ded-
in a logically centralized manner – have unleashed produc- icated to local entities with different policies, each of which
tive innovation and novel capabilities in the management of acts in its own best interest at the expense of the general inter-
such intra-domain networks. These benefits include the effec- est. This leads to local optima rather than the global optimum
tive deployment of new domain-specific services as well as that achieves the global network performance. Additionally,
the improvement of standard control functions following the from the DISCO perspective, one SDN controller is responsi-
SDN principles like intra-domain routing and TE. RCP [88] ble for one independent domain. However, an AS is usually
and RouteFlow [89] are practical examples of successful intra- too large to be handled by a single controller. Finally, DISCO
AS platforms that use OpenFlow to provide conventional IP did not provide appropriate reliability strategies suited to its
routing services in a centralized manner. geographically-distributed architecture. In fact, in the event
However, that main feature of logically-centralized control of a controller failure, one might infer that a remote con-
which has been leveraged by most SDN solutions to improve troller instance will be in charge of the subset of affected
346 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

switches, thereby resulting in a significant increase in the con- also stands out from similar solutions like Cardigan [93] by
trol plane latency. In our opinion, a better reliability strategy the efficient mechanisms used for optimizing control and data
would involve local per-domain redundancy; Local controller plane operations. In particular, the scalability challenges faced
replicas should indeed take over and serve as backups in case by SDX under realistic scenarios have been further investi-
the local primary controller fails. gated by iSDX [98], an enhanced Ryu [72]-based version of
In the same spirit, INRIA’s D-SDN [92] enables a logical SDX intended to operate at the scale of large industrial IXPs.
distribution of the SDN control plane based on a hierar- However, one major drawback of the SDX contribution is
chy of Main Controllers and Secondary Controllers, matching that it is limited to the participant ASes being connected via
the organizational and administrative structure of current and the software-based IXP, implying that non-peering ASes would
future Internet. In addition to dealing with levels of control not benefit from the routing opportunities offered by SDX.
hierarchy, another advantage of D-SDN over DISCO is related Besides, while solutions built on SDX use central TE policies
to its enhanced security and fault tolerance features. for augmenting BGP and promote a logical centralization of
2) SDX-Based Controllers: Different from DISCO which the routing control plane at the IXP level, SDX controllers
proposes per-domain SDN controllers with inter-domain func- are still logically decentralized at the inter-domain level since
tions for allowing autonomous end-to-end flow management no information is shared between them about their respective
across SDN domains, recent trends have considered deploy- interconnected ASes. This brings us back to the same prob-
ing SDN at Internet eXchange Points (IXPs) thus, giving lem we pointed out for DISCO [59] about end-to-end traffic
rise to the concept of Software-Defined eXchanges (SDXes). optimization being a local task for each part of the network.
These SDXes are used to interconnect participants of different To remedy this issue, some recent works [99] have consid-
domains via a shared software-based platform. That platform ered centralizing the whole inter-domain routing control plane
is usually aimed at bringing innovation to traditional peering, to improve BGP convergence by outsourcing the control logic
easing the implementation of customized peering policies and to a multi-AS routing controller that has a “Bird’s-eye view”
enhancing the control over inter-domain traffic management. over multiple ASes.
Prominent projects adopting that vision of software-defined It is also worth mentioning that SDX-based controllers face
IXPs and implementing it in real production networks several limitations in terms of both security and reliability.
include Google’s Cardigan in New Zealand [93], SDX Because the SDX controller is the central element in the
at Princeton [94], CNRS’s French TouIX [95] (European SDX architecture, security strategies must focus on secur-
ENDEAVOUR [96]) and the AtlanticWave-SDX [97]. Here ing the SDX infrastructure by protecting the SDX controller
we chose to focus on the SDX project at Princeton since we against cyber attacks and by authenticating any access to it.
believe in its potential for demonstrating the capabilities of In particular, Chung et al. [100] argue that SDX-based con-
SDN to innovate IXPs and for bringing answers to deploying trollers are subjected to the potential vulnerabilities introduced
SDX in practice. by SDN in addition to the common vulnerabilities associated
The SDX project [94] takes advantage of SDN-enabled with classical protocols. In that respect, they distinguish three
IXPs to fundamentally improve wide-area traffic delivery and types of current SDX architectures and discuss the involved
enhance conventional inter-domain routing protocols that lack security concerns. In their opinion, Layer-3 SDX [93], [94]
the required flexibility for achieving various TE tasks. Today’s will inherit all BGP vulnerabilities, Layer-2 SDX [101] will
BGP is indeed limited to destination-based routing, it has get the vulnerabilities of a shared Ethernet network, and SDN
a local forwarding influence restricted to immediate neigh- SDX [40] will also bring controller vulnerabilities like DDoS
bors, and it deploys indirect mechanisms for controlling path attacks, comprised controllers and malicious controller appli-
selection. To overcome these limitations, SDX relies on SDN cations. Moreover, the same authors of [100] point out that
features to ensure fine-grained, flexible and direct expression SDX-based controllers require security considerations with
of inter-domain control policies, thereby enabling a wider respect to Policy isolation between different SDX participants.
range of valuable end-to-end services such as Inbound TE, Finally, since the SDX controller becomes a potential sin-
application-specific peering, server load balancing, and traffic gle point of failure, fault-tolerance and resiliency measures
redirection through middle-boxes. should be taken into account when building an SDX architec-
The SDX architecture consists of a smart SDX controller ture. While the distributed peer-to-peer SDN SDX architec-
handling both SDX policies (Policy compiler) and BGP routes ture [102] is inherently resilient, centralized SDX approaches
(Route Server), conventional Edge routers, and an OpenFlow- should incorporate fault-tolerance mechanisms like that dis-
enabled switching fabric. The main idea behind this imple- cussed in Section V-B and should also leverage the existing
mentation is to allow participant ASes to compose their own fault-tolerant distributed SDN controller platforms [58].
policies in a high-level (using Pyretic) and independent manner
(through the virtual switch abstraction), and then send them to
the SDX controller. The latter is in charge of compiling these V. S UMMARY AND F UTURE P ERSPECTIVES
policies to SDN forwarding rules while taking into account While offering a promising potential to transform and
BGP information. improve current networks, the SDN initiative is still in
Besides offering this high-level softwarized framework that the early stages of addressing the wide variety of chal-
is easily integrated into the existing infrastructure while main- lenges involving different disciplines. In particular, the dis-
taining good interoperability with its routing protocol, SDX tributed control of SDNs faces a series of pressing challenges
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 347

TABLE I
M AIN C HARACTERISTICS OF THE D ISCUSSED SDN C ONTROLLERS

that require our special consideration. These include the as Kandoo [31] promoted a hierarchical organization of the
issues of (1) Scalability (2) Reliability (3) Consistency (4) control plane to further improve scalability, other alterna-
Interoperability, (5) Monitoring and (6) Security). tives opted for a flat organization for increased reliability
In this paper, we surveyed the most prominent state-of-the and performance (latency). On the other hand, distributed
art distributed SDN controller platforms and more importantly platforms like Onix [60], HyperFlow [60], ONOS [58] and
we discussed the different approaches adopted in tackling the OpenDaylight [83], focused on building consistency models
above challenges and proposing potential solutions. Table I for their logically centralized control plane designs. In par-
gives a brief summary of the main features and KPIs of the ticular, Onix [60] chose DHT and transactional databases for
discussed SDN controllers. Physically-centralized controllers network state distribution over the Publish/Subscribe system
such as NOX [51], POX [56] and FloodLight [53] suffer from used by HyperFlow [61]. Another different class of solutions
scalability and reliability issues. Solutions like DevoFlow [28] has been recently introduced by DISCO which promoted a log-
and DIFANE [27] attempted to solve these scalability issues ically distributed control plane based on existing ASs within
by rethinking the OpenFlow protocol whereas most SDN the Internet.
groups geared their focus towards distributing the control In previous sections, we classified these existing controllers
plane. While some of the distributed SDN proposals such according to the physical organization of the control plane
348 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

(Physical classification) and, alternatively, according to the Even though the distributed control model is considered as
way knowledge is shared in distributed control plane designs a scalable option when compared to the centralized control
(Logical classification). Furthermore, within each of these model, achieving network scalability while preserving good
classifications, we performed another internal classification performance requires a relevant control distribution scheme
based on the similarities between competing SDN controllers that takes into account both the organization of the SDN con-
(The color classification shown in Figure 3 and Figure 4). trol plane and the physical placement of the SDN controllers.
In light of the above, it is obvious that there are vari- In this context, we recommend a hierarchical organization of
ous approaches to building a distributed SDN architecture; the control plane over a flat organization for increased scal-
Some of these approaches met some performance criteria bet- ability and improved performance. We also believe that the
ter than others but failed in some other aspects. Clearly, none placement of controllers should be further investigated and
of the proposed SDN controller platforms met all the discussed treated as an optimization problem that depends on specific
challenges and fulfilled all the KPIs required for an optimal performance metrics [48].
deployment of SDN. At this stage, and building on these pre- Finally, by physically distributing the SDN control plane
vious efforts, we communicate our vision of a distributed SDN for scalability (and reliability Section V-B) purposes, it is
control model by going through these open challenges, iden- worth mentioning that new kinds of challenges may arise.
tifying the best ways of solving them, and envisioning future In particular, to maintain the logically centralized view, a
opportunities. strongly-consistent model can be used to meet certain appli-
cation requirements. However, as discussed in Section V-C,
a strongly consistent model may introduce new scalability
A. Scalability issues. In fact, retaining strong consistency when propagat-
Scalability concerns in SDN may stem from the decoupling ing frequent state updates might block the state progress and
between the control and data planes [103] and the central- cause the network to become unavailable, thus increasing
ization of the control logic in a software-based controller. switch-to-controller latencies.
In fact, as the network grows in size (e.g., switches, hosts,
etc.), the centralized SDN controller becomes highly solicited
(in terms of events/requests) and thus overloaded (in terms B. Reliability
of bandwidth, processing power and memory). Furthermore, Concerns about reliability have been considered as serious
when the network scales up in terms of both size and diam- in SDN. The data-to-control plane decoupling has indeed a
eter, communication delays between the SDN controller and significant impact on the reliability of the SDN control plane.
the network switches may become high, thus affecting flow- In a centralized SDN-based network, the failure of the central
setup latencies. This may also cause congestion in both the controller may collapse the overall network. In contrast, the
control and data planes and may generate longer failover use of multiple controllers in a physically distributed (but log-
times [7]. ically centralized) controller architecture alleviates the issue
That said, since control plane scalability in SDN is com- of a single point of failure.
monly assessed in terms of both throughput (the number of Despite not providing information on how a distributed
flow requests handled per second) and flow setup latency (the SDN controller architecture should be implemented, the
delay to respond flow requests) metrics [7], a single physically- OpenFlow standard gives (since version 1.2) the ability for a
centralized SDN controller may not particularly fulfill the switch to simultaneously connect to multiple controllers. That
performance requirements (with respect to these metrics) of OpenFlow option allows each controller to operate in one of
large-scale networks as compared to small or medium scale three roles (master, slave, equal) with respect to an active
networks (see Section III-A). connection to the switch. Leveraging on these OpenFlow
One way to alleviate some of these scalability concerns is roles which refer to the importance of controller replication
to extend the responsibilities of the data plane in order to in achieving a highly available SDN control plane, various
relieve the load on the controller (see Section II-A). The main resiliency strategies have been adopted by different fault-
drawback of that method is that it imposes some modifications tolerant controller architectures. Among the main challenges
to the design of OpenFlow switches. faced by these architectures are control state redundancy and
The second way, which we believe to be more effective, controller failover.
is to model the control plane in a way that mitigates scal- Controller redundancy can be achieved by adopting dif-
ability limitations. In a physically-centralized control model, ferent approaches for processing network updates. In the
a single SDN controller is in charge of handling all requests Active replication approach [69], also known as State Machine
coming from SDN switches. As the network grows, the lat- Replication, multiple controllers process the commands issued
ter is likely to become a serious bottleneck in terms of by the connected clients in a coordinated and deterministic
scalability and performance [104]. On the other hand, a way in order to concurrently update the replicated network
physically-distributed control model uses multiple controllers state. The main challenge of that method lies in enforcing
that maintain a logically centralized network view. This solu- a strict ordering of events to guarantee strong consistency
tion is appreciated for handling the controller bottleneck, hence among controller replicas. That approach for replication has
ensuring a better scale of the network control plane while the advantage of offering high resilience with an insignifi-
decreasing control-plane latencies. cant downtime, making it a suitable option for delay-intolerant
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 349

scenarios. On the other hand, in passive replication, referred controllers (backups) automatically take over the domain in
to as primary/backup replication, one controller (the primary) case it fails. Despite providing a fast failover technique, this
processes the requests, updates the replicated state, and peri- strategy depends on the associated standby methods (cold,
odically informs the other controller replicas (the backups) warm or hot) which have different advantages and draw-
about state changes. Despite offering simplicity and lower backs [108]. For instance, the cold standby method imposes
overhead, the passive replication scheme may create (con- a full initialization process on the standby controller given
troller and switch) state inconsistencies and generate additional the complete loss of the state upon the primary controller
delay in case the primary controller fails. failure. This makes it an adequate alternative for stateless
Additional concerns that should be explored are related to applications. In contrast, the hot standby method is effective
the kind of information to be replicated across controllers. in ensuring a minimum recovery time with no controller state
Existing controller platform solutions follow three approaches loss, but it imposes a high communication overhead due to the
for achieving controller state redundancy [16]: state repli- full state synchronization requirements between primary and
cation [60], [70], event replication [61], [66] and traffic standby controllers. The warm standby method reduces that
replication [105]. communication overhead at the cost of a partial state loss.
Moreover, control distribution is a central challenge when On the other hand, the non-redundant controller strategy
designing a fault tolerant controller platform. The centralized requires only one controller per controller domain. In case
control approach that follows the simple Master/Slave con- it fails, controllers from other domains extend their domains
cept [66], [70] relies on a single controller (the master) that to adopt orphan switches, thereby reducing the network over-
keeps the entire network state and takes all decisions based head. Two well-known strategies for non-redundant controllers
on a global network view. Backup controllers (the slaves) are are the greedy failover and the pre-partitioning failover [109].
used for fault-tolerance purposes. The centralized alternative While the former strategy relies on neighbor controllers to
is usually considered in small to medium-sized networks. On adopt orphan switches at the edge of their domains and from
the other hand, in the distributed control approach [58], [60], which they can receive messages, the latter relies on con-
the network state is partitioned across many controllers that trollers to proactively exchange information about the list of
simultaneously take control of the network while exchanging switches to take over in controller failure scenarios.
information to maintain the logically centralized network view. All things considered, a number of challenges and key
In that model, controller coordination strategies should be design choices based on a set of requirements are involved
applied to reach agreements and solve the issues of concurrent when adopting a specific controller replication and failover
updates and state consistency. Mostly effective in large-scale strategy. In addition to reliability and fault-tolerance consider-
networks, the distributed alternative provides fault tolerance ations, scalability, consistency and performance requirements
by redistributing the network load among the remaining active should be properly taken into account when designing a
controllers. fault-tolerant SDN controller architecture.
Finally, the implementation aspect is another important
challenge in designing a replication strategy [69]. While some
approaches opted for replicating controllers that store their net- C. Consistency
work state locally and communicate through a specific group Contrary to physically centralized SDN designs, dis-
coordination framework [106], other approaches went for tributed SDN controller platforms face major consistency
replicating the network state by delegating state storage, repli- challenges [110], [111]. Clearly, physically distributed SDN
cation and management to external data stores [58], [60], [61] controllers must exchange network information and handle
like distributed data structures and distributed file the consistency of the network state being distributed across
systems. them and stored in their shared data structures in order to
Apart from controller redundancy, other works focused on maintain a logically centralized network-wide view that eases
failure detection and controller recovery mechanisms. Some of the development of control applications. However, achieving
these works considered reliability criteria from the outset in a convenient level of consistency while keeping good perfor-
the placement of distributed SDN controllers. Both the number mance in software-defined networks facing network partitions
and locations of controllers were determined in a reliability- is a complex task. As claimed by the CAP theorem applied to
aware manner while preserving good performance. Reliability networks [112], it is generally impossible for SDN networks
was indeed introduced in the form of controller placement to simultaneously achieve all three of Consistency (C), high
metrics (switch-to-controller delay, controller load) to prevent Availability (A) and Partition tolerance (P). In the presence of
worst-case switch-to-controller re-assignment scenarios in the network partitions, a weak level of consistency in exchange
event of failures. Other works elaborated on efficient controller for high availability (AP) results in state staleness causing
failover strategies that consider the same reliability criteria. an incorrect behavior of applications whereas a strong level
Strategies for recovering from controller failures can be split of consistency serving the correct enforcement of network
into redundant controller strategies (with backups) and non- policies (CP) comes at the cost of network availability.
redundant controller strategies (without backups) [107]. The Strong Consistency model used in distributed file sys-
The redundant controller strategy assumes more than one tems implies that only one consistent state is observed by
controller per controller domain; One primary controller ensuring that any read operation on a data item returns the
actively controls the network domain and the remaining value of the latest write operation that occurred on that data
350 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

item. However, such consistency guarantees are achieved at the consistency models have been applied to SDNs and adopted by
penalty of increased data store access latencies. In SDNs, the most distributed SDN controller platforms: strong consistency
strong consistency model guarantees that all controller repli- and eventual consistency.
cas in the cluster have the most updated network information, In our opinion, a hybrid approach that merges various
albeit at the cost of increased synchronization and communi- consistency levels should be considered to find the optimal
cation overhead. In fact, if certain data occurring in different trade-off between consistency and performance. Unlike the
controllers are not updated to all of them, then these data are previously-mentioned approaches which are based on static
not allowed to be read, thereby impacting network availability consistency requirements where SDN designers decide which
and scalability. consistency level should be applied for each knowledge upon
Strong consistency is crucial for implementing a wide range application development, we argue that an SDN application
of SDN applications that require the latest network informa- should be able to assign a priority for each knowledge and,
tion and that are intolerant of network state inconsistencies. depending on the network context (i.e., instantaneous con-
Among the distributed data store designs that provide strong straints, network load, etc), select the appropriate consistency
consistency properties are the traditional SQL-based relational level that should be enforced.
databases like Oracle [113] and MySQL [114]. In that sense, recent approaches [119] introduced the con-
On the other hand, as opposed to the strong consistency cept of adaptive consistency in the context of distributed SDN
model, the Eventual Consistency model (sometimes referred controllers, where adaptively-consistent controllers can tune
to as a Weak Consistency model) implies that concurrent reads their consistency level to reach the desired level of perfor-
of a data item may return values that are different from the mance based on specific metrics. That alternative has the
actual updated value for a transient time period. This model advantage of sparing application developers the tedious task of
takes a more relaxed approach to consistency by assuming that selecting the appropriate consistency level and implementing
the system will eventually (after some period) become con- multiple application-specific consistency models. Furthermore,
sistent in order to gain in network availability. Accordingly, that approach can be efficient in handling the issues associated
in a distributed SDN scenario, reads of some data occurring with eventual consistency models [120].
in different SDN controller replicas may return different val- Finally, in the same way as scalability and reliability, we
ues for some time before eventually converging to the same believe that consistency should be considered while investigat-
global state. As a result, SDN controllers may temporarily ing the optimal placement of controllers. In fact, minimizing
have an inconsistent network view and thus cause an incorrect inter-controller latencies (distances) which are critical for sys-
application behavior. tem performance facilitates controller communications and
Eventually-consistent models have also been extensively enhances network state consistency.
used by SDN designers for developing inconsistency-tolerant
applications that require high scalability and availability. These
control models provide simplicity and efficiency of imple- D. Interoperability
mentation but they push the complexity of resolving state Alongside the concerns about the SDN network interoper-
inconsistencies and conflicts to the application logic and the ability with legacy networks, there is the challenge of ensuring
consensus algorithms being put in place by the controller plat- interoperability between disparate distributed SDN controllers
form. Cassandra [115], Riak [116] and Dynamo [117] are belonging to different SDN domains and using different con-
popular examples of NoSQL databases that have adopted the troller technologies in order to foster the development and
eventual consistency model. adoption of SDN.
All things considered, maintaining state consistency across In today’s multi-vendor environments, the limited interop-
logically centralized SDN controllers is a significant SDN erability between SDN controller platforms is mainly due to
design challenge that involves trade-offs between policy a lack of open standards for inter-controller communications.
enforcement and network performance [118]. The issue is Apart from the standardization of the Southbound interface—
that achieving strong consistency in an SDN environment OpenFlow being the most popular Southbound standard,
that is prone to network failures is almost impossible with- there is to date no open standard for the Northbound and
out compromising availability and without adding complexity East-Westbound interfaces to provide compatibility between
to network state management. Panda et al. [112] proposed OpenFlow implementations.
new ways to circumvent these impossibility results but their Despite the emerging standardization efforts underway by
approaches can be regarded as specific to particular cases. SDN organizations, we argue that there are many barriers
In a more general context, SDN designers need to leverage to effective and rapid standardization of the SDN East-
the flexibility offered by SDN to select the appropriate consis- Westbound interfaces, including the heterogeneity of the data
tency models for developing applications with various degrees models being used by SDN controller vendors. Accordingly,
of state consistency requirements and with different policies. we emphasize the need for common data models to achieve
In particular, adopting a single consistency model for handling interoperability and facilitate the tasks of standardization in
different types of shared states may not be the best approach SDNs. In this context, YANG [121] has emerged as a solid
to coping with such a heterogeneous SDN environment. As a data modeling language used to model configuration and state
matter of fact, recent works on SDN have stressed the need for data for standard representation. This NETCONF-based con-
achieving consistency at different levels. So far, two levels of tribution from IETF is intended to be extended in the future
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 351

and it is, more importantly, expected to pave the way for the towards more flexible, agile, adaptable and highly automated
emergence of standard data models driving interoperability in Next Generation Networks.
SDN networks. Despite all the hype, SDN entails many concerns
Among the recent initiatives taken in that direction, we and questions regarding its implementation and deploy-
can mention OpenConfig’s [122] effort on building a vendor- ment. For instance, current SDN deployments based on
neutral data model written in YANG for configuration and physically-centralized control architectures raised several
management operations. Also worth mentioning is ONF’s OF- issues of scalability and reliability. As a result, dis-
Config protocol [123] which implements a YANG-based data tributed SDN control architectures were proposed as a
model referred to as the Core Data Model. That protocol suitable solution for overcoming these problems. However,
was introduced to enable remote configuration of OpenFlow- there are still ongoing community debates about the best
capable equipments. and most optimal approach to decentralizing the net-
work control plane in order to harness the full potential
E. Other Challenges of SDN.
The novel aspect of this survey is the special focus placed
An efficient network monitoring is required for the devel-
on studying the wide variety of existing SDN controller plat-
opment of control and management applications in distributed
forms. These platforms are categorized in two ways: based
SDN-based networks. However, collecting the appropriate data
on a physical classification or a logical classification. Our
and statistics without impacting the network performance is a
thorough analysis of these proposals allowed us to achieve
challenging task. In fact, the continuous monitoring of net-
an extensive understanding of their advantages and drawbacks
work data and statistics may generate excessive overheads
and to develop a critical awareness of the challenges facing
and thus affect the network performance whereas the lack
the distributed control in SDNs.
of monitoring may cause an incorrect behavior of man-
The scalability, reliability, consistency, and interoperabil-
agement applications. Current network monitoring proposals
ity of the SDN control plane are among the key competing
have developed different techniques to find the appropriate
challenges faced in designing an efficient and robust high-
trade-offs between data accuracy and monitoring overhead. In
performance distributed SDN controller platform. Although
particular, IETF’s NETCONF Southbound protocol provides
considered as the main limitations of fully centralized SDN
some effective monitoring mechanisms for collecting statistics
control designs, scalability and reliability are also major con-
and configuring network devices. In the near future, we expect
cerns that are expressed in the case of distributed SDN
the OpenFlow specification to be extended to incorporate new
architectures as they are highly impacted by the structure of
monitoring tools and functions.
the distributed control plane (e.g., flat, hierarchical or hybrid
Like network monitoring, network security is another cru-
organization) as well as the number and placement of the
cial challenge that should be studied. The decentralization of
multiple controllers within the SDN network. Achieving such
the SDN control reduces the risk associated with a single
performance and availability requirements usually comes at
point of failure and attacks (e.g., the risk of a DDoS attack).
the cost of guaranteeing a consistent centralized network view
However, the integrity of data flows between the SDN con-
that is required for the design and correct behavior of SDN
trollers and switches is still not safe. For instance, we can
applications. Consistency considerations should therefore be
imagine that an attacker can corrupt the network by acting as
explored among the trade-offs involved in the design process
an SDN controller. In this context, new solutions and strate-
of an SDN controller platform. Last but not least, the interoper-
gies (e.g., based on TLS/SSL) have been introduced with the
ability between different SDN controller platforms of multiple
aim of guaranteeing security in SDN environments.
vendors is another crucial operational challenge surround-
Another aspect related to SDN security is the isolation of
ing the development, maturity and commercial adoption of
flows and networks through network virtualization. In the case
SDN. Overcoming that challenge calls for major standardiza-
of an underlying physical SDN network, this could be imple-
tion efforts at various levels of inter-controller communications
mented using an SDN network hypervisor that creates multiple
(e.g., Data models, Northbound and East-Westbound inter-
logically-isolated virtual network slices (called vSDNs), each
faces). Furthermore, such interoperability guarantees with
is managed by its own vSDN controller [14]. At this point,
respect to different SDN technology solutions represent an
care should be taken to design and secure the SDN hypervisor
important step towards easing the widespread interoperability
as an essential part of the SDN network.
of these SDN platforms with legacy networks and, effectively
ensuring the gradual transition towards softwarized network
VI. C ONCLUSION environments.
Software-Defined Networking has increasingly gained trac- Giving that rich variety of promising SDN controller plat-
tion over the last few years in both academia and research. forms with their broad range of significant challenges, we
The SDN paradigm builds its promises on the separation argue that developing a brand-new one may not be the best
of concerns between the network control logic and the for- solution. However, it is essential to leverage the existing plat-
warding devices, as well as the logical centralization of the forms by aggregating, merging and improving their proposed
network intelligence in software components. Thanks to these ideas in order to get as close as possible to a common stan-
key attributes, SDN is believed to work with network virtu- dard that could emerge in the upcoming years. That distributed
alization to fundamentally change the networking landscape SDN controller platform should meet the emerging challenges
352 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

associated with next generation networks (e.g., IoT [124] and [20] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: State-of-the-
Fog Computing [125]). art and research challenges,” J. Internet Services Appl., vol. 1, no. 1,
pp. 7–18, 2010.
With these considerations in mind, we intend, as part of [21] M. A. Sharkh, M. Jammal, A. Shami, and A. Ouda, “Resource allo-
our future work, to shed more light on the complex problem cation in a network-based cloud computing environment: Design chal-
of distributed SDN control. We propose to split that problem lenges,” IEEE Commun. Mag., vol. 51, no. 11, pp. 46–52, Nov. 2013.
[22] Ansible. Accessed: Aug. 25, 2017. [Online]. Available:
into two manageable challenges which are correlated: The con- https://www.ansible.com/
troller placement problem and the knowledge sharing problem. [23] Open Networking Foundation, ONF, Palo Alto, CA, USA, 2014.
The first problem investigates the required number of con- [Online]. Available: https://www.opennetworking.org/
[24] N. McKeown et al., “OpenFlow: Enabling innovation in campus net-
trollers along with their appropriate locations with respect to works,” SIGCOMM Comput. Commun. Rev., vol. 38, no. 2, pp. 69–74,
the desired performance and reliability objectives and depend- Apr. 2008.
ing on the existing constraints. The second one is related to [25] A. Doria et al., “Forwarding and control element separation (ForCES)
protocol specification,” Internet Eng. Task Force, Fremont, CA, USA,
the type and amount of information to be shared among the Rep. 22, Mar. 2010.
controller instances given a desired level of consistency. [26] “OpenFlow switch specification,” Open Netw. Found., Palo
Alto, CA, USA, Rep. 1.0.0, Dec. 2009. [Online]. Available:
https://www.opennetworking.org/
R EFERENCES [27] M. Yu, J. Rexford, M. J. Freedman, and J. Wang, “Scalable flow-
based networking with DIFANE,” in Proc. ACM SIGCOMM Conf.
[1] D. Kreutz et al., “Software-defined networking: A comprehensive (SIGCOMM), New Delhi, India, 2010, pp. 351–362.
survey,” Proc. IEEE, vol. 103, no. 1, pp. 14–76, Jan. 2015. [28] A. R. Curtis et al., “DevoFlow: Scaling flow management for high-
[2] N. Samaan and A. Karmouch, “Towards autonomic network manage- performance networks,” in Proc. ACM SIGCOMM Conf. (SIGCOMM),
ment: An analysis of current and future research directions,” IEEE Toronto, ON, Canada, 2011, pp. 254–265.
Commun. Surveys Tuts., vol. 11, no. 3, pp. 22–36, 3rd Quart., 2009. [29] G. Bianchi, M. Bonola, A. Capone, and C. Cascone, “OpenState:
[3] N. Feamster, J. Rexford, and E. Zegura, “The road to SDN: An Programming platform-independent stateful OpenFlow applications
intellectual history of programmable networks,” SIGCOMM Comput. inside the switch,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 2,
Commun. Rev., vol. 44, no. 2, pp. 87–98, Apr. 2014. pp. 44–51, Apr. 2014.
[4] Y. Li, X. Su, J. Riekki, T. Kanter, and R. Rahmani, “A SDN-based [30] P. Bosshart et al., “P4: Programming protocol-independent packet pro-
architecture for horizontal Internet of Things services,” in Proc. IEEE cessors,” SIGCOMM Comput. Commun. Rev., vol. 44, no. 3, pp. 87–95,
Int. Conf. Commun. (ICC), May 2016, pp. 1–7. Jul. 2014.
[5] M. Canini et al., STN: A Robust and Distributed SDN Control Plane, [31] S. H. Yeganeh and Y. Ganjali, “Kandoo: A framework for efficient
Open Netw. Summit Res., Track, Mar. 2014. and scalable offloading of control applications,” in Proc. 1st Workshop
[6] W. Braun and M. Menth, “Software-defined networking using Hot Topics Softw. Defined Netw. (HotSDN), Helsinki, Finland, 2012,
OpenFlow: Protocols, applications and architectural design choices,” pp. 19–24.
Future Internet, vol. 6, no. 2, pp. 302–336, 2014. [32] M. Moshref, A. Bhargava, A. Gupta, M. Yu, and R. Govindan, “Flow-
[7] M. Karakus and A. Durresi, “A survey: Control plane scalability level state transition as a new switch primitive for SDN,” in Proc. ACM
issues and approaches in software-defined networking (SDN),” Comput. Conf. SIGCOMM (SIGCOMM), Chicago, IL, USA, 2014, pp. 377–378.
Netw., vol. 112, pp. 279–293, Jan. 2017. [33] M. T. Arashloo, Y. Koral, M. Greenberg, J. Rexford, and D. Walker,
[8] W. Li, W. Meng, and L. F. Kwok, “A survey on OpenFlow-based soft- “SNAP: Stateful network-wide abstractions for packet processing,” in
ware defined networks: Security challenges and countermeasures,” J. Proc. ACM SIGCOMM Conf. (SIGCOMM), 2016, pp. 29–43.
Netw. Comput. Appl., vol. 68, pp. 126–139, Jun. 2016. [34] S. Pontarelli, M. Bonola, G. Bianchi, A. Capone, and C. Cascone,
[9] Y. Jarraya, T. Madi, and M. Debbabi, “A survey and a layered taxonomy “Stateful OpenFlow: Hardware proof of concept,” in Proc. IEEE 16th
of software-defined networking,” IEEE Commun. Surveys Tuts., vol. 16, Int. Conf. High Perform. Switching Routing (HPSR), Jul. 2015, pp. 1–8.
no. 4, pp. 1955–1980, 4th Quart., 2014. [35] G. Bianchi et al., “Open packet processor: A programmable architec-
[10] B. A. A. Nunes, M. Mendonca, X.-N. Nguyen, K. Obraczka, and ture for wire speed platform-independent stateful in-network process-
T. Turletti, “A survey of software-defined networking: Past, present, ing,” CoRR, vol. abs/1605.01977, no. 47, pp. 777–780, 2016.
and future of programmable networks,” IEEE Commun. Surveys Tuts., [36] R. Bifulco, J. Boite, M. Bouet, and F. Schneider, “Improving SDN with
vol. 16, no. 3, pp. 1617–1634, 3rd Quart., 2014. InSPired switches,” in Proc. Symp. SDN Res. (SOSR), 2016, pp. 1–12.
[11] F. Hu, Q. Hao, and K. Bao, “A survey on software-defined network and [37] D. L. Tennenhouse and D. J. Wetherall, “Towards an active net-
OpenFlow: From concept to implementation,” IEEE Commun. Surveys work architecture,” SIGCOMM Comput. Commun. Rev., vol. 37, no. 5,
Tuts., vol. 16, no. 4, pp. 2181–2206, 4th Quart., 2014. pp. 81–94, Oct. 2007.
[12] W. Xia, Y. Wen, C. H. Foh, D. Niyato, and H. Xie, “A survey on [38] V. Jeyakumar, M. Alizadeh, C. Kim, and D. Mazières, “Tiny
software-defined networking,” IEEE Commun. Surveys Tuts., vol. 17, packet programs for low-latency network control and monitoring,” in
no. 1, pp. 27–51, 1st Quart., 2015. Proc. 12th ACM Workshop Hot Topics Netw. (HotNets XII), 2013,
[13] C. Trois, M. D. D. Fabro, L. C. E. de Bona, and M. Martinello, “A pp. 1–7.
survey on SDN programming languages: Toward a taxonomy,” IEEE [39] J. Yang et al., “FOCUS: Function offloading from a controller to utilize
Commun. Surveys Tuts., vol. 18, no. 4, pp. 2687–2712, 4th Quart., switch power,” in Proc. IEEE Conf. Netw. Funct. Virtualization Softw.
2016. Defined Netw. (NFV SDN), Nov. 2016, pp. 199–205.
[14] A. Blenk, A. Basta, M. Reisslein, and W. Kellerer, “Survey on net- [40] P. Lin et al., “A west-east bridge based SDN inter-domain testbed,”
work virtualization hypervisors for software defined networking,” IEEE IEEE Commun. Mag., vol. 53, no. 2, pp. 190–197, Feb. 2015.
Commun. Surveys Tuts., vol. 18, no. 1, pp. 655–685, 1st Quart., 2016. [41] N. Foster et al., “Frenetic: A network programming language,” in
[15] S. Scott-Hayward, S. Natarajan, and S. Sezer, “A survey of security Proc. 16th ACM SIGPLAN Int. Conf. Funct. Program. (ICFP), 2011,
in software defined networks,” IEEE Commun. Surveys Tuts., vol. 18, pp. 279–291.
no. 1, pp. 623–654, 1st Quart., 2016. [42] A. Voellmy, H. Kim, and N. Feamster, “Procera: A language for high-
[16] P. C. da Rocha Fonseca and E. S. Mota, “A survey on fault management level reactive network control,” in Proc. 1st Workshop Hot Topics Softw.
in software-defined networks,” IEEE Commun. Surveys Tuts., vol. 19, Defined Netw. (HotSDN), 2012, pp. 43–48.
no. 4, pp. 2284–2321, 4th Quart., 2017. [43] C. Monsanto, J. Reich, N. Foster, J. Rexford, and D. Walker,
[17] I. T. Haque and N. Abu-Ghazaleh, “Wireless software defined network- “Composing software-defined networks,” in Proc. 10th USENIX Conf.
ing: A survey and taxonomy,” IEEE Commun. Surveys Tuts., vol. 18, Netw. Syst. Design Implement. (NSDI), 2013, pp. 1–14.
no. 4, pp. 2713–2737, 4th Quart., 2016. [44] S. H. Yeganeh, A. Tootoonchian, and Y. Ganjali, “On scalability of
[18] H. Kim and N. Feamster, “Improving network management with software-defined networking,” IEEE Commun. Mag., vol. 51, no. 2,
software defined networking,” IEEE Commun. Mag., vol. 51, no. 2, pp. 136–141, Feb. 2013.
pp. 114–119, Feb. 2013. [45] S. Azodolmolky, P. Wieder, and R. Yahyapour, “Performance evaluation
[19] M. F. Bari et al., “Data center network virtualization: A survey,” IEEE of a scalable software-defined networking deployment,” in Proc. 2nd
Commun. Surveys Tuts., vol. 15, no. 2, pp. 909–928, 2nd Quart., 2013. Eur. Workshop Softw. Defined Netw., Oct. 2013, pp. 68–74.
BANNOUR et al.: DISTRIBUTED SDN CONTROL: SURVEY, TAXONOMY, AND CHALLENGES 353

[46] T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics [72] Ryu SDN Framework. Accessed: Aug. 25, 2017. [Online]. Available:
of data centers in the wild,” in Proc. 10th ACM SIGCOMM Conf. https://osrg.github.io/ryu/
Internet Meas. (IMC), 2010, pp. 267–280. [73] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “Zookeeper: Wait-
[47] O. Michel and E. Keller, “SDN in wide-area networks: A survey,” in free coordination for Internet-scale systems,” in Proc. USENIX Conf.
Proc. 4th Int. Conf. Softw. Defined Syst. (SDS), May 2017, pp. 37–42. USENIX Annu. Tech. Conf. (USENIXATC), Boston, MA, USA, 2010,
[48] B. Heller, R. Sherwood, and N. McKeown, “The controller place- p. 11.
ment problem,” in Proc. 1st Workshop Hot Topics Softw. Defined Netw. [74] Y. Liu, A. Hecker, R. Guerzoni, Z. Despotovic, and S. Beker, “On
(HotSDN), 2012, pp. 7–12. optimal hierarchical SDN,” in Proc. IEEE Int. Conf. Commun. (ICC),
[49] M. T. I. ul Huque, W. Si, G. Jourjon, and V. Gramoli, “Large-scale London, U.K., Jun. 2015, pp. 5374–5379.
dynamic controller placement,” IEEE Trans. Netw. Service Manag., [75] S. Jain et al., “B4: Experience with a globally-deployed soft-
vol. 14, no. 1, pp. 63–76, Mar. 2017. ware defined WAN,” in Proc. ACM Conf. SIGCOMM (SIGCOMM),
[50] A. Shalimov, D. Zuikov, D. Zimarina, V. Pashkov, and R. Smeliansky, Hong Kong, 2013, pp. 3–14.
“Advanced study of SDN/OpenFlow controllers,” in Proc. 9th Central [76] A. Kumar et al., “BwE: Flexible, hierarchical bandwidth alloca-
Eastern Eur. Softw. Eng. Conf. Russia (CEE SECR), 2013, pp. 1–6. tion for WAN distributed computing,” in Proc. ACM Conf. Special
[51] N. Gude et al., “NOX: Towards an operating system for networks,” Interest Group Data Commun. (SIGCOMM), London, U.K., Aug. 2015,
SIGCOMM Comput. Commun. Rev., vol. 38, no. 3, pp. 105–110, pp. 1–14.
Jul. 2008. [77] T. D. Chandra, R. Griesemer, and J. Redstone, “Paxos made live: An
[52] D. Erickson, “The beacon OpenFlow controller,” in Proc. 2nd ACM engineering perspective,” in Proc. 26th Annu. ACM Symp. Principles
SIGCOMM Workshop Hot Topics Softw. Defined Netw. (HotSDN), Distrib. Comput. (PODC), Portland, OR, USA, 2007, pp. 398–407.
2013, pp. 13–18. [78] K.-K. Yap et al., “Taking the edge off with espresso: Scale, reliability
[53] Floodlight Project. Accessed: Aug. 25, 2017. [Online]. Available: and programmability for global Internet peering,” in Proc. Conf. ACM
http://www.projectfloodlight.org/ Special Interest Group Data Commun. (SIGCOMM), Los Angeles, CA,
[54] A. Tavakoli, M. Casado, T. Koponen, and S. Shenker, “Applying NOX USA, 2017, pp. 432–445.
to the datacenter,” in Proc. Workshop Hot Topics Netw. (HotNets VIII), [79] F. Benamrane, M. B. Mamoun, and R. Benaini, “Performances of
2009. OpenFlow-based software-defined networks: An overview,” J. Netw.,
[55] A. Tootoonchian, S. Gorbunov, Y. Ganjali, M. Casado, and vol. 10, no. 6, pp. 329–337, 2015.
R. Sherwood, “On controller performance in software-defined net- [80] B. Oki and B. Liskov, “Viewstamped replication: A new primary copy
works,” in Proc. 2nd USENIX Conf. Hot Topics Manag. Internet Cloud method to support highly-available distributed systems,” in Proc. 7th
Enterprise Netw. Services (Hot ICE), San Jose, CA, USA, 2012, p. 10. Annu. ACM Symp. Principles Distrib. Comput. (PODC), Aug. 1988,
[56] POX. Accessed: Aug. 25, 2017. [Online]. Available: pp. 8–17.
http://www.noxrepo.org/pox/about-pox/ [81] A. Mantas and F. M. V. Ramos, “Consistent and fault-tolerant SDN
[57] M. Dhawan, R. Poddar, K. Mahajan, and V. Mann, “SPHINX: with unmodified switches,” CoRR, vol. abs/1602.04211, Mar. 2016.
Detecting security attacks in software-defined networks,” in Proc. [82] A. Bondkovskii, J. Keeney, S. van der Meer, and S. Weber, “Qualitative
NDSS, 2015. comparison of open-source SDN controllers,” in Proc. IEEE/IFIP
[58] P. Berde et al., “ONOS: Towards an open, distributed SDN OS,” Netw. Oper. Manag. Symp. (NOMS), Istanbul, Turkey, Apr. 2016,
in Proc. 3rd Workshop Hot Topics Softw. Defined Netw. (HotSDN), pp. 889–894.
Chicago, IL, USA, 2014, pp. 1–6. [83] OpenDayLight Project. Accessed: Aug. 25, 2017. [Online]. Available:
[59] K. Phemius, M. Bouet, and J. Leguay, “DISCO: Distributed multi- http:// www.opendaylight.org/
domain SDN controllers,” CoRR, vol. abs/1308.6138, Aug. 2013. [84] A. S. Muqaddas, A. Bianco, P. Giaccone, and G. Maier, “Inter-
[60] T. Koponen et al., “Onix: A distributed control platform for large-scale controller traffic in ONOS clusters for SDN networks,” in Proc.
production networks,” in Proc. 9th USENIX Conf. Oper. Syst. Design IEEE Int. Conf. Commun. (ICC), Kuala Lumpur, Malaysia, May 2016,
Implement. (OSDI), 2010, pp. 351–364. pp. 1–6.
[61] A. Tootoonchian and Y. Ganjali, “HyperFlow: A distributed control [85] D. Ongaro and J. Ousterhout, “In search of an understandable con-
plane for OpenFlow,” in Proc. Internet Netw. Manag. Conf. Res. sensus algorithm,” in Proc. USENIX Conf. USENIX Annu. Tech. Conf.
Enterprise Netw. (INM/WREN), 2010, p. 3. (USENIX ATC), Philadelphia, PA, USA, 2014, pp. 305–320.
[62] J. Stribling et al., “Flexible, wide-area storage for distributed sys- [86] Akka Framework. Accessed: Aug. 25, 2017. [Online]. Available:
tems with wheelfs,” in Proc. 6th USENIX Symp. Netw. Syst. Design http://akka.io/
Implement. (NSDI), Boston, MA, USA, Apr. 2009, pp. 43–58. [87] C.-Y. Hong et al., “Achieving high utilization with software-driven
[63] A. Bianco, P. Giaccone, S. D. Domenico, and T. Zhang, “The role of wan,” in Proc. ACM Conf. SIGCOMM (SIGCOMM), Hong Kong, 2013,
inter-controller traffic for placement of distributed SDN controllers,” pp. 15–26.
CoRR, vol. abs/1605.09268, May 2016. [88] M. Caesar et al., “Design and implementation of a routing control
[64] B. Chandrasekaran and T. Benson, “Tolerating SDN application fail- platform,” in Proc. 2nd Conf. Symp. Netw. Syst. Design Implement.
ures with LegoSDN,” in Proc. 13th ACM Workshop Hot Topics Netw. (NSDI), vol. 2, 2005, pp. 15–28.
(HotNets XIII), 2014, pp. 1–7. [89] C. E. Rothenberg et al., “Revisiting routing control platforms with
[65] S. Shin et al., “Rosemary: A robust, secure, and high-performance the eyes and muscles of software-defined networking,” in Proc. 1st
network operating system,” in Proc. ACM SIGSAC Conf. Comput. Workshop Hot Topics Softw. Defined Netw. (HotSDN), Helsinki,
Commun. Security (CCS), 2014, pp. 78–89. Finland, 2012, pp. 13–18.
[66] N. Katta, H. Zhang, M. Freedman, and J. Rexford, “Ravana: Controller [90] AMQP. Accessed: Aug. 25, 2017. [Online]. Available:
fault-tolerance in software-defined networking,” in Proc. 1st ACM http://www.amqp.org/
SIGCOMM Symp. Softw. Defined Netw. Res. (SOSR), Scottsdale, AZ, [91] D. D. Clark, C. Partridge, J. C. Ramming, and J. T. Wroclawski, “A
USA, 2015, pp. 1–12. knowledge plane for the Internet,” in Proc. Conf. Appl. Technol. Archit.
[67] S. H. Yeganeh and Y. Ganjali, “Beehive: Simple distributed program- Protocols Comput. Commun. (SIGCOMM), Karlsruhe, Germany, 2003,
ming in software-defined networks,” in Proc. Symp. SDN Res. (SOSR), pp. 3–10.
Santa Clara, CA, USA, 2016, pp. 1–12. [92] M. A. S. Santos et al., “Decentralizing SDN’s control plane,” in Proc.
[68] B. Chandrasekaran, B. Tschaen, and T. Benson, “Isolating and toler- IEEE 39th Conf. Local Comput. Netw. (LCN), Edmonton, AB, Canada,
ating SDN application failures with LegoSDN,” in Proc. Symp. SDN Sep. 2014, pp. 402–405.
Res. (SOSR), Santa Clara, CA, USA, 2016, pp. 1–12. [93] J. P. Stringer et al., “Cardigan: SDN distributed routing fabric going
[69] E. S. Spalla et al., “AR2C2: Actively replicated controllers for SDN live at an Internet exchange,” in Proc. IEEE Symp. Comput. Commun.
resilient control plane,” in Proc. IEEE/IFIP Netw. Oper. Manag. Symp. (ISCC), Funchal, Portugal, Jun. 2014, pp. 1–7.
(NOMS), Istanbul, Turkey, Apr. 2016, pp. 189–196. [94] A. Gupta et al., “SDX: A software defined Internet exchange,” in Proc.
[70] F. Botelho, A. Bessani, F. M. V. Ramos, and P. Ferreira, “On the design ACM SIGCOMM, Chicago, IL, USA, 2014, pp. 551–562.
of practical fault-tolerant SDN controllers,” in Proc. 3rd Eur. Workshop [95] R. Lapeyrade, M. Bruyere, and P. Owezarski, “OpenFlow-based
Softw. Defined Netw., London, U.K., Sep. 2014, pp. 73–78. migration and management of the TouIX IXP,” in Proc. IEEE/IFIP
[71] A. Bessani, J. A. Sousa, and E. E. P. Alchieri, “State machine replica- Netw. Oper. Manag. Symp. (NOMS), Istanbul, Turkey, Apr. 2016,
tion for the masses with BFT-SMART,” in Proc. 44th Annu. IEEE/IFIP pp. 1131–1136.
Int. Conf. Depend. Syst. Netw. (DSN), Atlanta, GA, USA, 2014, [96] ENDEAVOUR Project. Accessed: Aug. 25, 2017. [Online]. Available:
pp. 355–362. https://www.h2020-endeavour.eu/
354 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 20, NO. 1, FIRST QUARTER 2018

[97] H. Morgan. (2015). AtlanticWave-SDX: A Distributed Intercontinental [121] S. Wallin and C. Wikström, “Automating network and service config-
Experimental Software Defined Exchange for Research and uration using NETCONF and YANG,” in Proc. 25th Int. Conf. Large
Education Networking. [Online]. Available: https://itnews.fiu.edu/wp- Installation Syst. Admin. (LISA), Boston, MA, USA, 2011, p. 22.
content/uploads/sites/8/2015/04/AtlanticWaveSDX-Press- [122] OpenConfig. Accessed: Aug. 25, 2017. [Online]. Available:
Release_FinalDraft.pdf http://www.openconfig.net/
[98] A. Gupta et al., “An industrial-scale software defined Internet exchange [123] “OF-CONFIG 1.2: OpenFlow management and configuration proto-
point,” in Proc. 13th USENIX Symp. Netw. Syst. Design Implement. col,” Open Netw. Found., Palo Alto, CA, USA, Rep. ONF TS-016,
(NSDI), Santa Clara, CA, USA, Mar. 2016, pp. 1–14. 2014.
[99] V. Kotronis, A. Gämperli, and X. Dimitropoulos, “Routing central- [124] M. Ojo, D. Adami, and S. Giordano, “A SDN-IoT architecture
ization across domains via SDN,” Comput. Netw., vol. 92, no. P2, with NFV implementation,” in Proc. IEEE Globecom Workshops
pp. 227–239, Dec. 2015. (GC Wkshps), Washington, DC, USA, Dec. 2016, pp. 1–6.
[100] J. Chung et al., “AtlanticWave-SDX: An international SDX to support [125] K. Liang, L. Zhao, X. Chu, and H.-H. Chen, “An integrated architecture
science data applications,” in Proc. Sci. Netw. Workshop Softw. Defined for software defined and virtualized radio access networks with fog
Netw. (SDN), vol. 15. Austin, TX, USA, Nov. 2015, pp. 1–7. computing,” IEEE Netw., vol. 31, no. 1, pp. 80–87, Jan./Feb. 2017.
[101] Advanced Layer 2 System, Internet2, Ann Arbor, MI, USA,
Dec. 2014. [Online]. Available: https://www.internet2.edu/products-
services/advanced-networking/layer-2-services/
[102] J. Chung, H. Owen, and R. Clark, “SDX architectures: A qualitative Fetia Bannour received the B.Eng. degree in
analysis,” in Proc. SoutheastCon, Norfolk, VA, USA, Mar./Apr. 2016, multidisciplinary engineering from the Tunisia
pp. 1–8. Polytechnic School of the University of Carthage,
Tunisia, in 2014. She is currently pursuing the
[103] M. Jammal, T. Singh, A. Shami, R. Asal, and Y. Li, “Software defined
Ph.D. degree in computer science with the Networks
networking: State of the art and research challenges,” Comput. Netw.,
and Telecommunications Department, University of
vol. 72, pp. 74–98, Oct. 2014.
Paris-Est Créteil, and LiSSi/TincNet Laboratory,
[104] K. Qiu et al., “ParaCon: A parallel control plane for scaling up path
France. Her research interests include computer net-
computation in SDN,” IEEE Trans. Netw. Service Manag., vol. 14,
works, software-defined networks, distributed con-
no. 4, pp. 978–990, Dec. 2017.
trol, Internet of Things , adaptive and autonomous
[105] P. Fonseca, R. Bennesby, E. Mota, and A. Passito, “Resilience of SDNs
control systems, and network virtualization. She also
based on active and passive replication mechanisms,” in Proc. IEEE
served as a Local Organizer for SaCCoNeT 2016 and as a Student volunteer
Glob. Commun. Conf. (GLOBECOM), Atlanta, GA, USA, Dec. 2013,
for IEEE ICC 2017.
pp. 2188–2193.
[106] V. Yazici, M. O. Sunay, and A. O. Ercan, “Controlling a software-
defined network via distributed controllers,” CoRR, vol. abs/1401.7651,
Jan. 2014. Sami Souihi received the M.Sc. degree from
[107] N. Kong, “Design concept for a failover mechanism in dis- the University of Paris 6, France, in 2010, and
tributed SDN controllers,” M.S. thesis, Dept. Comput. Sci., San the Ph.D. degree from the University of Paris-
Jose State Univ., San Jose, CA, USA, 2017. [Online]. Available: Est Cretéil in 2013. He is an Associate Professor
http://scholarworks.sjsu.edu/etd_projects/548/ of computer science with the Networks and
[108] V. Pashkov, A. Shalimov, and R. Smeliansky, “Controller failover for Telecommunications Department, Paris-Est Créteil
SDN enterprise networks,” in Proc. Int. Sci. Technol. Conf. (Mod. Netw. University and LiSSi/TincNet Laboratory, France.
Technol.) (MoNeTeC), Moscow, Russia, Oct. 2014, pp. 1–6. His research work focuses on adaptive mechanisms
[109] M. Obadia, M. Bouet, J. Leguay, K. Phemius, and L. Iannone, “Failover in large-scale dynamic networks. These mechanisms
mechanisms for distributed SDN controllers,” in Proc. Int. Conf. are based on context-enhanced knowledge, network
Workshop Netw. Future (NOF), Paris, France, Dec. 2014, pp. 1–6. functions virtualization, and software-defined net-
[110] L. Schiff, S. Schmid, and P. Kuznetsov, “In-band synchronization for working. He also served as a Technical Program Committee (TPC) Member
distributed SDN control planes,” SIGCOMM Comput. Commun. Rev., for international conferences, such as, IEEE SACONET, IEEE ICC, IEEE
vol. 46, no. 1, pp. 37–43, Jan. 2016. Globecom, WWIC and as a TPC Co-Chair of the IEEE ICC 2017 Symposium
[111] F. Botelho, T. A. Ribeiro, P. Ferreira, F. M. V. Ramos, and A. Bessani, on Communications QoS, Reliability and Modeling. He is a Reviewer for
“Design and implementation of a consistent data store for a dis- several IEEE conferences and journals, including IEEE ICC, IEEE WCNC,
tributed SDN control plane,” in Proc. 12th Eur. Depend. Comput. Conf. IEEE Globecom, the IEEE T RANSACTIONS ON N ETWORKING and the IEEE
(EDCC), Gothenburg, Sweden, 2016, pp. 169–180. T RANSACTIONS ON C OMMUNICATIONS.
[112] A. Panda, C. Scott, A. Ghodsi, T. Koponen, and S. Shenker, “Cap for
networks,” in Proc. 2nd ACM SIGCOMM Workshop Hot Topics Softw.
Defined Netw. (HotSDN), 2013, pp. 91–96. Abdelhamid Mellouk is a Full Professor at
[113] Oracle. Accessed: Aug. 25, 2017. [Online]. Available: University of Paris-Est (UPEC), Networks &
https://www.oracle.com Telecommunications (N&T) Department and
[114] MySQL. Accessed: Aug. 25, 2017. [Online]. Available: LiSSi/TincNet Laboratory France. He gradu-
http://www.mysql.fr/ ated in computer network engineering from the
[115] A. Lakshman and P. Malik, “Cassandra: A decentralized structured Computer Science High Eng. School, University
storage system,” SIGOPS Oper. Syst. Rev., vol. 44, no. 2, pp. 35–40, Oran-EsSenia, Algeria, and the University of Paris
Apr. 2010. Sud XI Orsay, received the Ph.D. in computer
[116] R. Klophaus, “Riak core: Building distributed applications without science from the same university, and a Doctorate
shared state,” in Proc. ACM SIGPLAN Commercial Users Funct. of Sciences (Habilitation) diploma from UPEC.
Program. (CUFP), Baltimore, MD, USA, 2010, p. 14. Founder of the Network Control Research activity
[117] G. DeCandia et al., “Dynamo: Amazon’s highly available key-value in UPEC with extensive international academic and industrial collaborations,
store,” in Proc. 21st ACM SIGOPS Symp. Oper. Syst. Principles his general area of research is in adaptive real-time bio-inspired control for
(SOSP), Stevenson, WA, USA, 2007, pp. 205–220. high-speed new generation dynamic wired/wireless networking in order to
[118] D. Levin, A. Wundsam, B. Heller, N. Handigol, and A. Feldmann, maintain acceptable quality of service/experience for added value services.
“Logically centralized?: State distribution trade-offs in software defined He is an active member of the IEEE Communications Society and held
networks,” in Proc. 1st Workshop Hot Topics Softw. Defined Netw. several offices including leadership positions in IEEE Communications
(HotSDN), 2012, pp. 1–6. Society Technical Committees. He has published/coordinated 11 books, 3
[119] M. Aslan and A. Matrawy, “Adaptive consistency for distributed SDN lecture notes, and several refereed international publications in journals,
controllers,” in Proc. 17th Int. Telecommun. Netw. Strategy Plann. conferences, and books, in addition to numerous keynotes and plenary talks
Symp. (Netw.), Montreal, QC, Canada, Sep. 2016, pp. 150–157. in flagship venues. He serves on the Editorial Boards or as Associate Editor
[120] E. Sakic, F. Sardis, J. W. Guck, and W. Kellerer, “Towards adaptive for several journals, and he is chairing or has chaired (or co-chaired) some
state consistency in distributed SDN control plane,” in Proc. IEEE Int. of the top international conferences and symposia, including the TPS Chair
Conf. Commun. (ICC), Paris, France, May 2017, pp. 1–7. of ICC 2017.

Das könnte Ihnen auch gefallen