Sie sind auf Seite 1von 271

SDN AND OPENFLOW

THE HYPE AND THE HARSH REALITY

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

SDN AND OPENFLOW


THE HYPE AND THE HARSH REALITY
Ivan Pepelnjak, CCIE#1354 Emeritus

Copyright 2014 ipSpace.net AG

WARNING AND DISCLAIMER


This book is a collection of blog posts written between March 2011 and the book publication date,
providing independent information about Software Defined Networking and OpenFlow. Every effort
has been made to make this book as complete and as accurate as possible, but no warranty or
fitness is implied. Read the introductory paragraphs before the blog post headings to understand the
context in which the blog posts have been written, and make sure you read the Introduction section.
The information is provided on an as is basis. The authors, and ipSpace.net shall have neither
liability nor responsibility to any person or entity with respect to any loss or damages arising from
the information contained in this book.

Copyright ipSpace.net 2014

Page ii

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

CONTENT AT A GLANCE
FOREWORD ............................................................................................................................IV
INTRODUCTION .....................................................................................................................VI
1

THE INITIAL HYPE ......................................................................................................1-1

SOFTWARE DEFINED NETWORKING 101 ................................................................2-1

OPENFLOW BASICS ...................................................................................................3-1

OPENFLOW IMPLEMENTATION NOTES ....................................................................4-1

OPENFLOW SCALABILITY CHALLENGES ..................................................................5-1

OPENFLOW AND SDN USE CASES..........................................................................6-1

SDN BEYOND OPENFLOW ......................................................................................7-1

Copyright ipSpace.net 2014

Page iii

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

FOREWORD
Ivan asked me to write the intro for his latest book on Software Defined Networking and I'm a bit
mystified why. Granted, he's like the control plane to my forwarding plane. The brilliant technical
insights I've gathered from Ivan's web site and webinars have provided me with valuable content
and creative inspiration ever since I first discovered it. In fact, I almost feel like I'm cheating at my
job. Every time I clarify SDN in a conversation with, "It's the decoupling of the logical from the
physical," I want to insert a footnote referencing him.
I remember the first time I heard him on a podcast, I thought to myself "This guy must be super
smart, because he sounds like a Bond villain and I can only grasp 50% of what he's saying." I
started telling colleagues about him, "Hey, check this guy out. His webinars will make your brain
bleed out of your ears!" Trust me, in my circle that's a HUGE compliment.
When I was chosen to attend my first Tech Field Day event, I was most excited because I would
finally get to meet Ivan in person. All my engineering friends were jealous and I was almost
apoplectic when the moment finally arrived, fearful I would do something foolish like confuse SMTP
and SNMP. This is when I discovered a really wonderful aspect to Ivan, if you're ever lucky enough
to interact with him personally (stalking doesn't count), you'll find him to be witty, friendly,
generous and gracious. He never makes you feel stupid for not understanding a protocol, the details
of an RFC or an IEEE standard.
He's the consummate educator and a giving mentor to almost anyone who asks. The more I know
him, the more I admire and respect his dedication to engineering. It truly is a vocation for him.

Copyright ipSpace.net 2014

Page iv

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

I guess I need to say something about SDN now, so here goes. While it could be the idea that finally
revolutionizes networking, data centers and even security, I advise caution. Vendors will latch onto
this new buzzword like a pitbull and promote it like the industry's new secret sauce. With this book,
you'll be able to separate facts from hype and make some educated decisions regarding your own
infrastructure.

Michele Chubirka
Security architect, analyst, writer and podcaster
December 2013

Copyright ipSpace.net 2014

Page v

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

INTRODUCTION
OpenFlow and Software Defined Networks (SDN) entered mainstream awareness in March 2011
when several large cloud providers and Internet Service Providers formed Open Networking
Foundation.
More than three years later, the media still doesnt understand the basics of SDN, and many
networking engineers feel threatened by what they see as a fundamental shift in the way they do
their jobs.
In the meantime, I published over a hundred blog posts on ipSpace.net trying to debunk the myths,
explain how SDN and OpenFlow work, and what their advantages and limitations are. Most of the
posts were responses to external triggers false claims, vendor launches, or questions I received
from my readers.
This book contains a collection of the most relevant blog posts describing the concepts of SDN and
OpenFlow. I cleaned up the blog posts and corrected obvious errors and omissions, but also tried to
leave most of the content intact. The commentaries between the individual blog posts will help you
understand the timeline or the context in which a particular blog post was written.
The book covers these topics:

The debunking of the initial hype surrounding OpenFlow public launch and the most blatant
misconceptions (Chapter 1);

Overview of what SDN is, what it benefits might be, and deliberations whether or not it makes
sense (Chapter 2);

Copyright ipSpace.net 2014

Page vi

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Introduction to OpenFlow, from architectural basics to protocol details, and deployment and
forwarding models (Chapter 3);

OpenFlow implementation notes, describing the peculiarities of hardware and software


implementations of OpenFlow switches (Chapter 4);

OpenFlow scalability challenges, from control-plane complexity to packet punting and limitations
of flow table updates (Chapter 5);

OpenFlow use cases, from production deployment @ Google to interesting ready-to-use


architectures and musings on potential future uses (Chapter 6);

SDN beyond OpenFlow (Chapter 7), covering BGP-based SDN, NETCONF, I2RS, Ciscos OnePK
and Plexxis controller-based data center fabrics.

Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:

Start with the SDN, OpenFlow and NFV Resources page;

Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);

2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;

Finally, Im always available for short online or on-site consulting engagements.

As always, please do feel free to send me any questions you might have the best way to reach me
is to use the contact form on my web site (www.ipSpace.net).
Happy reading!
Ivan Pepelnjak
July 2014

Copyright ipSpace.net 2014

Page vii

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

THE INITIAL HYPE

Academic researchers were working on OpenFlow concepts (distributed data plane with centralized
controller) for years, but in early 2011 a fundamental marketing shift happened: major cloud
providers (Google) and Internet Service Providers (Deutsche Telekom) created Open Networking
Foundation (ONF) to push forward commercial adoption of OpenFlow and Software Defined
Networking (SDN) or at least their definition of it.
Since then, every single vendor started offering SDN products. Almost none of them come even
close to the (narrow) vision promoted by the Open Networking Foundation (centralized control plane
with distributed data plane), NECs ProgrammableFlow being a notable exception.
Most vendors decided to SDN-wash their existing products, branding their existing APIs Open, and
claiming they have SDN-enabled products.

MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:

Start with the SDN, OpenFlow and NFV Resources page;

Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;

Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);

2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;

Finally, Im always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

As usual, the industry media didnt help they enthusiastically jumped onto the OpenFlow/SDN
bandwagon and started propagating myths. More than two years later they still dont understand the
fundamentals of SDN, and tend to focus exclusively on how SDN is supposed to hurt Cisco (or not).

IN THIS CHAPTER:
OPEN NETWORKING FOUNDATION FABRIC CRAZINESS REACHES NEW HEIGHTS
OPENFLOW FAQ: WILL THE HYPE EVER STOP?
OPENFLOW IS LIKE IPV6
FOR THE RECORD: I AM NOT AGAINST OPENFLOW
NETWORK FIELD DAY FIRST IMPRESSIONS
I APOLOGIZE, BUT IM EXCITED
THE REALITY TWO YEARS LATER
CONTROL AND DATA PLANE SEPARATION THREE YEARS LATER
TWO AND A HALF YEARS AFTER OPENFLOW DEBUT, THE MEDIA REMAINS CLUELESS
WHERES THE REVOLUTIONARY NETWORKING INNOVATION?
FALLACIES OF GUI

Copyright ipSpace.net 2014

Page 1-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In March 2011, industry media quickly picked up the buzz created by the Open Networking
Foundation (ONF) press releases and started exaggerating the already extravagant claims made by
ONF, prompting me to write the following blog post.

OPEN NETWORKING FOUNDATION FABRIC


CRAZINESS REACHES NEW HEIGHTS
Some of the biggest buyers of the networking gear have decided to squeeze some extra discount
out of the networking vendors and threatened them with open-source alternative, hoping to repeat
the Linux/Apache/MySQL/PHP saga that made it possible to build server farms out of low-cost
commodity gear with almost zero licensing costs. They formed the Open Networking Foundation,
found a convenient technology (OpenFlow) and launched another major entrant in the Buzzword
Bingo Software-Defined Networking (SDN).
Networking vendors, either trying to protect their margins by stalling the progress of this initiative,
or stampeding into another Wild West Gold Rush (hoping to unseat their bigger competitors with
low-cost standard-based alternatives) have joined the foundation in hordes; the list of initial
members reads like Whos Who in Networking.
Now, lets try to figure out what SDN might be all about. The ONF Mission Statement (on the first
page) says SDN allows owners and operators of networks to control and manage their networks to
best serve their needs. Are the founding members of ONF trying to tell us they have no control over
their networks and lack network management systems? It must be something else. How about this
one (from the same paragraph): OpenFlow seeks to increase network functionality while lowering

Copyright ipSpace.net 2014

Page 1-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

the cost associated with operating networks. Now were getting somewhere I told you it was all
about reducing costs (starting with the networking vendors margins).
(Some of) the industry media happily joined the craze, parroting meaningless phrases from various
press releases. Consider, for example, this article from IT World Canada.
SDN would give network operators the ability to virtualize network resources, being able to
dynamically improve latency or security on demand If you want to do it, you can do it today, using
dynamic routing protocols or QoS (latency), vShield/VSG (on-demand security) or a number of
virtualized networking appliances.
Also, protocols like RSVP to signal per-session bandwidth needs have been around for more than a
decade, but somehow never caught on. Must be the fault of those stupid networking vendors.
Sites like Facebook, Google or Yahoo would be able to tailor their networks so searches would be
blindingly fast I never realized the main search problem was network bandwidth. I always somehow
thought it was related to large datasets, CPU, database indices ... Anyhow, if the network bandwidth
is the bottleneck, why dont they upgrade to the next-generation Ethernet (10G/40G). Ah, yes, it
might be expensive. How about deploying Clos network architecture? Ouch, might be a nightmare to
configure and manage. How exactly will SDN solve this problem?
Stock exchanges could assure brokerage customers on the other side of the globe theyd get
financial data as fast as a dealer beside the exchange. Will SDN manage to flatten & shrink the
earth, will it change the speed of light, or will it use large-scale quantum entanglement?
It could be programmed to order certain routers to be powered down during off-peak power
periods. What stops you from doing that today?

Copyright ipSpace.net 2014

Page 1-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Dont get me wrong OpenFlow might be a good idea and it will probably lead to interesting new
opportunities (assuming they can solve the scalability and resilience issues) ... and Im absolutely
looking forward to the podcast were recording later today (available on Packet Pushers web site).
However, there are plenty of open standards in the networking industry (including XML-based
network configuration and management) waiting to be used. There are also (existing, standard)
technologies that you can use to solve most of the problems these people are complaining about.
The problem is that these standards and technologies are not used by operating systems or
applications (when was the last time youve deployed a server running OSPF to have seamless
multihoming?)
The main problems were facing today arise primarily from non-scalable application architectures
and broken TCP/IP stack. In a world with scale-out applications you dont need fancy combinations
of routing, bridging and whatever-else; you just need fast L3 transport between endpoints. In an
Internet with decent session layer or a multipath transport layer (be it SCTP, Multipath TCP or
something else) you dont need load balancers, BGP sessions with end-customers to support
multihoming, or LISP. All these kludges were invented to support OS/App people firmly believing in
fallacies of distributed computing. How is SDN supposed to change that? Im anxiously waiting to
see an answer beyond marketing/positioning/negotiating bullshit bingo.

Copyright ipSpace.net 2014

Page 1-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Not surprisingly, the OpenFlow hype did not subside, and totally inaccurate articles started
appearing in industry press, prompting me to write yet another rant in April 2011.

OPENFLOW FAQ:WILL THE HYPE EVER STOP?


Network World has published another masterpiece last week: FAQ: What is OpenFlow and why is it
needed? Following the physics-changing promises made during the Open Network Foundation
launch, one would hope to get some straight facts; obviously things dont work that way. Lets walk
through some of the points. While most of them might not be too incorrect from an oversimplified
perspective, they do over-hype a potentially useful technology way out of proportions.
NW: OpenFlow is a programmable network protocol designed to manage and direct traffic among
routers and switches from various vendors. This one is just a tad misleading. OpenFlow is actually a
protocol that allows a controller to download forwarding tables into one or more switches. Whether
that manages or directs traffic depends on what controller is programmed to do.
NW: The technology consists of three parts: [...] and a proprietary OpenFlow protocol for the
controller to talk securely with switches. Please do decide what you think proprietary means. All
parts of the OpenFlow technology are defined in publicly available documents under BSD-like
license.
NW: OpenFlow is designed to provide consistency in traffic management and engineering by
making this control function independent of the hardware it's intended to control. How can a lowlevel flow-table-control API provide what this statement claims it does? It all depends on the
controller implementation.

Copyright ipSpace.net 2014

Page 1-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

NW: The programmability of the MPLS capabilities of a particular vendor's platform is specific to
that vendor. And the OpenFlow-related capabilities of individual switches will depend on specific
implementations by specific vendors. Likewise, the capabilities of an OpenFlow controller will be
specific to that vendor. What exactly is the fundamental change?
NW: MPLS is a Layer 3 technique while OpenFlow is a Layer 2 method Do I need to elaborate on
this gem? Lets just point out that OpenFlow works with MAC addresses, IP subnets, IP flow 5tuples, VLANs or MPLS labels. Whatever a switch can do, OpenFlow can control it.
But wait ... OpenFlow has no provision for IPv6 at all. Maybe Network World is so futuristic they
consider a technology without IPv6 support a layer-2 technology.

Copyright ipSpace.net 2014

Page 1-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In another blog post, I compared OpenFlow to IPv6 the evangelists of both technologies promised
way more than the technologies were ever capable of delivering.

OPENFLOW IS LIKE IPV6


Frequent eruptions of OpenFlow-related hype (a recent one caused by Brocade Technology Day
Summit; Im positive Interop will not lag behind) call for a continuous myth-busting efforts. Lets
start with a widely-quoted (and immediately glossed-over) fact from Professor Scott Shenker, a
founding board member of the ONF: [OpenFlow] doesn't let you do anything you couldn't do on a
network before.
To understand his statement, remember that OpenFlow is nothing more than a standardized version
of communication protocol between control and data plane. It does not define a radically new
architecture, it does not solve distributed or virtualized networking challenges and it does not create
new APIs that the applications could use. The only thing it provides is the exchange of TCAM (flow)
data between a controller and one or more switches.
Cold fusion-like claims are nothing new in the IT industry. More than a decade ago another group of
people tried to persuade us that changing the network layer address length from 32 bits to 128 bits
and writing it in hex instead of decimal solves global routing and multihoming and improves QoS,
security and mobility. After the reality distortion field collapsed, we were left with the same set of
problems exacerbated by the purist approach of the original IPv6 architects.

Copyright ipSpace.net 2014

Page 1-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Learn from the past bubble bursts. Whenever someone makes an extraordinary claim about
OpenFlow, remember the it cant do anything you couldnt do before fact and ask yourself:

Did we have a similar functionality in the past? If not, why not? Was there no need or were the
vendors too lazy to implement it (don't forget they usually follow the money)?

Did it work? If not, why not?

If it did - do we really need a new technology to replace a working solution?

Did it get used? If not, why not? What were the roadblocks? Why would OpenFlow remove them?

Repeat this exercise regularly and youll probably discover the new emperors clothes arent nearly
as shiny as some people would make you believe.

Copyright ipSpace.net 2014

Page 1-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The OpenFlow pundits quickly labeled me as an OpenFlow hater, but I was just my grumpy old self
;) Heres the blog post (from May 2011) that tried to set the record straight (not that such things
would ever work).

FOR THE RECORD: I AM NOT AGAINST OPENFLOW


... as some of its supporters seem to believe every now and then (I do get severe allergic reaction
when someone claims it will change the laws of physics or when Im faced with technical
inaccuracies not to mention knee-jerking financial experts). Even more, assuming it can cross the
adoption gap, it could fundamentally change the business models of networking vendors (maybe not
in the way youd like them to be changed). You can read more about my OpenFlow views in the
article I wrote for SearchNetworking.
On the more technological front, I still dont expect to see miracles. Most OpenFlow-related ideas
Ive heard about have been tried (and failed) before. I fail to see why things would be different just
because we use a different protocol to program the forwarding tables.

Copyright ipSpace.net 2014

Page 1-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In just a few months, everyone was talking about OpenFlow and SDN, and Stephen Foskett, the
mastermind behind GestaltIT, decided to organize the first ever OpenFlow symposium in September
2011.
The vendor and user presentations weve seen at that symposium, combined with the vendor
presentations weve attended during the Networking Tech Field Day 2 seemed very promising
everyone was talking about the right topics and tried to address real-life scalability concerns.

NETWORK FIELD DAY FIRST IMPRESSIONS


We finished a fantastic Network Field Day (second edition) yesterday. While it will take me a while
(and 20+ blog posts) to recover from the information blast I received during the last two days, here
are the first impressions:
Explosion of innovation and its not just OpenFlow and/or SDN. Last year weve seen some great
products and a few good ideas (earning me the grumpy old man thats hard to make smile fame),
this year almost every vendor had something that excited me.
If you were watching the video stream, you probably got sick and tired of my wow, thats cool
comments. I apologize, but thats how I felt.
Everyone gets the problem ... and some of the vendors were trying to tell us what the problem is in
an CIO-level pitch. Not a good idea. However, its refreshing to see that everyone identified the
same problem (large-scale data centers, VM mobility ...), that its the problem were all familiar
with, and that its actually getting solved.

Copyright ipSpace.net 2014

Page 1-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Most vendors have sensible answers. They are addressing different parts of the big problem, they
talk about different technologies, but the answers arent bad. For example, every time I spotted a
scalability issue, they were aware of it and/or had good answers (if not a solution).
Layer-2 is fading away (again). While every switching vendor will tell you how you can build large L2
domains with their fabric, nobody is actually pushing them anymore. And the only time layer-2 Data
Center Interconnect (DCI) appeared on a slide, there was a unicorn image next to it. Even more,
two vendors actually said they think long-distance VM mobility is not a good idea (youll have to
watch the videos to figure out who they were).
Were cutting through the hype. Even the OpenFlow symposium was hypeless. Its so nice being able
to spend three days with highly intelligent people who are excited about the next great thing
(whatever it is), while being perfectly realistic about its current state and its limitations.
Youll see lots of new things in the future. Even if youre working in an SMB environment, you might
get exposed to OpenFlow in the not-too-distant future (more about that in an upcoming post).
Get ready for a bumpy ride. Lots of exciting technologies are being developed. Some of them make
perfect sense, some others less so. Some of them might work, some might fade away (not because
they would be inherently bad, but because of bad execution). Now is the time to jump on those
bandwagons get involved (hint: you just might start with IPv6), build a test lab, kick the tires,
figure out whether the new technologies might be a good fit for your environment when they
become stable.
Disclosure: vendors mentioned in this post indirectly covered my travel expenses. Read the full
disclosure (or a more precise one by Tony Bourke).

Copyright ipSpace.net 2014

Page 1-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Even more, the real-life approach of numerous vendors Ive seen during those two events made me
overly optimistic I thought we just might be able to get to real-life OpenFlow and SDN use cases
without the usual vendor jousting and get-rich-quick startup mentality. This is what I wrote in
October 2011:

I APOLOGIZE, BUT IM EXCITED


The last few days were exquisite fun: it was great meeting so many people focusing on a single
technology (OpenFlow) and concept (Software-Defined Networking, whatever that means) that just
might overcome some of the old obstacles (and introduce new ones). You should be at least a bit
curious what this is all about, and even if you dont see yourself ever using OpenFlow or any other
incarnation of SDN in your network, it never hurts to enhance your resume with another technology
(as long as its relevant; dont put CICS programmer at the top of it).
Watching the presentations from the OpenFlow symposium is a great starting point. I would start
with the ones from Igor Gashinsky (Yahoo!) and Ed Crabbe (Google) they succinctly explained the
problems theyre facing in their networks and how they feel OpenFlow could solve them. If youre an
IaaS cloud provider, this is the time to start thinking about potentials OpenFlow could bring to your
network, and if youre not talking to NEC, BigSwitch or Nicira, youre missing out. I would also talk
with Juniper (more about that later).
Next step: watch the vendor presentations from the OpenFlow symposium. Kyle Forster presented a
high-level overview of Big Switch architecture, Curt Beckmann from Brocade added a healthy dose
of reality check (highly appreciated), David Meyer (Cisco) presented an interesting perspective on
robustness and complexity (and several OpenFlow use cases), Don Clark from NEC talked about

Copyright ipSpace.net 2014

Page 1-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

their OpenFlow products (watch the video, PDF is not online) and finally David Ward from Juniper
presented the hybrid approach: use OpenFlow in combination (not as a replacement) with existing
technologies.
The afternoon technical Q&A panel just confirmed that numerous vendors well understand the
challenges associated with OpenFlow deployments outside of small lab setups, and that theyre
actively working on solving those problems and making OpenFlow a viable technology.
Two vendors expanded their coverage of OpenFlow during the Network Field Day: David Ward from
Juniper did a technical deep dive (dont skip the Junos automation part at the beginning of the
video, its interesting ... and you just might spot the VRF Smurf) and NEC even showed us a demo
of their OpenFlow-based switched network.
Luckily there are still some coolheaded people around (read Ethan Banks OpenFlow State of the
Union and Derick Winkworths More Open Flow Symposium Notes), but I cant help myself. The
grumpy old man from L3 ivory tower is excited (listen to PacketPushers OpenFlow/SDN podcast if
you dont believe me), and not just about OpenFlow. I still cant believe that I stumbled upon so
many interesting or cool technologies or solutions in the last few days. Could be that its just
vendors adapting to the blogging audience, or there actually might be something fundamentally new
coming to light like MPLS (then known as tag switching) was in the late 1990s.
Disclosure: vendors mentioned in this post indirectly covered my travel expenses. Read the full
disclosure (or a more precise one by Tony Bourke).

Copyright ipSpace.net 2014

Page 1-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The hard reality of intervening two years has crushed all my high hopes. This is the reality of
OpenFlow and SDN as I see it in November 2013:

THE REALITY TWO YEARS LATER


Major vendors (with the exception of NEC) havent made any progress. Juniper still hasnt delivered
on its promises. Cisco still hasnt shipped an OpenFlow switch or an SDN controller (although theyve
announced both months ago). Brocade supposedly has OpenFlow on their high-end routers and
Arista supports OpenFlow on its old high-end switch (but not in GA EOS release).
Every major vendor is talking about SDN, but its mostly SDN-washing (aka CLI-in-API-disguise).
Cisco is talking about OnePK, and has shipping early adopter SDK kit, but it will take a while before
we see OnePK in GA code on a widespread platform.
Startups arent doing any better. Big Switch is treading water and trying to find a useful use case for
their controller. Nicira was acquired by VMware and is moving away from OpenFlow. Contrail was
acquired by Juniper and recently shipped its product (which has nothing to do with OpenFlow and
not much with SDN). LineRate Systems was acquired by F5 and disappeared.
We havent seen customer deployments either. Facebook is doing interesting things (but from what
Ive heard theyre not OpenFlow-based), Google has an OpenFlow/SDN deployment, but they could
have done the exact same thing with classical routers and PCEP, Microsofts SDN is based on BGP
(and works fine).
It seems like the reality hit OpenFlow and it was a very hard hit and according to Gartner we
havent reached the trough of disillusionment yet.

Copyright ipSpace.net 2014

Page 1-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In January 2014 I took another look at what the Open Networking Foundation founding members
managed to achieve between March 2011 (the beginning of OpenFlow/SDN hype) and early 2014.
The only one that made significant progress on the centralized control plane front was Google.
Since I wrote this blog post, Facebook launched their own switch operating system, which seems to
be working along the same lines as classical network operating systems (one device, one control
plane).

CONTROL AND DATA PLANE SEPARATION THREE


YEARS LATER
Almost three years ago the OpenFlow/SDN hype exploded and the Open Networking Foundation
started promoting the concept of physically separate control and data planes. Lets see how far its
founding members got in the meantime:

Google implemented their inter-DC WAN network with switches that use OpenFlow within a
switching fabric and BGP/IS-IS and something akin to PCEP between sites;

Facebook is working on the networking platform for their Open Compute Project. It seems
theyve got to switch hardware specs; I havent heard about software running on those switches
yet or maybe theyll go down the same path as Google (We got cheap switches, and we have
our own software. Goodbye and thank you!)

Copyright ipSpace.net 2014

Page 1-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Yahoo! was talking about custom changes to standard networking protocols. Havent heard about
their progress since the first OpenFlow Symposium; the April 2012 presentation from Igor
Gashinsky still concluded with Wheres My Pony?

Deutsche Telekom is still using traditional routers and a great NFV platform.

Microsoft implemented SDN using BGP, using a central controller, but not a centralized control
plane.

I have no idea what Verizon is doing.

In the networking vendor world, NEC seems to be the only company with a mature commercial
product that matches the ONF definition of SDN. Cisco has just shipped the initial version of their
controller, as did HP, and those products seem pretty limited at the moment.
Wondering why I didnt include Big Switch Networks in the above list? My definition of shipping
includes publicly available product documentation, or (at the very minimum) something resembling
a data sheet with feature description, system requirements and maximum limits. I couldnt find
either on Big Switch web site.
On the other hand, the virtual networking world was always full of solutions with separate control
and data planes, starting with the venerable VMware Distributed vSwitch and Nexus 1000V, and
continuing with newer entrants, from Hyper-V extensible switch and VMware NSX to Juniper Contrail
and IBMs 5000V and DOVE. Some of these solutions were used years before the explosion of
OpenFlow/SDN hype (only we didnt know we should call them SDN).

Copyright ipSpace.net 2014

Page 1-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In the meantime, the industry media still hasnt grasped the basics of SDN. Heres my response to a
particularly misleading article written in November 2013:

TWO AND A HALF YEARS AFTER OPENFLOW DEBUT,


THE MEDIA REMAINS CLUELESS
If you repeat something often enough, it becomes a fact (or an urban myth). SDN is no exception;
industry press loves to explain SDN like this:
[SDN] takes the high-end features built into routers and switches and puts them into
software that can run on cheaper hardware. Corporations still need to buy routers and
switches, but they can buy fewer of them and cheaper ones.
That nice soundbite contains at least one stupidity per sentence:
SDN cannot move hardware features into software. If a device relies on hardware forwarding,
you cannot move the same feature into software without significantly impacting the forwarding
performance.
SDN software runs on cheaper hardware. Ignoring the intricacies of custom ASICs and
merchant silicon (and the fact that Cisco produces more custom ASICs than all merchant silicon
vendors combined), complexity and economies of scale dictate the hardware costs. Its pretty hard
to make cheaper hardware with the same performance and feature set.
However, all networking vendors bundle the software with the hardware devices and expense R&D
costs (instead of including them in COGS) to boost their perceived margins.

Copyright ipSpace.net 2014

Page 1-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Does the above paragraph sound like Latin to you? Dont worry just keep in mind that software
usually costs about as much (or more) as the hardware it runs on, but you dont see that.
Corporations can buy fewer routers and switches. It cant get any better than this. If you need
100 10GE ports, you need 100 10GE ports. If you need two devices for two WAN uplinks (for
redundancy), you need two devices. SDN wont change the port count, redundancy requirements, or
laws of physics.
Corporations can buy cheaper [routers and switches]. Guess what you still need the
software to run them, and until we see price tags of SDN controllers, and do a TCO calculation,
claims like this one remain wishful thinking (you did notice Im extremely diplomatic today, didnt
you?).

Copyright ipSpace.net 2014

Page 1-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Finally, numerous marketers and SDN/OpenFlow pundits keep repeating how theyll save the
(networking) world and bring true nirvana to the network operations with their flashy new gadgets.
Nothing can be further from the truth because we cannot get rid of the legacy permeating the whole
TCP/IP stack, as I explained in this post written in July 2013:

WHERES THE REVOLUTIONARY NETWORKING


INNOVATION?
In his recent blog post Joe Onisick wrote What network virtualization doesnt provide, in any form,
is a change to the model we use to deploy networks and support applications. [...] All of the same
broken or misused methodologies are carried forward. [...] Faithful replication of todays networking
challenges as virtual machines with encapsulation tunnels doesnt move the bar for deploying
applications.
Much as I agree with him, we cant change much on planet Earth due to the fact that VMs use
Ethernet NICs (so we need some form of VLANs to cater to infinite creativity of some people), IP
addresses (so we need L3 forwarding), broken TCP stack (requiring load balancers to fix it), and
obviously cant be relied upon to be sufficiently protected (so we need external firewalls).
Furthermore, unless we manage to stop shifting the problems around, the networking as a whole
wont get simpler.
What overlay network virtualization does bring us is a decoupling that makes physical infrastructure
less complex so it can focus on packet forwarding instead of zillions of customer-specific features
preferably baked in custom ASICs. Obviously thats not a good thing for everyone out there.

Copyright ipSpace.net 2014

Page 1-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The final bit of hype I want to dispel is the misleading focus on CLI that we use to configure
networking devices. CLI is not the problem, and GUI will not save the world.

FALLACIES OF GUI
I love Greg Ferros characterization of CLI:
We need to realise that the CLI is a power tools for specialist tradespeople and not a
knife and fork for everyday use.
However, you do know that most devices GUI offers nothing more than what CLI does, dont you?
Wheres the catch?
For whatever reason, people find colorful screens full of clickable items less intimidating than a
blinking cursor on black background. Makes sense after all, you can see all the options you have;
you can try pulling down things to explore possible values, and commit the changes once you think
you enabled the right set of options. Does that make a product easier to use? Probably. Will it result
in better-performing product? Hardly.
Have you ever tried to configure OSPF through GUI? How about trying to configure usernames and
passwords for individual wireless users? In both cases youre left with the same options youd have
in CLI (because most vendors implement GUI as eye candy in front of the CLI or API). If you know
how to configure OSPF or RADIUS server, GUI helps you break the language barrier (example:
moving from Cisco IOS to Junos), if you dont know what OSPF is, GUI still wont save the day ... or
it might, if you try clicking all the possible options until you get one that seems to work (expect a
few meltdowns on the way if youre practicing your clicking skills on a live network).

Copyright ipSpace.net 2014

Page 1-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

What the casual network admins need are GUI wizards a tool that helps you achieve a goal while
keeping your involvement to a minimum. For example: I need IP routing between these three
boxes. Go do it! should translate into Configure OSPF in area 0 on all transit interfaces. When you
see a GUI offering this level of abstraction please let me know. In the meantime, Im positive that
the engineers who have to get a job done quickly prefer using CLI over clickety-click GUI (and Im
not the only one), regardless of whether they have to configure a network device, Linux server,
Apache, MySQL, MongoDB or a zillion other products. Why do you think Microsoft invested so heavily
in PowerShell

Copyright ipSpace.net 2014

Page 1-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

SOFTWARE DEFINED NETWORKING 101

Open Networking Foundation (ONF) launched in March 2011 quickly defined Software Defined
Networking (SDN) as architecture with centralized control plane that controls multiple physically
distinct devices.
That definition definitely suits one of the ONF founding members (Google), but is it relevant to the
networking community at large? Or does it make more sense to focus on network programmability,
or using existing protocols (BGP) in novel ways?
This chapter contains my introductory posts on the SDN-related topics, musings on what makes
sense, and a few thoughts on career changes we might experience in the upcoming years. Youll find
more details in subsequent chapters, including an overview of OpenFlow, in-depth analysis of
OpenFlow-based architectures, some real-life OpenFlow and SDN deployments, and alternate
approaches to SDN.

MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:

Start with the SDN, OpenFlow and NFV Resources page;

Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;

Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);

2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;

Finally, Im always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

IN THIS CHAPTER:
WHAT EXACTLY IS SDN (AND DOES IT MAKE SENSE)?
BENEFITS OF SDN
DOES CENTRALIZED CONTROL PLANE MAKE SENSE?
HOW DID SOFTWARE DEFINED NETWORKING START?
WE HAD SDN IN 1993 AND DIDNT KNOW IT
STILL WAITING FOR THE STUPID NETWORK
IS CLI IN MY WAY OR IS IT JUST A SYMPTOM OF A BIGGER PROBLEM?
OPENFLOW AND SDN DO YOU WANT TO BUILD YOUR OWN RACING CAR?
SDN, WINDOWS AND FRUITY ALTERNATIVES
SDN, CAREER CHOICES AND MAGIC GRAPHS
RESPONSE: SDNS CASUALTIES

Copyright ipSpace.net 2014

Page 2-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The very strict definition of SDN as understood by Open Networking Foundation promotes an
architecture with strict separation between a controller and totally dumb devices that cannot do
more than forward packets based on forwarding rules downloaded from the controller. Does that
definition make sense? This is what I wrote in January 2014:

WHAT EXACTLY IS SDN (AND DOES IT MAKE SENSE)?


When Open Networking Foundation claimed ownership of Software-Defined Networking, they defined
it as separation of control and data plane:
[SDN is] The physical separation of the network control plane from the forwarding plane,
and where a control plane controls several devices.
Does this definition make sense or is it too limiting? Is there more to SDN? Would a broader scope
make more sense?

A BIT OF A HISTORY
Its worth looking at the founding members of ONF and their interests: most of them are large cloud
providers looking for cheapest possible hardware, preferably using a standard API so it can be
sourced from multiple suppliers, driving the prices even lower. Most of them are big enough to write
their own control plane software (and Google already did).
A separation of control plane (running their own software) and data plane (implemented in a lowcost white-label switches) was exactly what they wanted to see, and the Stanford team working on

Copyright ipSpace.net 2014

Page 2-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow provided the architectural framework they could use. No wonder ONF pushes this
particular definition of SDN.

MEANWHILE DEEP BELOW THE CLOUDY HEIGHTS


I have yet to meet a customer (academics might be an exception) that would consider writing their
own control-plane software; most of my customers arent anywhere close to writing an SDN
application on top of a controller framework (Open Daylight, Cisco XNC or HP VAN SDN controller).
Buying a shrink-wrapped application bundled with commercial support might be a different
story but then nobody really cares whether such a solution uses OpenFlow or RFC 2549;
the protocols and encapsulation mechanisms used within a controller-based network
solution are often proprietary and thus impossible to troubleshoot anyway.
On the other hand, I keep hearing about common themes:

The need for faster, more standardized, and automated provisioning;

The need for programmable network elements and vendor-neutral programming mechanisms
(Im looking at you, netmod working group);

Centralized policies and decision making based on end-to-end visibility;

Easier integration of network elements with orchestration and provisioning systems.

Will physical separation of control and forward plane solve any of these? It might, but there are
numerous tools out there that can do the same without overhauling everything weve been doing in
the last 30 years.

Copyright ipSpace.net 2014

Page 2-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

We dont need the physical separation of control plane to solve our problems (although the ability to
control individual forwarding entries does help) and it will probably take a decade before we
glimpse the promised savings of white-label switches and open-source software (even Greg Ferro
stopped believing that).

NOW WHAT?
Does it make sense to accept the definition of SDN that makes sense to ONF founding members but
not to your environment? Shall we strive for a different definition of SDN or just move on, declare it
as meaningless as the clouds, and focus on solving our problems? Would it be better to talk about
NetOps?
Maybe we should stop talking and start doing there are plenty of things you can do within existing
networks using existing protocols.

Copyright ipSpace.net 2014

Page 2-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Every new networking technology is supposed to solve most of our headaches. SDN is no exception.
The reality might be a bit different.

BENEFITS OF SDN
Paul Stewart wrote a fantastic blog post in May 2014 listing the potential business benefits of SDN
(as promoted by SDN evangelists and SDN-washing vendors).
Heres his list:

Abstracted Control Plane for a Central Point of Management

Granular Control of Flows (as required/desired)

Network Function Virtualization and Service Chaining

Decreased dependence on devices like load balancers

Facilitation of system orchestration

Easier troubleshooting/visibility

Platform for chargeback/showback

Decreased complexity and cost

Increased ability to utilize hardware and interconnections

DevOps friendly architecture

I have just one problem with this list Ive seen a similar list of benefits of IPv6:

Copyright ipSpace.net 2014

Page 2-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-1: IPv6 myths

Unfortunately, the reality of IT in general and IPv6 in particular is a bit different. The overly hyped
IPv6 benefits remain myths and legends; all we got were longer addresses, incompatible protocols
(OSPFv3 anyone), and half-thought-out implementations (example: DNS autoconfiguration) ridden
with religious wars (try to ask why dont we have first-hop router in DHCPv6 on any IPv6 mailing
list ;).
For more information, watch the fantastically cynical presentation Enno Rey had @ Troopers 2014
IPv6 Security summit, or my IPv6 resources.

Copyright ipSpace.net 2014

Page 2-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

With Open Networking Foundation adamantly promoting their definition of SDN, and based on
experiences with previous (now mostly extinct) centralized architectures, one has to ask a simple
question: does it make sense? Heres what I thought in May 2014:

DOES CENTRALIZED CONTROL PLANE MAKE SENSE?


A friend of mine sent me a challenging question:
You've stated a couple of times that you don't favor the OpenFlow version of SDN due to
a variety of problems like scaling and latency. What model/mechanism do you like?
Hybrid? Something else?
Before answering the question, lets step back and ask another one: Does centralized control plane,
as evangelized by ONF, make sense?

A BIT OF HISTORY
As always, lets start with one of the greatest teachers: history. Weve had centralized architectures
for decades, from SNA to various WAN technologies (SDH/SONET, Frame Relay and ATM). They all
share a common problem: when the network partitions, the nodes cut off from the central
intelligence stop functioning (in SNA case) or remain in a frozen state (WAN technologies).
One might be tempted to conclude that the ONF version of SDN wont fare any better than the
switched WAN technologies. Reality is far worse:

Copyright ipSpace.net 2014

Page 2-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

WAN technologies had little control-plane interaction with the outside world (example: Frame
Relay LMI), and those interactions were run by the local devices, not from the centralized control
plane;

WAN devices (SONET/SDH multiplexers, or ATM and Frame Relay switches) had local OAM
functionality that allowed them to detect link or node failures and reroute around them using
preconfigured backup paths. One could argue that those devices had local control plane,
although it was never as independent as control planes used in todays routers.
Interestingly, MPLS-TP wants to reinvent the glorious past and re-introduce centralized path
management, yet again proving RFC 1925 section 2.11.

The last architecture (that I remember) that used truly centralized control plane was SNA, and if
youre old enough you know how well that ended.

WOULD CENTRAL CONTROL PLANE MAKE SENSE IN LIMITED


DEPLOYMENTS?
Central control plane is obviously a single point of failure, and network partitioning is a nightmare if
you have a central point of control. Large-scale deployments of ONF variant of SDN are thus out of
question. But does it make sense to deploy centralized control plane in smaller independent islands
(campus networks, data center availability zones)?
Interestingly, numerous data center architectures already use centralized control plane, so we can
analyze how well they perform:

Copyright ipSpace.net 2014

Page 2-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Juniper XRE can control up to four EX8200 switches, or a total of 512 10GE ports;

Nexus 7700 can control 64 fabric extenders with 3072 ports, plus a few hundred directly
attached 10GE ports;

HP IRF can bind together two 12916 switches for a total of 1536 10GE ports;

QFabric Network Node Group could control eight nodes, for a total of 384 10GE ports.

NEC ProgrammableFlow seems to be an outlier they can control up to 200 switches, for a total of
over 9000 GE (not 10GE) ports but they dont run any control-plane protocol (apart from ARP and
dynamic MAC learning) with the outside world. No STP, LACP, LLDP, BFD or routing protocols.
One could argue that we could get an order of magnitude beyond those numbers if only we were
using proper control plane hardware (Xeon CPUs, for example). I dont buy that argument till I
actually see a production deployment, and do keep in mind that NEC ProgrammableFlow Controller
uses decent Intel-based hardware. Real-time distributed systems with fast feedback loops are way
more complex than most people looking from the outside realize (see also RFC 1925, section 2.4).

DOES CENTRAL CONTROL PLANE MAKE SENSE?


It does in certain smaller-scale environments (see above) as long as you can guarantee redundant
connectivity between then controller and controlled devices, or dont care what happens after link
loss (see also wireless access points). Does it make sense to generate a huge hoopla while
reinventing this particular wheel? I would spend my energy doing something else.

Copyright ipSpace.net 2014

Page 2-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

I absolutely understand why NEC went down this path they did something extraordinary
to differentiate themselves in a very crowded market. I also understand why Google decided
to use this approach, and why they evangelize it as much as they do. Im just saying that it
doesnt make that much sense for the rest of us.
Finally, do keep in mind that the whole world of IT is moving toward scale-out architectures. Netflix
& Co are already there, and the enterprise world is grudgingly doing the first steps. In the
meantime, OpenFlow evangelists talk about the immeasurable revolutionary merits of centralized
scale-up architecture. They must be living on a different planet.

Copyright ipSpace.net 2014

Page 2-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Just in case youre wondering how the OpenFlow/SDN movement started, heres a bit of pre-2011
history.

HOW DID SOFTWARE DEFINED NETWORKING START?


Software-Defined Networking is clearly a tautological term after all, software defined networking
device behavior ever since we stopped using Token Ring MAUs and unmanaged hubs. Open
Networking Foundation claims it owns the definition of the term (which makes approximately as
much sense as someone claiming they own the definition of red-colored clouds), but I was always
wondering who coined the term in the first place.
I finally found the answer in a fantastic overview of technologies and ideas that led to OpenFlow and
SDN published in December 2013 issue of acmqueue. According to that article, SDN first appeared in
an article published by MIT Technology Review that explains how Nick McKeown and his team at
Stanford use OpenFlow:
Frustrated by this inability to fiddle with Internet routing in the real world, Stanford
computer scientist Nick McKeown and colleagues developed a standard called OpenFlow
that essentially opens up the Internet to researchers, allowing them to define data flows
using software--a sort of "software-defined networking."
You did notice the a sort of classification and quotes around SDN, didnt you? Its pretty obvious
how the article uses software-defined networking to illustrate the point but once marketing took
over all hope for reasonable discussion was lost, and SDN became even more meaningless as cloud.

Copyright ipSpace.net 2014

Page 2-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Assuming we forget the ONF-promoted definition of SDN and define SDN as network programmed
from a central controller, its obvious we had SDN for at least 20 years.

WE HAD SDN IN 1993 AND DIDNT KNOW IT


I had three SDN 101 presentations during my 2013 visit to South Africa and had tried really hard to
overcome my grumpy skeptic self and find the essence of SDN while preparing for them. As Ive
been thinking about controllers, central visibility and network device programmability, it struck me:
we already had SDN in 1993.
In 1993 we were (among other things) an Internet Service Provider offering dial-up and leased line
Internet access. Being somewhat lazy, we hated typing the same commands in every time we had
to provision a new user (in pre-TACACS+ days we had to use local authentication to have
autocommand capability for dial-up users) and developed a solution that automatically changed
the router configurations after we added a new user. Heres a high-level diagram of what we did:

Copyright ipSpace.net 2014

Page 2-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-2: Simple router provisioning system built in 1993

HTML user interface (written in Perl) gave the operators easy access to user database (probably
implemented as a text file we were true believers in NoSQL movement in those days), and a backend Perl script generated router configuration commands from the user definitions and downloaded
them (probably through rcp the details are a bit sketchy) to the dial-up access servers.
Next revision of the software included support for leased line users the script generated interface
configurations and static routes for our core router (it was actually an MGS, but I found no good
MGS images on the Internet) or one of the access server (for users using asynchronous modems).
How is that different from all the shiny new stuff vendors are excitedly talking about? Beats me, I
cant figure it out ;) and as I said before, you dont always need new protocols to solve old
problems.

Copyright ipSpace.net 2014

Page 2-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

While were happily arguing the merits of reinvented architectures, we keep forgetting that the
basics of sound network architecture were known for over a decade and we still havent made any
progress getting closer to them.

STILL WAITING FOR THE STUPID NETWORK


More than 15 years ago the cover story of ACM netWorker magazine discussed the dawn of the
stupid network an architecture with smart edge nodes and simple packet forwarding code.
Obviously we learned nothing in all those years were still having the same discussions.
Here are a few juicy quotes from that article (taken completely out of context solely for your
enjoyment).
The telcos seemed to "fall asleep at the switch" at the core of their network.
"Keep it simple, stupid," or KISS, is an engineering virtue. The Intelligent Network,
however, is anything but simple; it is a marketing concept for scarce, complicated, highpriced services.
The Intelligent Network impedes innovation. Existing features are integrally spaghetticoded into the guts of the network, and new features must intertwine with the old.
Infrastructure improvements are rapidly making the telcos' Intelligent Network a
distinctly second-rate choice. The bottom line, though, is not the infrastructure; it is the
innovation that the Stupid Network unleashes.
The whole article is well worth reading, more so considering its over 15 years old and still spot-on.

Copyright ipSpace.net 2014

Page 2-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Some SDN proponents claim that the way we configure networking devices (using CLI) is the biggest
networking problem were facing today. They also conveniently forget that every scalable IT solution
uses automation, text files and CLI because they work, and allow experienced operators to work
faster.

IS CLI IN MY WAY OR IS IT JUST A SYMPTOM OF A


BIGGER PROBLEM?
My good friend Ethan published a blog post in February 2014 rightfully complaining how various
vendor CLIs hamper our productivity. Hes absolutely correct from the productivity standpoint, and I
agree with his conclusions (we need a layer of abstraction), but theres more behind the scenes.
Were all sick of CLI. I dont think anyone would disagree. However, CLI is not our biggest
problem. We happen to be exposed to the CLI on a daily basis due to lack of automation tools and
lack of abstraction layer; occasional fights with the usual brown substance flowing down the
application stack dont help either.
The CLI problem is mostly hype. The we need to replace CLI with (insert-your-favorite-gizmo)
hype was generated by SDN startups (one in particular) that want to sell their disruptive way of
doing things to the venture capitalists. BTW, the best way to configure their tools is through CLI.
CLI is still the most effective way of doing things ask any really proficient sysadmin, web
server admin or database admin how they manage their environment. Its not through point-andclick GUI, its through automation tools coupled with simple CLI commands (because automation
tools dont work that well when they have to simulate mouse clicks).

Copyright ipSpace.net 2014

Page 2-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

CLI generates vendor lock-in. Another pile of startup hype in this case coming from startups
that want to replace the network device lock-in with controller lock-in (heres a similar story).

WERE NOT UNIQUE


Startups and pundits would like to persuade you how broken traditional networking is, but every
other field in IT has to deal with the same problems just try to manage Windows server with Linux
commands, or create tables on Microsoft SQL server with MySQL or Oracle syntax even Linux
distributions dont have the same command set.
The true difference between other IT fields and networking is that the other people did something to
solve their problems while we keep complaining. Networking is no worse than any other IT
discipline; we just have to start moving forward, create community tools, and vote with our wallets.
Whenever you have a choice between two comparable products from different vendors, buy the one
that offers greater flexibility and programmability. Dont know what to look for? Talk with your
server- and virtualization buddies (I hope youre on speaking term with them, or its high time you
buy them a beer or two). If they happen to use Puppet or Chef to manage servers, you might try to
use the same tools to manage your routers and switches. Your favorite boxes dont support the tools
used by the rest of your IT? Maybe its time to change the vendor.

Copyright ipSpace.net 2014

Page 2-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Its reasonably easy to add automation and orchestration on top of existing network implementation.
Throwing away decades of field experience and replacing existing solutions with an OpenFlow-based
controller is a totally different story as I explained in May 2013:

OPENFLOW AND SDN DO YOU WANT TO BUILD


YOUR OWN RACING CAR?
The OpenFlow zealots are quick to point out the beauties of the centralized control plane, and the
huge savings you can expect from using commodity hardware and open-source software. What they
usually forget to tell you is that you also have to reinvent all the wheels the networking industry has
invented in the last 30 years.
Imagine you want to build your own F1 racing car... but the only component you got is a superduper racing engine from Mercedes Benz. You're left with the "easy" task of designing the car body,
suspension, gears, wheels, brakes and a few other choice bits and pieces. You can definitely do all
that if you're Google or McLaren team, but not if you're a Sunday hobbyist mechanic. No wonder
some open-source OpenFlow controllers look like Red Bull Flugtag contestants.
Does that mean we should ignore OpenFlow? Absolutely not, but unless you want to become really
fluent in real-time event-driven programming (which might look great on your resume), you should
join me watching from the sidelines until there's a solid controller (maybe we'll get it with Daylight,
Floodlight definitely doesn't fit the bill) and some application architecture blueprints.

Copyright ipSpace.net 2014

Page 2-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Till then, it might make sense to focus on more down-to-earth technologies; after all, you don't
exactly need OpenFlow and a central controller to solve real-life problems, like Tail-f clearly
demonstrated with their NCS software.

Copyright ipSpace.net 2014

Page 2-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Openness (for whatever value of Open) is another perceived benefit of SDN. In reality, youre
trading hardware vendor lock-in for controller vendor lock-in.

SDN,WINDOWS AND FRUITY ALTERNATIVES


Brad Hedlund made a pretty valid comment to my NEC Launched a Virtual OpenFlow Switch blog
post: On the other hand, it's NEC end-to-end or no dice, implicating the ultimate vendor lock-in.
Of course hes right and while, as Bob Plankers explains, you can never escape some lock-in (part 1,
response from Greg Ferro, part 2 all definitely worth reading), you do have to ask yourself am I
looking for Windows or Mac?
There are all sorts of arguments one hears from Mac fanboys (heres a networking related one) but
regardless of what you think of Mac and OSX, theres the undisputable truth: compared to reloadful
experience we get on most Windows-based boxes, Macs and OSX are rock solid; I have to reboot
my Macbook every other blue moon. Even Windows is stable when running on a Macbook (apart
from upgrade-induced reboots).
Before you start praising Steve Jobs and blaming Bill Gates and Microsoft at large, consider a simple
fact: OSX runs on a tightly controller hardware platform built with stability and reliability in mind.
Windows has to run on every possible underperforming concoction a hardware vendor throws at you
(example: my high-end laptop cannot record system audio because the 6-letter hardware vendor
wanted to save $0.02 on the sound chipset and chose the cheapest possible one), and has to deal
with all sort of crap third-party device drivers loaded straight into the operating system kernel.

Copyright ipSpace.net 2014

Page 2-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Now, what do you want to have in your mission-critical SDN/OpenFlow data center networking
infrastructure: a Mac-like tightly controlled and vendor-tested mix of equipment and associated
controller, or a Windows-like hodgepodge of boxes from numerous vendors, controlled by thirdparty software that might have never encountered the exact mix of the equipment you have.
If youre young and brazen (like I was two decades ago), go ahead and be your own system
integrator. If youre too old and covered with vendor-inflicted scars, you might prefer a tested endto-end solution regardless of what Gartner says in vendor-sponsored reports (and even solutions
that vendor X claims were tested dont always work). Just dont forget to consider the cost of
downtime in your total-cost-of-ownership calculations.

Copyright ipSpace.net 2014

Page 2-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

SDN controllers will replace networking engineers, at least if you believe what the SDN or
virtualization vendors are telling you. I dont think we have to worry about that happening in
foreseeable future (and nothing changed since I wrote the following blog post in late 2012).

SDN, CAREER CHOICES AND MAGIC GRAPHS


The current explosion of SDN hype (further fueled by recent VMworld announcement of SoftwareDefined Data Centers) made some networking engineers understandably nervous. This is the
question I got from one of them:
I have 8 plus years in Cisco, have recently passed my CCIE RS theory, and was looking
forward to complete the lab test when this SDN thing hit me hard. Do you suggest
completing the CCIE lab looking at this new future of Networking?
Short answer: the sky is not falling, CCIE still makes sense, and IT will still need networking people.
However, as I recently collected a few magic graphs for a short keynote speech, let me reuse them
to illustrate this particular challenge were all facing. Starting with the obvious, heres the legendary
Diffusion of Innovations: every idea is first adopted by a few early adopters, followed by early and
late majority.

Copyright ipSpace.net 2014

Page 2-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-3: Diffusion of ideas (source: Wikipedia)

Networking in general is clearly in the late majority/laggards phase. Whats important for our
discussion is the destruction of value-add through the diffusion process. Oh my, I sound like a
freshly-baked MBA whiz-kid, lets reword it: as a technology gets adopted, more people understand
it, the job market competition increases, and thus its harder to get a well-paying job in that
particular technology area. Supporting Windows desktops might be a good example.

Copyright ipSpace.net 2014

Page 2-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

As a successful technology matures, it moves through the four parts of another magic matrix (this
one from Boston Consulting Group).

Figure 2-4: Boston Consulting Group matrix

Initially every new idea is a great unknown, with only a few people brave enough to invest time in it
(CCIE R&S before Cisco made it mandatory for Silver/Gold partner status). After a while, the
successful ideas explode into stars with huge opportunities and fat margins (example: CCIE R&S a
decade ago, Nicira-style SDN today at least for Niciras founders), degenerates into a cash cow as

Copyright ipSpace.net 2014

Page 2-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

the market slowly gets saturated (CCIE R&S is probably at this stage by now) and finally (when
everyone starts doing it) becomes an old dog not worth bothering with.
Does it make sense to invest into something thats probably in a cash cow stage? The theory says
as much as needed to keep it alive, but dont forget that CCIE R&S will likely remain very relevant
a long time:

The protocol stacks were using havent changed in the last three decades (apart from extending
the address field from 32 to 128 bits), and although people are working on proposals like MPTCP, those proposals are still in experimental stage;

Regardless of all the SDN hoopla, neither OpenFlow nor other SDN technologies address the real
problems were facing today: lack of session layer in TCP and the use of IP addresses in
application layer. They just give you different tools to implement todays kludges.

Cisco is doing constant refreshes of its CCIE programs to keep them in the early adopters or
early majority technology space, so the CCIE certification is not getting commoditized.

If you approach the networking certifications the right way, youll learn a lot about the principles
and fundamentals, and youll need that knowledge regardless of the daily hype.

Now that Ive mentioned experimental technologies dont forget that not all of them get adopted
(even by early adopters). Geoffrey Moore made millions writing a book that pointed out that obvious
fact. Of course he was smart enough to invent a great-looking wrapper he called it Crossing the
Chasm.

Copyright ipSpace.net 2014

Page 2-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-5: The chasm before the mainstream market adoption (source: Crossing the Chasm & Inside the
Tornado)

The crossing the chasm dilemma is best illustrated with Gartner Hype Cycles. After all the initial
hype (that weve seen with OpenFlow and SDN) resulting in peak of inflated expectations, theres
the ubiquitous through of disillusionment. Some technologies die in that quagmire; in other more
successful cases we eventually figure out how to use them (slope of enlightenment).

Copyright ipSpace.net 2014

Page 2-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-6: Gartner hype cycle (source: Wikipedia)

We still dont know how well SDN will be doing crossing the chasm (according to the latest Gartners
charts, OpenFlow still hasnt reached the hype peak - I dread what's still lying ahead of us); weve
seen only a few commercial products and none of them has anything close to widespread adoption
(not to mention the reality of three IT geographies).

Copyright ipSpace.net 2014

Page 2-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Anyhow, since youve decided you want to work in networking, one thing is certain: technology will
change (whatever the change will be), and it will happen with or without you. At every point in your
career you have to invest some of your time into learning something new. Some of those new things
will be duds; others might turn into stars. See also Private Clouds Will Change IT Jobs, Not Eliminate
Them by Mike Fratto.
Finally, dont ask me for what will the next big thing be advice. Browse through the six years of
my blog posts. You might notice a clear shift in focus; its there for a reason.

Copyright ipSpace.net 2014

Page 2-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Finally, heres a response to an industry press gem I wrote in 2013:

RESPONSE: SDNS CASUALTIES


An individual focused more on sensationalism than content deemed it appropriate to publish an
article declaring networking engineers an endangered species on an industry press web site that I
considered somewhat reliable in the past.
The resulting flurry of expected blog posts included an interesting one from Steven Iveson in which
he made a good point: its easy for the cream-of-the-crop not to be concerned, but what about
others lower down the pile. As always, it makes sense to do a bit of reality check.

While everyone talks about SDN, the products are scarce, and it will take years before theyll
appear in a typical enterprise network. Apart from NECs Programmable Flow and overlay
networks, most other SDN-washed things Ive seen are still point products.

Overlay virtual networks seem to be the killer app of the moment. They are extremely useful and
versatile ... if youre not bound to VLANs by physical appliances. Well have to wait for at least
another refresh cycle before we get rid of them.

Data center networking is hot and sexy, but its only a part of what networking is. I havent seen
a commercial SDN app for enterprise WAN, campus or wireless (Im positive Im wrong write a
comment to correct me), because thats not where the VCs are looking at the moment.

Also, consider that the my job will be lost to technology sentiments started approximately 200 years
ago and yet the population has increased by almost an order of magnitude in the meantime, there

Copyright ipSpace.net 2014

Page 2-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

are obviously way more jobs now (in absolute terms) than there were in those days, and nobody in
his right mind wants to do the menial chores that the technology took over.
Obviously you should be worried if youre a VLAN provisioning technician. However, with everyone
writing about SDN you know whats coming down the pipe, and you have a few years to adapt,
expand the scope of your knowledge, and figure out where it makes sense to move (and dont forget
to focus on where you can add value, not what job openings you see today). If you dont do any of
the above, dont blame SDN when the VLANs (finally) join the dinosaurs and you have nothing left to
configure.
Finally, Im positive there will be places using VLANs 20 years from now. After all, AS/400s and
APPN are still kicking and people are still fixing COBOL apps (that IBM just made sexier with XML
and Java support).

Copyright ipSpace.net 2014

Page 2-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OPENFLOW BASICS

Based on exorbitant claims made by the industry press you might have concluded there must be
some revolutionary concepts in the OpenFlow technology. Nothing could be further from the truth
OpenFlow is a very simple technology that allows a controller to program forwarding entries in a
networking device.
Did you ever encounter Catalyst 5000 with Route Switch Module (RSM), or a combination of Catalyst
5000 and an external router, using Multilayer Switching (MLS)? Those products used architecture
identical to OpenFlow almost 20 years ago, the only difference being the relative openness of
OpenFlow protocol.
This chapter will answer a number of basic OpenFlow questions, including:

MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:

Start with the SDN, OpenFlow and NFV Resources page;

Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;

Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);

2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;

Finally, Im always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

What is OpenFlow?

What can different versions of OpenFlow do?

How can a controller implement control-plane protocols (like LACP, STP or routing protocols)
and does it have to?

Can we deploy OpenFlow in combination with traditional forwarding mechanisms?

IN THIS CHAPTER:
MANAGEMENT, CONTROL AND DATA PLANES IN NETWORK DEVICES AND SYSTEMS
WHAT EXACTLY IS THE CONTROL PLANE?
WHAT IS OPENFLOW?
WHAT IS OPENFLOW (PART 2)?
OPENFLOW PACKET MATCHING CAPABILITIES
OPENFLOW ACTIONS
OPENFLOW DEPLOYMENT MODELS
FORWARDING MODELS IN OPENFLOW NETWORKS
YOU DONT NEED OPENFLOW TO SOLVE EVERY AGE-OLD PROBLEM
OPENFLOW AND IPSILON: NOTHING NEW UNDER THE SUN

Copyright ipSpace.net 2014

Page 3-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OPENFLOW: BIOS DOES NOT A SERVER MAKE


SDN CONTROLLER NORTHBOUND API IS THE CRUCIAL MISSING PIECE
IS OPENFLOW THE BEST TOOL FOR OVERLAY VIRTUAL NETWORKS?
IS OPENFLOW USEFUL?

Copyright ipSpace.net 2014

Page 3-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The fundamental principle underlying OpenFlow and Software Defined Networking (as defined by
Open Networking Foundation) is the decoupling of control- and data plane, with data (forwarding)
plane running in a networking device (switch or router) and control plane being implemented in a
central controller, which controls numerous dumb devices. Lets start with the basics what are
data, control and management planes?

MANAGEMENT, CONTROL AND DATA PLANES IN


NETWORK DEVICES AND SYSTEMS
Every single network device (or a distributed system like QFabric) has to perform at least three
distinct activities:

Process the transit traffic (thats why we buy them) in the data plane;

Figure out whats going on around it with the control plane protocols;

Interact with its owner (or Network Management System NMS) through the management
plane.

Routers are used as a typical example in every text describing the three planes of operation, so lets
stick to this time-honored tradition:

Interfaces, IP subnets and routing protocols are configured through management plane
protocols, ranging from CLI to NETCONF and the latest buzzword northbound RESTful API;

Copyright ipSpace.net 2014

Page 3-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Router runs control plane routing protocols (OSPF, EIGRP, BGP ) to discover adjacent devices
and the overall network topology (or reachability information in case of distance/path vector
protocols);

Router inserts the results of the control-plane protocols into Routing Information Base (RIB) and
Forwarding Information Base (FIB). Data plane software or ASICs uses FIB structures to forward
the transit traffic.

Management plane protocols like SNMP can be used to monitor the device operation, its
performance, interface counters

Copyright ipSpace.net 2014

Page 3-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 3-1: Management, control and data planes

The management plane is pretty straightforward, so lets focus on a few intricacies of the control
and data planes.
We usually have routing protocols in mind when talking about Control plane protocols, but in reality
the control plane protocols perform numerous other functions including:

Interface state management (PPP, LACP);

Connectivity management (BFD, CFM);

Copyright ipSpace.net 2014

Page 3-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Adjacent device discovery (hello mechanisms present in most routing protocols, ES-IS, ARP, IPv6
ND, uPNP SSDP);

Topology or reachability information exchange (IP/IPv6 routing protocols, IS-IS in TRILL/SPB,


STP);

Service provisioning (RSVP for IntServ or MPLS/TE, uPNP SOAP calls);

Data plane should be focused on forwarding packets but is commonly burdened by other activities:

NAT session creation and NAT table maintenance;

Neighbor address gleaning (example: dynamic MAC address learning in bridging, IPv6 SAVI);

Netflow Accounting (sFlow is cheap compared to Netflow);

ACL logging;

Error signaling (ICMP).

Data plane forwarding is hopefully performed in dedicated hardware or in high-speed code (within
the interrupt handler on low-end Cisco IOS routers), while the overhead activities usually happen on
the device CPU (sometimes even in userspace processes the switch from high-speed forwarding to
user-mode processing is commonly called punting).
In reactive OpenFlow architectures a punting decision sends a packet all the way to the
OpenFlow controller.

Copyright ipSpace.net 2014

Page 3-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Regardless of the implementation details, its obvious the device CPU represents a significant
bottleneck (in some cases the switch to CPU-based forwarding causes several magnitudes lower
performance) the main reason one has to rate-limit ACL logging and protect the device CPU with
Control Plane Protection features.

Copyright ipSpace.net 2014

Page 3-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

It seems its easy to define what a network device control plane is (and how its different from the
data plane) until someone starts unearthing the interesting corner cases.

WHAT EXACTLY IS THE CONTROL PLANE?


Tassos opened an interesting can of worms in a comment to my Management, Control and Data
Planes post: Is ICMP response to a forwarded packet (TTL exceeded, fragmentation needed or
destination unreachable) a control- or data-plane activity?
Other control plane protocols (BGP, OSPF, LDP, LACP, BFD ...) are more clear-cut they run
between individual network devices (usually adjacent, but theres also targeted LDP and multihop
BGP) and could be (at least in theory) made to run across a separate control plane network (or
VRF).
Control plane protocols usually run over data plane interfaces to ensure shared fate if the
packet forwarding fails, the control plane protocol fails as well but there are scenarios
(example: optical gear) where the data plane interfaces cannot process packets, forcing you
to run control plane protocols across a separate set of interfaces.
Typical control plane protocols arent data-driven: BGP, LACP or BFD packet is never sent as a direct
response to a data plane packet.
ICMP is different: some ICMP packets are sent as replies to other ICMP packets, others are triggered
by data plane packets (ICMP unreachables and ICMPv6 neighbor discovery).

Copyright ipSpace.net 2014

Page 3-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Trying to classify protocols based on where theyre run is also misleading. Its true that the
networking device CPU almost always generates ICMP requests and responses (it doesnt make
sense to spend silicon real estate to generate ICMP responses). In some cases, ICMP packets might
be generated in the slow path, but thats just how a particular network operating system works.
Lets ignore those dirty details for the moment; just because a devices CPU touches a packet
doesnt make that packet a control plane packet.
Vendor terminology doesnt help us either most vendors talk about Control Plane Policing or
Protection. These mechanisms usually apply to control plane protocols as well as data plane packets
punted from ASICs to the device CPU.
Even IETF terminology isnt exactly helpful while C in ICMP does stand for Control, it doesnt
necessarily imply control plane involvement. ICMP is simply a protocol that passes control messages
(as opposed to user data) between IP devices.
Honestly, Im stuck. Is ICMP a control plane protocol thats triggered by data plane activity or is it a
data plane protocol? Can you point me to an authoritative source explaining what ICMP is? Share
your thoughts in the comments!

Copyright ipSpace.net 2014

Page 3-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Now that we know what data, control and management planes are, lets see how OpenFlow fits into
the picture.

WHAT IS OPENFLOW?
A typical networking device (bridge, router, switch, LSR ...) runs all the control protocols (including
port aggregation, STP, TRILL, MAC address learning and routing protocols) in the control plane
(usually implemented in central CPU or supervisor module), and downloads the forwarding
instructions into the data plane structures, which can be simple lookup tables or specialized
hardware (hash tables or TCAMs).
In architectures with distributed forwarding hardware the control plane has to use a communications
protocol to download the forwarding information into data plane instances. Every vendor uses its
own proprietary protocol (Cisco uses IPC InterProcess Communication to implement distributed
CEF); OpenFlow tries to define a standard protocol between control plane and associated data plane
elements.
The OpenFlow zealots would like you to believe that were just one small step away from
implementing Skynet; the reality is a bit more sobering. You need a protocol between control and
data plane elements in all distributed architectures, starting with modular high-end routers and
switches. Almost every modular high-end switch that you can buy today has one or more supervisor
modules and numerous linecards performing distributed switching (preferably over a crossbar
matrix, not over a shared bus). In such a switch, OpenFlow-like protocol runs between supervisor
module(s) and the linecards.

Copyright ipSpace.net 2014

Page 3-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Moving into more distributed space, the fabric architectures with central control plane (HPs IRF,
Ciscos VSS) use an OpenFlow-like protocol between the central control plane and forwarding
instances.
You might have noticed that all vendors support a limited number of high-end switches in a central
control plane architecture (Ciscos VSS cluster has two nodes and HPs IRF cluster can have up to
four high-end switches). This decision has nothing to do with vendor lock-in and lack of open
protocols but rather reflects the practical challenges of implementing a high-speed distributed
architecture (alternatively, you might decide to believe the whole networking industry is a
confusopoly of morons who are unable to implement what every post-graduate student can simulate
with open source tools).
Moving deeper into the technical details, the OpenFlow Specs page on the OpenFlow web site
contains a link to the OpenFlow Switch Specification v1.1.0, which defines:

OpenFlow tables (the TCAM structure used by OpenFlow);

OpenFlow channel (the session between an OpenFlow switch and an OpenFlow controller);

OpenFlow protocol (the actual protocol messages and data structures).

The designers of OpenFlow had to make the TCAM structure very generic if they wanted to offer an
alternative to numerous forwarding mechanisms implemented today. Each entry in the flow tables
contains the following fields: ingress port, source and destination MAC address, ethertype, VLAN tag
& priority bits, MPLS label & traffic class (starting with OpenFlow 1.1), IP source and destination
address (and masks), layer-4 IP protocol, IP ToS bits and TCP/UDP port numbers.

Copyright ipSpace.net 2014

Page 3-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

To make the data plane structures scalable, OpenFlow 1.1 introduces a concept of multiple flow
tables linked into a tree (and group tables to support multicasts and broadcasts). This concept
allows you to implement multi-step forwarding, for example:

Check inbound ACL (table #1)

Check QoS bits (table #2)

Match local MAC addresses and move into L3/MPLS table; perform L2 forwarding otherwise
(table #3)

Perform L3 or MPLS forwarding (tables #4 and #5).

You can pass metadata between tables to make the architecture even more versatile.
The proposed flow table architecture is extremely versatile (and Im positive theres a PhD thesis
being written proving that it is a superset of every known and imaginable forwarding paradigm), but
it will have to meet the harsh reality before well see a full-blown OpenFlow switch products. You can
implement the flow tables in software (in which case the versatility never hurts, but youll have to
wait a few years before the Moore Law curve catches up with terabit speeds) or in hardware where
the large TCAM entries will drive the price up.

Copyright ipSpace.net 2014

Page 3-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

I started getting more detailed OpenFlow questions after the initial What is OpenFlow post, and
tried to answer the most common ones in a follow-up post.

WHAT IS OPENFLOW (PART 2)?


Heres a typical list of questions Im getting from my readers:
I dont think OpenFlow is clearly defined yet. Is it a protocol? A model for Control plane
Forwarding plane FP interaction? An abstraction of the forwarding-plane? An automation
technology? Is it a virtualization technology? I dont think there is consensus on these things
yet.
OpenFlow is very well defined. Its a control plane (controller) data plane (switch) protocol that
allows control plane to:

Modify forwarding entries in the data plane;

Send control protocol (or data) packets through any port of any controlled data-plane devices;

Receive (and process) packets that cannot be handled by the data plane forwarding rules. These
packets could be control-plane protocol packets (for example, LLDP) or user data packets that
need special processing.

As part of the protocol, OpenFlow defines abstract data plane structures (forwarding table entries)
that have to be implemented by OpenFlow-compliant forwarding devices (switches).
Is it an abstraction of the forwarding plane? Yes, as far as it defines data structures that can be used
in OpenFlow messages to update data plane forwarding structures.

Copyright ipSpace.net 2014

Page 3-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Is it an automation technology? No, but it can be used to automate the network deployments.
Imagine a cluster of OpenFlow controllers with shared configuration rules that use packet carrying
capabilities of OpenFlow protocol to discover network topology (using LLDP or a similar protocol),
build a shared topology map of the network, and use it to download forwarding entries into the
controlled data planes (switches). Such a setup would definitely automate new device provisioning in
a large-scale network.
Alternatively, you could use OpenFlow to create additional forwarding (actually packet dropping)
entries in access switches or wireless access points deployed throughout your network, resulting in a
scalable multi-vendor ACL solution.
Is it a virtualization technology? Of course not. However, its data structures can be used to perform
MAC address, IP address or MPLS label lookup and push user packets into VLANs (or push additional
VLAN tags to implement Q-in-Q) or MPLS-labeled frames, so you can implement most commonly
used virtualization techniques (VLANs, Q-in-Q VLANs, L2 MPLS-based VPNs or L3 MPLS-based VPNs)
with it.
Theres no reason you couldnt control soft switch (embedded in the hypervisor) with OpenFlow. An
open-source hypervisor switch implementation (Open vSwitch) that has many extensions for
virtualization is already available and can be used with Xen/XenServer (its the default networking
stack in XenServer 6.0) or KVM.
Open vSwitch became the de-facto OpenFlow switch reference implementation. Its used by
many hardware and software vendors, including VMware, which uses Open vSwitch in the
multi-hypervisor version of NSX.

Copyright ipSpace.net 2014

Page 3-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Im positive the list of Open vSwitch extensions is hidden somewhere in its somewhat cryptic
documentation (or you could try to find them in the source code), but the list of OpenFlow 1.2
proposals implemented by Open vSwitch or sponsored by Nicira should give you some clues:

IPv6 matching with IPv6 header rewrite;

Virtual Port Tunnel configuration protocol and GRE/L3 tunnel support.

Controller master/slave switch. A must for resilient large-scale solutions.

Summary: OpenFlow is like C++. You can use it to implement all sorts of interesting solutions, but
its just a tool.

Copyright ipSpace.net 2014

Page 3-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow can match on almost any field in layer-2 (Ethernet, MPLS, 802.1Q, PBB, MPLS), layer-3
(IPv4 and IPv6) and layer-4 (TCP and UPD) headers. Heres an overview covering OpenFlow version
1.0 through 1.3.

OPENFLOW PACKET MATCHING CAPABILITIES


The original OpenFlow specification (version 1.0) allowed a controller to specify matches on MACand IPv4 addresses in forwarding entries downloaded to OpenFlow switches. Later versions of
OpenFlow protocol added matching capabilities on almost all fields encountered in typical modern
networks as shown in the following table (see release notes of the latest OpenFlow specification for
more details).
Match condition

Version

Input port

1.0

Ethernet source and destination MAC addresses

1.0

Ethernet frame type

1.0

VLAN tag

1.0

802.1p value

1.0

Copyright ipSpace.net 2014

Page 3-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Match condition

Version

802.1ad (Q-in-Q) VLAN tags

1.1

Provider Backbone Bridging (PBB 802.1ah)

1.3

MPLS tags

1.1

MPLS bottom-of-stack matching

1.3

NETWORK LAYER MATCHING


Source and destination IP addresses (with subnet masks)

1.0

ToS/DSCP bits

1.0

Layer-4 IP protocol

1.0

IP addresses in ARP packets

1.0

IPv6 header fields (addresses, traffic class, higher-level protocols)

1.2

IPv6 extension headers

1.3

TRANSPORT LAYER MATCHING


TCP and UDP port numbers

Copyright ipSpace.net 2014

1.0

Page 3-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Match condition

Version

SCTP port numbers

1.1

ICMP type and code fields

1.0

ICMPv6 support

1.2

OTHER OPTIONS
Extensible matching (matching on any bit pattern)

1.2

OpenFlow switches might not support all match conditions specified in the OpenFlow version
they support. For example, most data center switches dont support MPLS or PBB matching.
Furthermore, some switches might implement certain matching actions in software. For
example, early OpenFlow code for HP Procurve switches implemented layer-3 forwarding in
hardware and layer-2 forwarding in software, resulting in significantly reduced forwarding
performance.

Copyright ipSpace.net 2014

Page 3-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

After matching a packet, an OpenFlow forwarding entry performs a list of actions on the matched
packet. This blog post lists actions supported in OpenFlow versions 1.0 through 1.3.

OPENFLOW ACTIONS
Every OpenFlow forwarding entry has two components:

Flow match specification, which can use any combination of fields listed in the previous table;

List of actions to be performed on the matched packets.

Initial OpenFlow specification contained the basic actions one needs to implement MAC- and IPv4
forwarding as well as actions one might need to implement NAT or load balancing. Later versions of
the OpenFlow protocol added support for MPLS, IPv6 and Provider Backbone Bridging (PBB).
An OpenFlow switch OpenFlow switches might not support all actions specified in the
OpenFlow version they support. For example, most switches dont support MAC, IP address
or TCP/UDP port number rewrites.

OpenFlow action

Version

Send to output port (or normal processing)

1.0

Set output queue

1.1

Copyright ipSpace.net 2014

Page 3-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow action
Process the packet through specified group (example: LAG or fast

Version
1.1

failover)
Drop packet

1.0

Send input packet to controller

1.0

Add or remove 802.1q VLAN ID and 802.1p priority

1.0

Rewrite source or destination MAC address

1.0

Add or remove 802.1ad (Q-in-Q) tags

1.1

Provider Backbone Bridging (PBB 802.1ah) push and pop

1.3

Push or pop MPLS tags

1.1

NETWORK LAYER ACTIONS


Rewrite source or destination IP address

1.0

Rewrite DSCP header

1.0

Decrement TTL

1.1

Copyright ipSpace.net 2014

Page 3-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow action

Version

TRANSPORT LAYER ACTIONS


Rewrite TCP or UDP port numbers

1.0

Rewrite TCP and UDP port numbers

1.0

OTHER OPTIONS
Extensible rewriting (rewriting any bit pattern)

Copyright ipSpace.net 2014

1.2

Page 3-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The all-or-nothing approach to OpenFlow was quickly replaced with a more realistic approach. An
OpenFlow-only deployment is potentially viable in dedicated greenfield environments, but even there
its sometimes better to rely on functionality already available in networking devices instead of
reinventing all the features and protocols that were designed, programmed, tested and deployed in
the last 20 years.
Not surprisingly, the traditional networking vendors quickly moved from OpenFlow-only approach to
a plethora of hybrid solutions.

OPENFLOW DEPLOYMENT MODELS


I hope you never believed the OpenFlow networking nirvana hype in which smart open-source
programmable controllers control dumb low-cost switches, busting the networking = mainframes
model and bringing the Linux-like golden age to every network. As the debates during the OpenFlow
symposium clearly illustrated, the OpenFlow reality is way more complex than it appears at a first
glance.
To make it even more interesting, at least four different models for OpenFlow deployment have
already emerged:

Copyright ipSpace.net 2014

Page 3-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

NATIVE OPENFLOW
The switches are totally dumb; the controller performs all control-plane functions, including running
control-plane protocols with the outside world. For example, the controller has to use packet-out
messages to send LACP, LLDP and CDP packets to adjacent servers and packet-in messages to
process inbound control-plane packets from attached devices.
This model has at least two serious drawbacks even if we ignore the load placed on the controller by
periodic control-plane protocols:

The switches need IP connectivity to the controller for the OpenFlow control session. They can
use out-of-band network (where OpenFlow switches appear as IP hosts), similar to the QFabric
architecture. They could also use in-band communication sufficiently isolated from the OpenFlow
network to prevent misconfigurations (VLAN 1, for example), in which case they would probably
have to run STP (at least in VLAN 1) to prevent bridging loops.

Fast control loops like BFD are hard to implement with a central controller, more so if you want
to have very fast response time.

NEC seems to be using this model quite successfully (although they probably have a few
extensions), but already encountered inherent limitations: a single controller can control up to ~50
switches and rerouting around failed links takes around 200 msec (depending on the network size).
For more details, watch their Networking Tech Field Day presentation.
NEC has since enhanced the scalability of their controller a single controller cluster can
manage over a 200 switches.

Copyright ipSpace.net 2014

Page 3-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

NATIVE OPENFLOW WITH EXTENSIONS


A switch controlled entirely by the OpenFlow controller could perform some of the low-level controlplane functions independently. For example, it could run LLDP and LACP, and bundle physical links
into port channels (link aggregation groups). Likewise, it could perform load balancing across
multiple links without involvement of the controller.
OpenFlow got multipathing support in version 1.1. In late 2013 there are only a few
commercially-available switches supporting OpenFlow 1.3 (vendors decided to skip versions
1.1 and 1.2).

Some controller vendors went down that route and significantly extended OpenFlow 1.1. For
example, Nicira has added support for generic pattern matching, IPv6 and load balancing.
Needless to say, the moment you start using OpenFlow extensions or functionality implemented
locally on the switch, you destroy the mirage of the nirvana described at the beginning of the article
were back in the muddy waters of incompatible extensions and hardware compatibility lists. The
specter of Fiber Channel looms large.

SHIPS IN THE NIGHT


Switches have traditional control plane; OpenFlow controller manages only certain ports or VLANs on
trunked links. The local control plane (or linecards) can perform the tedious periodic tasks like
running LACP, LLDP and BFD, passing only the link status to the OpenFlow controller. The controller-

Copyright ipSpace.net 2014

Page 3-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

to-switch communication problem is also solved: the TCP session between them traverses the nonOpenFlow part of the network.
This approach is commonly used in academic environments where OpenFlow is running in parallel
with the production network. Its also one of the viable pilot deployment models.

INTEGRATED OPENFLOW
OpenFlow classifiers and forwarding entries are integrated with the traditional control plane. For
example, Junipers OpenFlow implementation inserts compatible flow entries (those that contain only
destination IP address matching) as ephemeral static routes into RIB (Routing Information Base).
OpenFlow-configured static routes can also be redistributed into other routing protocols.

Figure 3-2: Integrated OpenFlow (source: Juniper's presentation @ OpenFlow Symposium)

Copyright ipSpace.net 2014

Page 3-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Going a step further, Junipers OpenFlow model presents routing tables (including VRFs) as virtual
interfaces to the OpenFlow controller (or so it was explained to me). Its thus possible to use
OpenFlow on the network edge (on user-facing ports), and combine the flexibility it offers with
traditional routing and forwarding mechanisms.
From my perspective, this approach makes most sense: dont rip-and-replace the existing network
with a totally new control plane, but augment the existing well-known mechanisms with functionality
thats currently hard (or impossible) to implement. Youll obviously lose the vague promised benefits
of Software Defined Networking, but I guess that the ability to retain field-proven mechanisms while
adding customized functionality and new SDN applications more than outweighs that.

Copyright ipSpace.net 2014

Page 3-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

An OpenFlow network can emulate any network behavior supported by its components (hardware or
virtual switches), from hop-by-hop forwarding to path-based forwarding paradigms.

FORWARDING MODELS IN OPENFLOW NETWORKS


A few days ago Tom (@NetworkingNerd) Hollingsworth asked a seemingly simple question:
OpenFlow programs hop-by-hop packet forwarding, right? No tunnels? and wasnt satisfied with
my standard answer, so heres a longer explanation.
Before we get started, keep in mind OpenFlow is just a tool that one can use (or not) in
numerous environments. Toms question is (almost) equivalent to C programs use string
functions, right? Some do, some dont, depends on what youre trying to do.

POINT OPENFLOW DEPLOYMENTS


Sometimes you can solve your problem by using OpenFlow on individual (uncoupled) devices.
Typical use cases:

Edge security policy authenticate users (or VMs) and deploy per-user ACLs before
connecting a user to the network (example: IPv6 first-hop security);

Programmable SPAN ports use OpenFlow entries on a single switch to mirror selected traffic
to SPAN port;

DoS traffic blackholing use OpenFlow to block DoS traffic as close to the source as possible,
using N-tuples for more selective traffic targeting than the more traditional RTBH approach.

Copyright ipSpace.net 2014

Page 3-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Traffic redirection use OpenFlow to redirect interesting subset of traffic to network services
appliance (example: IDS).

Using OpenFlow on one or more isolated devices is simple (no interaction with adjacent devices) and
linearly scalable you can add more devices and controllers as needed because theres no tight
coupling anywhere in the system.

FABRIC OPENFLOW DEPLOYMENTS


Most OpenFlow products being developed these days try to solve the OpenFlow fabric use case
(because existing data center fabrics and Internet clearly dont work, right?). In these scenarios the
OpenFlow controller manages all the switches in the forwarding path and has to install forwarding
entries on every one of them.
Not surprisingly, developers of these products took different approaches based on their
understanding of networking challenges and limitations of OpenFlow devices.
Some solutions (example: VMware NSX) bypass the complexities of fabric forwarding by
establishing end-to-end something-over-IP tunnels, effectively reducing the fabric to a
single hop.
Path-based forwarding. Install end-to-end path forwarding entries into the fabric and assign user
traffic to paths at the edge nodes (aka Edge and Core OpenFlow). Bonus points if youre smart
enough to pre-compute and install backup paths.
If this looks like a description of MPLS LSPs, FECs and FRR, youre spot on. There are only so many
ways you can solve a problem in a scalable way.

Copyright ipSpace.net 2014

Page 3-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The dirty details of path-based forwarding vary based on the hardware capabilities of the switches
you use and your programming preferences. Using MPLS or PBB would be the cleanest option
those packet formats are well understood by network troubleshooting tools, so an unlucky engineer
trying to fix a problem in an OpenFlow-based fabric would have a fighting chance.
Unfortunately you wont see much PBB or MPLS in OpenFlow products any time soon they require
OpenFlow 1.3 (or vendor extensions) and hardware support thats often lacking in switches used for
OpenFlow forwarding these days. OpenFlow controller developers are trying to bypass those
problems with creative uses of packet headers (VLAN or MAC rewrite comes to mind), making a
troubleshooters job much more interesting.
Hop-by-hop forwarding. Install flow-matching N-tuples in every switch along the path. Results in
an architecture that works great in PowerPoint and lab tests, but breaks down in anything remotely
similar to a production network due to scalability problems, primarily FIB update challenges.
If an OpenFlow controller using hop-by-hop forwarding paradigm implements proactive flow
installation (install N-tuples based on configuration and topology), it just might work in small
deployments. If it uses reactive flow installation (punt new flows to the controller, install microflow
entries on every hop for each new flow), it deserves a nomination for Darwin Award.

WHY DOES IT MATTER?


Would you buy a core router that only supports RIPv1? Would you use a solution that uses PBR
instead of routing protocols? Would you use NetFlow-based forwarding with flows being instantiated
by a central router (remember Multi-Layer Switching on Cat5000)? Probably not weve learned the
hard way which protocols and architectures work and which ones dont.

Copyright ipSpace.net 2014

Page 3-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow is an emerging technology, and youll stumble upon numerous vendors (from startups to
major brand names) selling you OpenFlow-based solutions (and pixie dust). Its important to
understand how these solutions work behind the scenes when evaluating them. Everything will work
great in your 2-node proof-of-concept lab, but you might encounter severe scalability limitations in
real-life deployment.

Copyright ipSpace.net 2014

Page 3-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Networking engineers reactions to OpenFlow were easy to predict from this will never work to
heres how I can solve my problem with OpenFlow. It turns out we can solve many problems
without involving OpenFlow; the traditional networking protocols are often good enough.

YOU DONT NEED OPENFLOW TO SOLVE EVERY AGEOLD PROBLEM


Two great blog posts appeared almost simultaneously: evergreen Fallacies of Distributed Computing
from Bob Plankers and forward-looking Understanding Hadoop Clusters and the Network from Brad
Hedlund. Read them both before continuing (they are both great reads) and try to figure out why
Im mentioning them in the same sentence (no, its not the fact that Hadoop uses distributed
computing).
OK, heres the quote that ties them together. While describing rack awareness Brad wrote:
What is NOT cool about Rack Awareness at this point is the manual work required to
define it the first time, continually update it, and keep the information accurate. If the
rack switch could auto-magically provide the Name Node with the list of Data Nodes it
has, that would be cool. Or vice versa, if the Data Nodes could auto-magically tell the
Name Node what switch theyre connected to, that would be cool too. Even more
interesting would be a OpenFlow network, where the Name Node could query the
OpenFlow controller about a Nodes location in the topology.

Copyright ipSpace.net 2014

Page 3-32

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The only problem with Brads reasoning is that we already have the tools to do exactly what hes
looking for. The magic acronym is LLDP (802.1AB).
LLDP has been standardized years ago and is available on numerous platforms, including Catalyst
and Nexus switches, and Linux operating system (for example, lldpad is part of the standard Fedora
distribution). Not to mention that every DCB-compliant switch must support LLDP as the DCBX
protocol uses LLDP to advertise DCB settings between adjacent nodes.
The LLDP MIB is standard and allows anyone with SNMP read access to discover the exact local LAN
topology the connected port names, adjacent nodes (and their names), and their management
addresses (IPv4 or IPv6). The management addresses that should be present in LLDP
advertisements can then be used to expand the topology discovery beyond the initial set of nodes
(assuming your switches do include it in LLDP advertisement; for example, NX-OS does but Force10
doesn't).
Building the exact network topology from LLDP MIB is a very trivial exercise. Even a somewhat
reasonable API is available (yeah, having an API returning a network topology graph would be even
cooler). Mapping the Hadoop Data Nodes to ToR switches and Name Nodes can thus be done on
existing gear using existing protocols.
Would OpenFlow bring anything to the table? Actually not, it also needs packets exchanged between
adjacent devices to discover the topology and the easiest thing for OpenFlow controllers to use is ...
ta-da ... LLDP ... oops, OFDP, because LLDP just wasnt good enough. The only difference is that in
the traditional network the devices would send LLDP packets themselves, whereas in the OpenFlow
world the controller would use Packet-Out messages of the OpenFlow control session to send LLDP
packets from individual controlled devices and wait for Packet-In messages from other device to
discover which device received them.

Copyright ipSpace.net 2014

Page 3-33

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The Linux configuration wouldnt change much. If you want the switches to see the hosts, you still
have to run LLDP (or OFDP or whatever you call it) daemon on the hosts.
Last but definitely not least, you could use well-defined SNMP protocol with a number of readilyavailable Linux or Windows libraries to read the LLDP results available in the SNMP MIB in the old
world devices. Im still waiting to see the high-level SDN/OpenFlow API; everything Ive seen so far
are OpenFlow virtualization attempts (multiple controllers accessing the same devices) and
discussions indicating standard API isnt necessarily a good idea. Really? Havent you learned
anything from the database world?
So, why did I mention the two posts at the beginning of this article? Because Bob pointed out that
those who cannot remember the past are condemned to fulfill it. At the moment, OpenFlow seems
to fit the bill perfectly.

Copyright ipSpace.net 2014

Page 3-34

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Were not coming to the skeptic part of this chapter. Lets start with an easy observation: ideas
similar to OpenFlow were floated in 1990s (and failed miserably).

OPENFLOW AND IPSILON: NOTHING NEW UNDER THE


SUN
Several companies were trying to solve the IP+ATM integration problem in mid-nineties, most of
them using IP-based architectures (Cisco, IBM, 3Com), while Ipsilon tried its luck with a flow-based
solutions.
I found a great overview of IP+ATM solutions in an article published on the University of Washington
web site. This is what the article has to say about Ipsilons approach (and if you really want to know
the details, read GSMP (RFC 1987) and Ipsilon Flow Management Protocol (RFC 1953)):
An IP switch controller routes like an ordinary router, forwarding packets on a default VC.
However, it also performs flow classification for traffic optimization.
Replace IP switch controller with OpenFlow controller and default VC with switch-to-controller
OpenFlow session.
Once a flow is identified, the IP switch sets up a cut-through connection by first
establishing a VC for subsequent flow traffic, and then by asking the upstream node to
use this VC.

Copyright ipSpace.net 2014

Page 3-35

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Likewise, some people propose downloading 5-tuples or 12-tuples in all the switches along the flow
path. The only difference is that 15 years ago engineers understood virtual circuit labels use fewer
resources than 5-to-12-tuple policy-based routing.
As expected, Ipsilons approach had a few scaling issues. From the same article:
The bulk of the criticism, however, relates to Ipsilon's use of virtual circuits. Flows are
associated with application-to-application conversations and each flow gets its very own
VC. Large environments like the Internet with millions of individual flows would exhaust
VC tables.
Not surprisingly, a number of people (myself included) that still remember a bit of the networking
history are making the exact same argument about usage of microflows in OpenFlow environments
... but it seems RFC 1925 (section 2.11) will yet again carry the day.
An hour after publishing this blog post, I realized (reading an article by W.R. Koss) that Ed
Crabbe mentioned Ipsilon being the first attempt at SDN during his OpenFlow Symposium
presentation.

Copyright ipSpace.net 2014

Page 3-36

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Continuing the skeptic streak: do you really expect to get a network operating system just because
you have a protocol that allows you to download forwarding tables into a switch?
The blog post was written in 2011, when the shortcomings of OpenFlow werent that well
understood. Three years later (August 2014), all we have is a single production-grade commercial
controller (NEC ProgrammableFlow).

OPENFLOW: BIOS DOES NOT A SERVER MAKE


Greg (@etherealmind) Ferro invited me to a Packet Pushers podcast discussing OpenFlow with Matt
Davey (then working at University of Indiana). I was pleasantly surprised by Matts realistic attitude
(you should really listen to the whole podcast), it was nice to hear that theyre running a countrywide pilot with OpenFlow-enabled switches deployed at several universities, and some of the
applications he mentioned (for example, the capability to download ACLs into the switch from your
customized application) definitely tickled my inner geek. However, Im even more convinced that the
brouhaha surrounding Open Networking Foundation has little grounds in the realities of OpenFlow.
Remember: OpenFlow is a protocol allowing controlling software to download forwarding table
entries into one or more switches (which can be L2, L3 or LSR switches). Any OpenFlow-based
solution requires two components: the switching hardware with OpenFlow-capable firmware and the
controlling software using the OpenFlow protocol.
The OpenFlow protocol will definitely enable many copycat vendors to buy merchant silicon, put it
together and start selling their product with little investment in R&D (like the PC motherboard
manufacturers are doing today). I am also positive the silicon manufacturers (like Broadcom) will

Copyright ipSpace.net 2014

Page 3-37

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

have How to build OpenFlow Switch with Our Chipset application notes available as soon as they
find OpenFlow commercially viable. Hopefully well see another Dell (or HP) emerge, producing lowcost reasonable-quality products in the low-end to mid-range market ... but all these switches will
still need networking software controlling them.
If youre old enough to remember the original PCs from IBM, youll easily recognize the parallels.
IBM documented PC hardware architecture and BIOS API (you even got BIOS source code), allowing
numerous third-party vendors to build adapter cards (and later PC clones), but all those machines
had to run an operating system ... and most of them used MS-DOS (and later Windows). Almost
three decades later, vast majority of PCs still run on Microsofts operating systems.
Some people think that the potential adoption of OpenFlow protocol will magically materialize opensource software to control the OpenFlow switches, breaking the bonds of proprietary networking
solutions. In reality, the companies that invested heavily in networking software (Cisco, Juniper, HP
and a few others) might be the big winners ... if they figure out fast enough that they should morph
into software-focused companies.
Cisco has clearly realized the winds are changing and started talking about inclusion of OpenFlow in
NX-OS operating system. I would bet their first OpenFlow implementation wont be an OpenFlowenabled Nexus switch.

Copyright ipSpace.net 2014

Page 3-38

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Moving a bit further, you cannot program a controller unless it has a well-defined API you can use
(the northbound API). More than two years after the creation of the Open Networking Foundation,
we still dont have a specification (not even a public draft), and ever controller vendor uses a
different API. The situation might improve with the release of Open Daylight, an open-source
OpenFlow controller that will (if it becomes widely used) set a de-facto standard.

SDN CONTROLLER NORTHBOUND API IS THE CRUCIAL


MISSING PIECE
Imagine youd like to write a simple Perl (or Python, Ruby, JavaScript you get the idea) script to
automate a burdensome function on your server (or router/switch from any vendor running
Linux/BSD behind the scenes) that the vendor never bothered to implement. The script interpreter
relies on numerous APIs being available from the operating system from process API (to load and
start the interpreter) to file system API, console I/O API, memory management API, and probably a
few others.
Now imagine none of those APIs would be standardized (various mutually incompatible dialects of
Tcl used by Cisco IOS come to mind) thats the situation were facing in the SDN land today.
If we accept the analogy of OpenFlow being the x86 instruction set (its actually more like the pcode machine from UCSD Pascal days, but lets not go there today), and all we want to do is to write
a simple script that will (for example) redirect the backup-to-tape traffic to secondary path during
peak hours, we need a standard API to get the network topology, create a path across the network,

Copyright ipSpace.net 2014

Page 3-39

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

and create an ingress Forwarding Equivalence Class (FEC) to map the backup traffic to that path. In
short, we need whats called SDN Controller Northbound API.

THERE IS NO STANDARD NORTHBOUND API


I have some bad news for you: nobody is working on standardizing such an API (read a great
summary by Brad Casemore, and make sure to read all the articles he linked to).
Are you old enough to remember the video games for early IBM PC? None of them used MS-DOS.
They were embedded software solutions that you had to boot off a floppy disk (remember those?)
and then they took over all the hardware you had. Thats exactly what we have in the SDN land
today.
Dont try to tell me Ive missed Flowvisor an OpenFlow controller that allocates slices of actual
hardware to individual OpenFlow controllers. I havent; but using Flowvisor to solve this problem is
like using Xen (or KVM or ESXi) to boot multiple embedded video games in separate VMs. Not highly
useful for a regular guy trying to steer some traffic around the network (or any one of the other
small things that bother us), is it?
Also, dont tell me each SDN controller has an API. While NEC and startups like Big Switch Networks
are creating something akin to a network operating system that we could use to program our
network (no, I really dont want to deal with the topology discovery and fast failover myself), and
each one of them has an API, no two APIs are even remotely similar.
I still remember the days when there were at least a dozen operating systems running on top of
8088 processor, and it was a mission impossible to write a meaningful application that would run on
only a few of them without major porting efforts.

Copyright ipSpace.net 2014

Page 3-40

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

LETS SPECULATE
There might be several good reasons for the current state of affairs:

The only people truly interested in OpenFlow are the Googles of the world (Nicira is using
OpenFlow purely as an information transfer tool to get MAC-to-IP mappings into their
vSwitches);

Developers figure out all sorts of excellent reasons why their dynamic and creative work couldnt
possibly be hammered into tight confines of a standard API;

Nobody is interested in creating a Linux-like solution; everyone is striving to achieve the


maximum possible vendor lock-in;

We still dont know what were looking for.

The reality is probably a random mixture of all four (and a few others), but that doesnt change the
basic facts: until theres a somewhat standard and stable API (like SQL-86) that I could use with
SDN controllers from multiple vendors, Im better off using Cisco ONE or Junos XML API, otherwise
Im just trading lock-ins (as ecstatic users of umbrella network management systems would be more
than happy to tell you).
On the other hand, if I stick with Cisco or Juniper (and implement a simple abstraction layer in my
application to work with both APIs) at least I could be pretty positive theyll still be around in a year
or two.

Copyright ipSpace.net 2014

Page 3-41

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

When you have a hammer, every problem seems like a nail. Nicira and later Open Daylight tried to
implement network virtualization with OpenFlow. As it turns out, they might have used a wrong tool.

IS OPENFLOW THE BEST TOOL FOR OVERLAY VIRTUAL


NETWORKS?
Overlay virtual networks were the first commercial-grade OpenFlow use case Niciras Network
Virtualization Platform (NVP now VMware NSX for Multiple Hypervisors) used OpenFlow to program
the hypervisor virtual switches (Open vSwitches OVS).
OpenStack is using the same approach in its OVS Neutron plugin, and it seems Open Daylight aims
to reinvent that same wheel, replacing OVS plugin running on the hypervisor host agent with central
controller.
Does that mean that one should use OpenFlow to implement overlay virtual networks? Not really,
OpenFlow is not exactly the best tool for the job.

EASY START: ISOLATED LAYER-2 OVERLAY NETWORKS


Most OVS-based solutions (VMware NSX for Multiple Hypervisors, OpenStack ) use OpenFlow to
program forwarding entries in hypervisor virtual switches. In an isolated layer-2 overlay virtual
network OpenFlow isnt such a bad fit after all, the hypervisor virtual switches need nothing more
than mapping between VM MAC addresses and hypervisor transport IP addresses, and that
information is readily available in the cloud orchestration system.

Copyright ipSpace.net 2014

Page 3-42

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The OpenFlow controller can thus proactively download the forwarding information to the switches,
and stay out of the forwarding path, ensuring reasonable scalability.
BTW, even this picture isnt all rosy Nicira had to implement virtual tunnels to work around the
OpenFlow point-to-point interface model.

THE FIRST GLITCHES: LAYER-2 GATEWAYS


Adding layer-2 gateways to overlay virtual networks reveals the first shortcomings of OpenFlow.
Once the layer-2 environment stops being completely deterministic (layer-2 gateways introduce the
need for dynamic MAC learning), the solution architects have only a few choices:

Perform dynamic MAC learning in the OpenFlow controller all frames with unknown source MAC
addresses are punted to the controller, which builds the dynamic MAC address table and
downloads the modified forwarding information to all switches participating in a layer-2 segment.
This is the approach used by NECs ProgrammableFlow solution.
Drawback: controller gets involved in the data plane, which limits the scalability of the solution.

Offload dynamic MAC learning to specialized service nodes, which serve as an intermediary
between the predictive static world of virtual switching, and the dynamic world of VLANs. It
seems NVP used this approach in one of the early releases.
Drawback: The service nodes become an obvious chokepoint; an additional hop through a
service node increases latency.

Give up, half-ditch OpenFlow, and implement either dynamic MAC learning in virtual switches in
parallel with OpenFlow, or reporting of dynamic MAC addresses to the controller using a nonOpenFlow protocol (to avoid data path punting to the controller). It seems recent versions of
VMware NSX use this approach.

Copyright ipSpace.net 2014

Page 3-43

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

THE KILLER: DISTRIBUTED LAYER-3 FORWARDING


Every layer-2 overlay virtual networking solution must eventually support distributed layer-3
forwarding (the customers that matter usually want that for one reason or another). Regardless of
how you implement the distributed forwarding, hypervisor switches need ARP entries (see this blog
post for more details), and have to reply to ARP queries from the virtual machines.
Even without the ARP proxy functionality, someone has to reply to the ARP queries for the
default gateway IP address.

ARP is a nasty beast in an OpenFlow world its a control-plane protocol and thus not
implementable in the pure OpenFlow switches. The implementers have (yet again) two choices:

Punt the ARP packets to the controller, which yet again places the OpenFlow controller in the
forwarding path (and limits its scalability);

Solve layer-3 forwarding with a different tool (approach used by VMware NSX and distributed
layer-3 forwarding in OpenStack Icehouse).

DO WE REALLY NEED OPENFLOW?


With all the challenges listed above, does it make sense to use OpenFlow to control overlay virtual
networks? Not really. OpenFlow is like a Swiss Army knife (or a duck) it can solve many problems,
but is not ideal for any one of them.

Copyright ipSpace.net 2014

Page 3-44

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Instead of continuously adjusting the tool to make it fit for the job, lets step back a bit and ask
another question: what information do we really need to implement layer-2 and layer-3 forwarding
in an overlay virtual network? All we need are three simple lookup tables that can be installed via
any API mechanism of your choice (Hyper-V uses PowerShell)

IP forwarding table;

ARP table;

VM MAC-to-underlay IP table.
Some implementations would have a separate connected interfaces table; other
implementations would merge that with the forwarding table. There are also
implementations merging ARP and IP forwarding tables.

These three tables, combined with local layer-2 and layer-3 forwarding is all you need. Wouldnt it
be better to keep things simple instead of introducing yet-another less-than-perfect abstraction
layer?

Copyright ipSpace.net 2014

Page 3-45

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The blog post explaining how OpenFlow doesnt fit the needs of overlay virtual networks triggered a
flurry of questions along the lines of do you think theres no need for OpenFlow? Heres the
response:

IS OPENFLOW USEFUL?
OpenFlow is just a tool that allows you to install PBR-like forwarding entries into networking devices
using a standard protocol that should work across multiple vendors (more about that in another blog
post). From this perspective OpenFlow offers the same functionality as BGP FlowSpec or ForCES,
and a major advantage: its already implemented in networking gear from numerous vendors.
Where could you use PBR-like functionality? Im positive you already have a dozen ideas with
various levels of craziness; here are a few more:

Network monitoring (flow entries have counters);

Intelligent SPAN ports that collect only the traffic youre interested in;

Transparent service insertion;

Scale-out stateful network services;

Distributed DoS prevention;

Policy enforcement (read: ACLs) at the network edge.

OpenFlow has another advantage over BGP FlowSpec it has the packet-in and packet-out
functionality that allows the controller to communicate with the devices outside of the OpenFlow
network. You could use this functionality to implement new control-plane protocols or (for example)
interesting layered authentication scheme that is not available in off-the-shelf switches.

Copyright ipSpace.net 2014

Page 3-46

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Summary: OpenFlow is a great low-level tool that can help you implement numerous interesting
ideas, but I wouldnt spend my time reinventing the switching fabric wheel (or other things we
already do well).

Copyright ipSpace.net 2014

Page 3-47

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OPENFLOW IMPLEMENTATION NOTES

Its easy to say OpenFlow allows you to separate the forwarding and control planes, and control
multiple devices from a single controller, but how do you implement the control plane? How does
the control plane interact with the outside world? How do you implement legacy protocols in an
OpenFlow controller and do you have to implement them? Youll get answers to all these questions
in this chapter.
Can you build an OpenFlow-based network with existing hardware? Is it possible to build a multivendor network? These questions are answered in the second half of the chapter, which focuses on
vendor-specific implementation details.

MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:

Start with the SDN, OpenFlow and NFV Resources page;

Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;

Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);

2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;

Finally, Im always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

IN THIS CHAPTER:
CONTROL PLANE IN OPENFLOW NETWORKS
IS OPEN VSWITCH CONTROL PLANE IN-BAND OR OUT-OF-BAND?
IMPLEMENTING CONTROL-PLANE PROTOCOLS WITH OPENFLOW
LEGACY PROTOCOLS IN OPENFLOW-BASED NETWORKS
OPENFLOW 1.1 IN HARDWARE: I WAS WRONG
OPTIMIZING OPENFLOW HARDWARE TABLES
OPENFLOW SUPPORT IN DATA CENTER SWITCHES
MULTI-VENDOR OPENFLOW MYTH OR REALITY?
HYBRID OPENFLOW, THE BROCADE WAY
OPEN DAYLIGHT INTERNET EXPLORER OR LINUX OF THE SDN WORLD?

Copyright ipSpace.net 2014

Page 4-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

How do you build a control plane network in a distributed controller-based system? How does the
controller communicate with the devices it controls? Should it use in-band or out-of-band
communication? This blog post, written in late 2013, tries to provide some answers.

CONTROL PLANE IN OPENFLOW NETWORKS


Its easy to say SDN is the physical separation of the network control plane from the forwarding
plane, and where a control plane controls several devices, handwave over the details, and let
someone else figure them out. Implementing that concept in a reliable manner is a totally different
undertaking.

OPENFLOW CONTROL PLANE 101


In an OpenFlow-based network architecture, the controller (or a cluster of redundant controllers)
implements control-plane functionality: discovering the network topology and external endpoints (or
adjacent network devices), computing the forwarding entries that have to be installed into individual
network devices, and downloading them into controlled network devices using OpenFlow protocol.

Copyright ipSpace.net 2014

Page 4-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-1: OpenFlow control plane 101

OpenFlow is an application-level protocol running on top of TCP (and optionally TLS) the controller
and controlled device are IP hosts using IP connectivity services of some unspecified control plane
network. Does that bring back fond memories of SDH/SONET days? It should.

WHO WILL BUILD THE CONTROL PLANE NETWORK?


The answer youll get from academic-minded OpenFlow zealots is likely thats out of scope, lets
focus on the magic new stuff the separation of control- and data plane brings you. Pretty useless,
right? We need a more detailed answer before we start building OpenFlow-based solutions.
As always, history is our best teacher: similar architectures commonly used out-of-band controlplane networks.

Copyright ipSpace.net 2014

Page 4-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OUT-OF-BAND OPENFLOW CONTROL PLANE


The easiest (but not necessarily the most cost effective) approach to OpenFlow control plane
network is to build a separate network connecting management ports of OpenFlow switches with
OpenFlow controller. NEC is using this approach in their ProgrammableFlow solution, as is Juniper in
its QFabric architecture. Theres something slightly ironic in this approach: you have to build a
traditional L2 or L3 network to control the new gear.

Figure 4-2: Out-of-band OpenFlow control plane network

Copyright ipSpace.net 2014

Page 4-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

You could (in theory) build another OpenFlow-controlled network to implement controlplane network you need, but youd quickly end with turtles all the way down.

On the other hand, out-of-band control plane network is safe: we know how to build a robust L3
network with traditional gear, and a controller bug cannot disrupt the control-plane communication.
I would definitely use this approach in data center environment, where the costs of implementing a
dedicated 1GE control-plane network wouldnt be prohibitively high.
Would the same approach work in WAN/Service Provider environments? Of course it would after
all, weve been using it forever to manage traditional optical gear. Does it make sense? It definitely
does if you already have an out-of-band network, less so if someone asks you to build a new one to
support their bleeding-edge SDN solution.

IN-BAND CONTROL PLANE


Its possible (in theory) to get OpenFlow switches working with in-band control plane, but its a
complex and potentially risky undertaking. To get an understanding of the complexities involved,
read the relevant Open vSwitch documentation, which succinctly explains the challenges and the
OVS solution.
That solution would work under optimal circumstances on properly configured switches, but I would
still use an out-of-band control plane in networks with transit OpenFlow-controlled switches (a
transit switch being a switch passing control-plane traffic between controller and another switch).

Copyright ipSpace.net 2014

Page 4-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

BUT GOOGLE GOT IT TO WORK


No, they didnt. They use OpenFlow within the data center edge to control low-cost fixedconfiguration switches they used to implement a large-scale routing device. They still run IS-IS and
BGP between data centers, and use something functionally equivalent to PCEP to download centrally
computed traffic-engineering tunnels into the data center edge routers.

Copyright ipSpace.net 2014

Page 4-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

A few days after I wrote the Control plane in OpenFlow networks blog post, I got a comment
saying we worked really hard to implement numerous safeguards that make Open vSwitch in-band
control plane safe. Heres the whole story:

IS OPEN VSWITCH CONTROL PLANE IN-BAND OR OUTOF-BAND?


A few days ago I described how most OpenFlow data center fabric solutions use out-of-band control
plane (separate control-plane network). Can we do something similar when running OpenFlow
switch (example: Open vSwitch) in a hypervisor host?
TL&DR answer: Sure we can. Does it make sense? It depends.
Open vSwitch supports in-band control plane, but thats not the focus of this post.

If you buy servers with a half dozen interfaces (I wouldn't), then it makes perfect sense to follow the
usual design best practices published by hypervisor vendors, and allocate a pair of interfaces to user
traffic, another pair to management/control plane/vMotion traffic, and a third pair to storage traffic.
Problem solved.

Copyright ipSpace.net 2014

Page 4-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-3: Interfaces dedicated to individual hypervisor functions

Buying servers with two 10GE uplinks (what I would do) definitely makes your cabling friend happy,
and reduces the overall networking costs, but does result in slightly more interesting hypervisor
configuration.
Best case, you split the 10GE uplinks into multiple virtual uplink NICs (example: Cisco/s Adapter
FEX, Broadcom's NIC Embedded Switch, or SR-IOV) and transform the problem into a known
problem (see above) but what if you're stuck with two uplinks?

Copyright ipSpace.net 2014

Page 4-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-4: Logical interfaces created on physical NICs appear as physical interfaces to the hypervisor

OVERLAY VIRTUAL NETWORKS TO THE RESCUE


If you implement all virtual networks (used by a particular hypervisor host) with overlay virtual
networking technology, you don't have a problem. The virtual switch in the hypervisor (for example,
OVS) has no external connectivity; it just generates IP packets that have to be sent across the
transport network. The uplinks are thus used for control-plane traffic and encapsulated user traffic the OpenFlow switch is never touching the physical uplinks.

Copyright ipSpace.net 2014

Page 4-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-5: Overlay virtual networks are not connected to the physical NICs

Copyright ipSpace.net 2014

Page 4-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

INTEGRATING OPENFLOW SWITCH WITH PHYSICAL NETWORK


Finally, there's the scenario where an OpenFlow-based virtual switch (usually OVS) provides VLANbased switching, and potentially interferes with control-plane traffic running over shared uplinks.
Most products solve this challenge by somehow inserting the control-plane TCP stack in parallel with
the OpenFlow switch.

Figure 4-6: Hypervisor TCP/IP stack running in parallel with the Open vSwitch

Copyright ipSpace.net 2014

Page 4-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

For example, OVS Neutron agent creates a dedicated bridge for each uplink, and connects OVS
uplinks and the host TCP/IP stack to the physical uplinks through the per-interface bridge. That
setup ensures the control-plane traffic continues to flow even when a bug in Neutron agent or OVS
breaks VM connectivity across OVS. For more details see OpenStack Networking in Too Much Detail
blog post published on RedHat OpenStack site.

Figure 4-7: External bridges used by Neutron OVS plugin

Copyright ipSpace.net 2014

Page 4-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Regardless of how an OpenFlow-based network is implemented, it has to exchange information with


the outside world routing protocol information with adjacent routers, STP BPDUs with adjacent
switches, and LACP control frames with all adjacent devices (including some servers).
Similar to the forwarding model, the OpenFlow controller designers could use numerous
implementation paths.

IMPLEMENTING CONTROL-PLANE PROTOCOLS WITH


OPENFLOW
The true OpenFlow zealots would love you to believe that you can drop whatever youve been doing
before and replace it with a clean-slate solution using dumbest (and cheapest) possible switches and
OpenFlow controllers.
In real world, your shiny new network has to communicate with the outside world or you could
take the approach most controller vendors did, decide to pretend STP is irrelevant, and ask people
to configure static LAGs because youre also not supporting LACP.

HYBRID-MODE OPENFLOW WITH TRADITIONAL CONTROL PLANE


If youre implementing hybrid-mode OpenFlow, youll probably rely on the traditional software
running in the switches to handle the boring details of control-plane protocols and use OpenFlow
only to add new functionality (example: edge access lists).

Copyright ipSpace.net 2014

Page 4-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Needless to say, this approach usually wont result in better forwarding behavior. For example, it
would be hard to implement layer-2 multipathing in hybrid OpenFlow network if the switches rely on
STP to detect and break the loops.

OPENFLOW-BASED CONTROL PLANE


In an OpenFlow-only network, the switches have no standalone control plane logic, and thus the
OpenFlow controller (or a cluster of controllers) has to implement the control plane and controlplane protocols. This is the approach Google used in their OpenFlow deployment the OpenFlow
controllers run IS-IS and BGP with the outside world.
OpenFlow protocol provides two messages the controllers can use to implement any control-plane
protocol they wish:

The Packet-out message is used by the OpenFlow controller to send packets through any port of
any controlled switch.

The Packet-in message is used to send messages from the switches to the OpenFlow controller.
You could configure the switches to send all unknown packets to the controller, or set up flow
matching entries (based on controllers MAC/IP address and/or TCP/UDP port numbers) to select
only those packets the controller is truly interested in.

For example, you could write a very simple implementation of STP (similar to what Avaya is doing
on their ERS-series switches when they run MLAG) where the OpenFlow controller would always
pretend to be the root bridge and shut down any ports where inbound BPDUs would indicate
someone else is the root bridge:

Copyright ipSpace.net 2014

Page 4-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Get the list of ports with Read State message;

Send BPDUs through all the ports claiming the controller is the root bridge with very high
priority;

Configure flow entries that match the multicast destination address used by STP and forward
those packets to the controller;

Inspect incoming BPDUs, and shut down the port if the BPDU indicates someone else claims to
be a root bridge.

SUMMARY
OpenFlow protocol allows you to implement any control-plane protocol you wish in the OpenFlow
controller; if a controller does not implement the protocols you need in your data center, its not due
to lack of OpenFlow functionality, but due to other factors (fill in the blanks).
If the OpenFlow product youre interested in uses hybrid-mode OpenFlow (where the control plane
resides in the traditional switch software) or uses OpenFlow to program overlay networks (example:
Niciras NVP), you dont have to worry about its control-plane protocols.
If, however, someone tries to sell you software thats supposed to control your physical switches,
and does not support the usual set of protocols you need to integrate the OpenFlow-controlled
switches with the rest of your network (example: STP, LACP, LLDP on L2 and some routing protocol
on L3), think twice. If you use the OpenFlow-controlled part of the network in an isolated fabric or
small-scale environment, you probably dont care whether the new toy supports STP or OSPF; if you
want to integrate it with the rest of your existing data center network, be very careful.

Copyright ipSpace.net 2014

Page 4-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Most OpenFlow controller vendors try to ignore the legacy control plane protocols; after all, theres
no glory to be had in implementing LACP, LLDP or STP. Their myopic vision might hinder the success
of your OpenFlow deployment, as youll have to integrate the new network with the legacy
equipment.

LEGACY PROTOCOLS IN OPENFLOW-BASED


NETWORKS
Im positive your CIO will get a visit from a vendor offering clean-slate OpenFlow/SDN-based data
center fabrics in not so distant future. At that moment, one of the first questions you should ask is
how well does your new wonderland integrate with my existing network? or more specifically
which L2 and L3 protocols do you support?
At least one of the vendors offering OpenFlow controllers that manage physical switches has a
simple answer: use static LAG to connect your existing gear with our OpenFlow-based network
(because our controller doesnt support LACP), use static routes (because we dont run any routing
protocols) and dont create any L2 loops in your network (because we also dont have STP). If you
wonder how reliable that is, you obviously havent implemented a redundant network with static
routes before.
However, to be a bit more optimistic, the need for legacy protocol support depends primarily on how
the new solution integrates with your network.

Copyright ipSpace.net 2014

Page 4-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Overlay solutions (like VMware NSX) dont interact with the existing network at all. A hypervisor
running Open vSwitch and using STT or GRE appears as an IP host to the network, and uses existing
Linux mechanisms (including NIC bonding and LACP) to solve the L2 connectivity issues.
Layer-2 gateways included with VMware NSX for multiple hypervisors support STP and
LACP. VM-based gateways included with VMware NSX for vSphere run routing protocols
(BGP, OSPF and IS-IS) and rely on underlying hypervisors support of layer-2 control plane
protocols (LACP and LLDP).
Hybrid OpenFlow solutions that only modify the behavior of the user-facing network edge (example:
per-user access control) are also OK. You should closely inspect what the product does and ensure it
doesnt modify the network device behavior you rely upon in your network, but in principle you
should be fine. For example, the XenServer vSwitch Controller modifies just the VM-facing behavior,
but not the behavior configured on uplink ports.
Rip-and-replace OpenFlow-based network fabrics are the truly interesting problem. Youll have to
connect existing hosts to them, so youd probably want to have LACP support (unless youre a
VMware-only shop), and theyll have to integrate with the rest of the network, so you should ask for
at least:

LACP, if you plan to connect anything but vSphere hosts to the fabric and youll probably need
a device to connect the OpenFlow-based part of the network to the outside world;

LLDP or CDP. If nothing else, they simplify troubleshooting, and they are implemented on almost
everything including vSphere vSwitch.

STP unless the OpenFlow controller implements split horizon bridging like vSpheres vSwitch, but
even then we need basic things like BPDU guard.

Copyright ipSpace.net 2014

Page 4-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

A routing protocol if the OpenFlow-based solution supports L3 (OSPF comes to mind).

Call me a grumpy old man, but I wouldnt touch an OpenFlow controller that doesnt support the
above-mentioned protocols. Worst case, if I would be forced to implement a network using such a
controller, I would make sure its totally isolated from the rest of my network. Even then a single
point of failure wouldnt make much sense, so I would need two firewalls or routers and static
routing in redundant scenarios breaks sooner or later. You get the picture.
To summarize: dynamic link status and routing protocols were created for a reason. Dont allow
glitzy new-age solutions to daze you, or you just might experience a major headache down the road.

Copyright ipSpace.net 2014

Page 4-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In 2011 I thought we might have to wait a few years before seeing the first products supporting
multiple lookup tables introduced by OpenFlow 1.1. I was wrong about the lack of hardware support
for OpenFlow 1.1 the first proof-of-concept products appeared a few months later. Unfortunately
that product never became mainstream because the hardware it uses is too expensive we had to
wait till September 2013 to get first production-grade OpenFlow 1.3 switches (almost all vendors
decided to skip OpenFlow versions 1.1 and 1.2).

OPENFLOW 1.1 IN HARDWARE: I WAS WRONG


Earlier this month I wrote well probably have to wait at least a few years before well see a fullblown hardware product implementing OpenFlow 1.1. (and probably repeated something along the
same lines in during the OpenFlow Packet Pushers podcast). I was wrong (and I wont split hairs and
claim that an academic proof-of-concept doesnt count). Here it is: @nbk1 pointed me to a 100 Gbps
switch implementing the latest-and-greatest OpenFlow 1.1.
The trick lies in the NP-4 network processors from EZchip. These amazing beasts are powerful
enough to handle the linked tables required by OpenFlow 1.1; the researchers just had to
implement the OpenFlow API and compile OpenFlow TCAM structures into NP-4 microcode.
I have to admit Im impressed (and as some people know, thats not an easy task). It doesnt
matter whether the solution can handle full 100 Gbps or what the pps figures are; they got very far
very soon using off-the-shelf hardware, so it shouldnt be impossibly hard to repeat the performance
and launch a commercial product. The only question is the price of the NP-4 chipset (including
associated TCAM they were using) can someone build a reasonably-priced switch out of that
hardware?

Copyright ipSpace.net 2014

Page 4-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Initial hardware OpenFlow implementations installed OpenFlow forwarding rules in TCAM (the
specialized memory used to implement packet filters and policy-based routing), resulting in a dismal
maximum number of forwarding entries. Most vendors quickly realized its possible to combine
multiple hardware tables available in their switching silicon, and present them as a single table to an
OpenFlow controller.

OPTIMIZING OPENFLOW HARDWARE TABLES


Initial OpenFlow hardware implementations used a simplistic approach: install all OpenFlow entries
in TCAM (the hardware thats used to implement ACLs and PBR) and hope for the best.
That approach was good enough to get you a tick-in-the-box on RFP responses, but it fails miserably
when you try to get OpenFlow working in a reasonably sized network. On the other hand, many
problems people try to solve with OpenFlow, like data center fabrics, involve simple destination-only
L2 or L3 switching.
Problems that can be solved with destination-only L2- or L3 switching are so similar to what
were doing with traditional routing protocols that I keep wondering whether it makes sense
to reinvent that particular well-working wheel, but lets not go there.
The switching hardware vendors realized in the last months what the OpenFlow developers were
doing and started implementing forwarding optimizations they would install OpenFlow entries that
require 12-tuple matching in TCAM, and entries that specify only destination MAC address or
destination IP prefix in L2- and L3 switching structures (usually hash tables for L2 switching and

Copyright ipSpace.net 2014

Page 4-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

some variant of binary tree for L3 switching). The two or three switching tables would appear as a
single OpenFlow table to the controller, and the hardware switch would be able to install more flows.
Quite ingenious;)
The vendors using this approach include Arista (L2), Cisco (L2), and Dell Force 10 (L2 and L3). HP is
using both MAC table and TCAM in its 5900 switch, but presents them as two separate tables to the
OpenFlow controller (at least that was my understanding of their documentation please do correct
me if I got it wrong), pushing the optimization challenge back to the controller.

Copyright ipSpace.net 2014

Page 4-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In spring 2014 most data center switching vendors supported OpenFlow on at least some of their
products. Heres an overview documenting the state of data center switching market in May 2014:

OPENFLOW SUPPORT IN DATA CENTER SWITCHES


Good news: In the last few months, almost all major data center Ethernet switching vendors
(Arista, Cisco, Dell Force 10, HP, and Juniper) released documented GA version of OpenFlow on
some of their data center switches.
Bad news: no two vendors have even remotely comparable functionality.
All the information in this blog post comes from publicly available vendor documentation
(configuration guides, command references, release notes). NEC is the only vendor
mentioned in this blog post that does not have public documentation, so its impossible to
figure out (from the outside) what functionality their switches support.
Some other facts:

Most vendors offer OpenFlow 1.0. Exceptions: HP and NEC;

Most vendors have a single OpenFlow lookup table (one of the limitations of OpenFlow 1.0), HP
has a single table on 12500, two tables on 5900, and a totally convoluted schema on Procurve
switches.

Most vendors work with a single controller. Ciscos Nexus switches can work with up to 8
concurrent controllers, HP switches with up to 64 concurrent controllers.

Copyright ipSpace.net 2014

Page 4-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Many vendors optimize the OpenFlow lookup table by installing L2-only or L3-only flow entries in
dedicated hardware (which still looks like the same table to the OpenFlow controller);

OpenFlow table sizes remain dismal. Most switches support low thousands of 12-tuple flows.
Exception: NEC edge switches supports between 64K and 160K 12-tuple flows.

While everyone supports full 12-tuple matching (additionally, HP supports IPv6, MPLS, and PBB),
almost no one (apart from HP) offers significant packet rewrite functionality. Most vendors can
set destination MAC address or push a VLAN tag; HPs 5900 can set any field in the packets,
copy/decrement IP or MPLS TTL, and push VLAN, PBB or MPLS tags.

Summary: Its neigh impossible to implement anything but destination-only L2+L3 switching at
scale using existing hardware (the latest chipsets from Intel or Broadcom arent much better) and
I wouldnt want to be a controller vendor dealing with idiosyncrasies of all the hardware out there
all you can do consistently across most hardware switches is forward packets (without rewrites),
drop packets, or set VLAN tags.

Copyright ipSpace.net 2014

Page 4-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Based on the state of OpenFlow support in existing data center switches (see the previous post), its
fair to ask the question is it realistic to expect multi-vendor OpenFlow deployments? The answer I
got in May 2013 was no, unless you want to live with extremely baseline functionality. The
situation wasnt any better in August 2014 when this chapter was last updated.

MULTI-VENDOR OPENFLOW MYTH OR REALITY?


NEC demonstrated multi-vendor OpenFlow network @ Interop Las Vegas, linking physical switches
from Arista, Brocade, Centec, Dell, Extreme, Intel and NEC, and virtual switches in Linux (OVS) and
Hyper-V (PF1000) environments in a leaf-and-spine fabric controlled by ProgrammableFlow
controller (watch the video of Samrat Ganguly demonstrating the network).
Does that mean weve entered the era of multi-vendor OpenFlow networking? Not so fast.
You see, building real-life networks with fast feedback loops and fast failure reroutes is hard. It took
NEC years to get a stable well-performing implementation, and they had to implement numerous
OpenFlow 1.0 extensions to get all the features they needed. For example, they circumvented the
flow update rate challenges by implementing a very smart architecture effectively equivalent to the
Edge+Core OpenFlow ideas.
In a NEC-only ProgrammableFlow network, the edge switches (be they PF5240 GE switches or
PF1000 virtual switches in Hyper-V environment) do all the hard work, while the core switches do
simple path forwarding. Rerouting around a core link failure is thus just a matter of path rerouting,
not flow rerouting, reducing the number of entries that have to be rerouted by several orders of
magnitude.

Copyright ipSpace.net 2014

Page 4-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-8: Interop 2013 OpenFlow demo network (source: NEC Corporation of America)

In a mixed-vendor environments, ProgrammableFlow controller obviously cannot use all the smarts
of the PF5240 switches; it has to fall back to the least common denominator (vanilla OpenFlow 1.0)
and install granular flows in every single switch along the path, significantly increasing the time it
takes to install new flows after a core link failure.

Copyright ipSpace.net 2014

Page 4-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Will the multi-vendor OpenFlow get any better? It might OpenFlow 1.3 has enough functionality to
implement the Edge+Core design, but of course there arent too many OpenFlow 1.3 products out
there ... and even the products that have been announced might not have the features
ProgrammableFlow controller needs to scale the OpenFlow fabric.
For the moment, the best advice I can give you is If you want to have a working OpenFlow data
center fabric, stick with NEC-only solution.

Copyright ipSpace.net 2014

Page 4-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Most traditional data center switching vendors implemented hybrid OpenFlow functionality that
allows an OpenFlow controller to manage individual ports or VLANs instead of the whole switch.
Brocade was probably the first vendor that shipped a working solution (in June 2012).

HYBRID OPENFLOW,THE BROCADE WAY


A few days after Brocade unveiled its SDN/OpenFlow strategy, Katie Bromley organized a phone call
with Keith Stewart who kindly explained to me some of the background behind their current
OpenFlow support. Apart from the fact that it runs on the 100GE adapters, the most interesting part
is their twist on the hybrid OpenFlow deployment.
The traditional hybrid OpenFlow model (what Keith called hybrid switch) is well known (and
supported by multiple vendors): an OpenFlow-capable switch has two forwarding tables (or FIBs), a
regular one (built from source MAC address gleaning or routing protocol information) and an
OpenFlow-controlled one. Some ports of the switch use one of the tables, other ports the other.
Effectively, a hardware switch supporting hybrid switch OpenFlow is split into two independent
switches that operate in a ships-in-the-night fashion.
More interesting is the second hybrid mode Brocade supports: the hybrid port mode, where the
OpenFlow FIB augments the traditional FIB. Brocades switches using hybrid port approach can
operate in protected or unprotected mode:

Protected hybrid port mode uses OpenFlow FIB for certain VLANs or packets matching a packet
filter (ACL). This mode allows you to run OpenFlow in parallel (ships-in-the-night) with the

Copyright ipSpace.net 2014

Page 4-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

traditional forwarding over the same port a major win if youre not willing to spend money for
two 100GE ports (one for OpenFlow traffic, another for regular traffic).

Unprotected hybrid port mode performs a lookup in OpenFlow FIB first and uses the traditional
FIB as a fallback mechanism (in case theres no match in the OpenFlow table). This mode can be
used to augment the traditional forwarding mechanisms (example: OpenFlow-controlled PBR) or
create value-added services on top of (not in parallel with) the traditional network.

The set of applications that one can build with the hybrid OpenFlow is well known from policybased routing and traffic engineering to bandwidth-on-demand. However, Brocade MLX has one
more trick up its sleeve: it supports packet replication actions that can be used to implement
behavior similar to IP Multicast or SPAN port functionality. You can use that feature in environments
that need reliable packet delivery over UDP to increase the chance that at least a single copy of the
packet will reach the destination.
I like the hybrid approach Brocade took (its quite similar to what Juniper is doing with its integrated
OpenFlow) and the interesting new features (like the packet replication), but the big question
remains unanswered: where are the applications (aka OpenFlow controllers)? At the moment,
everyone (Brocade included) is partnering with NEC or demoing their gear with public-domain
controllers. Is this really the best the traditional networking vendors can do? I sincerely hope not.

Copyright ipSpace.net 2014

Page 4-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Is Open Daylight the right answer to the controller wars that seemed inevitable in early 2013?
Heres my take (written in February 2013):

OPEN DAYLIGHT INTERNET EXPLORER OR LINUX OF


THE SDN WORLD?
Youve probably heard that the networking hardware vendors decided to pool resources to create an
open-source OpenFlow controller. Just in case youre wondering whether they lost their mind (no,
they didnt), heres my cynical take.
Are you old enough to remember how Microsoft killed the browser market? After the World Wide
Web exploded (and caught Microsoft totally unprepared), there was a blooming browser market
(with Netscape being the absolute market leader). Microsoft couldnt compete in that market with an
immature product (Internet Explorer) and decided its best to destroy the market. They made
Internet Explorer freely available and the rest is history after the free product won the browser
wars (its hard to beat free and good enough) it took years for reasonable alternatives to emerge.
Not surprisingly, browser innovation almost stopped until Internet Explorer lost its dominant market
position.
Even if you dont remember Netscape Navigator, youve probably heard of Linux. Have you ever
wondered how you could get a high-quality open-source operating system for free? Check the list of
top Linux contributors (page 9-11 of the Linux Kernel Development report) Red Hat, Intel, Novell
and IBM. You might wonder why Intel and IBM invest in Linux. Its simple: the less users have to

Copyright ipSpace.net 2014

Page 4-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

pay for the operating systems, the more money will be left to buy hardware. For more details, you
absolutely have to read Be Wary of Geeks Bearing Gifts by Simon Wardley.
So what will Daylight be? Another Internet Explorer (killing the OpenFlow controller market, Big
Switch in particular) or another Linux (a good product ensuring OpenFlow believers continue
spending money on hardware, not software)? I'm hoping we'll get a robust networking Linux, but
your guess is as good as mine.

Copyright ipSpace.net 2014

Page 4-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OPENFLOW SCALABILITY CHALLENGES

An architecture in which a central controller run the control plane and uses attached devices as pure
forwarding elements has numerous scalability challenges, including:

Flow-based forwarding paradigm doesnt scale;

Hop-by-hop forwarding paradigm imposes significant overhead in large-scale networks. Path


forwarding paradigm works much better;

Existing hardware (data center switches) supports low thousands of full OpenFlow entries,
making it useless for large-scale deployments;

Existing hardware switches can install at most a few thousand new flow entries per second;

Data plane punting and packet forwarding to the controller in existing switches is extremely slow
when compared to the regular data plane forwarding performance.

MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:

Start with the SDN, OpenFlow and NFV Resources page;

Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;

Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);

2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;

Finally, Im always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

This chapter describes numerous challenges every OpenFlow controller implementation has to
overcome to work well in large-scale environments. Use it as a (partial) checklist when evaluating
OpenFlow controller products and solutions.

IN THIS CHAPTER:
OPENFLOW FABRIC CONTROLLERS ARE LIGHT-YEARS AWAY FROM WIRELESS ONES
OPENFLOW AND FERMI ESTIMATES
50 SHADES OF STATEFULNESS
FLOW TABLE EXPLOSION WITH OPENFLOW 1.0 (AND WHY WE NEED OPENFLOW
1.3)
FLOW-BASED FORWARDING DOESNT WORK WELL IN VIRTUAL SWITCHES
PROCESS, FAST AND CEF SWITCHING AND PACKET PUNTING
CONTROLLER-BASED PACKET FORWARDING IN OPENFLOW NETWORKS
CONTROL-PLANE POLICING IN OPENFLOW NETWORKS
PREFIX-INDEPENDENT CONVERGENCE (PIC): FIXING THE FIB BOTTLENECK
FIB UPDATE CHALLENGES IN OPENFLOW NETWORKS

Copyright ipSpace.net 2014

Page 5-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

FORWARDING STATE ABSTRACTION WITH TUNNELING AND LABELING


EDGE AND CORE OPENFLOW (AND WHY MPLS IS NOT NAT)
EDGE PROTOCOL INDEPENDENCE: ANOTHER BENEFIT OF EDGE-AND-CORE LAYERING
VIRTUAL CIRCUITS IN OPENFLOW 1.0 WORLD
MPLS IS NOT TUNNELING
WHY IS OPENFLOW FOCUSED ON L2-4?
DOES CPU-BASED FORWARDING PERFORMANCE MATTER FOR SDN?
OPENFLOW AND THE STATE EXPLOSION

Copyright ipSpace.net 2014

Page 5-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow controllers are usually compared with wireless controllers (particularly when someone
tries to prove that theyre a good idea). Nothing could be further from the truth.

OPENFLOW FABRIC CONTROLLERS ARE LIGHT-YEARS


AWAY FROM WIRELESS ONES
When talking about OpenFlow and the whole idea of controller-based networking, people usually say
well, its nothing radically new, weve been using wireless controllers for years and they work well,
so the OpenFlow ones will work as well.
Unfortunately the comparison is totally misleading.
While OpenFlow-based data center fabrics and wireless controller-based networks look very similar
on a high-level PowerPoint diagram, in reality theyre light-years apart. Here are just a few
dissimilarities that make OpenFlow-based fabrics so much more complex than the wireless
controllers.

TOPOLOGY MANAGEMENT
Wireless controllers work with the devices on the network edge. A typical wireless access point has
two interfaces: a wireless interface and an Ethernet uplink, and the wireless controller isnt
managing the Ethernet interface or any control-plane protocols that interface might have to run. The
wireless access point communicates with the controller through an IP tunnel and expects someone

Copyright ipSpace.net 2014

Page 5-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

else to provide IP connectivity, routing and failure recovery. The underlying physical topology of the
network is thus totally abstracted and invisible to the wireless controller.
Data center fabrics are built from high-speed switches with tens of 10/40GE ports, and the
OpenFlow controller must manage topology discovery, topology calculation, flow placement, failure
detection and fast rerouting. There are zillions of things you have to do in data center fabrics that
you never see in a controller-based wireless network.

TRAFFIC FLOW
In traditional wireless networks all traffic flows through the controller (there are some exceptions,
but lets ignore them for the moment). The hub-and-spoke tunnels between the controller and the
individual access points carry all the user traffic and the controller is doing all the smart forwarding
decisions.
In an OpenFlow-based fabric the controller should do a minimal amount of data-plane decisions
(ideally: none) because every time you have to punt packets to the controller, you reduce the
overall network performance (not to mention the dismal capabilities of todays switches when they
have to do CPU-based packet forwarding across an SSL session).

AMOUNT OF TRAFFIC
Wireless access points handle megabits of traffic, making a hub-and-spoke controller-based
forwarding a viable alternative.

Copyright ipSpace.net 2014

Page 5-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Data center fabrics are usually multi-terabit structures (every single pizza-box ToR switch has over a
terabit of forwarding capacity) three to four orders of magnitude faster than the wireless network
were comparing them with. Controller-based forwarding is totally unrealistic.

FORWARDING INFORMATION
In a traditional controller-based wireless network, the access point forwarding is totally stupid the
access points forward the data between directly connected clients (if allowed to do so) or send the
data received from them into the IP tunnel established with the controller (and vice versa). Theres
no forwarding state to distribute; all an access point needs to know are the MAC addresses of the
wireless clients.
In an OpenFlow-based fabric the controller must distribute as much forwarding, filtering and
rewriting (example: decrease TTL) information as possible to the OpenFlow-enabled switches to
minimize the amount of traffic flowing through the controller.
Furthermore, smart OpenFlow controllers build forwarding information in a way that allows the
switches to cope with the link failures (the controller has to install backup entries with lower
matching priority); you wouldnt want to have an overloaded controller and burnt-out switch CPU
every time a link goes down, network topology is lost, and the switch (in deep panic) forwards all
the traffic to the controller.
The functionality of a good OpenFlow controller that proactively pre-programs backup forwarding
entries (example: NEC ProgrammableFlow) is very similar to MPLS Traffic Engineering with Fast
Reroute; you cannot expect its complexity to be significantly lower than that.

Copyright ipSpace.net 2014

Page 5-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

REAL-TIME EVENTS
User roaming is the only real-time event in a controller-based wireless network (remember: access
point uplink failure is not handled by the controller). Access points do most of the work on their own
(the expected behavior is specified in IEEE standards anyway), and the controller just updates the
MAC forwarding information. The worst thing that can happen if the controller is too slow is a slight
delay experienced by the user (noticeable only on voice calls and by players of WoW sessions
running around large buildings).
The other near-real-time wireless event is user authentication, which often takes seconds (or my
wireless network is severely misconfigured). Yet again, nothing critical; the controller can take its
time.
In data center fabrics, you have to react to a failure in milliseconds and reprogram the forwarding
entries on tens of switches (unless you know what youre doing and already installed the precomputed backup entries see above).

FREQUENCY OF REAL-TIME EVENTS


Wireless controllers probably handle between tens and few hundreds real-time events per second
(unless you had a power glitch and every user wants to log into the network at the same time).
OpenFlow controllers that implement flow-based forwarding (flow entries are downloaded into the
switches for each individual TCP/UDP session a patently bad idea if I ever saw one) are designed
to handle millions of flow setups per second (not that the physical switches could take that load).

Copyright ipSpace.net 2014

Page 5-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

SUMMARY
As you can see, wireless controllers have nothing to do with OpenFlow controllers; they arent even
remotely similar in requirements or complexity (the only exception being OpenFlow controllers that
program just the network edge, like Niciras NVP).
Comparing the two is misleading and hides the real scope of the problem; no wonder some people
would love you to believe otherwise because that makes selling the controller-based fabrics easier.
In reality, an OpenFlow controller managing a physical data center fabric is a complex piece of realtime software, as anyone who tried to build a high-end switch or router has learned the hard way.

Copyright ipSpace.net 2014

Page 5-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Before going into the details of OpenFlow scalability challenges, lets try to estimate the size of the
problem were dealing with.

OPENFLOW AND FERMI ESTIMATES


Fast advances in networking technologies (and the pixie dust sprinkled on them) blinded us we
lost our gut feeling and rule-of-thumb. Guess what: contrary to what we love to believe, networking
isnt unique. Physicists faced the same challenge for a long time; one of them was so good that they
named the whole problem category after him.
Every time someone tries to tell you what your problem is, and how their wonderful new gizmo will
solve it, its time for another Fermi estimate.
Lets start with a few examples.
Data center bandwidth. A few weeks ago a clueless individual working for a major networking
vendor wrote a blog post (which unfortunately got pulled before I could link to it) explaining how
network virtualization differs from server virtualization because we dont have enough bandwidth in
the data center. A quick estimate shows a few ToR switches have all the bandwidth you usually need
(you might need more due to traffic bursts and number of server ports you have to provide, but
thats a different story).
VM mobility for disaster avoidance needs. A back-of-the-napkin calculation shows you cant
evacuate more than half a rack per hour over a 10GE link. The response I usually get when I prod
networking engineers into doing the calculation: OMG, thats just hilarious. Why would anyone want
to do that?

Copyright ipSpace.net 2014

Page 5-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

And now for the real question that triggered this blog post: some people still think we can
implement stateful OpenFlow-based network services (NAT, FW, LB) in hardware. How realistic is
that?
Scenario: web application(s) hosted in a data center with 10GE WAN uplink.
Questions:

How many new sessions are established per second (how many OpenFlow flows does the
controller have to install in the hardware)?

How many parallel sessions will there be (how many OpenFlow flows does the hardware have to
support)?

Facts (these are usually the hardest to find)


1. Size of an average web page is ~1MB
2. An average web page loads in ~5 seconds
3. An average web page uses ~20 domains
4. An average browser can open up to 6 sessions per hostname
Using facts #3 and #4 we can estimate the total number of sessions needed for a single web page.
Its anywhere between 20 and 120, lets be conservative and use 20.
Using fact #1 and the previous result, we can estimate the amount of data transferred over a typical
HTTP session: 50KB.
Assuming a typical web page takes 5 seconds to load, a typical web user receives 200 KB/second
(1.6 mbps) over 20 sessions or 10KB (80 kbps) per session. Seems low, but do remember that most
of the time the browser (or the server) waits due to RTT latency and TCP slow start issues.

Copyright ipSpace.net 2014

Page 5-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Assuming a constant stream of users with these characteristics, we get 125.000 new sessions over a
10GE every 5 seconds or 25.000 new sessions per second per 10Gbps.
Always do a reality check. Is this number realistic? Load balancing vendors support way more
connections per second (cps) @ 10 Gbps speeds. F5 BIG-IP 4000s claims 150K cps @ 10 Gbps, and
VMware claims its NSX Edge Services Router (improved vShield Edge) will support 30K cps @ 4
Gbps. It seems my guestimate is on the lower end of reality (if you have real-life numbers, please
do share them in comments!).
Modern web browsers use persistent HTTP sessions. Browsers want to keep sessions established as
long as possible, web servers serving high-volume content commonly drop them after ~15 seconds
to reduce the server load (Apache is notoriously bad at handling very high number of concurrent
sessions). 25.000 cps x 15 seconds = 375.000 flow records.
Trident-2-based switches can handle 100K+ L4 OpenFlow entries (at least BigSwitch claimed so
when we met @ NFD6). Thats definitely on the low end of the required number of sessions at 10
Gbps; do keep in mind that the total throughput of a typical Trident-2 switch is above 1 Tbps or
three orders of magnitude higher. Enterasys switches support 64M concurrent flows @ 1Tbps, which
seems to be enough.
The flow setup rate on Trident-2-based switches is supposedly still in low thousands, or an order of
magnitude too low to support a single 10 Gbps link (the switches based on this chipset usually have
64 10GE interfaces).
Now is the time for someone to invoke the ultimate Moores Law spell and claim that the hardware
will support whatever number of flow entries in not-so-distant future. Good luck with that; Ill settle
for an Intel Xeon server that can be pushed to 25 mpps. OpenFlow has its uses, but large-scale
stateful services is obviously not one of them.

Copyright ipSpace.net 2014

Page 5-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

State kept by networking devices is obviously one of the factors impacting scalability. Lets see how
much state we might need, how we can reduce the amount of state kept in a device, and how we
can get rid of real-time state changes.

50 SHADES OF STATEFULNESS
A while ago Greg Ferro wrote a great article describing integration of overlay and physical networks
in which he wrote that an overlay network tunnel has no state in the physical network, triggering
an almost-immediate reaction from Marten Terpstra (of RIPE fame, now @ Plexxi) arguing that the
network (at least the first ToR switch) knows the MAC and IP address of hypervisor host and thus
has at least some state associated with the tunnel.
Marten is correct from a purely scholastic perspective (using his argument, the network keeps some
state about TCP sessions as well), but what really matters is how much state is kept, which
device keeps it, how its created and how often it changes.

HOW MUCH STATE DOES A DEVICE KEEP?


The end hosts have to keep state of every single TCP and UDP session, but most transit network
devices (apart from abominations like NAT) dont care about those sessions, making Internet as fast
as it is.

Copyright ipSpace.net 2014

Page 5-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Decades ago we had a truly reliable system that kept session state in every single network
node; it never lost a packet, but it barely coped with 2 Mbps links (the oldtimers might
remember it as X.25).
The state granularity should get ever coarser as you go deeper into the network core edge
switches keep MAC address tables and ARP/ND caches of adjacent end hosts, core routers know
about IP subnets, routers in public Internet know about the publicly advertised prefixes (including
every prefix Bell South ever assigned to one of its single-homed customers), while the high-speed
MPLS routers know about BGP next hops and other forwarding equivalence classes (FECs)

WHICH DEVICE KEEPS THE STATE


Well-designed architecture has complexity (and state) concentrated at the network edge. The core
devices keep minimum state (example: IP subnets), while the edge devices keep session state. In a
virtual network case, the hypervisors should know the VM endpoints (MAC addresses, IP addresses,
virtual segments) and the physical devices just the hypervisor IP address, not the other way round.
Furthermore, as much state as possible should be stored in low-speed devices using software-based
forwarding. Its pretty simple to store a million flows in software-based Open vSwitch (updating
them is a different story) and mission-impossible to store 10.000 5-tuple flows in Trident 2 chipset
used by most ToR switches.

Copyright ipSpace.net 2014

Page 5-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

HOW IS STATE CREATED


Systems with control-plane (proactive) state creation (example: routing table built from routing
protocol information) are always more scalable than systems that have to react to data-plane events
in real time (example: MAC address learning or NAT table maintenance).
Data-plane-driven state is particularly problematic for devices with hardware forwarding packets
that change state (example: TCP SYN packets creating new NAT translation) might have to be
punted to the CPU.
Finally, theres the soft state cases where the protocol designers needed state in the network,
but didnt want to create a proper protocol to maintain it, so the end devices get burdened with
periodic state refresh messages, and the transit devices spend CPU cycles refreshing the state. RSVP
is a typical example, and everyone running large-scale MPLS/TE networks simply loves the periodic
refresh messages sent by tunnel head-ends they keep the core routers processing them cozily
warm.

HOW OFTEN DOES STATE CHANGE


Devices with slow-changing state (example: BGP routers) are clearly more stable than devices with
fast-changing state (example: Carrier-Grade NAT).

Copyright ipSpace.net 2014

Page 5-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

SUMMARY
Whenever youre evaluating a network architecture or reading a vendor whitepaper describing nextgeneration unicorn-tears-blessed solution, try to identify how much state individual components
keep, how its created and how often it changes. Hardware devices storing plenty of state tend to be
complex and expensive (keep that in mind when evaluating the next application-aware fabric).
Not surprisingly, RFC 3429 (Some Internet Architectural Guidelines and Philosophy) gives
you similar advice, although in way more eloquent form.

Copyright ipSpace.net 2014

Page 5-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In the initial What is OpenFlow blog post I mentioned multi-table support and why its crucial to
scalable OpenFlow implementation. It took me almost two years to write a follow-up blog post
explaining the scalability problems of OpenFlow 1.0.

FLOW TABLE EXPLOSION WITH OPENFLOW 1.0 (AND


WHY WE NEED OPENFLOW 1.3)
The number of flows in hardware switches (dictated by the underlying TCAM size) is one of the
major roadblocks in a large-scale OpenFlow deployment. Vendors are supposedly making progress,
with Intel claiming up to 4000 12-tuple flow entries in their new Ethernet Switch FM6700 series. Is
that good enough? As always, it depends.
First, lets put the 4000 flows number in perspective. Its definitely a bit better than what current
commodity switches can do (for vendors trying to keep mum about their OpenFlow limitations,
check their ACL sizes flow entries would use the same TCAM), but NEC had 64.000+ flows on the
PF5240 years ago and Enterasys has 64 million flows per box with their CoreFlow2 technology.
Judge for yourself whether 4000 flows is such a major step forward.
Now lets focus on whether 4000 flows is enough. As always, the answer depends on the use case,
network size and implementation details. This blog post will focus on the last part.

Copyright ipSpace.net 2014

Page 5-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

USE CASE: DATA CENTER FABRIC


The simplest possible data center use case is a traditional (non-virtualized) data center network
implemented with OpenFlow (similar to what NEC is doing with their Virtual Tenant Networks).
The OpenFlow-based network trying to get feature parity with low-cost traditional ToR switches
should support

Layer-2 and layer-3 forwarding;

Per-port or per-MAC ingress and egress access lists.

Well focus on a single layer-2 segment (you really dont want to get me started on the complexities
of scalable OpenFlow-based layer-3 forwarding) implemented on a single hardware switch. Our
segment will have two web servers (port 1 and 2), a MySQL server (port 3), and a default gateway
on port 4.
The default gateway could be a firewall, a router, or a load balancer it really doesnt
matter if we stay focused on layer-2 forwarding.

STEP 1: SIMPLE MAC-BASED FORWARDING


The OpenFlow controller has to install a few forwarding rules in the switch to get the traffic started.
Ignoring the multi-tenancy requirements you need a single flow forwarding rule per destination MAC
address:

Copyright ipSpace.net 2014

Page 5-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Flow match

Action

DMAC = Web-1

Forward to port 1

DMAC = Web-2

Forward to port 2

DMAC = MYSQL-1

Forward to port 3

DMAC = GW

Forward to port 4

It seems we dont need much TCAM:

=
Smart switches wouldnt store the MAC-only flow rules in TCAM; they would use other
forwarding structures available in the switch like MAC hash tables.

STEP 2: MULTI-TENANT INFRASTRUCTURE


If you want to implement multi-tenancy, you need multiple forwarding tables (like VRFs), which are
not available in OpenFlow 1.0, or you have to add the tenant ID to the existing forwarding table.
Traditional switches would do it in two steps:

Mark inbound packets with VLAN tags;

Perform packet forwarding based on destination MAC address and VLAN tag.

Copyright ipSpace.net 2014

Page 5-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Switches using OpenFlow 1.0 forwarding model cannot perform more than one operation during the
packet forwarding process they must match the input port and destination MAC address in a single
flow rule, resulting in a flow table similar to this one:
Flow match

Action

SrcPort = Port 2, DMAC = Web-1

Forward to port 1

SrcPort = Port 3, DMAC = Web-1

Forward to port 1

SrcPort = Port 4, DMAC = Web-1

Forward to port 1

SrcPort = Port 1, DMAC = Web-2

Forward to port 2

SrcPort = Port 3, DMAC = Web-2

Forward to port 2

SrcPort = Port 4, DMAC = Web-2

Forward to port 2

The number of TCAM entries needed to support multi-tenant layer-2 forwarding has exploded:

Copyright ipSpace.net 2014

Page 5-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

STEP 3: ACCESS LISTS


Lets assume we want to protect the web servers with an input (server-to-switch) port ACL, which
would look similar to this one:
Flow match

Action

TCP SRC = 80

Permit

TCP SRC = 443

Permit

TCP DST = 53 & IP DST = DNS

Permit

TCP DST = 25 & IP DST = Mail

Permit

TCP DST = 3306 & IP DST = MySql

Permit

Anything else

Drop

By now youve probably realized what happens when you try to combine the input ACL with other
forwarding rules. The OpenFlow controller has to generate a Cartesian product of all three
requirements: the switch needs a flow entry for every possible combination of input port, ACL entry
and destination MAC address.

Copyright ipSpace.net 2014

Page 5-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OPENFLOW 1.3 TO THE RESCUE


Is the situation really as hopeless as illustrated above? Of course not smart people trying to
implement real-life OpenFlow solutions quickly realized OpenFlow 1.0 works well only in PPT, lab
tests, PoCs and glitzy demos, and started working on a solution.
OpenFlow 1.1 (and later versions) have a concept of tables - independent ookup tables that can be
chained in any way you wish (further complicating the life of hardware vendors).
This is how you could implement our requirements with switches supporting OpenFlow 1.3:

Table #1 ACL and tenant classification table. This table would match input ports (for tenant
classification) and ACL entries, drop the packets not matched by input ACLs, and redirect the
forwarding logic to correct per-tenant table.

Table #2 .. #n per-tenant forwarding tables, matching destination MAC addresses and


specifying output ports.
The first table could be further optimized in networks using the same (overly long) access
list on numerous ports. That decision could also be made dynamically by the OpenFlow
controller.

A typical switch would probably have to implement the first table with a TCAM. All the other tables
could use the regular MAC forwarding logic (MAC forwarding table is usually orders of magnitude
bigger than TCAM). Scalability problem solved.
Summary: Buy switches and controllers that support OpenFlow 1.3

Copyright ipSpace.net 2014

Page 5-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

BUT THERE ARE NO OPENFLOW 1.3-COMPLIANT SWITCHES ON THE


MARKET
Not true anymore. NEC is shipping OpenFlow 1.3 on their ProgrammableFlow switches, as does HP
on its 5900- and 12500-series switches.

CAN WE STILL USE OPENFLOW 1.0 SWITCHES?


Of course you can - either make sure the use case is small enough so the Cartesian product of your
independent requirements fits into existing TCAM, or figure out which vendors have table-like
extensions to OpenFlow 1.0 (hint: NEC does, or their VTN wouldnt work in real-life networks).

Copyright ipSpace.net 2014

Page 5-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

After you spend a few minutes researching the data sheets of existing OpenFlow-capable switches
from major networking vendors it becomes painfully obvious that flow-based forwarding makes no
sense on hardware switching platforms. Surprisingly, the virtual switches arent much better.

FLOW-BASED FORWARDING DOESNT WORK WELL IN


VIRTUAL SWITCHES
I hope its obvious to everyone by now that flow-based forwarding doesnt work well in existing
hardware. Switches designed for large number of flow-like forwarding entries (NEC
ProgrammableFlow switches, Enterasys data center switches and a few others) might be an
exception, but even they cant cope with the tremendous flow update rate required by reactive
(flow-by-flow) flow setup ideas.
One would expect virtual switches to fare better. That doesnt seem to be the case.

A FEW DEFINITIONS FIRST


Flow-based forwarding is sometimes defined as forwarding of individual transport-layer sessions
(sometimes also called microflows). Numerous failed technologies are a pretty good proof that this
approach doesnt scale.
Other people define flow-based forwarding as anything that is not destination-address-only
forwarding. I dont really understand how this definition differs from MPLS Forwarding Equivalence
Class (FEC) and why we need a new confusing term.

Copyright ipSpace.net 2014

Page 5-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

MICROFLOW FORWARDING IN OPEN VSWITCH


Initial versions of Open vSwitch were a prime example of ideal microflow-based forwarding
architecture: in-kernel forwarding module performed microflow forwarding and punted all unknown
packets to the user-mode daemon.
The user-mode daemon would then perform packet lookup (using OpenFlow forwarding entries or
any other forwarding algorithm) and install a microflow entry for the newly discovered flow in the
kernel module.
Third parties (example: Midokura Midonet) use Open vSwitch kernel module in combination
with their own user-mode agent to implement non-OpenFlow forwarding architectures.

If youre old enough to remember the Catalyst 5000, youre probably getting unpleasant flashbacks
of Netflow switching but the problems we experienced with that solution must have been caused
by poor hardware and underperforming CPU, right? Well, it turns out virtual switches dont fare
much better.
Digging deep into the bowels of Open vSwitch reveals an interesting behavior: flow eviction. Once
the kernel module hits the maximum number of microflows, it starts throwing out old flows. Makes
perfect sense after all, thats how every caching system works until you realize the default limit
is 2500 microflows, which is barely good enough for a single web server and definitely orders of
magnitude too low for a hypervisor hosting 50 or 100 virtual machines.

Copyright ipSpace.net 2014

Page 5-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHY, OH WHY?
The very small microflow cache size doesnt make any obvious sense. After all, web servers easily
handle 10.000 sessions and some Linux-based load balancers handle an order of magnitude more
sessions per server. While you can increase the default cache size, ones bound to wonder what the
reason for the dismally low default value is.
I wasnt able to figure out what the underlying root cause is, but Im suspecting it has to do with
per-flow accounting flow counters have to be transferred from the kernel module to the user-mode
daemon periodically. Copying hundreds of thousands of flow counters over a user-to-kernel socket
at short intervals might result in somewhat noticeable CPU utilization.

HOW CAN YOU FIX IT?


Isnt it obvious? You drop the whole notion of microflow-based forwarding and do things the
traditional way. OVS moved in this direction with release 1.11 which implemented megaflows
(coarser OpenFlow-like forwarding entries) in kernel module, and moved flow eviction from kernel to
user-mode OpenFlow agent (which makes perfect sense as kernel forwarding entries almost exactly
match user-mode OpenFlow entries).
Not surprisingly, no other virtual switch uses microflow-based forwarding. VMware vSwitch, Ciscos
Nexus 1000V and IBMs 5000V make forwarding decisions based on destination MAC addresses,
Hyper-V and Contrail based on destination IP addresses, and even VMware NSX for vSphere uses
distributed vSwitch and in-kernel layer-3 forwarding module.

Copyright ipSpace.net 2014

Page 5-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

After establishing the size of the problem, lets move forward to the first scalability obstacle
controller-based packet forwarding. The review of an existing network platform behavior (Cisco IOS)
might help you understand the challenges of large-scale OpenFlow implementations.

PROCESS, FAST AND CEF SWITCHING AND PACKET


PUNTING
Process switching is the oldest, simplest and slowest packet forwarding mechanism in Cisco IOS.
Packets received on an interface trigger an interrupt, the interrupt handler identifies the layer-3
protocol based on layer-2 packet headers (example: Ethertype in Ethernet packets) and queues the
packets to (user mode) packet forwarding processes (IP Input and IPv6 Input processes in Cisco
IOS).
Once the input queue of a packet forwarding process becomes non-empty, the operating system
schedules it. When there are no higher-priority processes ready to be run, the operating system
performs a context switch to the packet forwarding process.
When the packet forwarding process wakes up, it reads the next entry from its input queue,
performs destination address lookup and numerous other functions that might be configured on
input and output interfaces (NAT, ACL ...), and sends the packet to the output interface queue.
Not surprisingly, this mechanism is exceedingly slow ... and Cisco IOS is not the only operating
system struggling with that just ask anyone who tried to run high-speed VPN tunnels implemented
in Linux user mode processes on SOHO routers.

Copyright ipSpace.net 2014

Page 5-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Interrupt switching (packet forwarding within the interrupt handler) is much faster as it doesnt
involve context switching and potential process preemption. Theres a gotcha, though if you spend
too much time in an interrupt handler, the device becomes non-responsive, starts adding
unnecessary latency to forwarded packets, and eventually starts dropping packets due to receive
queue overflows (You dont believe me? Configure debug all on the console interface of a Cisco
router).
Theres not much you can do to speed up ACLs (which have to be read sequentially) and NAT is
usually not a big deal (assuming the programmers were smart enough to use hash tables).
Destination address lookup might be a real problem, more so if you have to do it numerous times
(example: destination is a BGP route with BGP next hop based on static route with next hop learnt
from OSPF). Welcome to fast switching.
Fast switching is a reactive cache-based IP forwarding mechanism. The address lookup within the
interrupt handler uses a cache of destinations to find the IP next hop, outgoing interface, and
outbound layer-2 header. If the destination is not found in the fast switching cache, the packet is
punted to the IP(v6) Input process, which eventually performs full-blown destination address lookup
(including ARP/ND resolution) and stores the results in the fast switching cache.
Fast switching worked great two decades ago (there were even hardware implementations of fast
switching) ... until the bad guys started spraying the Internet with vulnerability scans. No caching
code works well with miss rates approaching 100% (because every packet is sent to a different
destination) and very high cache churn (because nobody designed the cache to have 100.000 or
more entries).
When faced with a simple host scanning activity, routers using fast switching in combination with
high number of IP routes (read: Internet core routers) experienced severe brownouts because most

Copyright ipSpace.net 2014

Page 5-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

of the received packets had destination addresses that were not yet in the fast switching cache, and
so the packets had to be punted to process switching. Welcome to CEF switching.
CEF switching (or Cisco Express Forwarding) is a proactive, deterministic IP forwarding mechanism.
Routing table (RIB) as computed by routing protocols is copied into forwarding table (FIB), where
its combined with adjacency information (ARP or ND table) to form a deterministic lookup table.
When a router uses CEF switching, theres (almost) no need to punt packets sent to unknown
destinations to IP Input process; if a destination is not in the FIB, it does not exist.
There are still cases where CEF switching cannot do its job. For example, packets sent to IP
addresses on directly connected interfaces cannot be sent to destination hosts until the router
performs ARP/ND MAC address resolution; these packets have to be sent to the IP Input process.
The directly connected prefixes are thus entered as glean adjacencies in the FIB, and as the router
learns MAC address of the target host (through ARP or ND reply), it creates a dynamic host route in
the FIB pointing to the adjacency entry for the newly-discovered directly-connected host.
Actually, you wouldnt want to send too many packets to the IP Input process; its better to create
the host route in the FIB (pointing to the bit bucket, /dev/null or something equivalent) even before
the ARP/ND reply is received to ensure subsequent packets sent to the same destination are
dropped, not punted behavior nicely exploitable by ND exhaustion attack.
Its pretty obvious that the CEF table must stay current. For example, if the adjacency information is
lost (due to ARP/ND aging), the packets sent to that destination are yet again punted to the process
switching. No wonder the router periodically refreshes ARP entries to ensure they never expire.

Copyright ipSpace.net 2014

Page 5-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Controller-based packet forwarding in an OpenFlow implementation is almost exactly like process


switching in Cisco IOS. Here are the details:

CONTROLLER-BASED PACKET FORWARDING IN


OPENFLOW NETWORKS
One of the attendees of the ProgrammableFlow webinar sent me an interesting observation:
Though there is separate control plane and separate data plane, it appears that there is
crossover from one to the other. Consider the scenario when flow tables are not
programmed and so the packets will be punted by the ingress switch to PFC. The PFC will
then forward these packets to the egress switch so that the initial packets are not
dropped. So in some sense: we are seeing packet traversing the boundaries of typical
data-plane and control-plane and vice-versa.
Hes absolutely right, and if the above description reminds you of fast and process switching youre
spot on. There really is nothing new under the sun.
OpenFlow controllers use one of the following two approaches to switch programming (more details
@ NetworkStatic):

Proactive flow table setup, where the controller downloads flow entries into the switches based
on user configuration (ex: ports, VLANs, subnets, ACLs) and network topology;

Copyright ipSpace.net 2014

Page 5-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Reactive flow table setup (or flow-driven forwarding), where the controller downloads flow
entries into the switches based on the unknown traffic the OpenFlow switches forward to the
controller.

Even though I write about flow tables, dont confuse them with per-flow forwarding that Doug
Gourlay loves almost as much as I do. A flow entry might match solely on destination MAC address,
making flow tables equivalent to MAC address tables, or it might match the destination IP address
with the longest IP prefix in the flow table, making the flow table equivalent to routing table or FIB.
The controller must know the topology of the network and all the endpoint addresses (MAC
addresses, IP addresses or IP subnets) for the proactive (predictive?) flow setup to work. If youd
have an OpenFlow controller emulating OSPF or BGP router, it would be easy to use proactive flow
setup; after all, the IP routes never change based on the application traffic observed by the
switches.
Intra-subnet L3 forwarding is already a different beast. One could declare ARP/ND to be an
authoritative control-plane protocol (please dont get me started on the shortcomings of ARP and
whether ES-IS would be a better solution) in which case you could use proactive flow setup to create
host routes toward IP hosts (using an approach similar to Mobile ARP what did I just say about
nothing being really new?).
However, most vendors marketing departments (with a few notable exceptions) think their gear
needs to support every bridging-abusing stupidity ever invented, from load balancing schemes that
work best with hubs to floating IP or MAC addresses used to implement high-availability solutions.
End result: the network has to support dynamic MAC learning, which makes OpenFlow-based
networks reactive nobody can predict when and where a new MAC address will appear (and its not
guaranteed that the first packet sent from the new MAC address will be an ARP packet), so the

Copyright ipSpace.net 2014

Page 5-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

switches have to send user traffic with unknown source or destination MAC addresses to the
controller, and were back to packet punting.
Some bridges (lovingly called layer-2 switches) dont punt packets with unknown MAC addresses to
the CPU, but perform dynamic MAC address learning and unknown unicast flooding is in hardware...
but thats not how OpenFlow is supposed to work.
Within a single device the software punts packet from hardware (or interrupt) switching to
CPU/process switching, in a controller-based network the switches punt packet to the controller. Plus
a change, plus c'est la mme chose.

Copyright ipSpace.net 2014

Page 5-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Packets punted to the controller from the data plane of an OpenFlow switch represent a significant
burden on the switch CPU. Large amount of punted packets (triggered, for example, by an address
scan) can easily result in a denial-of-service attack. Its time to reinvent another wheel: controlplane policing (CoPP).

CONTROL-PLANE POLICING IN OPENFLOW NETWORKS


The Controller-Based Packet Forwarding in OpenFlow Networks post generated the obvious question:
does that mean we need some kind of Control-Plane Protection (CoPP) in OpenFlow controller? Of
course it does, but things arent as simple as that.
The weakest link in todays OpenFlow implementations (like NECs ProgrammableFlow) is not the
controller, but the dismal CPU used in the hardware switches. The controller could handle millions
packets per second (thats the flow setup rate claimed by Floodlight developers), the switches
usually burn out at thousands of flow setups per second.
The CoPP function thus has to be implemented in the OpenFlow switches (like its implemented in
linecard hardware in traditional switches), and thats where the problems start OpenFlow doesnt
have a usable rate-limiting functionality till version 1.3, which added meters.
OpenFlow meters are a really cool concept they have multiple bands, and you can apply either
DSCP remarking or packet dropping at each band that would allow an OpenFlow controller to
closely mimic the CoPP functionality and apply different rate limits to different types of control- or
punted traffic.

Copyright ipSpace.net 2014

Page 5-32

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Unfortunately, only a few hardware switches available on the market supports OpenFlow 1.3 yet,
and some of them might not support meters (or meters on flows sent to the controller). In the
meantime, proprietary extensions galore NEC used one to limit unicast flooding in its
ProgrammableFlow switches.

Copyright ipSpace.net 2014

Page 5-33

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Time to move forward to another scalability roadblock: the number of flows you can install in a
hardware device per second. This limitation has nothing to do with OpenFlow; the choke point is the
communication path between the switch CPU and the forwarding hardware. Traditional switches and
routers had the same problems and solved them with Prefix Independent Convergence.

PREFIX-INDEPENDENT CONVERGENCE (PIC): FIXING


THE FIB BOTTLENECK
Did you rush to try OSPF Loop Free Alternate on a Cisco 7200 after reading my LFA blog post ... and
disappointedly discovered that it only works on Cisco 7600? The reason is simple: while LFA does
add feasible-successor-like behavior to OSPF, its primary mission is to improve RIB-to-FIB
convergence time.
If you want to know more details, I would strongly suggest you browse through the IP Fast Reroute
Applicability presentation Pierre Francois had @ EuroNOG 2011. To summarize what he told us:

Its relatively easy to fine-tune OSPF or IS-IS and get convergence times in tens of milliseconds.
SPF runs reasonably fast on modern processors, more so with incremental SPF optimizations.

A platform using software-based switching can use the SPF results immediately (thus theres no
real need for LFA on a Cisco 7200).

The true bottleneck is the process of updating distributed forwarding tables (FIBs) from the IP
routing table (RIB) on platforms that use hardware switching. That operation can take a
relatively long time if you have to update many prefixes.

Copyright ipSpace.net 2014

Page 5-34

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The generic optimization of the RIB-to-FIB update process is known as Prefix-Independent


Convergence (PIC) if the routing protocols can pre-compute alternate paths, suitably designed FIB
can use that information to cache alternate next hops. Updating such a FIB no longer involves
numerous updates to individual prefixes; you have to change only the next hop reachability
information.
PIC was first implemented for BGP (you can find more details, including interesting discussions of
FIB architectures, in another presentation Pierre Francois had @ EuroNOG), which usually carries
hundreds of thousands of prefixes that point to a few tens of different next hops. It seems some
Service Providers carry way too many routes in OSPF or IS-IS, so it made sense to implement LFA
for those routing protocols as well.
In its simplest form, BGP PIC goes a bit beyond exiting EBGP/IBGP multipathing and copies
backup path information into RIB and FIB. Distributing alternate paths throughout the
network requires numerous additional tweaks, from modified BGP path propagation rules to
modified BGP route reflector behavior.

Copyright ipSpace.net 2014

Page 5-35

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Adding support for OpenFlow on an existing switch doesnt change the underlying hardware.
OpenFlow agent on a hardware device has to deal with the same challenges as the traditional
control-plane software.

FIB UPDATE CHALLENGES IN OPENFLOW NETWORKS


Last week I described the problems high-end service provider routers (or layer-3 switches if you
prefer that terminology) face when they have to update large number of entries in the forwarding
tables (FIBs). Will these problems go away when we introduce OpenFlow into our networks?
Absolutely not, OpenFlow is just another mechanism to download forwarding entries (this time from
an external controller) not a laws-of-physics-changing miracle.
NEC, the only company Im aware of that has production-grade OpenFlow deployments and is willing
to talk about them admitted as much in their Networking Tech Field Day 2 presentation (watch the
ProgrammableFlow Architecture and Use Cases video around 12:00). Their particular
controller/switch combo can set up 600-1000 flows per switch per second (which is still way better
than what researchers using HP switches found and documented in the DevoFlow paper they found
the switches can set up ~275 flows per second).
Now imagine a core of a simple L2 network built from tens of switches and connecting hundreds of
servers and thousands of VMs. Using traditional L2 forwarding techniques, each switch would have
to know the MAC address of each VM ... and the core switches would have to update thousands of
entries after a link failure, resulting in multi-second convergence time. Obviously OpenFlow-based
networks need prefix-independent convergence (PIC) as badly as anyone else.

Copyright ipSpace.net 2014

Page 5-36

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 5-1: Core link failure in an OpenFlow network

OpenFlow 1.0 could use flow matching priorities to implement primary/backup forwarding entries
and OpenFlow 1.1 provides a fast failover mechanism in its group tables that could be used for
prefix-independent convergence, but it's questionable how far you can get with existing hardware
devices, and PIC doesn't work in all topologies anyway.
Just in case youre wondering how existing L2 networks work at all data plane in highspeed switches performs dynamic MAC learning and populates the forwarding table in
hardware; the communication between the control and the data plane is limited to the bare
minimum (which is another reason why implementing OpenFlow agents on existing switches
is like attaching a jetpack to a camel).

Copyright ipSpace.net 2014

Page 5-37

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Is there another option? Sure its called forwarding state abstraction, or for those more familiar
with MPLS terminology Forwarding Equivalence Class (FEC). While you might have thousands of
servers or VMs in your network, you have only hundreds of possible paths between switches. The
trick every single OpenFlow controller vendor has to use is to replace endpoint-based forwarding
entries in the core switches with path-indicating forwarding entries. Welcome back to virtual circuits
and BGP-free MPLS core. Its amazing how the old tricks keep resurfacing in new disguises every
few years.

Copyright ipSpace.net 2014

Page 5-38

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Forwarding state abstraction (known as Forwarding Equivalence Classes in MPLS lingo) is the only
way toward scalable OpenFlow fabrics. The following blog post (written in February 2012) has some
of the details:

FORWARDING STATE ABSTRACTION WITH TUNNELING


AND LABELING
Yesterday I described how the limited flow setup rates offered by most commercially-available
switches force the developers of production-grade OpenFlow controllers to drop the microflow ideas
and focus on state abstraction (people living in a dreamland usually go in a totally opposite
direction). Before going into OpenFlow-specific details, lets review the existing forwarding state
abstraction technologies.

A MOSTLY THEORETICAL DETOUR


Most forwarding state abstraction solutions that Im aware of use a variant of Forwarding
Equivalence Class (FEC) concept from MPLS:

All the traffic that expects the same forwarding behavior gets the same label;

The intermediate nodes no longer have to inspect the individual packet/frame headers; they
forward the traffic solely based on the FEC indicated by the label.

The grouping/labeling operation thus greatly reduces the forwarding state in the core nodes (you
can call them P-routers, backbone bridges, or whatever other terminology you prefer) and improves

Copyright ipSpace.net 2014

Page 5-39

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

the core network convergence due to significantly reduced number of forwarding entries in the core
nodes.

Figure 5-2: MPLS forwarding diagram from the Enterprise MPLS/VPN Deployment webinar

The core network convergence is improved due to reduced state not due to pre-computed
alternate paths that Prefix-Independent Convergence or MPLS Fast Reroute uses.

FROM THEORY TO PRACTICE


There are two well-known techniques you can use to transport traffic grouped in a FEC across the
network core: tunneling and virtual circuits (or Label Switched Paths if you want to use non-ITU
terminology).

Copyright ipSpace.net 2014

Page 5-40

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

When you use tunneling, the FEC is the tunnel endpoint all traffic going to the same tunnel
egress node uses the same tunnel destination address.
All sorts of tunneling mechanisms have been proposed to scale layer-2 broadcast domains and
virtualized networks (IP-based layer-3 networks scale way better by design):

Provider Backbone Bridges (PBB 802.1ah), Shortest Path Bridging-MAC (SPBM 802.1aq) and
vCDNI use MAC-in-MAC tunneling the destination MAC address used to forward user traffic
across the network core is the egress bridge or the destination physical server (for vCDNI).

Figure 5-3: SPBM forwarding diagram from the Data Center 3.0 for Networking Engineers webinar

VXLAN, NVGRE and GRE (used by Open vSwitch) use MAC-over-IP tunneling, which scales way
better than MAC-over-MAC tunneling because the core switches can do another layer of state
abstraction (subnet-based forwarding and IP prefix aggregation).

Copyright ipSpace.net 2014

Page 5-41

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 5-4: Typical VXLAN architecture from the Introduction to Virtual Networking webinar

TRILL is closer to VXLAN/NVGRE than to SPB/vCDNI as it uses full L3 tunneling between TRILL
endpoints with L3 forwarding inside RBridges and L2 forwarding between RBridges.

Figure 5-5: TRILL forwarding diagram from the Data Center 3.0 for Networking Engineers webinar

Copyright ipSpace.net 2014

Page 5-42

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

With tagging or labeling a short tag is attached in front of the data (ATM VPI/VCI, MPLS label
stack on point-to-point links) or somewhere in the header (VLAN tags) instead of encapsulating the
users data into a full L2/L3 header. The core network devices perform packet/frame forwarding
based exclusively on the tags. Thats how SPBV, MPLS and ATM work.

Figure 5-6: MPLS-over-Ethernet frame format from the Enterprise MPLS/VPN Deployment webinar

MPLS-over-Ethernet commonly used in todays high-speed networks is an abomination as it


uses both L2 tunneling between adjacent LSRs and labeling ... but thats what you get when
you have to reuse existing hardware to support new technologies.

Copyright ipSpace.net 2014

Page 5-43

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

A few months after I wrote the Forwarding State Abstraction blog post, Martin Casado and his
team presented an article with similar ideas at the HotSDN conference. Heres my summary of that
article (written in August 2012):

EDGE AND CORE OPENFLOW (AND WHY MPLS IS NOT


NAT)
More than a year ago, I explained why end-to-end flow-based forwarding doesnt scale (and Doug
Gourlay did the same using way more colorful language) and what the real-life limitations are. Not
surprisingly, the gurus that started the whole OpenFlow movement came to the same conclusions
and presented them at the HotSDN conference in August 2012 ... but even that hasnt stopped some
people from evangelizing the second coming.

THE PROBLEM
Contrary to what some pundits claim, flow-based forwarding will never scale. If youve been around
long enough to experience ATM-to-the-desktop failure, Multi-Layer Switching (MLS) kludges, demise
of end-to-end X.25, or the cost of traditional circuit switching telephony, you know what Im talking
about. If not, supposedly its best to learn from your own mistakes be my guest.
Before someone starts Moore Law incantations: software-based forwarding will always be more
expensive than predefined hardware-based forwarding. Yes, you can push tens of gigabits through a
highly optimized multi-core Intel server. You can also push 1,2Tbps through Broadcom chipset at

Copyright ipSpace.net 2014

Page 5-44

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

comparable price. The ratios havent changed much in the last decades, and I dont expect them to
change in the near future.

SCALABLE ARCHITECTURES
The scalability challenges of flow-based forwarding have been well understood (at least within IETF,
ITU is living on a different planet) decades ago. Thats why we have destination-only forwarding,
variable-length subnet masks and summarization, and Diffserv (with a limited number of traffic
classes) instead of Intserv (with per-flow QoS).
The limitations of destination-only hop-by-hop forwarding were also well understood for at least two
decades and resulted in MPLS architecture and various MPLS-based applications (including MPLS
Traffic Engineering).
Theres a huge difference between MPLS TE forwarding mechanism (which is the right tool for the
job), and distributed MPLS TE control plane (which sucks big time). Traffic engineering is ultimately
an NP-complete knapsack problem best solved with centralized end-to-end visibility.
MPLS architecture solves the forwarding rigidity problems while maintaining core network scalability
by recognizing that while each flow might be special, numerous flows share the same forwarding
behavior.
Edge MPLS routers (edge LSR) thus sort the incoming packets into forwarding equivalence classes
(FEC), and use a different Label Switched Path (LSP) across the network for each of the forwarding
classes.

Copyright ipSpace.net 2014

Page 5-45

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Please note that this is a gross oversimplification. Im trying to explain the fundamentals,
and (following a great example of physicists) ignore all the details... oops, take the ideal
case.
The simplest classification implemented in all MPLS-capable devices today is destination prefix-based
classification (equivalent to traditional IP forwarding), but theres nothing in MPLS architecture that
would prevent you from using N-tuples to classify the traffic based on source addresses, port
numbers, or any other packet attribute (yet again, ignoring the reality of having to use PBR with the
infinitely disgusting route-map CLI to achieve that).

MPLS IS JUST A TOOL


Always keep in mind that every single network technology is a tool, not a solution (some of them
might be solutions looking for a problem, but thats another story), and some tools are more useful
in some scenarios than others ... which still doesnt make them good or bad, but applicable or
inapplicable.
Also, after more than a decade of tinkering, the vendor MPLS implementations leave a lot to be
desired. If you hate a particular vendors CLI or implementation kludges, blame them, not the
technology.

Copyright ipSpace.net 2014

Page 5-46

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

EDGE AND CORE OPENFLOW


After this short MPLS digression, lets come back to the headline topic. Large-scale OpenFlow-based
solutions face two significant challenges:

Its hard to build resilient networks with centralized control plane and unreliable transport
between the controller and controlled devices (this problem was well known in the days of Frame
Relay and ATM);

You must introduce layers of abstraction in order to scale the network.

Martin Casado, Teemu Koponen, Scott Shenker and Amin Tootoonchian addressed the second
challenge in their Fabric: A Retrospective on Evolving SDN paper, where they propose two layers in
an SDN architectural framework:

Edge switches, which classify the packets, perform network services, and send the packets
across core fabric toward the egress edge switch;

Core fabric, which provides end-to-end transport.

Not surprisingly, theyre also proposing to use MPLS labels as the fabric forwarding mechanism.

WHERES THE BEEF?


The fundamental difference between typical MPLS networks we have today and the SDN Fabric
proposed by Martin Casado et al. is the edge switch control/management plane: FEC classification is
downloaded into the edge switches through OpenFlow (or some similar mechanism).

Copyright ipSpace.net 2014

Page 5-47

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Existing MPLS implementations or protocols have no equivalent mechanism, and a mechanism for a
consistent implementation of a distributed network edge policy would be highly welcome (all of my
enterprise OpenFlow use cases fall into this category).

FINALLY, IS MPLS NAT?


Now that weve covered MPLS fundamentals, I have to mention another pet peeve that annoys me:
lets see why its ridiculous to compare MPLS to NAT.
As explained above, MPLS edge routers classify ingress packets into FECs, and attach a label
signifying the desired treatment to each of the packet. The original packet is not changed in any
way; any intermediate node can get the raw packet content if needed.
NAT, on the other hand, always changes the packet content (at least the layer-3 addresses,
sometimes also layer-4 port numbers), or it wouldnt be NAT.
NAT breaks transparent end-to-end connectivity, MPLS doesnt. MPLS is similar to lossless
compression (ZIP), NAT is similar to lossy compression (JPEG). Do I need to say more?

Copyright ipSpace.net 2014

Page 5-48

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Using Forwarding Equivalence Classes (FECs) and path-based forwarding in an OpenFlow network
results in another simplification: core switches dont have to support the same rich functionality as
the edge switches.

EDGE PROTOCOL INDEPENDENCE: ANOTHER BENEFIT


OF EDGE-AND-CORE LAYERING
I asked Martin Casado to check whether I correctly described his HotSDN12 paper in my Edge and
Core OpenFlow post, and he replied with another interesting observation:
The (somewhat nuanced) issue I would raise is that [...] decoupling [also] allows
evolving the edge and core separately. Today, changing the edge addressing scheme
requires a wholesale upgrade to the core.
The 6PE architecture (IPv6 on the edge, MPLS in the core) is a perfect example of this concept.

WHY DOES IT MATTER?


Traditional scalable network designs always have at least two layers: access or aggregation layer,
where most of the network services are performed, and core layer, that provides high-speed
transport across a stable network core.
In IP-only networks, the core and access routers (aka layer-3 switches) share the same forwarding
mechanism (ignoring the option of having default routing in the access layer); if you want to

Copyright ipSpace.net 2014

Page 5-49

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

introduce a new protocol (example: IPv6) you have to deploy it on every single router throughout
the network, including all core routers.
On the other hand, you can introduce IPv6, IPX or AppleTalk (not really), or anything else in an
MPLS network, without upgrading the core routers. The core routers continue to provide a single
function: optimal transport based on MPLS paths signaled by the edge routers (either through LDP,
MPLS-TE, MPLS-TP or more creative approaches, including NETCONF-configured static MPLS labels).
The same ideas apply to OpenFlow-configured networks. The edge devices have to be smart and
support a rich set of flow matching and manipulation functionality; the core (fabric) devices have to
match on simple packet tags (VLAN tags, MAC addresses with PBB encapsulation, MPLS tags ...) and
provide fast packet forwarding.

IS THIS AN IVORY TOWER DREAM?


Apart from MPLS, there are several real-life SDN implementations of this concept:

Niciras NVP is providing virtual networking functionality in OpenFlow-controlled hypervisor


switches that use simple IP transport (with STT or GRE encapsulation) across the network core;

Microsofts Hyper-V Network Virtualization uses a similar architecture with PowerShell instead of
OpenFlow/OVSDB as the hypervisor configuration API;

NECs ProgrammableFlow solution uses PF5420 (with 160K OpenFlow entries) at the edge and
PF5820 (with 750 full OpenFlow entries and 80K MAC entries) at the core.

Before you mention (multicast-based) VXLAN in the comments: I fail to see something softwaredefined in a technology that uses flooding to learn dynamic VM-MAC-to-VTEP-IP mappings.

Copyright ipSpace.net 2014

Page 5-50

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The idea of edge and core OpenFlow makes perfect sense, but OpenFlow 1.0 doesnt support MPLS.
Could we use something else to make it work?
The following blog post was written in February 2012; in summer 2014 I inserted a few comments
to illustrate how we got nowhere in more than two years.

VIRTUAL CIRCUITS IN OPENFLOW 1.0 WORLD


Two days ago I described how you can use tunneling or labeling to reduce the forwarding state in
the network core (which you have to do if you want to have reasonably fast convergence with
currently-available OpenFlow-enabled switches). Now lets see what you can do in the very limited
world of OpenFlow 1.0 (which is what most OpenFlow-enabled switches shipping in summer 2014
support).

OPENFLOW 1.0 DOES NOT SUPPORT TUNNELING OF ANY SORT


Open vSwitch (OpenFlow-capable soft switch running on Linux/Xen/KVM) can use GRE tunnels to
exchange MAC frames between hypervisor hosts across an IP backbone, but cannot use OpenFlow to
provision those tunnels it uses Open vSwitch Database to get its configuration information
(including GRE tunnel definitions).
After the GRE tunnels have been created, they appear as regular interfaces within the Open
vSwitch; an OpenFlow controller can use them in flow entries to push user packets across GRE
tunnels to other hypervisor hosts.

Copyright ipSpace.net 2014

Page 5-51

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Tunneling support within existing OpenFlow-enabled data center switches is virtually non-existent
(Junipers MX routers with OpenFlow add-on might be an exception), primarily due to hardware
constraints.
We will probably see VXLAN/NVGRE/GRE implementations in data center switches in the next few
months, but I expect most of those implementations to be software-based and thus useless for
anything else but a proof-of-concept (August 2014: no major data center switching vendor supports
OpenFlow over any tunneling technology).
Cisco already has VXLAN-capable chipset in the M-series linecards; believers in merchant silicon will
have to wait for the next-generation chipsets (August 2014: Broadcoms and Intels chipsets support
VXLAN, but so far no vendor shipped VXLAN termination that would work with OpenFlow).

OPENFLOW 1.0 HAS LIMITED LABELING FUNCTIONALITY


MPLS support was added to OpenFlow in release 1.1 and while MPLS-capable hardware devices
could use MPLS labeling with OpenFlow, there arent many devices that would support both MPLS
and OpenFlow today (yet again, talk to Juniper). Forget MPLS for the moment.
VLAN stacking was also introduced in OpenFlow 1.1. While it would be a convenient labeling
mechanism (similar to SPBV, but with a different control plane), many data center switches dont
support Q-in-Q (802.1ad). No VLAN stacking today.
The only standard labeling mechanism left to OpenFlow-enabled switches is thus VLAN tagging
(OpenFlow 1.0 supports VLAN tagging, VLAN translation and tag stripping). You could use VLAN tags
to build virtual circuits across the network core (similar to what MPLS labels do) and the source-

Copyright ipSpace.net 2014

Page 5-52

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

destination-MAC combination at the egress node to recreate the original VLAN tag, but the solution
is messy, hard to troubleshoot, and immense fun to audit. But wait, it gets worse.

THE REALITY
I had the virtual circuits discussion with multiple vendors during the OpenFlow symposium and
Networking Tech Field Day and we always came to the same conclusions:

Forwarding state abstraction is mandatory;

OpenFlow 1.0 has very limited functionality;

Standard tagging/tunneling mechanisms are almost useless due to hardware/OpenFlow


limitations (see above);

Everyone uses their own secret awesomesauce to solve the problem ... often with proprietary
OpenFlow extensions.

Someone was also kind enough to give me a hint that solved the secret awesomesauce riddle: We
can use any field in the frame header in any way we like.
Looking at the OpenFlow 1.0 specs (assuming no proprietary extensions are used) you can rewrite
source and destination MAC addresses to indicate whatever you wish you have 96 bits to work
with. Assuming the hardware devices support wildcard matches on MAC addresses (either by
supporting OpenFlow 1.1 or a proprietary extension to OpenFlow 1.0), you could use the 48 bits of
the destination MAC address to indicate egress node, egress port, and egress MAC address.

Copyright ipSpace.net 2014

Page 5-53

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

I might have doubts about the VLAN translation mechanism described in the previous paragraph (I
am positive many security-focused engineers will have doubts), but the reuse header fields
approach is even more interesting to support. How can you troubleshoot a network if you never
know what the source/destination MAC addresses really mean?

SUMMARY
Before buying an OpenFlow-based data center network, figure out what the vendors are doing (they
will probably ask you to sign an NDA, which is fine), including:

What are the mechanisms used to reduce forwarding state in the OpenFlow-based network core?

Whats the actual packet format used in the network core (or: how are the fields in the packet
header really used?)

Will you be able to use standard network analysis tools to troubleshoot the network?

Which version of OpenFlow are they using?

Which proprietary extensions are they using (or not using)?

Which switch/controller combinations are tested and fully supported?

Copyright ipSpace.net 2014

Page 5-54

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Lets conclude the forwarding scalability part of this chapter with a slightly irrelevant detour: is
MPLS tunneling?

MPLS IS NOT TUNNELING


Greg (@etherealmind) Ferro started an interesting discussion on Google+, claiming MPLS is just
tunneling and a duct tape like NAT. I would be the first one to admit MPLS has its complexities and
shortcomings, but calling it a tunnel just confuses the innocents. MPLS is not tunneling, its a virtualcircuits-based technology, and the difference between the two is a major one.
You can talk about tunneling when a protocol that should be lower in the protocol stack gets
encapsulated in a protocol that youd usually find above or next to it. MAC-in-IP, IPv6-in-IPv4, IPover-GRE-over-IP, MAC-over-VPLS-over-MPLS-over-GRE-over-IPsec-over-IP ... these are tunnels.
IP-over-MPLS-over-PPP/Ethernet is not tunneling, just like IP-over-LLC1-over-TokenRing or IP-overX.25-over-LAPD wasnt.
It is true, however, that MPLS uses virtual circuits, but they are not identical to tunnels. Just
because all packets between two endpoints follow the same path and the switches in the middle
dont inspect their IP headers, doesnt mean you use a tunneling technology.
One-label MPLS is (almost) functionally equivalent to two well-known virtual circuit technologies:
ATM or Frame Relay (that was also its first use case). However, MPLS-based networks scale better
than those using ATM or Frame Relay because of two major improvements:
Automatic setup of virtual circuits based on network topology (core IP routing information), both
between the core switches and between the core (P-routers) and edge (PE-routers) devices. Unless

Copyright ipSpace.net 2014

Page 5-55

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

configured otherwise, IP routing protocol performs topology autodiscovery and LDP establishes a full
mesh of virtual circuits across the core.
VC merge: Virtual circuits from multiple ingress points to the same egress point can merge within
the network. VC merge significantly reduces the overall number of VCs (and the amount of state the
core switches have to keep) in fully meshed networks.
Its interesting to note that ITU wants to cripple MPLS to the point of being equivalent to
ATM/Frame Relay. MPLS-TP introduces out-of-band management network and management
plane-based virtual circuit establishment.

DOES IT MATTER?
It might seem like Im splitting hair just for the fun of it, but theres a significant scalability
difference between virtual circuits and tunnels: devices using tunnels appear as hosts to the
underlying network and require no in-network state, while solutions using virtual circuits (including
MPLS) require per-VC state entries (MPLS: inbound-to-outbound label mapping in LFIB) on every
forwarding device in the path. Even more, end-to-end virtual circuits (like MPLS TE) require state
maintenance (provided by periodic RSVP signaling in MPLS TE) involving every single switch in the
VC path.
You can find scalability differences even within the MPLS world: MPLS/VPN-over-mGRE (tunneling)
scales better than pure label-based MPLS/VPN (virtual circuits) because MPLS/VPN-over-mGRE relies
on IP transport and not on end-to-end LSPs between PE-routers. You can summarize loopback
addresses if you use MPLS/VPN-over-mGRE; doing the same in end-to-end-LSP-based MPLS/VPN
networks breaks them. L2TPv3 scales better than AToM for the same reason.

Copyright ipSpace.net 2014

Page 5-56

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

All VC-based solutions require a signaling protocol between the end devices and the core switches
(or an out-of-band layer-8+ communication and management-plane provisioning). Two common
protocols used in MPLS networks are LDP (for IP routing-based MPLS) and RSVP (for traffic
engineering). Secure and scalable inter-domain signaling protocols are rare; VC-based solutions are
thus usually limited to a single management domain (state explosion is another problem that limits
the size of a VC-based network).
The only global networks using on-demand virtual circuits were the telephone system and X.25; one
of them already died because of its high per-bit costs, and the other one is surviving primarily
because were replacing virtual circuits (TDM voice calls) with tunnels (VoIP).

TANGENTIAL AFTERTHOUGHTS
Dont be sloppy with your terminology. Theres a reason we use different terms to indicate different
behavior it helps us understand the implications (ex: scalability) of the technology. For example,
its important to understand why bridging differs from routing and why its wrong to call them both
switching, and it helps if you understand that Fibre Channel actually uses routing (hidden deep
inside switching terminology).

Copyright ipSpace.net 2014

Page 5-57

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Based on all the limitations documented in this chapter, its easy to see why nobody tries to use
OpenFlow to solve problems that reside above the transport layer (the following blog post has been
written in autumn of 2012; nothing has changed in the meantime).

WHY IS OPENFLOW FOCUSED ON L2-4?


Another great question I got from David Le Goff:
So far, SDN is relying or stressing mainly the L2-L3 network programmability (switches
and routers). Why are most of the people not mentioning L4-L7 network services such as
firewalls or ADCs. Why would those elements not have to be SDNed with an OpenFlow
support for instance?
To understand the focus on L2/L3 switching, lets go back a year and a half to the laws-of-physicschanging big bang event.
OpenFlow started as a research project used by academics working on clean-slate network
architectures, and it was not the first or the only approach to distributed control/data plane
architecture (for more details, watch Ed Crabbes presentation from the OpenFlow Symposium).
However, suddenly someone felt the great urge to get OpenFlow monetized, had to invent a fancy
name, and thus SDN was born.
The main proponents of OpenFlow/SDN (in the Open Networking Foundation sense) are still the
Googles of the world and what they want is the ability to run their own control-plane on top of
commodity switching hardware. They don't care that much about L4-7 appliances, or people whod

Copyright ipSpace.net 2014

Page 5-58

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

like to program those appliances from orchestration software. They have already solved the L4-7
appliance problem with existing open-source tools running on commodity x86 hardware.

DOES OPENFLOW/SDN MAKE SENSE IN L4-7 WORLD?


It makes perfect sense to offer programmable APIs in L4-7 appliances, and an ever-increasing
number of vendors is doing that, from major vendors like F5s Open API to startups like Embrane
and LineRate Systems. However, appliance configuration and programming is a totally different
problem that cannot be solved with OpenFlow. OpenFlow is not a generic programming language but
a simple protocol that allows you to download forwarding information from controller to data plane
residing in a networking element.

IS OPENFLOW STILL USEFUL IN L4-7 WORLD?


If you really want to use OpenFlow to implement a firewall or a load balancer (not that its always a
good idea), you can use the same architecture Cisco used to implement fast path in its Virtual
Security Gateway (VSG) firewalls: send all traffic to the central controller, until the controller
decides it has enough information to either block or permit the flow, at which time the flow
information (5-tuple) is installed in the forwarding elements. Does this sound like Multi-Layer
Switching, the technology every Catalyst 5000 user loved to death? Sure it does. Does it make
sense? Well, it failed miserably the first time, but maybe well get luckier with the next attempt.

Copyright ipSpace.net 2014

Page 5-59

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Does it make sense to use OpenFlow on virtual switches, or is it usability limited to hardware
devices? I tried to give a few hints in July 2012 while answering questions from David Le Goff who
was at that time working for 6WIND.

DOES CPU-BASED FORWARDING PERFORMANCE


MATTER FOR SDN?
David Le Goff sent me several great SDN-related questions. Heres the first one:
What is your take on the performance issue with software-based equipment when dealing
with general purpose CPU only? Do you see this challenge as a hard stop to SDN
business?
Short answer (as always) is it depends. However, I think most people approach this issue the wrong
way.
First, lets agree that SDN means programmable networks (or more precisely, network elements
that can be configured through a reasonable and documented API), not the Open Networking
Foundations self-serving definition.
Second, I hope we agree it makes no sense to perpetuate the existing spaghetti mess we have in
most data centers. Its time to decouple content and services from the transport, decouple virtual
networks from the physical transport, and start building networks that provide equidistant endpoints
(in which case it doesnt matter to which port a load balancer or firewall is connected).

Copyright ipSpace.net 2014

Page 5-60

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Now, assuming youve cleaned up your design, you have switches that do fast packet forwarding
and have few needs for additional services, and the services-focused elements (firewalls, caches,
load balancers) that work on L4-7. These two sets of network elements have totally different
requirements:

Implementing fast (and dumb) packet forwarding on L2 (bridge) or L3 (router) on generic x86
hardware makes no sense. It makes perfect sense to implement the control plane on generic x86
hardware (almost all switch vendors use this approach) and generic OS platform, but it definitely
doesnt make sense to let the x86 CPU get involved with packet forwarding. Broadcom's chipset
can do a way better job for less money.

L4-7 services are usually complex enough to require lots of CPU power anyway. Firewalls
configured to perform deep packet inspection and load balancers inspecting HTTP sessions must
process the first few packets of every session by the CPU anyway, and only then potentially
offload the flow record to dedicated hardware. With optimized networking stacks, its possible to
get reasonable forwarding performance on well-designed x86 platforms, so theres little reason
to use dedicated hardware in L4-7 appliances today (SSL offload is still a grey area).

On top of everything else, shortsighted design of dedicated hardware used by L4-7 appliances
severely limits your options. Just ask a major vendor that needed years to roll out IPv6-enabled load
balancers, high-performance IPv6-enabled firewalls blade ... and still doesnt have hardware-based
deep packet inspection of IPv6 traffic.

Copyright ipSpace.net 2014

Page 5-61

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

SUMMARY
While its nice to have high performance packet forwarding on generic x86 architecture, the
performance of software switching is definitely not an SDN showstopper. Also, keep in mind a
software appliance running on a single vCPU can provide up to a few gigabits of forwarding
performance, there are plenty of cores in todays Xeon-based servers (10Gbps per physical server is
thus very realistic), and not that many people have multiple 10GE uplinks from their data centers.

Copyright ipSpace.net 2014

Page 5-62

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The final blog post in this chapter illustrates what happens when overexcited engineers forget the
harsh limits of reality. I hope this chapter gave you enough information to analyze how bad the idea
described in this blog post is (the blog post was written in late 2011, but there are still people
proposing similar solutions in 2014).

OPENFLOW AND THE STATE EXPLOSION


While everyone deeply involved with OpenFlow agrees its just a low-level tool that cant solve
problems we couldnt solve in the past (just like replacing Tcl with C++ wont help you prove P =
NP), occasionally you stumble across mindboggling ideas that are so simple you have to ask
yourself: were we really that stupid? One of them that obviously impressed James Hamilton is the
solution to load balancing that requires no load balancers.
Before clicking Read more, watch this video and try to figure out what the solution is and why were
not using it in large-scale networks.
The proposal is truly simple: it uses anycast with per-flow forwarding. All servers have the same IP
address, and the OpenFlow controller establishes a path from each client to one of the servers. In its
most simplistic implementation, a flow entry is installed in all devices in the path every time a client
establishes a session with a server (you could easily improve it by using MPLS LSPs or any other
virtual circuit/tunneling mechanism in the core).
Now ask yourself: will this ever scale? Of course it wont. It might be a good solution for long-lived
sessions (after all, thats how voice networks handle 800-numbers), but not for the data world
where a single client could establish tens of TCP sessions per second.

Copyright ipSpace.net 2014

Page 5-63

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

A quick look back confirms that hunch: all technologies that required per-session state in every
network device have failed. IntServ (with RSVP) never really took off on a global scale, and ATM-tothe-desktop failed miserably. The only two exceptions are global X.25 networks (they were so
expensive that nobody ever established more than a few sessions) and voice networks (where
sessions usually last for minutes ... or hours if teenagers get involved).
Load balancers work as well as they do because a single device in the whole path (load balancer)
keeps the per-session state, and because you can scale them out if they become overloaded, you
just add another pair of redundant devices with new IP addresses to the load balancing pool (and
use DNS-based load balancing on top of them).
Some researchers have quickly figured out the scaling problem and theres work being done to make
the OpenFlow-based load balancing scale better, but one has to wonder: after theyre done and their
solution scales, will it be any better than what we have today, or will it just be different?
Moral of the story every time you hear about an incredible solution to a well-known problem ask
yourself: why werent we using it in the past? Were we really that stupid or are there some inherent
limitations that are not immediately visible? Will it scale? Is it resilient? Will it survive device or link
failures? And dont forget: history is a great teacher.

Copyright ipSpace.net 2014

Page 5-64

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OPENFLOW AND SDN USE CASES

Traditional networking architectures and protocols are a perfect solution to a specific set of
problems: shortest-path destination-only layer-2 and layer-3 forwarding. Its amazing how many
problems one can solve with such a specific toolset, from scale-out data center fabrics to the global
Internet.
More complex challenges (example: traffic engineering) have been solved using the traditional
architecture of distributed loosely coupled independent nodes (example: MPLS TE), but could benefit
from a centralized network visibility.
Finally, the traditional solutions havent even tried to tackle some of the harder networking problems
(example: megaflow-based forwarding or centralized policies with on-demand deployment) that
could be solved with a controller-based architecture.

MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:

Start with the SDN, OpenFlow and NFV Resources page;

Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;

Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);

2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;

Finally, Im always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

This chapter contains several real-life SDN solutions, most of them OpenFlow-based. For alternate
approaches see the SDN Beyond OpenFlow chapter, for even more use cases watch the publicly
available videos from my OpenFlow-based SDN Use Cases webinar.

IN THIS CHAPTER:
OPENFLOW: ENTERPRISE USE CASES
OPENFLOW @ GOOGLE: BRILLIANT, BUT NOT REVOLUTIONARY
COULD IXPS USE OPENFLOW TO SCALE?
IPV6 FIRST-HOP SECURITY: IDEAL OPENFLOW USE CASE
OPENFLOW: A PERFECT TOOL TO BUILD SMB DATA CENTER
SCALING DOS MITIGATION WITH OPENFLOW
NEC+IBM: ENTERPRISE OPENFLOW YOU CAN ACTUALLY TOUCH
BANDWIDTH-ON-DEMAND: IS OPENFLOW THE SILVER BULLET?
OPENSTACK/QUANTUM SDN-BASED VIRTUAL NETWORKS WITH FLOODLIGHT
NICIRA, BIGSWITCH, NEC, OPENFLOW AND SDN

Copyright ipSpace.net 2014

Page 6-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Half a year after the public launch of OpenFlow and SDN (in autumn 2011), we already identified
numerous enterprise use cases. Most of them are still largely ignored as every startup and major
networking vendor rushes toward the (supposedly) low hanging fruits of data center fabrics and
cloud-scale virtual networks.

OPENFLOW: ENTERPRISE USE CASES


One of the comments I usually get about OpenFlow is sounds great and Im positive Yahoo! and
Google will eventually use it, but I see no enterprise use case. (see also this blog post). Obviously
nobody would go for a full-blown native OpenFlow deployment and well probably see hybrid (shipsin-the-night) approach more often in research labs than in enterprise networks, but theres always
the integrated mode that allows you to add OpenFlow-based functionality on top of existing
networking infrastructure.
Leaving aside the pretentious claims how OpenFlow will solve hard problems like global load
balancing, there are four functions you can easily implement with OpenFlow (Tony Bourke wrote
about them in more details):

packet filters flow classifier followed by a drop or normal action;

policy based routing flow classifier followed by outgoing interface and/or VLAN tag push;

static routes flow classifiers using only destination IP prefix and

NAT some OpenFlow switches might support source/destination IP address/port rewrites.

Combine that with the ephemeral nature of OpenFlow (whatever controller downloads into the
networking device does not affect running/startup configuration and disappears when its no longer

Copyright ipSpace.net 2014

Page 6-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

needed), and the ability to use the same protocol with multiple product families, either from one or
multiple vendors, and you have a pretty interesting combo.
Actually, I dont care if the mechanism to change networking devices forwarding tables is OpenFlow
or something completely different, as long as its programmable, multi-vendor and integrated with
the existing networking technologies. As I wrote a number of times, OpenFlow is just a
TCAM/FIB/packet classifier download tool.
Remember one of OpenFlows primary use cases: add functionality where vendor is lacking it (see
Igor Gashinskys presentation from OpenFlow Symposium for a good coverage of that topic).
Now stop for a minute and remember how many times you badly needed some functionality along
the lines of the four functions I mentioned above (packet filters, PBR, static routes, NAT) that you
couldnt implement at all, or that required a hodgepodge of expect scripts (or XML/Netconf requests
if youre Junos automation fan) that you have to modify every time you deploy a different device
type or a different software release.
Here are a few ideas I got in the first 30 seconds (if you get other ideas, please do write a
comment):

User authentication for devices that dont support 802.1X;

Per-user access control (I guess NAC is the popular buzzword) that works identically on dial-up,
VPN, wireless and wired access devices;

Push user into a specific VLAN based on whatever hes doing (or based on customized user
authentication);

Give users controlled access to a single application in another VLAN (combine that with NAT to
solve return path problems);

Layer-2 service insertion, be it firewall, IDS/IPS, WAAS or some yet-unknown device;

Copyright ipSpace.net 2014

Page 6-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Looking at my short list, it seems @beaker was right: security just might be the killer app for
OpenFlow/SDN OpenFlow could be used either to implement some security features (packet filters
and traffic steering), to help integrate traditional security functions with the rest of the network, or
to implement dynamic security services insertion at any point in the network something we badly
need but almost never get.

Copyright ipSpace.net 2014

Page 6-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Google uses OpenFlow to control their WAN edge routers that they built from commodity switching
components. The details of their implementation are proprietary (and they havent open-sourced
their solution), heres what I was able to deduce from publicly available information in May 2012:

OPENFLOW @ GOOGLE: BRILLIANT, BUT NOT


REVOLUTIONARY
Google unveiled some details of its new internal network at Open Networking Summit in April and
predictably the industry press and OpenFlow pundits exploded with the this is the end of the
networking as we know it glee. Unfortunately I havent seen a single serious technical analysis of
what it is theyre actually doing and how different their new network is from what we have today.
This is a work of fiction, based solely on the publicly available information presented by
Googles engineers at Open Networking Summit (plus an interview or two published by the
industry press). Read and use it at your own risk.

WHAT IS GOOGLE DOING?


After supposedly building their own switches, Google decided to build their own routers. They use a
distributed multi-chassis architecture with redundant central control plane (not unlike Junipers
XRE/EX8200 combo). Lets call their combo a G-router.

Copyright ipSpace.net 2014

Page 6-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

A G-router is used as a WAN edge device in their data centers and runs traditional routing protocols:
EBGP with the data center routers and IBGP+IS-IS across WAN with other G-routers (or traditional
gear during the transition phase).
On top of that, every G-router has a (proprietary, I would assume) northbound API that is used by
Googles Traffic Engineering (G-TE) a centralized application thats analyzing the application
requirements, computing the optimal paths across the network and creating those paths through the
network of G-routers using the above-mentioned API.
I wouldnt be surprised if G-TE would use MPLS forwarding instead of installing 5-tuples into midpath switches. Doing Forwarding Equivalence Class (FEC) classification at the head-end device
instead of at every hop is way simpler and less loop-prone.
Like MPLS-TE, G-TE runs in parallel with the traditional routing protocols. If it fails (or an end-to-end
path is broken), G-routers can always fall back to traditional BGP+IGP-based forwarding, and like
with MPLS-TE+IGP, youll still have a loop-free (although potentially suboptimal) forwarding
topology.

IS IT SO DIFFERENT?
Not really. Similar concepts (central path computation) were used in ATM and Frame Relay
networks, as well as early MPLS-TE implementations (before Cisco implemented OSPF/IS-IS traffic
engineering extensions and RSVP that was all youd had).
Some networks are supposedly still running offline TE computations and static MPLS TE tunnels
because they give you way better results than the distributed MPLS-TE/autobandwidth/automesh
kludges.

Copyright ipSpace.net 2014

Page 6-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

MPLS-TP is also going in the same direction paths are computed by NMS, which then installs in/out
label mappings (and fast failover alternatives if desired) to the Label Switch Routers (LSRs).

THEN WHAT IS DIFFERENT?


Google is (as far as I know) the first one that implemented the end-to-end system: gathering
application needs, computing paths, and installing them in the routers in real time.
You could do the same thing (should you wish to do it) with the traditional gear using NETCONF with
a bit of MPLS-TP sprinkled on top (or your own API if you have switches that can be easily
programmed in a decent programming language Arista immediately comes to mind), but it would
be a slight nightmare and would still suffer the drawbacks of distributed signaling protocols (even
static MPLS-TE tunnels use RSVP these days).
The true difference between their implementation and everything else on the market is thus that
they did it the right way, learning from all the failures and mistakes we made in the last two
decades.

WHY DID THEY DO IT?


Wouldnt you do the same assuming youd have the necessary intellectual potential and resources?
Googles engineers built themselves a high-end router with modern scale-out software architecture
that runs only the features they need (with no code bloat and no bugs from unrelated features), and
they can extend the network functionality in any way they wish with the northbound API.
Even though they had to make hefty investment in the G-router platform, they claim their network
already converges almost 10x faster than before (on the other hand, its not hard converging faster

Copyright ipSpace.net 2014

Page 6-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

than IS-IS with default timers), and has average link utilization above 90% (which in itself is a huge
money-saver).

HYPE GALORE
Based on the information from Open Networking Summit (which is all the information I have at the
moment), you might wonder what all the hype is about. In one word: OpenFlow. Lets try to debunk
those claims a bit.
Google is running an OpenFlow network. Get lost. Google is using OpenFlow between controller and
adjacent chassis switches because (like everyone else) they need a protocol between the control
plane and forwarding planes, and they decided to use an already-documented one instead of
inventing their own (the extra OpenFlow hype could also persuade hardware vendors and chipset
manufacturers to implement more OpenFlow capabilities in their next-generation products).
Google built their own routers ... and so can you. Really? Based on the scarce information from ONS
talks and interview in Wired, Google probably threw more money and resources at the problem than
a typical successful startup. They effectively decided to become a router manufacturer, and they did.
Can you repeat their feat? Maybe, if you have comparable resources.
Google used open-source software ... so the monopolistic Ciscos of the world are doomed. Just in
case you believe the fairy-tale conclusion, let me point out that many Internet exchanges use opensource software for BGP route servers, and almost all networking appliances and most switches built
today run on open source software (namely Linux or FreeBSD). Its the added value that matters, in
Googles case their traffic engineering solution.

Copyright ipSpace.net 2014

Page 6-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Google built an open network really? They use standard protocols (BGP and IS-IS) like everyone
else and their traffic engineering implementation (and probably the northbound API) is proprietary.
How is that different (from the openness perspective) from networks built from Junipers or Ciscos
gear?

CONCLUSIONS
Googles engineers did a great job it seems they built a modern routing platform that everyone
would love to have, and an awesome traffic engineering application. Does it matter to you and me?
Probably not; I dont expect them giving their crown jewels away. Does it matter that they used
OpenFlow? Not really, its a small piece of their whole puzzle. Will someone else repeat their feat
and bring a low-cost high-end router to the market? I doubt, but I hope to be wrong.

Copyright ipSpace.net 2014

Page 6-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow might be an ideal tool to solve interesting problems that are too rare to merit attention of
traditional networking vendors. Internet Exchange Points (IXPs) might be one of those scenarios.

COULD IXPS USE OPENFLOW TO SCALE?


The SDN industry probably considers me an old and grumpy naysayer (and Im positive Mrs Y has a
special place in their hearts after her recent blog post), so I tried really hard to find a real-life
example where OpenFlow could be used to solve mid-market innovators dilemma to balance my
usual OpenFlow and SDN presentation.
Internet Exchange Points (IXP) seemed a perfect fit they are high-speed mission-critical
environments usually implemented as geographically stretched layer-2 networks, and facing all sorts
of security and scaling problems. Deploying OpenFlow on IXP edge switches would results in
standardized security posture that wouldnt rely on idiosyncrasies of particular vendors
implementation, and we could use OpenFlow to implement ARP sponge (or turn ARPs into unicasts
sent to ARP server).
I presented these ideas at MENOG 12 in March 2013 and got a few somewhat interested responses
and then I asked a really good friend with significant operational experience in IXP environments
for feedback. Not surprisingly, the reply was a cold shower:
I am not quite sure how this improves current situation. Except for the ARP sponge
everything else seem to be implemented by vendors in one form or another. For the ARP
sponge, AMS-IX uses great software developed in house that theyve open-sourced.
As always, from the ops perspective proven technologies beat shiny new tools.

Copyright ipSpace.net 2014

Page 6-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

On a somewhat tangential topic, Dean Pemberton runs OpenFlow in production in New Zealand
Internet Exchange. His deployment model is totally different: the IXP is a layer-3 fabric (not a layer2 fabric like most Internet exchanges), and his route server is the only way to exchange BGP routes
between members. Hes using Quagga and RouteFlow to program Pica8 switches.
A note from a grumpy skeptic: his deployment works great because hes carrying a pretty
limited number of BGP routes the Pica8 switches hes using support up to 12K routes.
IPv4 or IPv6? Who knows, the data sheet ignores that nasty detail.

Copyright ipSpace.net 2014

Page 6-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

First-hop IPv6 security is another morass lacking a systemic solution. Could we solve it with
OpenFlow? Yes, we could but theres nobody approaching this problem from the controller-based
perspective (at least based on my knowledge in August 2014).

IPV6 FIRST-HOP SECURITY: IDEAL OPENFLOW USE


CASE
Supposedly its a good idea to be able to identify which one of your users had a particular IP address
at the time when that source IP address created significant havoc. We have a definitive solution for
the IPv4 world: DHCP server logs combined with DHCP snooping, IP source guard and dynamic ARP
inspection. IPv6 world is a mess: read this e-mail message from v6ops mailing list and watch Eric
Vynckes RIPE65 presentation for excruciating details.

SHORT SUMMARY

Many layer-2 switches still lack the feature parity with IPv4;

IPv6 uses three address allocation algorithms (SLAAC, privacy extensions, DHCPv6) and its quite
hard to enforce a specific one;

Host implementations are wildly different (aka: The nice thing about standards is that you have
so many to choose from.).

IPv6 address tracking is a hodgepodge of kludges.

Copyright ipSpace.net 2014

Page 6-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHAT IF ...THERE WOULD BE AN OPENFLOW SOLUTION?


Now imagine a parallel universe in which the edge switches support OpenFlow 1.3 and IPv6 (the
only vendors matching these criteria in August 2014 are NEC and HP). IPv6 address tracking would
become an ideal job for an OpenFlow controller:

Whenever a new end-host appears on the network, its authenticated, and its MAC address is
logged. Only that MAC address can be used on that port (many switches already implement this
functionality).

Whenever an end-host starts using a new IPv6 source address, the packets are not matched by
any existing OpenFlow entries and thus get forwarded to the OpenFlow controller.

The OpenFlow controller decides whether the new source IPv6 is legal (enforcing DHCPv6-only
address allocation if needed), logs the new IPv6-to-MAC address mapping, and modifies the flow
entries in the first-hop switch. The IPv6 end-host can use many IPv6 addresses each one of
them is logged immediately.

Ideally, if the first-hop switches support all the nuances introduced in OpenFlow 1.2, the
controller can install neighbor advertisement (NA) filters, effectively blocking ND spoofing.

Will this nirvana appear anytime soon? Not likely. Most switch vendors support only OpenFlow 1.0,
which is totally IPv6-ignorant. Also, solving real-life operational issues is never as sexy as promoting
the next unicorn-powered fountain of youth.

Copyright ipSpace.net 2014

Page 6-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Imagine the world where you can buy a prepackaged data center (or a pod for your private cloud
deployment), with compute, storage and networking handled from the single central management
console.
As of August 2014, NEC is still the only vendor with a commercial-grade data center fabric product
using OpenFlow. Most other vendors use more traditional architectures, and the virtualization world
is quickly moving toward overlay virtual networks.
Anyhow, this is how I envisioned potential OpenFlow use in a small data center in 2012:

OPENFLOW: A PERFECT TOOL TO BUILD SMB DATA


CENTER
When I was writing about the NEC+IBM OpenFlow trials, I figured out a perfect use case for
OpenFlow-controlled network forwarding: SMB data centers that need less than a few hundred
physical servers be it bare-metal servers or hypervisor hosts (hat tip to Brad Hedlund for nudging
me in the right direction a while ago)
As I wrote before, OpenFlow-controlled network forwarding (example: NEC, BigSwitch)
experiences a totally different set of problems than OpenFlow-controlled edge (example:
Nicira or XenServer vSwitch Controller).

Copyright ipSpace.net 2014

Page 6-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

THE DREAM
As you can imagine, its extremely simple to configure an OpenFlow-controlled switch: configure its
own IP address, management VLAN, and controllers IP address, and let the controller do the rest.
Once the networking vendors figure out the fine details, they could use dedicated management
ports for out-of-band OpenFlow control plane (similar to what QFabric is doing today), DHCP to
assign an IP address to the switch, and a new DHCP option to tell the switch where the controller is.
The DHCP server would obviously run on the OpenFlow controller, and the whole control plane
infrastructure would be completely isolated from the outside world, making it pretty secure.
The extra hardware cost for significantly reduced complexity (no per-switch configuration and a
single management/SNMP IP address): two dumb 1GE switches (to make the setup redundant),
hopefully running MLAG (to get rid of STP).
Finally, assuming server virtualization is the most common use case in a SMB data center, you could
tightly couple OpenFlow controller with VMwares vCenter, and let vCenter configure the whole
network:

CDP or LLDP would be used to discover server-to-switch connectivity;

OpenFlow controller would automatically download port group information from vCenter and
automatically provision VLANs on server-to-switch links.

Going a step further, OpenFlow controller could automatically configure static port channels
based on load balancing settings configured on port groups.

Copyright ipSpace.net 2014

Page 6-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

End result: decently large layer-2 network with no STP, automatic multipathing, and automatic
adjustment to VLAN changes, with a single management interface, and the minimum number of
moving parts. How cool is that?

SCENARIO#1 GE-ATTACHED SERVERS


If you decide to use GE-attached servers, and run virtual machines on them, it would be wise to use
four to six uplinks per hypervisor host (two for VM data, two for kernel activities, optionally
additional two for iSCSI or NFS storage traffic).
You could easily build a GE Clos fabric using switches from NEC America: PF5240 (ToR switch) as
leaf nodes (youd have almost no oversubscription with 48 GE ports and 4 x 10GE uplinks), and
PF5820 (10 GE switch) as spine nodes and interconnection point with the rest of the network.
Using just two PF5820 spine switches you could get over 1200 1GE server ports enough to connect
200 to 300 servers (probably hosting anywhere between 5.000 and 10.000 VMs).
You'd want to keep the number of switches controlled by the OpenFlow controller low to avoid
scalability issues. NEC claims they can control up to 200 ToR switches with a controller cluster; I
would be slightly more conservative.

SCENARIO#2 10GE ATTACHED SERVERS


Things get hairy if you want to use 10GE-attached servers (or, to put it more diplomatically, IBM
and NEC are not yet ready to handle this use case):

If you want true converged storage with DCB, you have to use IBMs switches (NEC does not
have DCB), and even then Im not sure how DCB would work with OpenFlow.

Copyright ipSpace.net 2014

Page 6-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

PF5820 (NEC) and G8264 (IBM) have 40GE uplinks, but I have yet to see a 40GE OpenFlowenabled switch with enough port density to serve as the spine node. At the moment, it seems
that bundles of 10GE uplinks are the way to go.

It seems (according to data sheets, but I could be wrong) NEC supports 8-way multipathing, and
wed need at least 16-way multipathing to get 3:1 oversubscription.

Anyhow, assuming all the bumps eventually do get ironed out, you could have a very easy-tomanage network connecting a few hundred 10GE-attached servers.

WILL IT EVER HAPPEN?


I remain skeptical, mostly because every vendor seems obsessed with cloud computing and
zettascale data centers, ignoring mid-scale market but there might be silver lining. This idea
would make most sense if youd be able to buy a prepackaged data center (think VCE block) at a
reasonably low price (to make it attractive to SMB customers).
A few companies have all the components one would need in a SMB data center (Dell, HP, IBM), and
Dell just might be able to pull it off (while HP is telling everyone how theyll forever change the
networking industry). And now that Ive mentioned Dell: how about configuring your data center
through a user-friendly web interface, and have it shipped to your location in a few weeks.

Copyright ipSpace.net 2014

Page 6-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow is an ideal tool when you want to augment the software-based networking services with
packet forwarding at hardware speeds. This post describes an DoS prevention solution demonstrated
by NEC and Radware in spring 2013:

SCALING DOS MITIGATION WITH OPENFLOW


NEC and a slew of its partners demonstrated an interesting next step in the SDN saga @ Interop Las
Vegas 2013: multi-vendor SDN applications. Load balancing, orchestration and security solutions
from A10, Silver Peak, Red Hat and Radware were happily cooperating with ProgrammableFlow
controller.
A curious mind obviously wants to know whats behind the scenes. Masterpieces of engineering?
Large integration projects ... or is it just a smart application of API glue? In most cases, its the
latter. Lets look at the ProgrammableFlow Radware integration.
Heres a slide from NECs white paper. An interesting high-level view, but no details. Radware press
release is even less helpful (but its definitely a masterpiece of marketing).

Copyright ipSpace.net 2014

Page 6-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 6-1: NEC+Radware high-level solution architecture

Copyright ipSpace.net 2014

Page 6-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Fortunately Ron Meyran provided more details on Radware blog as did Lior Cohen in his SDN Central
Demo Friday presentation:

DefenseFlow software monitors the flow entries and counters provided by an OpenFlow
controller, and tries to identify abnormal traffic patterns;

The abnormal traffic is diverted to Radware DefensePro appliance that scrubs the traffic before
its returned to the data center.

Both operations are easily done with ProgrammableFlow API it provides both flow data and the
ability to redirect the traffic to a third-party next hop (or MAC address) based on a dynamicallyconfigured access list. Heres a CLI example from the ProgrammableFlow webinar; API call would be
very similar (but formatted as JSON or XML object):

Copyright ipSpace.net 2014

Page 6-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 6-2: Sample ProgrammableFlow traffic steering configuration

Copyright ipSpace.net 2014

Page 6-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHY IS THIS USEFUL?


Deep packet inspection is CPU-intensive and hard to implement at high speeds. DPI products and
solutions (including traffic scrubbing appliances like DefensePro) thus tend to be expensive. 40Gbps
of DefensePro DPI (DefensePro 40420) sets you back almost half a million dollars (according to this
price list).
Doing initial triage and subsequent traffic blackholing in cheaper forwarding hardware (programmed
through OpenFlow) and diverting a small portion of the traffic through the scrubbing appliance
significantly improves the average bandwidth a DPI solution can handle at reasonable cost.

IS THIS SOMETHING ONLY OPENFLOW COULD DO?


Of course not flow monitoring and statistics have been available for decades, either in Netflow or
sFlow format. Likewise, weve been using PBR to redirect traffic for decades, and configuring PBR
through NETCONF is not exactly rocket science ... and of course theres FlowSpec that real-life
engineers sometimes use to mitigate real-life DoS attacks (although, like any other tool, it does fail
every now and then).
However, an OpenFlow controller does provide a more abstracted API instead of configuring PBR
entries that push traffic toward next hop (or an MPLS TE tunnel if youre an MPLS ninja) and
modifying router configuration while doing so, you just tell the OpenFlow controller that you want
the traffic redirected toward a specific MAC address, and the necessary forwarding entries
automagically appear all across the path.
Finally, theres the sexiness factor. Mentioning SDN instead of Netflow or PBR in your press release
is infinitely more attractive to bedazzled buyers.

Copyright ipSpace.net 2014

Page 6-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

WILL IT SCALE?
You should be aware of the major OpenFlow scaling issues by now, and I hope youve realized that
real-life switches have real-life limitations. Most of the existing hardware reuses ACL entries when
you ask for full-blown OpenFlow flow entries. Now go and check the ACL table size on your favorite
switch, and imagine you need one entry for each flow spec you want to monitor or divert to the DPI
appliance.
Done? Disappointed? Pleasantly surprised?
However, a well-tuned solution using the right combination of hardware and software (example:
NECs PF5240 which can handle 160.000 L2, IPv4 or IPv6 flows in hardware) just might work. Still,
were early in the development cycle, so make sure you do thorough (stress) testing before buying
anything ... and just in case you need rock-solid traffic generator, Spirent will be more than happy
to sell you one (or few).

Copyright ipSpace.net 2014

Page 6-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

NEC and IBM gave me access to one of their early ProgrammableFlow customers. This is what I got
out of that discussion which took place in February 2012.
In the meantime, Ive encountered at least one large-scale production deployment of
ProgrammableFlow, proving that NECs solution works in large data centers.

NEC+IBM: ENTERPRISE OPENFLOW YOU CAN


ACTUALLY TOUCH
I didnt expect wed see multi-vendor OpenFlow deployment any time soon. NEC and IBM decided to
change that and Tervela, a company specialized in building messaging-based data fabrics, decided
to verify their interoperability claims. Janice Roberts who works with NEC Corporation of America
helped me get in touch with them and I was pleasantly surprised by their optimistic view of
OpenFlow deployment in typical enterprise networks.

A BIT OF A BACKGROUND
Tervelas data fabric solutions typically run on top of traditional networking infrastructure, and an
underperforming network (particularly long outages triggered by suboptimal STP implementations)
can severely impact the behavior of the services running on their platform.
They were looking for a solution that would perform way better than what their customers are
typically using today (large layer-2 networks), while at the same time being easy to design,

Copyright ipSpace.net 2014

Page 6-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

provision and operate. It seems that they found a viable alternative to existing networks in a
combination of NECs ProgrammableFlow Controller and IBMs BNT 8264 switches.

EASY TO DEPLOY?
As long as your network is not too big (NEC claimed their controller can manage up to 50 switches in
their Networking Tech Field Day presentation, and the later releases of ProgrammableFlow increased
that limit to 200), the design and deployment isnt too hard according to Tervelas engineers:

They decided to use out-of-band management network and connected the management port of
BNT8264 to the management network (they could also use any other switch port).

All you have to configure on the individual switch is the management VLAN, a management IP
address and the IP address of the OpenFlow controllers.

The ProgrammableFlow controller automatically discovers the network topology using LLDP
packets sent from the controller through individual switch interfaces.

After those basic steps, you can start configuring virtual networks in the OpenFlow controller
(see the demo NEC made during the Networking Tech Field Day).

Obviously, youd want to follow some basic design rules, for example:

Make the management network fully redundant (read the QFabric documentation to see how
thats done properly);

Connect the switches into a structure somewhat resembling a Clos fabric, not in a ring or a
random mess of cables.

Copyright ipSpace.net 2014

Page 6-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

TEST RESULTS LATENCY


Tervelas engineers ran a number of tests, focusing primarily on latency and failure recovery.
They found out that (as expected) the first packet exchanged between a pair of VMs experiences a
8-9 millisecond latency because its forwarded through the OpenFlow controller, with subsequent
packets having latency they were not able to measure (their tool has a 1 msec resolution).
Lesson#1 If the initial packet latency matters, use proactive programming mode (if available) to
pre-populate the forwarding tables in the switches;
Lesson#2 Dont do a full 12-tuple lookups unless absolutely necessary. Youd want to experience
the latency only when the inter-VM communication starts, not for every TCP/UDP flow (not to
mention that capturing every flow in a data center environment is a sure recipe for disaster).

TEST RESULTS FAILURE RECOVERY


Very fast failure recovery was another pleasant surprise. They tested just the basic scenario (parallel
primary/backup links) and found that in most cases the traffic switches over to the second link in
less than a millisecond, indicating that NEC/IBM engineers did a really good job and pre-populated
the forwarding tables with backup entries.
If it takes 8-9 milliseconds for the controller to program a single flow into the switches (see
latency above), its totally impossible that the same controller would do a massive
reprogramming for the forwarding tables in less than a millisecond. The failure response
must have been preprogrammed in the forwarding tables.

Copyright ipSpace.net 2014

Page 6-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

There were a few outliers (10-15 seconds), probably caused by lack of failure detection on the
physical layer. As I wrote before, detecting link failures via control packets sent by OpenFlow
controller doesnt scale you need distributed linecard protocols (LACP, BFD) if you want to have a
scalable solution.
NEC added OAM functionality in later releases of ProgrammableFlow, probably solving this
problem.

Finally, assuming their test bed allowed the ProgrammableFlow controller to prepopulate the backup
entries, it would be interesting to observe the behavior of a four-node square network, where its
impossible to find a loop-free alternate path unless you use virtual circuits like MPLS Fast Reroute
does.

TEST RESULTS BANDWIDTH ALLOCATION AND TRAFFIC ENGINEERING


One of the interesting things OpenFlow should enable is the bandwidth-aware flow routing. Tervelas
engineers were somewhat disappointed to discover the software/hardware combination they were
testing doesnt meet those expectations yet.
They were able to reserve a link for high-priority traffic and observe automatic load balancing across
alternate paths (which would be impossible in a STP-based layer-2 network), but they were not able
to configure statistics-based routing (route important flows across underutilized links).

Copyright ipSpace.net 2014

Page 6-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

NEXT STEPS?
Tervelas engineers said the test results made them confident in the OpenFlow solution from NEC
and IBM. They plan to run more extensive tests and if those test results work out, theyll start
recommending OpenFlow-based solutions as a Proof-of-Concept-level alternative to their customers.

A HUGE THANK YOU!


This blog post would never happen without Janice Roberts who organized the exchange of ideas, and
Michael Matatia, Jake Ciarlante and Brian Gladstein from Tervela who were willing to spend time
with me sharing their experience.

Copyright ipSpace.net 2014

Page 6-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Every time a new networking technology appears, someone tries to solve the Bandwidth-on-Demand
problem with it. OpenFlow is no exception.

BANDWIDTH-ON-DEMAND: IS OPENFLOW THE SILVER


BULLET?
Whenever the networking industry invents a new (somewhat radical) technology, bandwidth-ondemand seems to be one of the much-touted use cases. OpenFlow/SDN is no different Juniper
used its OpenFlow implementation (Open vSwitch sitting on top of Junos SDK) to demonstrate
Bandwidth Calendaring (see Dave Wards presentation @ OpenFlow Symposium for more details),
Greg Ferro was talking about the same topic in his fantastic Introduction to OpenFlow/SDN webinar,
and Dmitri Kalintsev recently blogged How about an ability for things like Open vSwitch ... to
actually signal the transport network its connectivity requirements ... say desired bandwidth I have
only one problem with these ideas: Ive seen them before.
In the last 20 years, at least three technologies have been invented to solve the bandwidth-ondemand problem: RSVP, ATM Switched Virtual Circuits (SVC) and MPLS Traffic Engineering (MPLSTE). None of them was ever widely used to create a ubiquitous bandwidth-on-demand service.
Im positive very smart network operators (including major CDN and content providers like Google)
use MPLS-TE very creatively. Im also sure there are environments where RSVP is a mission-critical
functionality. Im just saying bandwidth-on-demand is like IP multicast its used by 1% of the
networks that badly need it.
All three technologies I mentioned above faced the same set of problems:

Copyright ipSpace.net 2014

Page 6-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Per-flow (or per-granular-FEC) state in the network core never scales. This is what killed RSVP
and ATM SVCs.

Its pretty hard to traffic engineer just the elephant flows. Either you do it properly and traffic
engineer all traffic, or you end with a suboptimal network.

Reacting to short-term changes in bandwidth requirements can cause interesting oscillations in


the network (Im positive Petr Lapukhov could point you to a dozen sources analyzing this
problem).

Nobody above the network layer really cares its way simpler to blame the network when the
bandwidth fairy fails to deliver.

You dont think the last bullet is real? Then tell me how many off-the-shelf applications have RSVP
support ... even though RSVP has been available in Windows and Unix/Linux server for ages. How
many applications can mark their packets properly? How many of them allow you to configure DSCP
value to use (apart from IP phones)?
Similarly, its not hard to implement bandwidth-on-demand for specific elephant flows (inter-DC
backup, for example) with a pretty simple combination of MPLS-TE and PBR, potentially configured
with Netconf (assuming you have a platform with a decent API). You could even do it with SNMP
pre-instantiate the tunnels and PBR rules and enable tunnel interface by changing ifAdminStatus.
When have you last seen it done?
So, although Im the first one to admit OpenFlow is an elegant tool to integrate flow classification
(previously done with PBR) with traffic engineering (using MPLS-TE or any of the novel technologies
proposed by Juniper) using the hybrid deployment model, being a seasoned skeptic, I just dont
believe well reach the holy grail of bandwidth-on-demand during this hype cycle. However, being an
eternal optimist, I sincerely hope Im wrong.

Copyright ipSpace.net 2014

Page 6-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In one of their pivoting phases Big Switch Networks proposed to implement virtual networking with
MAC-layer access control lists installed through OpenFlow. Im not aware of any commercial
deployment of this idea.

OPENSTACK/QUANTUM SDN-BASED VIRTUAL


NETWORKS WITH FLOODLIGHT
A few years before MPLS/VPN was invented, Id worked with a service provider who wanted to offer
L3-based (peer-to-peer) VPN service to their clients. Having a single forwarding table in the PErouters, they had to be very creative and used ACLs to provide customer isolation (youll find more
details in the Shared-router Approach to Peer-to-peer VPN Model section of my MPLS/VPN
Architectures book).
Now, what does that have to do with OpenFlow, SDN, Floodlight and Quantum?

THE BIG PICTURE


Big Switch has released a plug-in for Quantum that provides OpenFlow-based virtual network
support with their open-source Floodlight controller, and they use layer-2 ACLs to implement virtual
networks, confirming the infinite wisdom of RFC 1925:
Every old idea will be proposed again with a different name and a different presentation,
regardless of whether it works.

Copyright ipSpace.net 2014

Page 6-32

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

HOW DOES IT WORK?


The 30K foot perspective first:

OpenStack virtual networks are created with the REST API of the Quantum (networking)
component of OpenStack;

Quantum uses back-end plug-ins to create the virtual networks in the actual underlying network
fabric. Quantum (and the rest of OpenStack) does not care how the virtual networks are
implemented as long as they provide isolated L2 domains.

And a quick look behind the scenes:

Big Switch decided to implement virtual networks with dynamic OpenFlow-based L2 ACLs instead
of using VLAN tags.

The REST API offered by Floodlights VirtualNetworkFilter module offers simple methods that
create virtual networks and assign MAC addresses to them.

The VirtualNetworkFilter intercepts new flow setup requests (PacketIn messages to the Floodlight
controller), checks that the source and destination MAC address belong to the same virtual
network, and permits or drops the packet.

If the VirtualNetworkFilter accepts the flow, the Floodlights Forwarding module installs the flow
entries for the newly-created flow throughout the network.

The current release of Floodlight installs per-flow entries throughout the network. Im not
particularly impressed with the scalability of this approach (and Im not the only one).

Copyright ipSpace.net 2014

Page 6-33

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

DOES IT MAKE SENSE?


Floodlight controller and its Quantum plug-in have a very long way to go before Id use them in a
production environment:

The Floodlight controller is a single point of failure (theres no provision for a redundant
controller);

Unless I cant read Java code (which wouldnt surprise me at all), the VirtualNetworkFilter stores
all mappings (including MAC membership information) in in-memory structures that are lost if
the controller or the server on which it runs crashes;

As mentioned above, per-flow entries used by Floodlight controller dont scale at all (more about
that in an upcoming post).

The whole thing is thus a nice proof-of-concept tool that will require significant efforts (probably
including a major rewrite of the forwarding module) before it becomes production-ready.
However, we should not use Floodlight to judge the quality of the yet-to-be-released commercial
OpenFlow controller from Big Switch Networks. This is how Mike Cohen explained the differences:
I want to highlight that all of the points you raised around production deployability and
flow scalability (and some you didn't around how isolation is managed / enforced) are
indeed addressed in significant ways in our commercial products. Theres a separation
between what's in Floodlight and the code folks will eventually see from Big Switch.
As always, I might become a believer once I see the product and its documentation.

Copyright ipSpace.net 2014

Page 6-34

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The final blog post in this chapter was written in early 2012 when the industry press still wasnt able
to figure out what individual companies using OpenFlow were doing. Although its a bit old, it still
provides an overview of different solutions that use OpenFlow as a low-level forwarding table
programming tool.
In the meantime, VMware bought Nicira (as I predicted in the last paragraph), and Niciras NVP
became the basis for VMwares NSX.

NICIRA, BIGSWITCH, NEC, OPENFLOW AND SDN


Numerous articles published in the last few days describing how Nicira clashes heads-on with Cisco
and Juniper just proved that you should never let facts interfere with a good story (let alone eyecatching headline). Just in case you got swayed away by those catchy stories, heres the real McCoy
(as I see it):

WHAT ARE THEY ACTUALLY DOING?


Nicira is building virtual networks solution using tunneling (VLAN tags, MAC-over-GRE or whatever
else is available) between hypervisor switches. It expects the underlying network transport to do its
work, be it at layer-2 or layer-3. An Open vSwitch appears as a regular VLAN-capable learning
switch or as an IP host to the physical network, and uses existing non-OpenFlow mechanisms to
interact with the network.

Copyright ipSpace.net 2014

Page 6-35

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Deployment paradigm: complexity belongs to the hypervisor soft switches, lets keep the network
simple. It should provide no more and no less than optimal transport between equidistant hypervisor
hosts (Clos fabrics come to mind).
Target environment: Large cloud builders and other organizations leaning toward Xen/OpenStack.
NEC and BigSwitch are building virtual networks by rearranging the forwarding tables in the physical
switches. Their OpenFlow controllers are actively reconfiguring the physical network, creating virtual
networks out of VLANs, interfaces, or sets of MAC/IP addresses.
Deployment paradigm: we know hypervisor switches are stupid and cant see beyond VLANs, so
well make the network smarter (aka VM-aware networking).
Target environment: large enterprise networks and those that build cloud solutions with existing
software using VLAN-based virtual switches.

COMPETITIVE HOT SPOTS?


Between Nicira and NEC/BigSwitch: few. There is an overlap in functionality (NEC and BigSwitch can
obviously manage Open vSwitch as well), but not much overlap in typical use case or sweet-spot
target environments (I am positive there will be marketing efforts to shoehorn all of them in places
where they dont really fit, but thats a different story).
Between Nicira and Cisco/Juniper switches: few. Large cloud providers already got rid of enterprise
kludges and use simple L2 or L3 fabrics. Facebooks, Googles and Amazons of the world run on IP;
they dont care much about TRILL-like inventions. Some of them buy equipment from Juniper, Cisco,
Force10 or Arista, some of them build their own boxes, but however they build their network, that

Copyright ipSpace.net 2014

Page 6-36

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

wont change because of Nicira. No wonder Michael Bushong from Juniper embraced Nicira's
solution.
Between Nicira and Ciscos Nexus 1000V: not at the moment. Open vSwitch runs on Xen/KVM,
Nexus 1000V runs on VMware/Hyper-V. Open vSwitch runs on vSphere, but with way lower
throughput than Nexus 1000V. Obviously Cisco could easily turn Nexus 1000V VSM into an
OpenFlow controller (I predicted that would be their first move into OpenFlow world, and was proven
dead wrong) and manage Open vSwitches, but there's nothing at the moment to indicate they're
considering it.
Between BigSwitch/NEC and Cisco/Juniper. This one will be fun to watch, more so with IBM, Brocade
and HP clearly joining the OpenFlow camp and Juniper cautiously being on the sidelines.
However, Nicira might trigger an interesting mindset shift in the cloud aspirant community: all of a
sudden, Xen/OpenStack/Quantum makes more sense from the scalability perspective. A certain
virtualization vendor will indubitably notice that ... unless they already focused their true efforts on
PaaS (at which point all of the above becomes a moot point).

Copyright ipSpace.net 2014

Page 6-37

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

SDN BEYOND OPENFLOW

The SDN = Centralized Control Plane (preferably using OpenFlow) definition promoted by Open
Networking Foundation (ONF) is too narrow for most real-life use cases, as it forces a controller
vendor to reinvent all the mechanisms we had in networking devices for the last 30 years, and make
them work within a distributed system with unreliable communication paths.
Many end-users (including Microsoft, a founding member of ONF) and vendors took a different
approach, and created solutions that use traditional networking protocols in a different way, rely on
overlays to reduce the complexity through decoupling, or use a hierarchy of control planes to
achieve better resilience.
This chapter starts with a blog post describing the alternate approaches to SDN and documents
several potentially usable protocols and solutions.

MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:

Start with the SDN, OpenFlow and NFV Resources page;

Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;

Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);

2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;

Finally, Im always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

IN THIS CHAPTER:
THE FOUR PATHS TO SDN
THE MANY PROTOCOLS OF SDN
EXCEPTION ROUTING WITH BGP: SDN DONE RIGHT
NETCONF = EXPECT ON STEROIDS
DEAR $VENDOR, NETCONF != SDN
WE NEED BOTH OPENFLOW AND NETCONF
CISCO ONE: MORE THAN JUST OPENFLOW/SDN
THE PLEXXI CHALLENGE (OR: DONT BLAME THE TOOLS)
I2RS JUST WHAT THE SDN GOLDILOCKS IS LOOKING FOR?

Copyright ipSpace.net 2014

Page 7-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The very strict definition of SDN as understood by Open Networking Foundation promotes an
architecture with strict separation between a controller and totally dumb devices that cannot do
more than forward packets based on forwarding rules downloaded from the controller.
This definition is too narrow for most use cases, resulting in numerous solutions and architectures
being branded as SDN. Most of these solutions fall into one of the four categories described in the
blog post I wrote in August 2014.

THE FOUR PATHS TO SDN


After the initial onslaught of SDN washing, four distinct approaches to SDN have started to emerge,
from centralized control plane architectures to smart reuse of existing protocols.
As always, each approach has its benefits and drawbacks, and theres no universally best solution.
You just got four more (somewhat immature) tools in your toolbox. And now for the details.

CONTROL-DATA PLANE SEPARATION


The original (or shall I say orthodox) SDN definition comes from Open Networking Foundation and
calls for a strict separation of control- and data planes, with a single control plane being responsible
for multiple data planes.
That definition, while serving the goals of ONF founding members, is at the moment mostly
irrelevant for most enterprise or service provider organizations, which cannot decide to become a
router manufacturer to build a few dozens of WAN edge routers and based on the amount of

Copyright ipSpace.net 2014

Page 7-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

resources NEC invested in ProgrammableFlow over the last years, its not realistic to expect that
well be able to use OpenDaylight in production environments any time soon (assuming youd want
to use it an architecture with a single central failure point in the first place).
FYI, Im not blaming OpenFlow. OpenFlow is just a low level tool that can be extremely
handy when youre trying to implement unusual ideas.

Reasonably-sized organizations could use OpenFlow to augment the forwarding functionality of


existing network devices (in which case the only hardware one could use are a few HP switches, as
no other major vendor supports send-to-normal OpenFlow action).
I am positive there will be people building OpenFlow controllers controlling forwarding fabrics, but
theyll eventually realize what a monumental task they undertook when theyll have to reinvent all
the wheels networking industry invented in the last 30 years including:

Topology discovery;

Fast failure detection (including detection of bad links, not just lost links);

Fast reroute around failures;

Path-based forwarding and prefix-independent convergence;

Scalable linecard protocols (LACP, LLDP, STP, BFD ).

Copyright ipSpace.net 2014

Page 7-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

OVERLAY VIRTUAL NETWORKS


The proponents of overlay virtual networking solutions use the same architectural approach that
worked well with Telnet (replacing X.25 PAD), VoIP (replacing telephone exchanges) or iSCSI, not to
mention the global Internet reduce the complexity of the problem by decoupling transport fabric
from edge functionality (a more cynical mind might be tempted to quote RFC 1925 section 2.6a).
The decoupling approach works well assuming there are no leaky abstractions (in other words, the
overlay can ignore the transport network which wasnt exactly the case in Frame Relay or ATM
networks). Overlay virtual networks work well over fabrics with equidistant endpoints, and fail as
miserably as any other technology when being misused for long-distance VLAN extensions.

VENDOR-SPECIFIC APIS
After the initial magical dust of SDN-washing settled down, few vendors remained standing (Im
skipping those that allow you to send configuration commands in XML envelope and call that
programmability):

Arista has eAPI (access to EOS command line through REST) as well as the capability to install
any Linux component on their switches, and use programmatic access to EOS data structures
(sysdb);

Ciscos OnePK gives you extensive access to inner working of Cisco IOS and IOS XE (havent
found anything NX-OS-related on DevNet);

Juniper has some SDK thats safely tucked behind a partner-only regwall. Just the right thing to
do in 2014.

Copyright ipSpace.net 2014

Page 7-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

F5 had iRules and iControl for years (and theres a Perl library to use it, which is totally
awesome).

Not surprisingly, vendors love you to use their API. After all, thats the ultimate lock-in they can get.

REUSE OF EXISTING PROTOCOL


While the vendors and the marketers were fighting the positioning battles, experienced engineers
did what they do best they found a solution to a problem with the tools at hand. Many scalable
real-life SDN implementations (as opposed to works great in PowerPoint ones) use BGP to modify
forwarding information in the network (or even filter traffic with BGP FlowSpec), and implement
programmatic access to BGP with something like ExaBGP.
Finally, dont forget that weve been using remote-triggered black holes for years (the RFC
describing it is five years old, but the technology itself is way older) we just didnt know we were
doing SDN back in those days.

Copyright ipSpace.net 2014

Page 7-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHICH ONE SHOULD I USE?


You know the answer: it depends.
If youre planning to implement novel ideas in the data center, overlay virtual networks might be the
way to do (more so as you can change the edge functionality without touching the physical
networking infrastructure).
Do you need flexible dynamic ACLs or PBR? Use OpenFlow (or even better, DirectFlow if you have
Arista switches).
Looking for a large-scale solution that controls the traffic in LAN or WAN fabric? BGP might be the
way to go.
Finally, you can do things you cannot do with anything else with some vendor APIs (but do
remember the price youre paying).

Copyright ipSpace.net 2014

Page 7-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The following text is a slightly reworded blog post I wrote in April 2013:

THE MANY PROTOCOLS OF SDN


One could use a number of existing protocols to implement a controller-based networking solution
depending on the desired level of interaction between the controller and the controlled devices. The
following diagram lists some of them sorted by the networking device plane they operate on.

Figure 7-1: The many protocols of SDN

Copyright ipSpace.net 2014

Page 7-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

NETCONF, OF-Config (a YANG data model used to configure OpenFlow devices through NETCONF)
and XMPP (chat protocol creatively used by Arista EOS) operate at the management plane they
can change network device configuration or monitor its state.
Remote Triggered Black Holes is one of the oldest solutions using BGP as the mechanism to modify
networks forwarding behavior from a central controller.
Some network virtualization vendors use BGP to build MPLS/VPN-like overlay virtual networking
solutions.
I2RS and PCEP (a protocol used to create MPLS-TE tunnels from a central controller) operate on the
control plane parallel to traditional routing protocols). BGP-LS exports link state topology and MPLSTE data through BGP.
OVSDB is a protocol that treats control-plane data structures as database tables and enables a
controller to query and modify those structures. Its used extensively in VMwares NSX, but could be
used to modify any data structure (assuming one defines additional schema that describes the
data).
OpenFlow, MPLS-TP, ForCES and Flowspec (PBR through BGP used by creative network operators
like CloudFlare) work on the data plane and can modify the forwarding behavior of a controlled
device. OpenFlow is the only one of them that defines data-to-control-plane interactions (with the
Packet In and Packet Out OpenFlow messages).

Copyright ipSpace.net 2014

Page 7-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Microsoft was one of the first companies to document their use of BGP to implement a controllerbased architecture. Numerous similar solutions have been described since the time I wrote this blog
post (October 2013) it seems BGP is becoming one of the most popular SDN implementation tools.

EXCEPTION ROUTING WITH BGP: SDN DONE RIGHT


One of the holy grails of data center SDN evangelists is controller-driven traffic engineering
(throwing more leaf-and-spine bandwidth at the problem might be cheaper, but definitely not
sexier). Obviously they dont call it traffic engineering as they dont want to scare their audience
with MPLS TE nightmares, but the idea is the same.
Interestingly, you dont need new technologies to get as close to that holy grail as you wish; Petr
Lapukhov got there with a 20 year old technology BGP.

THE PROBLEM
Ill use a well-known suboptimal network to illustrate the problem: a ring of four nodes (it could be
anything, from a monkey-designed fabric, to a stack of switches) with heavy traffic between nodes A
and D.

Copyright ipSpace.net 2014

Page 7-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-2: Sample network diagram

In a shortest-path forwarding environment you cannot spread the traffic between A and D across all
links (although you might get close with a large bag of tricks).
Can we do any better with a controller-based forwarding? We definitely should. Lets see how we can
tweak BGP to serve our SDN purposes.

INFRASTRUCTURE: USING BGP AS IGP


If you want to use BGP as the information delivery vehicle for your SDN needs, you MUST ensure its
the highest priority routing protocol in your network. The easiest design you can use is a BGP-only
network using BGP as a more scalable (albeit a bit slower) IGP.

Copyright ipSpace.net 2014

Page 7-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-3: Using BGP as a large-scale IGP

BGP-BASED SDN CONTROLLER


After building a BGP-only data center, you can start to insert controller-generated routes into it:
establish an IBGP session from the controller (cluster) to every BGP router and use higher local
preference to override the EBGP-learned routes. You might also want to set no-export community
on those routes to ensure they arent leaked across multiple routers.

Copyright ipSpace.net 2014

Page 7-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-4: BGP-based SDN controller

Obviously Im handwaving over lots of moving parts you need topology discovery, reliable next
hops, and a few other things. If you really want to know all those details, listen to the Packet
Pushers podcast where we deep dive around them (hint: you could also engage me to help you build
it).

Copyright ipSpace.net 2014

Page 7-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

RESULTS: UNEQUAL-COST MULTIPATH


The SDN controller in our network could decide to split the traffic between A and D across multiple
paths. All it has to do to make it work is to send the following IBGP routing updates for prefix D:

Two identical BGP paths (with next hops B and D) to A (to ensure the BGP route selection
process in A uses BGP multipathing);

A BGP path with next hop C to B (B might otherwise send some of the traffic for D to A, resulting
in a forwarding loop between B and A).

Figure 7-5: Unequal cost multipathing with BGP-based SDN controller

Copyright ipSpace.net 2014

Page 7-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

You can get even fancier results if you run MPLS in your network (hint: read the IETF draft on
remote LFA to get a few crazy ideas).

MORE INFORMATION

Routing Design for Large-Scale Data Centers (Petrs presentation @ NANOG 55)

Use of BGP for Routing in Large-Scale Data Centers (IETF draft)

Centralized Routing Control in BGP Networks (IETF draft)

Copyright ipSpace.net 2014

Page 7-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Not surprisingly, the SDN-washing (labeling whatever you have as SDN) started just a few months
after the initial SDN hype, with some people calling their NETCONF implementation SDN. This is
what NETCONF really is.

NETCONF = EXPECT ON STEROIDS


After the initial explosion of OpenFlow/SDN hype, a number of people made claims that OpenFlow is
not the tool one can use to make SDN work, and NETCONF is commonly mentioned as an alternative
(not surprisingly, considering that both Cisco IOS and Junos support it). Unfortunately, considering
todays state of NETCONF, nothing can be further from the truth.

WHAT IS NETCONF?
NETCONF (RFC 6421) is an XML-based protocol used to manage the configuration of networking
equipment. It allows the management console (manager) to issue commands and change
configuration of networking devices (NETCONF agents). In this respect, its somewhat similar to
SNMP, but since it uses XML, provides a much richer set of functionality than the simple key/value
pairs of SNMP.
For more details, I would strongly suggest you listen to the NETCONF Packet Pushers podcast.

Copyright ipSpace.net 2014

Page 7-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHATS WRONG WITH NETCONF?


Greg Ferro made a great analogy in the above-mentioned podcast: NETCONF is like SNMPv2/v3 (the
transport protocol) and Yang (the language used to describe valid NETCONF messages) is like ASN.1
(the syntax describing SNMP variables). However, theres a third component in the SNMP
framework: a large set of standardized MIBs that are implemented by almost all networking
vendors.
Its thus possible to write a network management application using a standard MIB that would work
with equipment from all vendors that decided to implement that MIB. For example, should the
Hadoop developers decide to use LLDP to auto-discover the topology of the Hadoop clusters, they
could rely on LLDP MIB being available in switches from most data center networking vendors.
Apart from few basic aspects of session management, no such standardized data structure exists in
the NETCONF world. For example, theres no standardized command (specified in an RFC) that you
could use to get the list of interfaces, shut down an interface, or configure an IP address on an
interface. The drafts are being written by the NETMOD working group, but it will take a while before
they make it to the RFC status and get implemented by major vendors.
Every single vendor that graced us with a NETCONF implementation thus uses its own proprietary
format within the NETCONFs XML envelope. In most cases, the vendor-specific part of the message
maps directly into existing CLI commands (in Junos case, the commands are XML-formatted
because Junos uses XML internally). Could I thus write a NETCONF application that would work with
Cisco IOS and Junos? Sure I could if Id implement a vendor-specific module for every device
family I plan to support in my application.

Copyright ipSpace.net 2014

Page 7-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHY WOULD YOU USE NETCONF?


Lets consider the alternatives: decades ago we configured network devices over Telnet sessions
using expect scripts simple automation scripts that would specify what one needs to send to the
device, and what response one should expect. You could implement the scripts with the original
expect tool, or with a scripting language like Tcl or Perl.
Using a standard protocol that provides clear message delineation (expect scripts were mainly
guesswork and could break with every software upgrade done on the networking devices) and error
reporting (another guesswork part of the expect scripts) is evidently a much more robust solution,
but its still too little and delivered way too slowly. What we need is a standard mechanism of
configuring a multi-vendor environment, not a better wrapper around existing CLI (although the
better wrapper does come handy).

Copyright ipSpace.net 2014

Page 7-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

NETCONF (or XMPP as used by Arista) operates solely on the management plane, making it an
interesting device configuration mechanism, but we might need more to implement something that
could rightfully be called SDN. This is my response (written in October of 2012) to SDN-washing
activities performed by a large data center vendor.

DEAR $VENDOR, NETCONF != SDN


Some vendors feeling the urge to SDN-wash their products claim that the ability to program them
through NETCONF (or XMPP or whatever other similar mechanism) makes them SDN-blessed.
There might be a yet-to-be-discovered vendor out there that creatively uses NETCONF to change the
device behavior in ways that cannot be achieved by CLI or GUI configuration, but most of them use
NETCONF as a reliable Expect script.
More precisely: what Ive seen being done with NETCONF or XMPP is executing CLI commands or
changing device (router, switch) configuration on-the-fly using a mechanism that is slightly more
reliable than a Perl script doing the same thing over an SSH session. Functionally its the same thing
as typing the exec-level or configuration commands manually (only a bit faster and with no autocorrect).
What's missing? Few examples: you cannot change the device behavior beyond the parameters
already programmed in its operating system (like you could with iRules on F5 BIG-IP). You cannot
implement new functionality (apart from trivial things like configuring and removing static routes or
packet/route filters). And yet some $vendors I respect call that SDN. Give me a break, I know you
can do better than that.

Copyright ipSpace.net 2014

Page 7-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Most NETCONF implementations dont allow you to go below the device configuration level. On the
other hand, OpenFlow by itself isnt enough to implement a self-sufficient SDN solution, as it doesnt
allow the controller to configure the initial state of the attached devices. In a solution that
implements novel forwarding functionality we might need both.

WE NEED BOTH OPENFLOW AND NETCONF


Every time I write about a simple use case that could benefit from OpenFlow, I invariably get a
comment along the lines of you can do that with NETCONF. Repeated often enough, such
comments might make an outside observer believe you dont need OpenFlow for Software Defined
Networking (SDN), which is simply not true. Here are at least three fundamental reasons why thats
the case.

CONTROL/DATA PLANE SEPARATION


Whether you need OpenFlow for SDN obviously depends on how you define SDN. The networking
components were defined by their software the moment they became smarter than cables
connecting individual hosts (around the time IBM launched 3705 in the seventies if not earlier), so
you definitely dont need OpenFlow to implement networking defined by software.
However, lame joking aside, the definition of SDN as promoted by the Open Networking Foundation
requires the separation of control and data planes, and you simply cant do that with NETCONF. If
anything, ForCES would be the right tool for the job, but youve not heard much about ForCES from

Copyright ipSpace.net 2014

Page 7-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

your favorite vendor, have you even though its development has been slowly progressing (or not,
depending on your point of view) for the last decade.

IMPLEMENTING NEW FUNCTIONALITY


NETCONF is a protocol that allows you to modify networking devices configuration. OpenFlow is a
protocol that allows you to modify its forwarding table. If you need to reconfigure a device,
NETCONF is the way to go. If you want to implement new functionality (whatever it is) that is not
easily configurable within the software your networking device is running, you better be able to
modify the forwarding plane directly.
There might be interesting things you could do through network device configuration with NETCONF
(installing route maps with policy-based routing, access lists or static MPLS in/out label mappings,
for example), but installing the same entries via OpenFlow would be way easier, simpler and (most
importantly) device- and vendor-independent.
For example, NETCONF has no standard mechanism you can use today to create and apply an ACL
to an interface. You can create an ACL on a Cisco IOS/XR/NX-OS or a Junos switch or router with
NETCONF, but the actual contents of the NETCONF message would be vendor-specific. To support
devices made by multiple vendors, youd have to implement vendor-specific functionality in your
NETCONF controller. On the contrary, you could install the same forwarding entries (with the DROP
action) through OpenFlow into any OpenFlow-enabled switch (the only question being whether
these entries would be executed in hardware or on the central CPU).

Copyright ipSpace.net 2014

Page 7-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

EPHEMERAL STATE
NETCONF protocol modifies device configuration. Whatever you configure with NETCONF appears in
the device configuration and can be saved from running configuration to permanent (or startup) one
when you decide to save the changes. You might not want that to happen if all you want to do is
apply a temporary ACL on an interface or create an MPLS-TP-like traffic engineering tunnel
(computed externally, not signaled through RSVP).
OpenFlow-created entries in the forwarding table are by definition temporary. They dont appear in
device configuration (and are probably fun to troubleshoot because they only appear in the
forwarding table) and are lost on device reload or link loss.

CAN WE DO IT WITHOUT NETCONF?


Given all of the above, can we implement SDN networks without NETCONF? Of course we can
assuming we go down the OpenFlow-only route, but not many users or vendors considering
OpenFlow are willing to do that (Google being one of the exceptions); most of them would like to
retain the field-proven smarts of their networking devices and augment them with additional
functionality configured through OpenFlow. In a real-life network, we will thus need both NETCONF
to configure the existing software running in networking devices (hopefully through standardized
messages in the not-too-distant future), and potentially OpenFlow to add new functionality where
needed.

Copyright ipSpace.net 2014

Page 7-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Not surprisingly, some vendors reacted to the SDN movement by launching their own proprietary
APIs. Ciscos OnePK is (in August 2014) by far the most comprehensive one.

CISCO ONE: MORE THAN JUST OPENFLOW/SDN


As expected, Cisco launched its programmable networks strategy (Cisco Open Networking
Environment ONE) at Cisco Live US ... and as we all hoped, it was more than just OpenFlow
support on Nexus 3000. It was also totally different from the usual we support OpenFlow on our
gear me-too announcements weve seen in the last few months.
One of the most important messages in the Ciscos ONE launch is OpenFlow is just a small part of
the big picture. Thats pretty obvious to anyone who tried to understand what OpenFlow is all about,
and weve heard that before, but realistic statements like this tend to get lost in all the hype
generated by OpenFlow zealots and industry press.

Copyright ipSpace.net 2014

Page 7-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-6: Cisco OnePK high level overview

The second, even more important message is lets not reinvent the wheel. Google might have the
needs and resources to write their own OpenFlow controllers, northbound API, and custom
applications on top of that API; the rest of us would just like to get our job done with minimum
hassle. To help us get there, Cisco plans to add One Platform Kit (onePK) API to IOS, IOS-XR and
NX-OS.

Copyright ipSpace.net 2014

Page 7-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-7: Cisco OnePK APIs

WHY IS ONEPK IMPORTANT?


You probably remember the OpenFlow is like x86 instruction set statement made by Kyle Forster
in 2011. Now, imagine youd like to write a small PERL script on top of x86 instruction set. You cant
do that, youre missing a whole stack in the middle the operating system, file system, user
authentication and authorization, shell, CLI utilities, PERL interpreter ... you get the picture.
OpenFlow has the same problem its useless without a controller with a northbound API, and
theres no standard northbound API at the moment. If I want to modify packet filters on my wireless
access point, or create a new traffic engineering tunnel, I have to start from scratch.
Thats where onePK comes in it gives you high-level APIs that allow you to inspect or modify the
behavior of the production-grade software you already have in your network. You dont have to deal

Copyright ipSpace.net 2014

Page 7-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

with low-level details, you can (hopefully we have to see the API first) focus on how getting your
job done.

OPEN OR PROPRIETARY?
No doubt the OpenFlow camp will be quick to claim onePK is proprietary. Of course it is, but so is
almost every other SDK or API in this industry. If you decide to develop an iOS application, you
cannot run it on Windows 7; if your orchestration software works with VMwares API, you cannot use
it to manage Hyper-V.
The real difference between networking and most of the other parts of the IT is that in networking
you have a choice. You can use onePK, in which case your application will only work with Cisco IOS
and its cousins, or you could write your own application stack (or use a third party one) using
OpenFlow to communicate with the networking gear. The choice is yours.

MORE DETAILS
You can get more details about Cisco ONE on Ciscos web site and its data center blog, and a
number of bloggers published really good reviews:

Derick Winkworth is underwear-throwing excited about Cisco ONE

Jason Edelman did an initial analysis of Ciscos SDN material and is waiting to see the results of
the Cisco ONE announcement.

Colin McNamaras blog post is a bit more product focused.

Copyright ipSpace.net 2014

Page 7-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Plexxi implemented an interesting controller-based architecture that combines smart autonomous


switches with a central controller. The fabric can work without the controller, but behaves better
when the controller is present.

THE PLEXXI CHALLENGE (OR: DONT BLAME THE


TOOLS)
Plexxi has an incredibly creative data center fabric solution: they paired data center switching with
CWDM optics, programmable ROADMs and controller-based traffic engineering to get something that
looks almost like distributed switched version of FDDI (or Token Ring for the FCoTR fans). Not
surprisingly, the tools we use to build traditional networks dont work well with their architecture.
In a recent blog post Marten Terpstra hinted at shortcomings of Shortest Path First (SPF) approach
used by every single modern routing algorithm. Lets take a closer look at why Plexxis engineers
couldnt use SPF.

ONE RING TO RULE THEM ALL


The cornerstone of Plexxi ring is the optical mesh thats automatically built between the switches.
Each switch can control 24 lambdas in the CWDM ring (8 lambdas pass through the switch) and uses
them to establish connectivity with (not so very) adjacent switches:

Four lambdas (40 Gbps) are used to connect to the adjacent (east and west) switch;

Two lambdas (20 Gbps) are used to connect to four additional switches in both directions.

Copyright ipSpace.net 2014

Page 7-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-8: The Plexxi optical network

Copyright ipSpace.net 2014

Page 7-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

The CWDM lambdas established by Plexxi switches build a chordal ring. Heres the topology you get
in a 25-node network:

Figure 7-9: Topology of a 25 node Plexxi ring

Copyright ipSpace.net 2014

Page 7-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

And heres how a 10-node topology would look like:

Figure 7-10: Topology of a 10 node Plexxi ring

The beauty of Plexxi ring is the ease of horizontal expansion: assuming you got the wiring right, all
you need to do to add a new ToR switch to the fabric is to disconnect a cable between two switches
and insert a new switch between them as shown in the next diagram. You could do it in a live
network if the network survives a short-term drop in fabric bandwidth while the CWDM ring is
reconfigured.

Copyright ipSpace.net 2014

Page 7-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-11: Adding a new Plexxi switch into an existing ring

FULL MESH SUCKS WITH SPF ROUTING


Now imagine youre running a shortest path routing protocol over a chordal ring topology. Smaller
chordal rings look exactly like a full mesh, and we know that a full mesh is the worst possible fabric
topology. You need non-SPF routing to get a reasonable bandwidth utilization and more than 20 (or
40) GBps of bandwidth between a pair of nodes.

Copyright ipSpace.net 2014

Page 7-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

There are at least two well-known solutions to the non-SPF routing challenge:

Central controllers (well known from SONET/SDH, Frame Relay and ATM days);

Distributed traffic engineering (thoroughly hated by anyone who had to operate a large MPLS TE
network close to its maximum capacity).

Plexxi decided to use a central controller, not to provision the virtual circuits (like we did in ATM
days) but to program the UCMP (Unequal Cost Multipath) forwarding entries in their switches.
Does that mean that we should forget all we know about routing algorithms and SPF-based ECMP
and rush into controller-based fabrics? Of course not. SPF and ECMP are just tools. They have wellknown characteristics and well understood use cases (for example, they work great in leaf-and-spine
fabrics). In other words, dont blame the hammer if you decided to buy screws instead of nails.

Copyright ipSpace.net 2014

Page 7-32

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web

In summer of 2012 IETF launched yet another working group to develop a protocol that could
interact with routers on the control plane. I2RS (initially called IRS) might be exactly what a resilient
SDN solution needs assuming it ever gets off the ground.

I2RS WHAT THE SDN GOLDILOCKS IS LOOKING FOR?


Most current SDNish tools are too cumbersome for everyday use: OpenFlow is too granular (the
controller interacts directly with the FIB or TCAM), and NETCONF is too coarse (it works on the
device configuration level and thus cannot be used to implement anything the networking device
cant already do). In many cases, wed like an external application to interact with the devices
routing table or routing protocols (similar to tracked static routes available in Cisco IOS, but without
the configuration hassle).
Interface to the Routing System (I2RS) is a new initiative that should provide just what we might
need in those cases. To learn more about IRS, you might want to read the problem statement and
framework drafts, view the slides presented at IETF84, or even join the irs-discuss mailing list.
Even if you dont want to know those details, but consider yourself a person interested in routing
and routing protocols, do read two excellent e-mails written by Russ White: in the first one he
explained how IRS might appear as yet another routing protocol and benefit from the existing
routing-table-related infrastructure (including admin distance and route redistribution), in the
second one he described several interesting use cases.
Is I2RS the SDN porridge we're looking for? Its way too early to tell (we need to see more than an
initial attempt to define the problem and the framework), but the idea is definitely promising.

Copyright ipSpace.net 2014

Page 7-33

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic (srdjan.milenkovicsm@gmail.com [109.121.110.253]). More information at http://www.ipSpace.net/Web