Beruflich Dokumente
Kultur Dokumente
Outline
Objective / Intro [CF: 10 min.]
Traffic Matrix [CF/PL: 20 min., incl. pmacct]
pmacct
Introduction &
Objective
Objective
SLA enforcement
expressed as loss, latency and jitter availability targets
How to enforce
Ensure that capacity exceeds demands frequently enough to
achieve availability targets
Highlight: catastrophic events (multiple non-SRLG failures) may lead
to planned congestion. The planner decided not to plan enough
capacity for this event as the cost of such a solution outweights the
penality. A notion of probability and risk assessment is fundamental to
efficient capacity planning.
Output
Is Per-class Per-link OPF < a target threshold (e.g. 85%)?
If yes
then be happy
else either modify inputs
or the target output threshold
or accept the violation
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
Topology
Base topology is simple to collect
ISIS/OSPF LS Database
Dynamic MPLS-TE
Complex because non-deterministic
Static MPLS-TE
Simple: the planning tool computes the route of each TE LSP
Over-Provision Factor
Area of research
Common agreement that [80-90%] should be ok when
underlying capacity is >10Gbps
with some implicit assumptions on traffic being a large mix of
independent flows
10
Data:
BELL, AIX, MFN, NZIX
11
But fine because the availability target for Class2 is much looser than Class1
(eg. 99% vs 99.999%)
QoS allows to create excellent OPF for the Tightest-SLA classes at the
expense of the loosed-SLA classes.
More details in [Filsfils and Evans 2005] and in [Deploy QoS]
12
Traffic Matrix
13
1
14 4
B. Claise, Cisco
AS2
AS1
AR
AS3
AS4
AS5
AR
CR
CR
AR
Customers
AR
AR
CR
CR
PoP
AR
PoP
Server Farm 1
Customers
Server Farm 2
15
AS2
AS1
AR
AS3
AS4
AS5
AR
CR
CR
AR
Customers
AR
AR
CR
CR
PoP
AR
PoP
Server Farm 1
Customers
Server Farm 2
16
TE mesh
miss per-class information (except if multiple meshes for each
class, very rare today)
opex implication of operating a TE mesh
17
18
Demand Estimation
Goal: Derive Traffic Matrix (TM) from easy to
measure variables
6 Mbps
y: link utilizations
A: routing matrix
x: point-to-point demands
Solve: y = Ax
19
Individual demand
estimates can be inaccurate
See also [Zhang et al, 2004]: How to Compute Accurate Traffic Matrices
for Your Network in Seconds
Results show similar accuracy for AT&T IP backbone (AS 7018)
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
20
OAK
SJC
PAO
CHI
IAD
BWI
DCA
21
22
Regressed Measurements
Interface counters remain the most reliable and
relevant statistics
Collect LSP, Netflow, etc. stats as convenient
Can afford partial coverage
(e.g., one or two big PoPs)
more sparse sampling
(1:10000 or 1:50000 instead of 1:500 or 1:1000)
less frequent measurements
(hourly instead of by the minute)
23
pmacct
24
http://www.pmacct.net/
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
25
26
27
P
E
B
P
1
P
2
P
3
P
4
P
E
C
P
E
D
28
P
E
C
P
E
B
P
P
1
P
P
2
P
3
P
P
4
P
E
D
29
60
50
50
44.03
MB/peer
40
30
20
18.59
18.12
17.89
17.76
17.57
17.48
17.39
22.73
10
0
0
200
400
600
800
1000
1200
1400
30
sql_history: 5m
acct_5mins_%Y%m%d_%H (
id int(4) unsigned NOT NULL AUTO_INCREMENT,
as_src int(4) unsigned NOT NULL,
as_dst int(4) unsigned NOT NULL,
peer_as_src int(4) unsigned NOT NULL,
peer_as_dst int(4) unsigned NOT NULL,
peer_ip_src char(15) NOT NULL,
peer_ip_dst char(15) NOT NULL,
packets int(10) unsigned NOT NULL,
bytes bigint(20) unsigned NOT NULL,
stamp_inserted datetime NOT NULL,
stamp_updated datetime DEFAULT NULL,
[ ] );
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
acct_5mins_YYYYMMDD_12
acct_5mins_YYYYMMDD_09
31
P
E
A
P
E
B
P
P
1
cluster1_YYYYMMDD_HH
P
3
P
E
C
cluster3_YYMMDD_HH
P
P
2
cluster2_YYYYMMDD_HH
P
P
4
P
E
D
cluster4_YYYYMMDD_HH
32
Still on scalability
A single collector might not fit it all:
Memory: cant store all BGP full routing tables
CPU: cant cope with the pace of telemetry export
33
FROM <table>
WHERE peer_ip_src = < Pi | PEj > AND
iface_in = k AND
stamp_inserted = < today | last hour | last 5 mins >
[ GROUP BY ];
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
34
Further information
http://www.pmacct.net/lucente_pmacct_uknof14.pdf
AS-PATH radius, Communities filter, asymmetric routing
Entities on the provider IP address space
http://www.pmacct.net/building_traffic_matrices_n49.pdf
http://www.pmacct.net/pmacct_peering_epf5.pdf
Building traffic matrices to support peering decisions
http://wiki.pmacct.net/OfficialExamples
Quick-start guide to setup a NetFlow/sFlow+BGP collector instance,
implementation notes, etc.
35
Network Planning
36
Online
(SNMP,)
Strategic Planning
Architecture
& Engineering
(Days to Months)
Design Analysis
Failure Analysis
Strategic TE
RFO Analysis
Operations
(Minutes to Hours)
Infrastructure
Monitoring
Tactical TE
37
(Core Topologies)
38
N:1 Savings
versus
versus
1:1 Protection
$100 carrying capacity requires
$200 expenditure
2:1
$100 carrying capacity requires
$150 expenditure
15%-20% in practice
39
N:1 Costs
Physical diversity not present/cheap
However, usually present at high traffic points (e.g., no diversity
in far away provinces but yes in capital regions)
Engineering/architecture considerations
E.g., how effectively balance traffic
Planning considerations
Subject of this talk
40
Planning Methodologies
Monitoring per link statistics doesnt cut it
Planning needs to be topology aware
41
Failure Planning
Scenario: Planning receives traffic projections, wants to
determine what buildout is necessary
Worst case view
Perform
topology WhatIf analysis
42
Congestion relieved
43
Simulate results
44
Optimization/
Traffic Engineering
45
Network Optimization
Network Optimization encompasses network
engineering and traffic engineering
Network engineering
46
Strategic or tactical?
How often to re-optimise?
If strategic MPLS TE chosen:
Core or edge mesh
Statically (explicit) or dynamically established tunnels
Tunnel sizing
Online or offline optimization
Traffic sloshing
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
47
R1
R4
R7
R8
R2
R3
R5
R6
48
Either
minimizing maximum utilization
in normal working (non-failure)
case
Or
OC48
OC3
C
OC48
OC12
OC48
A
OC48
OC3
C
OC48
OC12
OC12
OC48
OC12
OC12
OC48
49
50
R1
R4
1
R2
R8
R3 1
R5
R7
R6
R1
R4
1
R2
2
R3
R7
1
R8
1
R5
R6
51
52
Connect Existing
Regional Networks
Anonymized (by
permission)
53
Thickness ~ Speed
Color ~ Utilization
Yellow 50%
Red 100%
Arrows ~ Routes
Solid ~ Normal
X ~ Failure Location
54
55
Under Failure
traffic shifted back
North
56
57
Share backup
capacity: outer circuits
fail into central ones
Change 16 metrics
Remove congestion
Normal
(121% -> 72%)
Worst case
link failure
(131% -> 86%)
58
Optimized metrics
can also be
deployed in an
MPLS network
e.g. LDP networks
100
90
80
(theoretically optimal max utilization)/max utilization
Single set of
metrics achieves
80-95% of
theoretical best
across failures
70
60
50
40
30
20
10
0
Network A
Network B
Network C
Network D
Optimized Metrics
Network E
Network F
US WAN
Demo
Optimized Explicit (Primary + Secondary)
59
60
Tunnel Sizing
Tunnel sizing is key
Needless congestion if actual load >> reserved bandwidth
Needless tunnel rejection if reservation >> actual load
Enough capacity for actual load but not for the tunnel
reservation
61
Tunnel Sizing
Online vs. offline sizing:
Offline sizing
62
Tunnel Sizing
When to re-optimise?
Event driven optimisation, e.g. on link or node failures
Wont re-optimise due to tunnel changes
Periodically
Tunnel churn if optimisation periodicity high
Inefficiencies if periodicity too low
Can be online or offline
63
64
Traffic sloshing
Tunnel #1
1
1
1
Y
1
Tunnel #2
In normal case:
For traffic from X Y, router X IGP will see best path via router A
Tunnel #1 will be sized for X Y demand
65
Traffic sloshing
1
1
1
1
1
Tunnel #1
X
1
D
2
1
Tunnel #2
Y
1
66
Traffic sloshing
1
1
1
C
1
1
Tunnel #1
X
1
D
2
1
Tunnel #2
Y
1
Holistic view of traffic demands (core traffic matrix) and routing (in failures
if necessary) is necessary to understand impact of TE
67
Traffic sloshing
1
1
1
1
1
Tunnel #1
X
1
D
2
1
Tunnel #2
Y
1
68
Topology:
highly meshed
(*) Presented at TERENA Networking Conference, June 2004
69
Optimized Metrics
Normal Utilizations
no failures
Optimized: BLUE
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
70
core network
263 scenarios
Results:
Delay-based metrics cause
congestions
CSPF fills links to 100%
Metric Optimization achieves
<90% worst-case utilizations
71
72
TE Case Study 3
Anonymous network
TE Options:
Dynamic MPLS
Mesh of CSPF tunnels in the core network
Sloshing causes congestion under failure scenarios
Metric Based TE
Explicit Pri. + Sec. LSPs
Failures Considered
Single-circuit, circuit+SRLG, circuit+SRLG+Node
Plot is for single-circuit failures
73
* Metric-Based TE
o Explicit Pri. + Sec.
74
* Metric-Based TE
o Explicit Pri. + Sec.
75
MPLS TE
Dynamic tunnels
Very efficient
Requires complex solutions to deploy
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
76
77
D2
E
(Eq1)
78
79
Benefits
Simple
the router computes everything automatically
<50msec
pre-computed and pre-installed
prefix-independent
Leverage IOS-XR Hierarchical dataplane FIB
Deployment friendly
no IETF protocol change, no interop testing, incremental
deployment
80
Benefits
Good Scaling
No degradation on IGP convergence for primary paths
Capacity Planning
Node Protection (Guaranteed or De Facto)
an LFA can be chosen on the basis of the guaranteed-node protection
simulation indicate that most link-based LFAs anyway avoid the node
(ie. De Facto Node Protection)
81
Constraints
Topology dependent
availability of a backup path depends on topology
Is there a neighbor which meets Eq1?
82
Deployment
Target <sec
LFA
Applicability?
BB
Target
<50msec
If no: TE FRR
is better
Topology
Optimization
Edge
Sweet spot
for LFA!
draft-ietf-rtgwg-lfa-applicability-00
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
83
Backbone Applicability
Based on ~10 SP backbone topologies
Link LFA: 70% of the links are protected
Prefix LFA: 94% of the prefixes across all links are protected
84
Access/Aggregation Topologies
Zero u-Loop
Zero u-Loop
85
A reference to
consult if
interested
Slight modification
to slide 17. The
solution will be
called Remote
LFA and an ietf
draft should be
released in the
next weeks.
86
IP/Optical Integration
87
SRLG
To backup R1R4, R2 or R3?
R1
O1
O2
O3
O4
O5
R2
R3
R4
R5
88
R1
Link O1-123
Circuit ID
O1
O2
O3
O4
O5
R2
R3
R4
R5
Link O1-123
O1
troubleshooting
disjointness
O2
O3
O4
O5
89
90
91
A final example
92
Network Design
For Mobile backbone network
Fictional (German) topology
IP over optical
Topology:
IP/Optical
6 core sites
2010 Cisco Systems, Inc./Cariden Technologies, Inc..
93
N:1 protection
IP Design Rules
2 P-routers in core sites, 2 PE-routers in all sites
E(dge)-routers prepresent traffic sources (behind PEs)
94
95
Note: fiber
used for more
than one
circuit around
Frankfurt
96
SRLGs on IP layer
97
Move
DusseldorfStuttgart away
from Frankfurt
Move
DusseldorfFrankfurt away
from Cologne
98
Diverse!
99
3. etc
100
Final IP topology
Highest utilization
due to any circuit
or SRLG failure is
90%
Saving of 20%
due to diversity of
DusseldorfFrankfurt and
DusseldorfStuttgart
101
IPFRR LFAs
75% of
interface
traffic has an
LFA available
Some intersite links are
not protected
due to ring
topology
102
LFA applicability
draft section
3.3: Square
103
Conclusion
104
Conclusion
Capacity Planning is essential for enforcing SLA with min capacity
Router vendors to provide input data
Traffic Matrix (neftlow v9)
105
References
[Filsfils and Evans 2005]
Clarence Filsfils and John Evans, "Deploying Diffserv in IP/MPLS Backbone Networks for Tight SLA Control", IEEE Internet
Computing*, vol. 9, no. 1, January 2005, pp. 58-65
http://www.employees.org/~jevans/papers.html
[Deploy QoS]
Deploying IP and MPLS QoS for multiservice networks: theory and practice, By John Evans, Clarence Filsfils
http://books.google.be/books?id=r6121tRwA6sC&pg=PA76&lpg=PA76&dq=book+deploying+qos+sp+filsfils&source=bl&ots=xau
vtXLg3X&sig=f1NGddiXrZ_FAA3ZbRtoxVDiwPc&hl=en&ei=grDaTL6nBY3CsAOOsoHIBw&sa=X&oi=book_result&ct=result&res
num=1&ved=0CBUQ6AEwAA#v=onepage&q&f=false
[Telkamp 2003]
Thomas Telkamp, Backbone Traffic Management, Asia Pacific IP Experts Conference (Cisco), November 4th, 2003, Shanghai,
P.R. China
http://www.cariden.com/technology/white_papers/entry/backbone_traffic_management
[Vardi 1996]
Y. Vardi. Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data. J.of the American Statistical
Association, pages 365377, 1996.
106
References
[Gunnar et al.]
Anders Gunnar (SICS), Mikael Johansson (KTH), Thomas Telkamp (Global Crossing). Traffic Matrix Estimation on a Large IP
Backbone - A Comparison on Real Data
http://www.cariden.com/technology/white_papers/entry/traffic_matrix_estimation_on_a_large_ip_backbone__a_comparison_on_real_dat
[Telkamp 2009]
How Full is Full?, DENOG 2009
http://www.cariden.com/technology/white_papers/entry/how_full_is_full
[Maghbouleh 2002]
Arman Maghbouleh, Metric-Based Traffic Engineering: Panacea or Snake Oil? A Real-World Study,
NANOG 26, October 2002, Phoenix
http://www.cariden.com/technologies/papers.html
[Cao 2004]
Jin Cao, William S. Cleveland, and Don X. Sun. Bandwidth Estimation for Best-Effort Internet Traffic
Statist. Sci. Volume 19, Number 3 (2004), 518-543.
107