Sie sind auf Seite 1von 27

Masters Thesis, Mikko Nieminen

Espoo, February 14th, 2006

TROUBLESHOOTING IN LIVE
WCDMA NETWORKS

Supervisor: Professor Heikki


Hmminen
Background to the Study
The number of live WCDMA networks is growing
quickly.
The first commercial Third Generation
Partnership Project (3GPP) compliant network,
J-phone, was opened in December 2002.
By October of 2005, there were 80 live
commercial WCDMA networks and the amount
of subscribers was nearly 40 million. By that
time, around 140 licenses had been awarded
for WCDMA, the current WCDMA license holders
having more than 500 million subscribers in
their Second Generation (2G) networks.
Especially in Europe and Asia, WCDMA network
deployment after successful field trials and
service launches has entered a new critical
stage: the phase of network optimisation and
network troubleshooting.
Research Problem

As the amount of WCDMA subscribers quickly


increases, operators and equipment vendors are facing
big challenges in maintaining and troubleshooting their
networks.
We may raise the question of how one can efficiently
narrow down the root causes of the problems when there
is a huge amount of subscribers and traffic in a live
WCDMA network.
What are the principles of examination of the fault
scenarios and narrowing down the problem investigation
into logical manageable pieces?
Which are the tools and methods that are in practice used
in WCDMA network troubleshooting today?
In order tackle these questions and challenges, this
Thesis presents a Framework for KPI-triggered
troubleshooting in live WCDMA networks.
The applicability of the Framework is demonstrated by
applying it to a selection of real troubleshooting cases
that have occurred in commercial WCDMA networks.
Scope of the Study

This study concentrates on the KPI-triggered


problems in live WCDMA networks.
In general, the faults can be classified into three
categories
Critical, which are emergency problems that require
immediate actions,
Major (which we refer in this study as KPI-triggered
problems)
Minor which do not affect the services of the network.
The viewpoint of is from the equipment vendors
side, the main objective being to create guidelines
for troubleshooting experts and technical support
personnel of WCDMA network manufacturers in order
to perform troubleshooting and narrow the problems
down following a defined logic.
This Thesis mainly concentrates on WCDMA network
troubleshooting from a Radio Access Network
perspective. The reasoning behind this approach is
that the UTRAN covers most of the WCDMA specific
functionality and intelligence, and therefore brings
the majority of the troubleshooting challenges also.
Research Methods

This Thesis is mainly based on the study


of various technical specifications and
interviews of WCDMA network
troubleshooting experts.
The main literature sources are the 3GPP
specifications of release 99, since the
majority of the live WCDMA networks
were based on 3GPP release 99 during
the writing of this Thesis.
It can be noted that 3GPP release 4
networks are currently gaining foothold in
the live WCDMA networks. However,
there are only minor differences in the
Radio Access functionality of the afore-
mentioned two 3GPP specification
releases.
Structure of the Thesis

Introduction to WCDMA Networks


UTRAN Protocols
Call Trace Analysis
Key Performance Indicators
Framework for KPI-Triggered
Troubleshooting
Cases from Live WCDMA Networks
WCDMA network architecture

PSTN INTERNET

GMSC GGSN

AuC
CORE
HLR
NETWORK

EIR

MSC/VLR SGSN

UTRAN RNC RNC

Node B Node B Node B Node B


cell cell cell cell cell cell cell cell

UE
ME

USIM
UTRAN architecture

UTRAN
Iu-CS
Node B
3G
RNC MSC
Node B
Uu
Core
Iub Iur Network
(CN)
Node B
User Equipment
(UE) RNC SGSN
Node B
Iu-PS
UMTS Bearer Services

Stratum
Access
Radio Access Bearer

Non-
Signalling connection

RRC

RRC connection Iu connection

Stratum
Access
Radio bearer service Iu bearer service

: SAP

UE RAN CN
Uu Iu
Summary of Protocols (CS user
plane)
Uu Iub Iu

CS CS
application application
and and
coding coding

RLC RLC

Iu-UP Iu-UP
MAC MAC protocol protocol

FP FP

AAL2 AAL2 AAL2 AAL2


WCDMA WCDMA
L1 L1
ATM ATM ATM ATM

PDH/SDH PDH/SDH PDH/SDH PDH/SDH

UE Node B RNC MSC


Summary of Protocols (UE control
plane)

Uu Iub Iu

NAS NAS

RRC RRC RANAP RANAP

RLC RLC SCCP SCCP

MTP3b MTP3b
MAC MAC
SSCF-NNI SSCF-NNI
FP FP SSCOP SSCOP

AAL2 AAL2 AAL5 AAL5


WCDMA WCDMA
L1 L1
ATM ATM ATM ATM

PDH/SDH PDH/SDH PDH/SDH PDH/SDH

UE Node B RNC CN
Overview of WCDMA Call Setup

MT Call
MO Call

RRC Radio Access


User Plane
Paging Connection Bearer
Data Flow
Establishment Establishment
RRC connection establishment (DCH)
UE Node RNC
B
1. RRC CONNECTION REQUEST
RRC RRC

2. Admission
Control
3. RADIO LINK SETUP REQUEST
C-NBAP C-NBAP
4. Start
RX
5. RADIO LINK SETUP ESPONSE
C-NBAP C-NBAP

6. ESTABLISH REQUEST
ALCAP ALCAP

7. ESTABLISH CONFIRM
ALCAP ALCAP

8. UPLINK & DOWNLINK SYNC


FP FP

9. Start
TX
10. RRC CONNECTION SETUP
RRC RRC

11. L1 SYNCH

12. RL RESTORE INDICATION


D-NBAP D-NBAP

13. RRC CONNECTION SETUP COMPLETE


RRC RRC
Protocol Analysers

Company Product Home


Country
Nethawk [47] 3G Analyser Finland
Agilent [48] Signaling Analyzer United States
Tektronix [49] K15 United States
Radcom [50] Performer Analyser Israel
Acterna [51] Telecom Protocol Analyzer United States
RRC Connection Events and KPIs
UE RNC CN
RRC CONNECTION REQUEST
Event 1
Event 1RRC_CONN_ATT_EST
Setup phase incremented
RRC CONNECTION SETUP Event 2RRC_CONN_ATT_COMP
Event 2
incremented
Access phase Event 3RRC_CONN_ACC_COMP
RRC CONNECTION SETUP COMPLETE incremented
Event 3 Event 4RRC_CONN_ACT_COMP
Active phase incremented

Event 4
IU RELEASE COMMAND

Sum of RRC_CONN_STP_COMP
RRC Setup Complete Rate = x 100 %
Sum of RRC_CONN_STP_ATT

Sum of
RRC Establishment Complete Rate =
RRC_CONN_ACC_COMP x 100 %
Sum of RRC_CONN_STP_ATT

Sum of RRC_CONN_ACT_COMP
RRC Retainability Rate = x 100 %
Sum of
RRC_CONN_ACC_COMP
RRC connection Phases

Phase: Setup Access Active

Setup Access Active


complete Complete Complete

Success
Access Active
Release

Active
Attempts Failures RRC Drop

Access Failures

Setup Failures, Blocking


Other WCDMA network KPIs

Sum of RAB_STP_COMP
RAB Setup Complete Rate = x 100 %
Sum of RAB_STP_ATT

Sum of RAB_ACC_COMP
RAB Establishment Complete Rate = x 100 %
Sum of RAB_STP_ATT

Sum of RAB_ACT_COMP
RAB Retainability Rate = x 100 %
Sum of RAB_ACC_COMP

Sum of
CSSR RAB_ACC_COMP x 100 %
= Sum of RRC_CONN_STP_ATT

Sum of
CCSR RAB_ACT_COMP x 100 %
= Sum of RRC_CONN_STP_ATT
Fault Classification

Fault Class Description Examples


A-CRITICAL Critical (emergency duty System restart, all links down
Total or major contacted) problems severely Simultaneous restarts of active
outages that affect service, capacity/traffic, computer units
are not billing, and maintenance More than 50 per cent of traffic
avoidable capabilities and require handling capacity out of use
with a immediate corrective action,
regardless of time of day or day Subscriber related network element
workaround
of the week as viewed by the functionality is not working
solution.
operator.
B-MAJOR Major problems cause Capacity/quality related functionality is
The problem conditions that seriously affect not working as supposed to
leads to system performance, operation, Problems seriously affecting end user
degradation maintenance, and service, but avoidable with a workaround
of network administration and require solution
performance immediate attention as viewed Configuration changes (network, HW,
or the fault by the operator. and SW) are not working as supposed to
affects traffic The urgency is less than in Subscriber related functions are not
randomly. critical situations because of a working completely
lesser immediate or impending
effect on system performance, Performance measurement, alarm
customers, and the customers management or activation of a new
operation and revenue. feature fails
Single restart of computer units

C-MINOR Other problems that the Failures not seriously affecting traffic
Minor fault operator does does not view as Errors in operating commands syntax
not affecting critical or major are considered Cosmetic errors in operational
operation or minor. Minor problems do not commands or statistics output
service significantly impair the
functioning of the system or Minor errors in documentation
quality
affect the service to customers.
These problems are tolerable
during system use.
Framework for KPI-Triggered
Troubleshooting

Framework is designed for investigating and


soelving B-MAJOR level i.e. KPI-triggered
faults
Before applying the Framework
The general alarm status of the network has been
checked. No clear network alarms pointing to the
root cause of the fault can be detected.
Traces from external interfaces of RNC have been
taken with a protocol analyser in order to record the
fault scenario. Also RNC internal trace has been
taken when the fault took place.
The basic fault scenario has been analysed and
clarified.
A
Is the problem new in the operator network?

No Yes

B Perform simulation of the fault C


N New SW, HW, parameters, UE
in test bed.
Yes o model or feature introduced?
Does the fault still occur?

No Yes
Yes D E Perform simulation of the fault
Is the fault operator
with reference conditions.
specific?
No Does the fault still occur?

Yes No

F Has average network load increased G


significantly and/or does the Analyse and
problem occur at a specific time of day? investigate the
differences between
Yes No the working and faulty
H Use RNC Performance Tester to generate load conditions.
in test bed and perform analysis.

I Analyse the traces. Investigate fault scope.

J K L M N O P
CN RNC Node B Transmission Service Country UE
specific specific specific specific specific specific specific

Q
Analyse network element and interface specific alarms, parameters, capacity, logs
and traces. Take specific actions depending on problem scope
(refer to detailed Framework notes).

R
In case of MVI environment, check IOT results and contact foreign vendor.
Investigate own vendors default parameters and compare implementation
againts 3GPP specifications.
Compare own default parameters with other default parameters of other vendors.
Execute air interface protocol analysis and drive tests.
Case: Increased AMR call drop rate

A decrease in RAB Retainability Rate KPI for


AMR telephony service was experienced
during the last three months in an operator
network.
The decrease was around 2% on each RNC
compared to the time when the network was
performing well. Actions that had already
been taken with no positive effect:
Soft reset for all Node Bs and for all RNCs
Hard reset and re-commissioning of Node Bs
Alarms checked and no major alarms found
Case: Increased AMR call drop rate

A
I. Is the problem new in the operator network?

Yes

C
II. New SW, HW, parameters, UE
model or feature introduced?

Yes

E Perform simulation of the fault


III.
in reference conditions.
Does the fault still occur?

No

G
Analyse and
IV. investigate the
differences between
the working and faulty
conditions.
Case: Increased AMR call drop rate

Solution
The short term solution was that the
parameter for planned maximum
downlink transmission power of all
the Node Bs in the operator network
was changed to the default value of
34 dBm. In this way, the problem
disappeared in the operator network.
The long term solution was to
implement a fix of the bug into the
next software release of the Node B.
Results

As a result of thorough research


conducted for this Thesis, a Framework
for KPI-triggered troubleshooting for
live WCDMA networks was developed.
The Framework is mainly targeted for
WCDMA network equipment vendors,
to help them in solving major service
affecting faults occurring in the live
WCDMA networks of today.
Troubleshooting cases from live
WCDMA networks were solved using
the Framework developed, in order to
verify the results and test the
applicability and practicality of the
Framework.
Assessment of the results

The applicability and relevance of the


troubleshooting Framework was tested against three
different fault cases from live WCDMA networks.
The results were fairly promising since all the cases
were successfully solved by utilising the Framework.
The Framework was found to be quite practical and
suitable for solving KPI-triggered problems in live
WCDMA networks.
However, it must be taken into account that the
Framework was tested with a limited number of
cases, because of time and resource limitations. If
more extensive testing and verification with a large
number of cases would be applied, there is a
possibility that optimisations and improvements to
the Framework could be done.
Still, the basic logic of the Framework was proven
with reasonable relevance. The results presented in
this study can be easily tested in the future against a
number of cases in order to verify the results with
more extensive statistical reliability.
Exploitation of the results

The results of this study will be used as


source material in the development of
UTRAN troubleshooting competence
development and advanced learning
solution creation, targeted for
troubleshooting experts and customer
support engineers of one of the leading
WCDMA network equipment vendors.
Also, the results of the Thesis will be used
as an input in creation of customer
documentation for UTRAN
troubleshooting.
There is also an intention to further test
the relevance and reliability of the results
of this Thesis by applying it in the 24/7
RAN technical support operator service of
the equipment vendor in question.
Future Research

The significance of Performance Indicator


based troubleshooting is increasing
continuously in live WCDMA networks.
Once the PI and KPI specifications become
more mature, more extensive study of the
most relevant Performance Indicators used in
WCDMA network troubleshooting is essential.
Also, there is a need to develop a Framework
and logic for solving emergency problems in
WCDMA networks.
As the growth of complexity of
telecommunication networks increases,
effective and efficient troubleshooting
procedures are essential in order to manage
the diversity of network technologies and the
increasing quality requirements of the
operators.

Das könnte Ihnen auch gefallen