Beruflich Dokumente
Kultur Dokumente
highlights
• The concept ‘‘generalization latency’’, taking user experience as the starting point, is proposed. The main influential factors of ‘‘generalization latency’’
include air latency and latency from the core network to the application server.
• Mapping model between network wireless performance and air signaling latency, downlink throughput rate is established in depth in this paper.
• Modified Gaussian mixture model introducing temporal characteristics are researched, so as to detect the data-service of abnormal latency. This
algorithm introduces the temporal characteristic value, promoting detection accuracy.
article info a b s t r a c t
Article history: With the rapid development of Mobile Internet and the 4th Generation mobile communication technol-
Received 31 October 2017 ogy, data service has exceeded voice service, which has also become the important means for mobile
Received in revised form 24 January 2018 operators to promote shares in the communication market. Therefore, the service quality of data service
Accepted 12 February 2018
business will directly influence mobile user perception and satisfaction to network. With complicated
Available online 9 March 2018
process networking procedure is long in the data service process and the fundamental reasons of
Keywords: problems are relatively more difficult to position. During voice communication in mobile networks, there
Anomaly detection are relatively unitary important factors which can accept user perception such as call drop, network
Wireless network congestion and signal interference, etc. However, users’ perception towards data services is somewhat
Generalization latency different, which shows strong association with the usage scenarios of the various applications of users. For
User perception example, in the data browsing service, if terminal connection fails, the background will start the function
Temporal characteristics of automatic repeated connections, during which, latency is increased, so as to influence user perception of
data service latently. Besides, in the video service, initialization delay, stalling during the play and times
of stalling are also the factors which could affect video quality. The above analysis shows the latency
in the various data service processes and the usual network latency indicators, such as TCP three-way
handshake and DNS, etc. gathered and mapped into a total latency, which is the latency perception from
the perspective of user experience. In the current work, it is defined as generalization latency, which is
also known as the total latency covering latency for users to establish connection on the signaling control
plane and latency of user plane.
The first innovation of this paper is to establish a mapping model, where, generalization latency, which
is from the perspective of user using perception, is related to performance indicators of telecommunica-
tion network, under different data service characteristic scenarios, so as to forecast the inflection point of
network performance anomaly. The second innovation is to introduce the abnormally detection model for
generalization latency, so as to detect the performance stability of the application layer of the application
service plane.
© 2018 Elsevier B.V. All rights reserved.
1. Introduction
https://doi.org/10.1016/j.future.2018.02.022
0167-739X/© 2018 Elsevier B.V. All rights reserved.
10 Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18
Table 1
The network indicators of various SP server of netease news.
The signaling procedures Start node End node The start of signaling The end of the signaling
RRC connection UE eNodeB RRC connection request RRC connection setup complete
Service request UE MME Service request Initial context setup request
eRAB establishment MME eNodeB Initial context setup request Initial context setup response
DNS UE DNS server DNS query DNS query response
TCP (1 or 2 times) hand shake UE SP server SYN SYN ACK
TCP (1 or 2 times) hand shake UE SP server SYN ACK ACK
Service response UE SP server First get First response
Service latency UE SP server First get Response
Table 2
The network indicators of various SP server of netease news.
Host Service type Indicator 1 Indicator 2 Indicator 3
mimg.127.net Image 119.6880 76.5645 88.1134
comment.api.163.com Text 123.2254 77.1528 76.3300
war.163.com Gif 60.6628 79.3935 91.3400
c.m.163.com Text 24.3160 76.4377 86.3451
mm.bst.126.net Image 39.8235 76.6526 71.4561
imgsize.ph.126.net Image 15.1923 84.7826 92.4322
c.3g.163.com Text 135.0727 76.8205 87.8731
163.wrating.com Gif 241.2105 85.9929 79.5471
core network deployed in large scale, at the stage of making analysis’’ [5] is a kind of prediction model technology that re-
the mapping mode between wireless performance indica- searches on the relationship between dependent and independent
tors and air latency, and air signaling data shall be collected variables. However, there are some limitations and deficiencies in
for the model verification. In this paper, the air signaling existing regression algorithms both in regard to appropriateness
data collected from a small-range pilot area of a city in China range and regression effects. Thus, the method to avoid deficiencies
are adopted for data training and result verification. of existing algorithms so as to reduce errors and improve predic-
• Not only the duration for the air signaling to establish link tion accuracy remains a problem to be urgently solved. In view of
connection shall be considered, but the latency influence of this, this paper proposed a kind of modeling that applies ensemble
the air user plane, namely the downlink throughput shall be machine learning to correlation equations. □
considered. However, since the downlink throughput is not Ensemble learning is a technique, which uses multiple learners
only affected by cellular wireless network resource down- to solve the same problem and can significantly improve the gen-
link scheduling, but affected by SP service server compre- eralization ability and stability of learning systems. The greatest
hensively, when building the air model, the downlink rate advantage of ensemble learning lies in sufficient consideration
of the same kind of application service, such as SohuNews, and use of cognition differences of different learner on the same
NetEase News and WeChat, etc. may be selected for analysis. problem. Through the comprehensive decision by multiple learn-
ers, the problem can be understood more comprehensively. The
key to success of ensemble learning depends on balance between
3.1. Presentation of models accuracy and difference degree of individual learner. Thus, as for
the major difference between ensemble learning and individual
In this chapter, the mapping model between key network per- learning, ensemble learning not only contains the generation of
formance indicators and air Latency is established by taking ad- individual learning devices but also involves the possible interac-
vantages of ensemble machine learning method and on the basis tion effects between individuals and the combination of individual
of partial sample data of air signaling, core network signaling, prediction results.
wireless measurement report (MR) and network management in- Step 1: The construction process of individual learner
In research on the framework of ensemble regression learn-
dicator, and then the Air Latency is predicted according to key
ing algorithms, a number of researchers have proposed different
indicator data.
construction process to construct several learners for integration
from different perspectives, which can be mainly divided into the
3.1.1. Definition of correlation equations
following four kinds:
If WI refers to wireless performance indicator of telecommuni-
cation network; Latency refers to air latency; Throughout refers to • Sequential construction process: in the sequential construc-
downlink throughput capacity, correlation equations of air signal- tion process of individual learners, the learning of each
ing latency (formula (1)), air user latency (formula (2)) and wireless learner is conducted successively in order and the perfor-
performance indicator are established as follows. mance of the previous learning device can directly or indi-
rectly influence the learning of the latter learner. Thus, the
Latency = f (WI i ) i = 1, 2, 3 . . . to 7 (1)
construction process owns quite high efficiency in process-
Throughput = f (WI i ) i = 1, 2, 3 . . . to 7 (2) ing some specific modes. However, it has worse reliability,
which is mainly expressed in: when an error occurs to a
where, WIi refers to wireless performance indicator, its data orig-
certain learner in the construction process, the following
inates from full core network signaling, system data including
learners may be influenced by the error. The ensemble
wireless measurement report (MR) and network management in-
learning algorithm based on Boosting is the most typical
dicator and performance indicators including RSRP, RSRQ, SINR,
case.
CQI, MCS, PDSCH PRB and PUSCH PRB; Latency refers to air latency • Parallel construction process: in the parallel construction
and its data originates from core network signaling above interface process, each learner can independently accomplish the
S11 and air signaling (for instance, when RRC connection is con- learning of sample space in parallel and conduct the uniform
structed, Latency originates from air signaling); Throughput refers ensemble only at the final output stage. This method is the
to downlink throughput capacity; and its data comes from core one with the most theoretical and experimental research
network signaling above interface S11. and the broadest application in the framework of ensemble
learning algorithms. As for the major advantages, the con-
3.1.2. Ensemble machine learning struction process of each learner is independent from each
In network optimizing activities, there is always such a situation other with quite strong robustness and easy to paralleliza-
that a certain network problem depends on a few influencing tion. One of the most typical cases is bagging algorithm.
factors, that is, dependence relationship exists between one depen- • Selective construction process: through optimization selec-
dent variable and a few independent variables [4]. Furthermore, it tion, partial learners are selected from the set of initial
is hard to differentiate the primary of a few influencing factors or individual learner for ensemble in such construction process
the effects of some secondary factors still cannot be omitted. Such so as to reduce calculation time and improve generalization
kinds of problems are jointly named as ‘‘regression’’. ‘‘Regression ability.
Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18 13
The model proposed in this chapter applies parallel construc- In the formula, there are two consisting parts including a least
tion process and selective construction process for the construction square term and β 2 *λ, where β refers to the correlation coefficient.
process of individual learners, including the following six kinds of For the shrinkage parameter, it is added to the least square term to
regression algorithms: obtain an extremely low variance.
(1) Linear Regression (5) ElasticNet Regression
It is one of the modeling techniques that are the most famil- Similar to ridge regression, Lasso also penalizes the absolute
iar to people. Generally speaking, linear regression is one of the values of regression coefficients [8]. Additionally, it can reduce the
preferred techniques when people are learning prediction models. changing degree and improve the accuracy of linear regression
In this technique [6], dependent variables are continuous while model. ElasticNet is the combination of Lasso and Ridge regression
independent variables can be continuous or discrete. Regression techniques. It uses L1 regularization for training and regards L2 as
line is linear in nature. In linear regression, the best fitting straight regularization matrix in priority. When there are several correla-
line (that is, the regression line) is used to construct a kind of tion characteristics, ElasticNet is useful. Lasso randomly selects one
relationship between the dependent variable (Y ) and one or more of them while ElasticNet selects two.
independent variables (X ). A simple equation, Y = a + bX + e, is
β̂ = argmin ∥y − X β∥2 + λ2 ∥β∥2 + λ1 ∥β∥1 .
( )
used to express it, where ‘‘a’’ refers to the intercept, ‘‘b’’ refers to (6)
the slope of the straight line and ‘‘e’’ refers to the error term. The β
equation can be adopted to predict the values of targeted variables (6) Bayesian Linear Regression
according to the given prediction variables (X ). There are huge differences between Bayesian linear regres-
(2) Polynomial Regression sion model and classic linear regression model. The latter regards
Though linear regression predicts the overall tendency, under- regression coefficients as fixed unknown parameters while the
fitting phenomenon appears because linear regression solves for former regards regression coefficient as an unknown probability
the unbiased estimation of the minimum mean square deviation. distribution. Subsequently, these unknown distributions can be
As data fluctuates surrounding the straight line, polynomial regres- deduced according to available samples. In calculating the distri-
sion is introduced, allowing some deviations in the estimation, so bution of variables to be predicted, sampling should be conducted
as to reduce the prediction mean square deviation, which finally
when regression coefficients distribute in given independent vari-
appears to be a broken line. The formula of polynomial algorithm
ables so as to acquire the distribution of variables to be pre-
is listed as below:
dicted. Thus, generally speaking, the calculated amount of Bayesian
ŵ = (X T WX )−1 X T Wy . (3) model [8] training and prediction with such kinds of models is
usually larger than that of regular linear regression.
W refers to a quotation that endows weight for each data Step 2: Ensemble combination
point. Kernel functions are used to endow higher weight to the Ensemble combination [9] is accomplished through coordina-
neighboring points and the formula is as follows: tion and mutual compensation between individual learners. Vari-
⏐x(i) − x⏐ ous different combinations basically accord with the basic princi-
[⏐ ⏐]
w(i, i) = exp . (4) ple. When ensemble learning is applied to solve regression prob-
−2k2 lems, the most commonly-used combination is linear combination,
(3) Stepwise Regression which can be divided into simple mean combination and weighted
In processing multiple independent variables, regression of mean combination in detail. When it is applied to solve classi-
such form can be used. In such technique, the choice of inde- fication problems, the most commonly-used combination [10] is
pendent variables is accomplished in an automatic process that majority voting that includes relative majority voting and absolute
includes non-manual operation. Important variables are identified majority voting.
through values that are observed and counted, such as R-square, t- Besides the commonly-used ensemble combinations men-
stats and AIC indicator. Stepwise regression fits the model through tioned above, there are various other ensemble combinations in
simultaneous addition/deletion of covariant based on designated existing literature. Some scholars use learning algorithms, such as
standard. Such modeling technique aims at using the least number nervous network, to conduct re-learning on the new input space
of prediction variables to maximize the prediction ability, which is consisting of the prediction results of individual learners so as to
also one of the methods that process high dimensional datasets. realize the trainable nonlinear combination. Bayesian method and
(4) Ridge Regression Bayesian network method are also used for ensemble combination.
Ridge regression [7] is an algorithm for data with multi- Additionally, mixed-expert system allocates different learners to
collinearity (high correlation between independent variables). different local regions in problem space. Affiliated access network
Under the multicollinearity circumstance, though least square is used to determine which learners to choose each time, train-
method (OLS) fairly treats each variable, the differences are large, ing is necessary for learners and access networks. In this paper,
leading to deviation of observation values from the actual value. weighted combination and dynamic weighted combination are
Ridge regression is a kind of biased estimation regression method, used separately for ensemble combination and the effects of the
which is an improved least square estimation in essence. Through two combinations are evaluated and compared in the following
abandoning the unbiased nature of least square method, it is a chapters.
more actual and reliable regression method to acquire regression In the stage of ensemble combination (the prediction results of
coefficients at the price of losing partial information and reducing combination of individual learners), if the targeted output corre-
accuracy whose fitness with abnormal data is stronger than that sponding to input of xi is yi , the mapping relation, f : xi → yi ,
of least square method. In a linear equation, Y = a+ b1 x1 + exists between input and output. Through certain individual gen-
· · · + bn xn + e, e can be divided into two subcomponents, namely, erating methods, the given training dataset, D = {(xi , yi )}Ni=1 , trains
deviation and variance. Prediction errors may result from the two T learning devices, {f1 , f2 , . . . , fT }, to constitute the set of individual
subcomponents or either of them. Through the shrinkage param-
learners (F0 = {ft }Tt=1 ). Each learner in F0 is an approximation of
eter λ, ridge regression solves the multicollinearity problem. For
the function f . The output of ensemble learners can be expressed
detailed condition, please refer to the formula below:
by following equation:
β̂ = argmin∥y − X β∥22 λ ∥β∥22 . (5)
β∈Rp fˆ (xi ) = g(wt , ft (xi , yi )) (t = 1, 2, . . . , T ) (7)
14 Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18
If the sum of weighted values is limited to 1, the least and Fig. 3. The error rate of training and testing sets of eight kinds of algorithms.
optimal weighted value of MSE is:
T
1 ∑ Ctj−1
wt = ∑T ∑T (9)
N
j=1 k=1 j=1 Ckj−1
where,
N
1 ∑
Ctj = (yi − ft (x)) (yi − fj (xi )). (10)
N
I =1
If the weighted values are not limited, the least and optimal
weighted value of MSE should be defined as:
)−1
wt = hTit hit hTit yi .
(
(11)
where,
hit = ht (xi ) (1 ≤ i ≤ N , 1 ≤ t ≤ T ) , yi = f (x − i). (12) Fig. 4. The nonlinear relationship between some key indicators and Air Latency.
data-based services. Consumption of the services becomes more the probability of each record that belongs to the cluster Zi = k ∈
diversified, with data-based services such as web browsing, video {1, . . . , K } and the possibility is defined as αk,s . When potential
communication or streaming. And the type of content in the diver- cluster Z is known, time is no longer dependence. To simplify
sified services has different requirements for latency. Moreover, calculation, the independence hypothesis between network access
due to different traffic load at different time periods, the latency indicator data and time is conditionally independent if the cluster
will be different as well. When collecting big data of telecommu- category Z is known. Each variable (Xi |Zi = k) obeys the Gaussian
nications, both the traffic load and latency of data services at all distribution whose mean value is µk and variance is Σk ; for all
times. Therefore, temporal information will usually be collected i, P (Xi |Di , Zi ) = P (Xi |Zi ).
simultaneously. For example, a normal value of the peak hours may In order to solve the problem, the following decomposition is
not only be an abnormal value, but unable to be detected. When conducted (the independence hypothesis with known category Z
adding time characteristics to a model, periodic behavior on each is used for the sum of the first factor):
time axis is observed. To highlight the weight of different data- ∑
service indicators as well as the trend where the indicators change P(Xi |Di = s) = P(Xi |Zi = k)P(Zi = k|Di = s). (15)
over time, some anomaly detection algorithms based on single k
indicator, whose threshold value is designated, are not applicable EM algorithm is generally used for parameter estimation. Due
any longer. to the introduction of time feature Di = s into Gaussian mix-
As telecommunication data-service is affected by time, phys- ture model, the model parameter αk,s satisfies:
ical and data-service scenarios and various other factors. Based ∑
on such characteristics, modified Gaussian mixture model intro- αk,s = 1. (16)
ducing temporal characteristics are researched and developed, so k
as to mine the data-service of abnormal latency. This algorithm Probability density function can be expressed by weighting
has not only made full use of the characteristic of Mixtures of function:
Gaussians — adapted to complex scenarios, but has introduced
−1
the temporal characteristic value based on the model, promoting ∑ ∑
detection accuracy. P(Xi ) = αk,s Nk (Xi ; µk , (x − µk )). (17)
k k
4.1. Presentation of models N (x; µ, Σ ) is used to express the Gaussian density of parame-
ters µ and Σ . If time feature di = s, Is is used to express the set of
As researched in this paper, anomaly detection model [11] is a subscripts i.
new model that introduces the correlation between data features −1
and time properties based on Gaussian mixture model. Further- 1 ∑
Nk (Xi ; µk , Σk ) = exp[− (x − µk )T (x − µk )]. (18)
more, the relationship between potential cluster and time axis is 2
k
also introduced. Meanwhile, only the correlation inside the vari-
ables is considered. In other words, when potential cluster Z is In order to obtain the final parameter, the algorithm is described
known, time is no longer the dependence. To facilitate calculation, as below:
the independence hypothesis between data and time axis is con- Step 1: as for all k and i, when Xi = xi and Di = di , calculate the
ditionally independent if the cluster category Z is known. Here are possibility of Zi = k. If the posterior probability of αk,s is βk,s , then
the detailed model steps. K
∑
Network access indicator data of a certain telecom operators βk,s = Nk (Xi ; µk , Σk ) αk,di / N (Xi |µl , Σl ) αl,di . (19)
from December 15, 2015 to July 26, 2016 is taken as the example. l=1
Effective sample data is acquired after removing the zero and
Step 2: firstly, as for all k and s, calculate Sk,s
missing values of page display latency (feature 1), http success ratio
(feature 2) and tcp response success ratio (feature 3). The first fifty #IS
∑
thousand lines of the three indicators from June 29 to July 26 are Sk,s = βk,IS(j) . (20)
taken as the modeling sample. X refers to indicators dataset that j=1
contains N values whose subscript is i. Each value is p-dimensional
Step 3: as for all k and s, update the possibility αk,s
vectors and p is the number of features. In addition, each feature
value is assumed to be continuous; D refers to the category number (t )
Sk,s
of time axis sets that also contain N values. Because it is a periodic αk,s = ∑K (t )
. (21)
day and each value di corresponds to each hour of one day, values l=1 S l,s
are taken as {0, . . . , 23} (see Table 3). Step 4: as for all k, update the mean value µk :
Anomaly detection model proposed in this paper is based on ∑N
Gaussian mixture model that is the extension of a single Gaussian βk,i
µk = ∑iN=1 . (22)
l=1 βk,i
probability density function. If each point is generated by a single
Gaussian distribution, the group of data is generated by M single
Gaussian models [12]; it remains unknown which single Gaussian Step 5: as for all k, update covariance matrix:
model a specific data belongs to; the proportion αk of each single
∑N
(xi − µk )T (xi − µk ) βk,i
Gaussian model in mixture models is unknown; all the data points Σk = i=1
∑N . (23)
from different distribution are mixed together; the distribution is i=1 βk,i
named as Gaussian mixture model. EM algorithm is generally used As for the algorithm based on Gaussian mixture model [14]
to estimate the parameters of Gaussian mixture model [13]. that introduces time feature, there are three kinds of parameters.
The model stated in this paper is a kind of new model that Because the mean value µk (k = 1, . . . , K ) is the same for each
introduces the correlation between the data features and time D, there are K parameters in total; because the variance αk (k =
feature based on traditional Gaussian mixture model. As for each 1, . . . , K ) is the same for each D, there are K parameters in
∑total; if
s ∈{0, . . . , 23}, if Di = s, due to the introduction of time feature Di , the weight of each category αk,s = P(Zi = k | Di = s and k αk,s =
there is a correlation relationship between the time feature s and 1, there are D ∗ (K − 1) parameters in total. As for the model, there
16 Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18
Table 3
The start-stop signaling of the air signaling control plane and user plane.
Time Feature 1 Feature 2 Feature 3 X D
06/29 0:00 340 80.85 60.87 (340,80.85, 60.87) 0
06/29 0:30 321 84.54 89.21 (321,84.54, 89.21) 0
... ... ... ... ...
07/26 23:30 368 76.38 82.70 (368,76.38, 82.70) 23
[6] R. Li, Z. Wang, C. Gu, F. Li, H. Wu, A novel time-of-use tariff design basedon Zhensen Wu (M’97–SM’04) received the B.Sc. degree
Gaussian mixture model, Appl. Energy 162 (2016) 1530–1536. in applied physics from Xi’an Jiaotong University, Xi’an,
[7] D. Agarwal, Detecting anomalies in cross-classified streams: a Bayesian ap- China, in 1969 and the M.Sc. degree in space physics from
proach, Knowl. Inf. Syst. 11 (1) (2007) 29–44. Wuhan University, Wuhan, China, in 1981. He is currently
a Professor at Xidian University, Xi’an, China. From 1995 to
[8] L. Chen, J. Zheng, Selective transfer learning for cross domain recommendation,
2001, he was invited multiple times as a Visiting Professor
in: SDM. 2013, pp. 641-649.
to Rouen University, France, for implementing joint study
[9] M.T. Chiang, B. Mirkin, Intelligent choice of the number of clusters in k-
of two projects supported by the Sino-France Program for
means clustering: an experimental study with different cluster spreads, J.
Advanced Research. His research interests include elec-
Classification 27 (1) (2010) 3–40. tromagnetic and optical waves in random media, optical
[10] Alexi Delgado, I. Romero, Environmental conflict analysis using an integrated wave propagation and scattering, and ionospheric radio
grey clustering and entropy-weight method, 77(C), 2016, 108-121. propagation.
[11] A. Patcha, J.M. Park, An overview of anomaly detection techniques: Existing
solutions and latest technological trends, Comput. Netw. 51 (12) (2007) 3448–
3470. Yuanjian Zhu received the Bachelor’s degree in informa-
[12] A.K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognit. Lett. tion engineering from Jilin University and the M.S. degree
31 (8) (2010) 651–666. in electromagnetic and microwave technology from South
[13] D. Hsu, Identifying key variables and interactions in statistical models of East university, China, in 2007 and 2010, respectively.
building energy consumption using regularization, Energy 83 (4) (2015) 144
5.
[14] D. Wang, Integrated dynamic evaluation of depletion-drive performancein
naturally fractured-vuggy carbonate reservoirs using DPSOFCM clustering,
Fuel 181 (2016) 996–1010.
[15] J.D. Banfield, A.E. Raftery, Model-based Gaussian and non-Gaussian clustering,
Biometrics 49 (3) (1993) 803–821.
[16] Asiya Khan, Lingfen Sun, QoE prediction model and its application invideo
quality adaptation over UMTS networks, IEEE Trans. Multimedia 14 (2) (2012) Pei Zhang received the Bachelor’s degree in communica-
431–442. tion engineering and the M.S. degree in communication
and information systems from Nanjing University of posts
and telecommunications, China, in 2009 and 2012, respec-
tively.
Yan Wang received the Bachelor’s degree in communi-
cation engineering from Xidian University and the M.S.
degree in communication and information systems from
Huazhong university of science and technology, China, in
2004 and 2007, respectively. Since 2015, she has been
working towards the Ph.D. degree at Xidian University,
Xi’an, China. Her current work concerns the big data anal-
ysis of mobile communication.