Node Probability Table Generation Method

A Method for Developing Node
Probability Table Using Qualitative

Value of Software Metrics
Authors: Chandan Kumar, Dr. D. K. Yadav
Presented By:
Chandan Kumar
chandan.ca@nitjsr.ac.in
Department of Computer
Applications
National Institute of Technology
Jamshedpur
CONTENTS
INTRODUCTION
PROPOSED METHOD
CONCLUSION
REFERENCES
INTRODUCTION
Bayesian belief network is the combination of
probability theory and graph theory. The uncertainty
of the system is modeled using probability theory and
the graph helps to indicate independence structure
that enables the probability distribution to be
decomposed into smaller pieces
Bayesian belief network is especially useful to
represent the modeling of an uncertainty and have
been applied successfully in various areas like:
medical diagnostic systems, Weather forecasting,
Project management, Signal processing, Software
engineering etc.
3
Cont
In every area where BBN is applied have been
proved that BBN is capable to represent the
uncertainty. However, there are two significant
barriers to build large scale BBN:
1) Building the causal relationships among the nodes
and
2) Development of NPTs.
) In Bayesian belief networks framework, the
independence structure in a joint distribution is
characterized by a directed acyclic graph (DAG),
with Nodes representing random variables and
edges representing causal relationships between
4
variables.
Cont
The causal relationships between variables are
defined by probability functions that receive input as
a set of values of the parent nodes and calculate the
given nodes probability. These probability functions
are commonly represented by tables - namely, node
probability tables (NPTs).
Ex:
Cont
Cont
Designing the NPT data is one of the fundamental
issues associated with the BBN. There are no
guidelines or rules that can be used to develop the
NPT data that is appropriate for all types of problems.
For example, manually defining NPT for Bayesian
belief networks is a complex task and takes
exponentially large effort.
Several methods have been proposed in the literature
to reduce this complex task of defining NPT
manually.
7
Cont
K. Huang and M. Henrion [13] proposed a method
known as NoisyOR. The disadvantage of this
method is that it applies only to the Boolean Nodes
and completely ignores the interaction effects
between variables.
F. J. Dez [14] proposed a method known as Noisy
MAX that is applied to the ranked Nodes with many
states. This method does not model the range of
relationship.
B. Das [15] proposed an algorithm to construct NPT
while easing the extent of knowledge acquisition with
this technique the development of the NPT is 8
Cont
by computing appropriate weighted functions of the
elicited distributions.
Fenton et al. [16] proposed an approach which is
based on the doubly truncated normal distribution.
In the literature, it is observed that the most of the
approaches/ techniques/ algorithms have considered a
particular type of statistical distribution. However, for
developing the NPT of a Node in BBN, the statistical
distribution of assessment may depend on the kind of
problem and the type of data available and it may
follow any type of probabilistic statistical distribution.
9
Cont
In fact, Node information is stored in the domain
expert in the form of knowledge that can be
determined through the qualitative value in the form
of low, medium and high and from the qualitative
value the corresponding probability can be generated
with less effort.
This approach does not follow any statistical
distribution and reflects a true probability distribution
of the Node behavior.
Therefore, in this paper, a new method is proposed to
develop the Node probability table using the
10
qualitative value of software metric.
PROPOSED METHOD
The node probability tables in a BBN model state the
strengths of uncertain relationships between the
factors. The number of probabilities required for each
node is shown in Eq. (1).
Where NP =Number of probabilities of the child node;

m = number of states of the child node; and n = number
of states of parent node.
11
Cont
In its simplest form when all the nodes have the same
number of states, then NP=nk, where the k = total
number of parents, Therefore, if n=2 and k=3, NP=8;
the number of probability increases exponentially
with the number of parent nodes as well the number
states of parent nodes.
12
Steps for obtaining the NPT

Step 1: Construct validation table of NPT with the help
of domain experts.
Substep (1a): Data Collection Assessment of
dependent variables in terms of High (H), Medium (M)
and Low (L) with weightgae is collected from the
domain experts in round wise (Round 1- 1/4th of total
assessment, Round 2- 1/2 of total assessment, Round 3complete assessment)
Substep (1b): Data Analysis - The data is collected in
round wise by the number of experts. Therefore, it is
necessary to check the consistency of their opinion.
13
Cont
If the difference of probabilities between the rounds is
0%, then the consistency of the domain experts is
perfect.
If the difference of probabilities between the rounds is
not equal to 0%, then the average value of the
probabilities of the entire round is taken.
Substep (1c): Estimate the complete set of NPT- Estimate
the complete assessment with weightage of dependent
variable using the interpolation.
14
Cont
Step 2: Generate probability value of the NPT from
random function.
Step 3: Update the randomized NPT obtained in step 2
until it approximately matches with the NPT of step 1.
15
An illustrative example
A BBN model of software development is shown in fig.
1
EDT- Experience of development team
CP- Capability of the programmer
QS- Quality of staff
DPF- Defined process followed
QPD- Quality of process development
EDP- Effort on development process
ODPE- Overall development process
effectively
Fig. 1
PODD- Probability of avoiding defect in

development
16
Cont
A subcomponent of a model is shown in fig. 2 where
experience of development team (EDT) metric and
capability of the programmer (CP) metric have called the
parents of the quality of staff (QS) metric. In Fig. 1 of
BBN, every node has three states (High, Medium and
Low). Therefore, the number of probability values for the
node quality of staff will be 9 (32=9).
Fig. 2
17
NPT Development
Step 1: Construct validation table of NPT with the help
of domain experts.
Substep (1a): Assessment of dependent variables in
terms of High (H), Medium (M) and Low (L) with
weightage is collected from the domain experts in
round wise. Domain experts applied the fuzzy rule for
the assessment. Here the fuzzy AND rules of the
IF_THEN have been used.
1. IF EDT is H and CP is H then expected QS is H
2. IF EDT is H and CP is M then expected QS is M
18
9. IF EDT is L and CP is L then expected QS is L
Cont
The collected assessment with weightage of dependent
variable by the domain experts in different rounds is
shown in Table I.
Table 1
EDT
High
CP
High
High
Round 1
Round 2
Round 3
Medium
High
Low
Medium
High
Medium
Medium
Medium
Low
Low
High
Low
Medium
Low
Low
19
Cont
Substep (1b): Data Analysis The collected data in
substep (1a) is analyzed and reproduced in table 2.
Table 2
EDT
CP
HIGH
MEDIUM
LOW
High
High
H (0.8)
--
L (0.05)
High
Medium
H (0.4)
M (0.5)
--
High
Low
H (0.1)
--
L (0.6)
Medium
High
--
M (0.2)
L (0.1)
Medium
Medium
H (0.1)
--
L (0.3)
Medium
Low
--
M (0.25)
L (0.7)
Low
High
H (0.25)
M (0.35)
--
Low
Medium
--
M (0.2)
L (0.7)
Low
Low
H (0.05)
--
L (0.85)
20
Cont
Substep (1c): Estimate the complete set of NPT through
interpolation. The estimated complete set of NPT is shown
in Table 3.
Table 3
EDT
High
High
High
Medium
Medium
Medium
Low
Low
Low
CP
High
Medium
Low
High
Medium
Low
High
Medium
Low
HIGH
H (0.8)
H (0.4)
H (0.1)
H (0.7)
H (0.1)
H (0.05)
H (0.25)
H (0.1)
H (0.05)
MEDIUM
M (.15)
M (0.5)
M (0.3)
M (0.2)
M (0.6)
M (0.25)
M (0.35)
M (0.2)
M (0.1)
LOW
L (0.05)
L (0.1)
L (0.6)
L (0.1)
L (0.3)
L (0.7)
L (0.40)
L (0.7)
L (0.85)
21
Cont
Step 2: Probabilistic value is generated from a random
function for the dependent node. The probabilistic value
obtained from a random function for the dependent node
is shown in Fig. 3.
Fig. 3
22
Cont
Step 3: Update the randomized NPT obtained in step 2
until it approximately matches with the NPT of step 1:
The NPT is revised by iteration of random function in
view of NPT obtained in step 1. Updated randomized NPT
is shown in Fig.
5.
Fig. 4
23
Conclusion
The node probability table has major impact on the
BBN. The uncertainty of the causal relationship is
quantified with the NPT. There is not any single
universal and practical method available in the
literature for constructing NPT for BBN. The proposed
methodology is more general and can be used for any
type of BBN problem because this methodology is the
collective of expert knowledge, fuzzy logic and random
functions on iteration basis. Therefore, this
methodology is capable to develop the NPT data of all
types of applications.
24
References
1) N. E. Fenton and M. Neil, A Critique of Software Defect Prediction Models,
IEEE Transactions on Software Engineering, vol. 25, pp. 675689, 1999
2) N. E. Fenton, M. Neil, W. March, P. Hearty, L. Radlinski and P. Krause, On the
effectiveness of early life cycle defect prediction with Bayesian Nets, Empirical
Software Engineering, vol. 13, pp. 499 537, 2008.
3) S. Amasaki, Y. Takagi, O. Mizuno and T. Kikuno, A bayesian belief network for
assessing the likelihood of fault content, Proceedings of 14th international
symposium on software reliability engineering (ISSRE), pp. 215-226, 2003.
4) Ganesh J. Pai and Joanne B. Dugan, Empirical analysis of software fault content
and fault proneness using Bayesian methods, IEEE Transaction on Software
Engineering, vol. 33, pp. 675- 686, 2007.
5) K. Dejaeger, T. Verbraken and B. Baesens, Toward Comprehensible Software
Fault Prediction Models Using Bayesian Network Classifiers, IEEE Transactions
on Software Engineering, vol. 39, pp. 237257, 2013.
6) Okutan and O. T. Yldz, Software defect prediction using Bayesian networks,
Empirical Software Engineering, vol. 19, pp. 154181, 2014.
7) N. E. Fenton and M. Neil, Predicting software quality using Bayesian belief
networks, Proceedings of 21st Annual Software Engineering Workshop, pp. 217
25
230, 1996.
References
8) S. Mohanta, G. Vinod, A. K. Ghosh, R. Mall, An Approach for Early Prediction
of Software Reliability, ACM SIGSOFT Software Engineering Notes, vol. 35, pp.
19, 2010.
9) S. Mohanta, G. Vinod and R. Mall, A technique for early prediction of software
reliability based on design metrics, International Journal of System Assurance
Engineering and Management, vol. 2, 261 281, 2011.
10) Nasir Majeed Awan and Adnan Khadem Alvi, Predicting software test effort in
iterative development using a dynamic Bayesian network, Master Thesis, School
of Engineering Blekinge Institute of Technology, 2010.
11) Yong Hu, Xiangzhou Zhang, E.W.T. Ngai, Ruichu Cai, Mei Liu, Software project
risk analysis using Bayesian networks with causality constraints, Decision
Support Systems, vol. 56, pp. 439-449, 2013.
12) P. Weber, G. Medina-Oliva, C. Simon, B. Iung, Overview on Bayesian networks
applications for dependability, risk analysis and maintenance areas, Engineering
Applications of Artificial Intelligence, vol. 25, pp. 671-682, 2012.
13) K. Huang and M. Henrion, Efficient Search-Based Inference for Noisy-OR
BeliefNetworks, Twelfth Conference on Uncertainty in Artificial Intelligence,
Portland, pp. 325-331, 1996.
26
References
13) F.J. Dez, Parameter adjustment in Bayes networks: the generalized noisy or-gate,
Proc. Ninth Conference on Uncertainty in Artificial Intelligence, D. Heckerman and
A. Mamdani, eds, pp. 99-105, 1993.
14) B. Das, Generating node probabilities for Bayesian networks: Easing the
knowledge acquisition problem, arXiv preprint cs/0411034, 2004.
15) N.E. Fenton, M. Neil and J.S. Caballero, Using Ranked Nodes to Model
Qualitative Judgments in Bayesian Networks, Proc. of IEEE Transactions on
Knowledge and Data Engineering, vol. 19, pp. 1420-1432, 2007.
16) Radjenovic D, Herico M, Torkar R and Zivkovic A, Software fault prediction
metrics: A systematic literature review, Information and Software Technology, vol.
55, pp. 13971418, 2013.
17) Radjenovic D, Herico M, Torkar R and Zivkovic A, Software fault prediction
metrics: A systematic literature review, Information and Software Technology, vol.
55, pp. 13971418, 2013.
18) Wang HJ, Khoshgoftaar TM, Liang QA, A study of software metric selection
techniques: stability analysis and defect prediction model performance,
International journal on artificial intelligence tools, vol. 22, pp. , 2013
27
References
19) Wang HJ, Khoshgoftaar TM, Wald R et al., A study on first order statistics- based
feature selection technique on software metric data, in: proceedings of the 25th
intl conf. on software engineering & knowledge engineering (SEKE13), Boston,
USA, pp. 467472, 2013.
20) Liu H and Yu L, Toward integrating feature selection algorithm for classification
and clustering, IEEE Trans. on knowledge and data engineering, vol. 27, pp. 491
502, 2005.
21) Peng H, Long F, Ding C, Feature selection based on mutual information criteria of
max- dependency, max-relevance, and min redundancy, IEEE Trans. on pattern
analysis and machine intelligence, vol. 27, pp. 12261238, 2005.
22) Chandan Kumar , D. K. Yadav, Software Quality Modeling using Metrics of Early
Artifacts, In Proc. of the Confluence 2013: The Next Generation Information
Technology Summit (4th International Conference) , Noida, India, 26 27
September, IET Publications, pp. 711, 2013.
23) D. K. Yadav, S. K. Charurvedi and R. B. Mishra, Early Software Defects Prediction
Using Fuzzy Logic, International Journal of Performability Engineering, vol. 8, pp.
399408, July 2012.
24) Timo Koski and John M. Noble (2009). Bayesian Networks An Introduction, John
28
Wiley & Sons, Ltd, UK, ISBN: 978-0-470-74304-1
Question. ?
29
30

Node Probability Table Generation Method

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Node Probability Table Generation Method

Hochgeladen von

Copyright:

Verfügbare Formate

A Method for Developing Node

Probability Table Using Qualitative

Where NP =Number of probabilities of the child node;

Steps for obtaining the NPT

PODD- Probability of avoiding defect in

Das könnte Ihnen auch gefallen