Sie sind auf Seite 1von 8

Clarity Guided Belief Revision for Domain Knowledge Recovery

in Legacy Systems

Yang Li and Hongji Yang William Chu


Computer Science Department Computer Science Department
De Montfort University TungHai University
England Taiwan
E-mail: yangli, hyang@strl.dmu.ac.uk chu@cis.thu.edu.tw

Abstract of their time developing an understanding of the software


being maintained [15]. One of the primary reasons for this
Program understanding is the process of acquiring is that software documentation is often inadequate or unre-
knowledge from a computer program. Although research liable. As a result, source code becomes the only reliable
work utilising knowledge engineering techniques has been analysis source for developing understanding [2].
undertaken in this field, it is our observation that a thor- Techniques ranging from syntax analysis, through struc-
ough application of AI methodology has not been suffi- tural analysis, to Domain Knowledge Base Analysis
ciently explored. In this paper, we present a clarity guided (DKBA) are integrated in Computer Aided Reverse Engi-
belief revision approach to domain knowledge recovery in neering (CARE) tools to help maintainers with program
legacy software systems. Novel solutions are given to three understanding. Due to the interest and space limit of
key AI issues in the context of domain knowledge recovery this paper, we discuss the third issue only. Traditional
from source code: knowledge representation, where con- DKBA methods are heavy-weighted where the effective-
crete semantic network is separated from abstract semantic ness of domain knowledge recovery relies heavily on the
network to better accommodate uncertainty reasoning and use of knowledge at different abstract layer and therefore
propagation; uncertainty reasoning, which borrows ideas hinders the efficiency of these methods. Table 1 lists re-
from confirmation theory and recasts them in the context lated work on DKBA, categorised by typical knowledge
of semantic network reasoning; heuristic search, which is engineering issues. It is our observation that important is-
designed on the principle of programming psychology. Our sues such as knowledge representation, uncertainty reason-
approach is light-weighted. It can be used stand-alone or as ing and program space management have not been suffi-
a complement to traditional heavy-weighted domain knowl- ciently addressed in the context of domain knowledge re-
edge recovery methods. covery from source code. In this paper, we present a light-
weighted approach to DKBA where a suit of novel solutions
Keywords : program understanding, knowledge re- are given to these three issues.
covery, semantic network, belief revision, heuristic The rest of the paper is organised as follows: in Sec-
search, programming psychology tion 2, a variant of semantic network used as a supporting
knowledge representation in our system is introduced. In
Section 3, a novel belief revision model for semantic net-
1. Introduction work is illustrated which recasts the ideas borrowed from
confirmation theory in our new context. The issues of be-
After decades’ software development, software indus- lief initialisation and belief propagation are also discussed.
try has arrived at a point where the majority of software Section 4 introduces a new idea on heuristic search, where
engineering effort is being spent on maintaining existing programming psychology is used as an efficient way to de-
systems rather than developing new ones. Domain knowl- cide good searching path at singular-way belief propagation
edge, the core of software, which has been accumulated stage. A comparison between our work and related work is
over years reaches a certainty degree of “saturation”. Cur- given in Section 5. Finally, we summarise the paper and a
rent practice suggests that software maintainers spend most future plan.
Knowledge Representation plan [1, 3, 9, 10, 11, 13, 14, 18, 19], semantic/connectionist network [4],
graph (chart) [21, 22],tree/outline/hierarchy [1, 3]
Reasoning Techniques classic reasoning [1, 3, 9, 10, 11, 13, 14, 19], uncertainty reasoning [4],
inductive reasoning [6]
Control Strategies for Reasoning bottom-up [4, 9, 10, 13, 19], top-down [4], dynamic programming/hybrid
search [11], flexible/multi-purpose application [21]
Knowledge Base Management hierarchy [1]
Program Space Management dominance tree [5]
Assessment Environment MACS [7], Medona [12]
Embedded in Other Tasks transformation [8, 20], bug-seeking [10]

Table 1. A Survey of DKBA research categorised by Knowledge Engineering Issues

inter-relationships examples
objects-objects instance-of, part-of, etc.
objects-actions receiver-of, sender-of.
actions-actions sub-plan-of, precedent-of, etc.

Table 2. Inter-relationship examples among nodes in semantic network

2. Two-layer Semantic Network and inter-relationships among them. Each concrete seman-
tic sub-net is associated with a single knowledge slice. Ab-
stract semantic network contains only domain concepts and
We use semantic network as domain knowledge repre-
links to corresponding concrete semantic sub-nets. It acts
sentation in our approach. The definition of semantic net-
as a connection among different knowledge slices and pro-
work in our context is described as follows:
vides an infrastructure for belief propagation. Concrete se-
Let S N be a semantic network, S N = (N; E ), where mantic network is, on the other hand, used to accommodate
N denotes the total set of concepts and E denotes the to-
belief revision. Figure 1 shows an abstract semantic net-
tal set of inter-relationships among these concepts. N is
work in telecommunication domain, whereas, Figure 2 is
classified into two kinds of concepts, namely, objects and an example of concrete semantic sub-net.
actions. Object represents class, instance, features, etc.,
whereas action represents operation or event which oc-
cur among several objects. E is therefore classified into 3. Belief Initialisation, Revision and Propaga-
the inter-relationships between objects and objects, objects tion
and actions, actions and actions. Table 2 describes inter-
relationship examples in each category. In this section, we introduce the uncertainty reasoning
We introduce a new concept called domain knowledge method used in our approach. The uncertainty reasoning
slice in the context of DKBA. A domain knowledge slice is model is built on the axiom system of confirmation theory
defined as a set of strongly related domain concepts linked [17] for its simplicity to use. The calculus system of con-
by a set of inter-relationships among these concepts. Mul- firmation theory has been completely adapted in our new
tiple domain knowledge slices could exist for a single set context. Three topics involved are discussed, namely, belief
of domain concepts depending on the number of different initialisation, belief revision and belief propagation.
group of inter-relationships among these concepts. Domain
knowledge is therefore regarded as a collection of domain 3.1. Belief Initialisation
knowledge slices which are linked with each other through
common concepts. A pre-defined semantic network exists in knowledge
In order to accommodate this idea, we change classic se- base. A working semantic network is held in black-
mantic network into two-layer network, namely, concrete board, which is obtained by checking concepts and inter-
semantic network and abstract semantic network. Concrete relationships in knowledge base with the source code be-
semantic network contains detailed information on concepts ing analysed. Once a match is found between a concept
AA1

Transfer

AO1

Source Destination File

OO4 OO5 OO6

Connect Server Communicate


AO2

Client AA3 AA4


AA2
OO3
Accept AO7 Client-Information Record

AO6 AO3
Listen OO2 Send
AO4
Bind Server-Information
AO5
Receive
Allocate Socket

AO8 OO1

Domain-Name Host-Name IP-Address

Figure 1. An example of Abstract Semantic Network in Telecommunication Domain

Connect part-of
Accept

part-of

part-of part-of precedent-of

Listen

Allocate

precedent-of precedent-of
Bind

Figure 2. An example of concrete semantic sub-net

and a name in the code, the linkage between the concept individable lexicon unit whereas a compound name consti-
and its location in the code is recorded in the blackboard. tutes several atomic names. Two classifications are given to
The check of an inter-relationship is performed in the pro- atomic name recovery rules, namely, regular atomic name
gram region constrained by the participating concepts of recovery rules and irregular atomic name recovery rules.
the inter-relationship. Moreover, to avoid the combinato- One example of regular atomic name recovery rule is: “first
rial explosion of search in program space, the search for three letters (0.15)” which means if a name in the code is
an inter-relationship is only carried out in specific program matched with the first three letters of a concept, the name
constructs. For example, the search for inherit relation is can be linked with this concept with an assigned belief,
only carried out in the following structure: 0.15. Irregular abbreviation usually comes from the pronun-
ciation of a word which is difficult to write standard rules
class Clock : public Click {
for it and therefore different irregular abbreviation cases are
} collected. For example:
========================================
The initial belief of a concept in the blackboard comes Atomic Name irregular cases
from the matching degree between a concept and a name ----------------------------------------
Information info (0.90), infor (0.85)
in the source code being analysed. In real world, pro- ========================================
grammers often write abbreviation of meaningful names in
source code, which leaves ambiguity to these names. We If a name in the code is matched with one of a concept’s ir-
have a database called name dictionary to contain name re- regular cases directly, a quantified linkage between the con-
covery rules. cept and the name can be set up.
A single domain concept can be represented as either an A single compound name is composed of several atomic
atomic name or a compound name. An atomic name is an names with connecting symbols like ’-’ or ’ ’, etc. The am-
biguity of a compound name mainly comes from the am- of a particular knowledge slice to a program. We hereby in-
biguity of each composite atomic name. We also give two troduce two concepts, namely, Contribution Strength (C S )
classifications to compound name recovery rules, namely, and Refutation Strength (RS ). C S indicates the strength
regular compound name recovery rule and irregular com- the presentation of a domain concept or an inter-relationship
pound name recovery rules. The regular compound name has to build up positive belief in the context of a knowledge
recovery rule will check each of its composite name by slice. RS indicates the destructing power the absence of
using atomic name recovery rules; the belief of the whole a domain concept or an inter-relationship has to refute the
name is assigned to the minimum belief among all its com- context of a knowledge slice. Both C S and RS are real
posite atomic names. Irregular compound name abbrevi- numbers belonging to [0; 1:0]. By default, C S and RS are
ations are commonly used abbreviations without standard set to 0.
rules. Irregular abbreviations are also collected, for exam- Let S N = fc1 ; :::; cm ; r1 ; :::; rk g be a domain knowl-
ple, edge slice, where ci ; 1  i  m and rj ; 1  j  k denote
domain concepts and inter-relationships in the knowledge
slice respectively; f(C S (i); RS (i))j1  i  m + k g be
===============================================
Compound Name Composite Names Irregular Cases
----------------------------------------------- the contribution strength and refutation strength of each el-
ement in S N respectively; fC F (i)j1  i  m + k g be
Domain-Name Domain, Name DN (0.7)
===============================================
the current belief of each element in S N ; fC F 0 (i)j1 
The check of an irregular compound name is done by i  m + k g be the prepared belief of each element in S N ;
matching its irregular cases directly. AU be the authenticity degree of S N . The calculation of
For both atomic names and compound names, we give 0 0 0
C F (i) is: C F (i) = C F (i), if i is a concept; C F (i) =
relatively higher belief to irregular name recovery rules than the production of beliefs of all participating concepts, if i
to regular name recovery rules, which is accorded with is an inter-relationship. The algorithm of calculating AU is
common-sense. Nevertheless, name recovery is an exhaus- described as follows:
tive process where direct matching, irregular name recov-
===================================================
ery rules and regular name recovery rules have to be tested Algorithm for calculating Authenticity of context
without exception. ---------------------------------------------------
MB <- 0
The belief of an inter-relationship is dependent on the FOR i <- 1 TO m+k DO
belief of its participating concepts. We therefore need not IF CF’(i) > 0 THEN
MB <- MB + (1-MB)*CF’(i)*CS(i)
give initial belief to inter-relationships. ENDFOR

MD <- 0
3.2. Belief Revision FOR i <- 1 TO m+k DO
IF CF’(i) < 0 THEN
MD <- MD + (1-MD)*(-CF’(i))*RS(i)
In the context of DKBA, we take the view that domain ENDFOR
knowledge is a set of cooperative domain knowledge slices
AU <- (1-MD)*MB-MD
and a domain knowledge slice is composed of a set of coop- ----------------------------------------------------
erative domain concepts, where a domain concept could get Note:
<- stands for the operation of ordinary assignment
support from other concepts in the same domain knowledge ====================================================
slice or even from other knowledge slices. The belief of Script 1
a domain concept is therefore subjected to be revised once
the belief of its cooperative concepts changes. In this sub-
section, we discuss the dynamics of such a belief revision Theorem 3.2.1 The calculation of M B (or M D) in Script
process. 1 constitutes an Abel group in the field of [ 1; 1].
A proof for Theorem 3.2.1 can be easily obtained accord-
ing to the definition of Abel group. Due to space limitation,
3.2.1 Calculating The Authenticity of A Knowledge
we will not give the proof in this paper. Theorem 3.2.1 sug-
Slice
gests that the computation of AU is actually independent
Empirically, each concept or inter-relationship within a of the order of concepts and inter-relationships. M B col-
knowledge slice can make different contribution to the lects the beliefs that favour the context of a knowledge slice
recognition of authenticity of the knowledge slice. Take while M D collects the beliefs that against it. The compu-
Figure 2 for example, the actions of connect, listen and ac- tation of AU through M B and M D makes it possible that
cept will contribute more than allocate and bind to recog- M D has higher priority than M B . This is a desirable prop-

nise the scenario where sub-actions are taken to fulfill the erty, because a refutation is normally issued by users and
task of connect. Moreover, the absence of a domain con- therefore should be given higher weight. It is easy to ver-
cept or an inter-relationship would totally refute the linkage ify that 1  AU  1. Initially, AU is set to be 0. The
calculation of AU is done in concrete semantic network. Script 2

3.2.2 Revising The Belief of A Concept within A Single Similar to Section 3.2.1, we have:
Knowledge Slice
The authenticity of a knowledge slice is calculated by syn-
thesising the “votes” from all its participating concepts and Theorem 3.2.3 The calculation of MB (or MD) consti-
inter-relationships. The concepts in the knowledge slice tutes an Abel group in the field of [ 1; 1].
can therefore be re-evaluated once the authenticity of the Theorem 3.2.3 suggests that the computation of CF (i)
knowledge slice is changed. We hereby provide formulas is independent of the order of D(CF 0 (i; j )). In the above
for calculating the effect the revised authenticity of a knowl- algorithm, we first synthesise the ratios of belief-revising
edge slice has on its participating concepts. We don’t revise of a concept in different knowledge slices prior to updating
the belief of inter-relationships. Let Æ (AU ) = AU AU 0 , the old belief of the concept by the synthesised ratio. The
where AU 0 is the previous authenticity of a knowledge calculation of CF (i) is done in abstract semantic network.
slice; D(CFi ) be the ratio of belief-revising for concept i
in the knowledge slice. The other notations used in these
formulas are accorded with the definition in section 3.2.1. 3.3. Belief Propagation
We have

Æ (AU )  CS (i) if Æ (AU ) > 0 The revised belief of a concept can again revise the au-
D(CFi ) =
Æ (AU )  RS (i) else thenticity of the knowledge slices it occurs and in turn, af-
fects the beliefs of the concepts contained in these knowl-
We note that when authenticity of a context is positive, edge slices, and so on. A propagation of the effect of revised
the positive belief of a concept in the context is stretched at belief of a concept will occur in the semantic network. We
the rate of contribution strength; otherwise, the negative be- describe two different kinds of belief propagation, namely,
lief of a concept increases at the rate of refutation strength. dual-way belief propagation and singular-way belief prop-
D(CFi ) is also calculated in concrete semantic network. agation.
The result is written in abstract semantic network.

3.2.3 Synthesising The Beliefs of A Concept in Multi- 3.3.1 Dual-way Belief Propagation
ple Knowledge Slices
A domain concept can have different roles in different con- Dual-way belief propagation is carried out immediately af-
texts. After the beliefs of a concept are revised in different ter the beliefs of domain concepts are initialised. Suppose,
contexts, a final synthesis of the beliefs of the concept is we regard a domain knowledge slice as a domain expert, the
needed. Let D(CF 0 (i; j )) be the ratio of belief-revising for belief of a domain concept as the viewpoint of a domain ex-
concept i in context j; 1  j  m. CF 0 (i) be the old belief pert on the concept. At the stage when the beliefs of domain
of concept i, CF (i) be the synthesised belief for concept concepts are just initialised, the viewpoints of domain ex-
i. The algorithm for calculating CF (i) is described as fol- perts have not been imposed on these concepts yet. A sub-
lows: stantial of exchange of viewpoints among domain experts
on concepts is therefore needed. The viewpoints are ex-
=================================================== changed through the belief revision of concepts. The three
Algorithm for Synthesising Multiple Beliefs
--------------------------------------------------- steps in Section 3.2 describe how the viewpoint of a domain
MB <- 0 expert is imposed on a concept. We use the word of “dual-
FOR j <- 1 TO m DO
IF D(CF’(i, j)) > 0 THEN way”, because the exchange of viewpoints here is in two
MB <- MB + (1-MB)*D(CF’(i, j)) directions where a domain expert can both influence other
ENDFOR
domain experts and vice verse.
MD <- 0
FOR j <- 1 TO m DO
Dual-way belief propagation is an iterative process. Af-
IF D(CF’(i, j)) < 0 THEN ter one pass, only closely linked domain experts can ex-
MD <- MD + (1-MD)*(-D(CF’(i, j)))
ENDFOR
change viewpoints; as the process goes on, alienate domain
experts can exchange their viewpoints. In our system, a
D(CF’(i)) <- MB - MD
CF(i) <- CF’(i) + (1-CF’(i)) * D(CF’(i))
constraint 1 is set to limit the number of iterative rounds
---------------------------------------------------- because closely-related domain experts should not over in-
Note:
<- stands for the operation of ordinary assignment
fluence each other and alienate experts have less influence
==================================================== on each other.
3.3.2 Singular-way Belief Propagation usually a self-contained component with relatively indepen-
dent functionality. Empirical studies [16] suggests that each
Singular-way Belief Propagation is invoked when users in- programmer, having different training background and tem-
tervene the reasoning process by manually specifying the perament, tends to consistently use a particular code-writing
belief of a concept. We assume that a user is the domain ex- style. If we can identify different programming styles in a
pert with highest authority. It is not allowed for other “do- program, we will find out different programmers who wrote
main experts” (refer to knowledge slices) to revise a user’s the program and in turn, partition the program into smaller
and their successors’ (refer to those knowledge slices cho- self-contained sub-modules. The benefits of this method in
sen for belief revision after users’ intervention) belief. The the context of DKBA are: (1) the search for good path at
belief propagation is therefore in a singular-way. With the singular-way belief propagation stage can be concentrated
same concern given in Section 3.3.1, a constraint DC is set in a single self-contained module where strongly-coupled
to limit the number of propagation layers. Script 3 gives an domain knowledge exists; (2) the priority of search can be
algorithm for controlling the singular-way belief propaga- given to program regions with good readability.
tion.
===================================================== 4.2. Program Partitioning
STACK <= (knowledge slices affected by the concept
revised by a user, 1).
Three groups of key features in source code can be used
WHILE not-empty(STACK) DO to distinguish different programming styles. They are style
(CKS, layer) <= STACK.
IF (not-marked(CKS)) AND (layer <= DC) THEN
of comments, style of names and style of indent. We have
1. Compute the authenticity of CKS. given taxonomy to each group of features. Due to space
2. Revise the beliefs of all the concepts in CKS.
3. Synthesise the multiple beliefs of a concept
limit, we will not illustrate them here. Program partition-
if needed. ing is divided into three stages, namely, programming style
4. Marks CKS as ’processed’.
5. CKS’ <- knowledge slices affected by the
sampling, program cutting and program re-healing. An al-
concepts in CKS. gorithm for creating sampling function for programming
6. STACK <= (CKS’, layer+1)
ENDIF
styles is given in Script 4. Some abbreviations are Program-
ENDWHILE ming Styles (PS), Current Program Line (CPL), Sampling
------------------------------------------------------
Note:
Function (SF), Sample Interval (SI).
<= stands for the operation of POP or PUSH on a STACK. ==================================================
<- stands for the operation of general assignment. Algorithm for Creating Sampling Function of
====================================================== Programming Styles for A Program
Script 3 ==================================================
PS <- null
CPL <- 1
SF <- null
Because singular-way belief propagation is an irretractable WHILE CPL <> END-OF-PROGRAM DO
process and different propagation tracks give different re- ps <- programming style in CPL
sults. We thereby design a heuristic rule to choose the good IF ps in PS THEN
path, which is discussed in next section. PS[ps] <- PS[ps] + 1
ELSE
PS <= ps
PS[ps] <- 1
4. Programming Psychology Based Program ENDIF
Space Partitioning and Heuristic Search IF (CPL mod SI) == 0 THEN
SF <= PS
PS <- null
Although work has been done for program space parti- ENDIF
tioning [5], enough concern has not been given in the con-
CPL <- CPL + 1
text of DKBA. In this section, we present a novel method ENDWHILE
for program space partitioning based on the principle of pro- ---------------------------------------------------
Note:
gramming psychology. A heuristic rule for selecting good <- stands for the operation of general assignment
searching path at singular-way belief propagation stage is <= stands for the operation of adding a element
to a set
designed based on such a method. ===================================================
Script 4

4.1. Motivation for Programming Psychology A programmer can occasionally use other programming
styles. By setting a threshold to sample function, such
A large software program is generally co-written by a noise can be filtered out and main features of programming
group of programmers. Each programmer is responsible for style for a programmer remain. Some programming styles
only part of the whole program. Each part of the program is can also commonly be shared by different programmers.
The correct cutting points of a program is where new pro- 5. A Comparison to Related Work
gramming styles come up or old programming styles dis-
appear. Script 5 provides an algorithm for program cut- Traditional concept assignment methods for linking pro-
ting. Some abbreviations are Sample Pointer (Sp), Sample gram sections with domain concepts are heavy-weighted.
Number (SN), Programming Style Numbers (PSN), Parti- Biggerstaff, et. al. [4] proposed an approach to concept as-
tion (Par), programming style pointer (ps). signment. A hybrid semantic/connectionist network is used
as their knowledge representation. A neural network-like
=================================================== mechanism is used to propagate uncertainty across mul-
Algorithm for Program Cutting tiple layers, ranging from syntax level, through structure
===================================================
Sp <- 1 level, to domain concept level. The propagation of uncer-
Par <- null tainty is one-way, i.e., bottom-up. The uncertainty is im-
WHILE Sp <> SN DO
ps <- 1 plicitly represented in the connectionist network, where a
WHILE ps <> PSN DO training is needed to acquire knowledge. Although an in-
IF (PS[ps][Sp]==0 AND PS[ps][Sp+1]>0) OR
(PS[ps][Sp]>0 AND PS[ps][Sp+1]==0) THEN teresting work for concept assignment for a single knowl-
Par <= Sp edge slice, Biggerstaff didn’t address the issue of how re-
ENDIF
ps <- ps + 1 covered knowledge slices affect each other. The reason is
ENDWHILE that concept assignment in a single knowledge slice, by us-
Sp <- Sp + 1
ENDWHILE ing their approach, needs substantial supporting knowledge
---------------------------------------------------- across many layers which already constitutes a big prob-
Note:
<- stands for the operation of general assignment lem. In contrast, we provide a light-weighted approach to
<= stands for the operation of adding an element domain knowledge recovery from source code, which con-
into a set.
==================================================== centrates on the “width” rather than “depth”of DKBA. The
Script 5 rationale we hold is that a deep-level analysis/check on a
legacy program is not that necessary since the legacy sys-
The cutting of a program is done in sequential program tem has been running correctly over years! Our approach
space, whilst a self-contained module can spread across can be used stand-alone or as a complement to traditional
separated program regions which needs to be re-healed DKBA methods.
again. The healing is done by matching exactly the same MYCIN [17], an early expert system for disease diag-
programming styles among different program regions. nosis uses production rule as its knowledge representation.
The uncertainty reasoning mechanism is built on the axiom
system of confirmation theory where a set of calculus are
4.3. Heuristic Search designed for the rule-based uncertainty reasoning. Our un-
certainty reasoning model is also built on the axiom system
We first address the issue of evaluating the quality of a of confirmation theory, however a set of calculus are spe-
program section. Let Quai be the quality of a program sec- cially designed for semantic network-oriented uncertainty
tion i, C W and N W be the weights for comment and name reasoning in the new context.
respectively where C W > 0; N W > 0; C W + N W = 1, It is also noticed that quite a lot of work has been done
C Di and N Di be the density of comments and names in for partitioning program. Burd [5], for example, proposes
program section i respectively. Both C Di and N Di can be a dominance tree as a representation for program and de-
easily calculated based on the algorithm described in Script signs a set of metrics to cut out components from the tree.
4. We have: Their work is structure-oriented where thought has not been
Quai = CW  C Di + NW  N Di
given to the context of DKBA. In contrast, special concerns
are given to domain-oriented program partitioning when we
The key point for selecting a good search path at
search for an “empirically effective and efficient to use”
singular-way belief propagation stage is to select a good
heuristic rule for selecting good belief propagation path.
knowledge slice. Let K S be a knowledge slice, P S S be
the set of program sections corresponding to the participat-
ing domain concepts of this knowledge slice, QP S S be the 6. Concluding Remark and Future Work
qualities of these program sections, QK S be the quality of
K S . We have QK S = minfQP S S g. The priorities for Domain knowledge recovery from legacy systems con-
choosing good knowledge slices are, in turn, given to: (1) stitutes a big challenge in the new era of software engineer-
knowledge slices with high QK S ; (2) knowledge slices in ing. Software engineering is always believed to be an ideal
the same program partition of the currently selected knowl- test-bed for AI methodology which unfortunately has not
edge slice. been sufficiently addressed by existing DKBA methods. We
develop an approach to domain knowledge recovery from [8] A. Engberts, W. Kozaczynski, and J. Q. Ning. Concept
source code based on our understanding on AI over years. recognition-based program transformation. In Proceedings
It is a light-weighted approach which is a good complement of the International Conference on Software Maintenance
to traditional DKBA methods. 1991.
Our approach has been implemented in Reverse engi- [9] J. Hartman. Understanding natural programs using proper
neering Assistant (RA) [23] and tested on a database ap- decomposition. In Proceedings of the 13th International
Conference on Software Engineering.
plication from a major telecommunication company with
[10] W. L. Johnson and E. Soloway. (proust): Knowldge-based
which a research collaboration was set up to discover ontol-
program understanding. In Porceedings of the 7th Interna-
ogy from legacy system. The experience will be reported
tional Conference on Software Engineering.
elsewhere.
[11] K. Kontogiannis, R. DeMori, M. Bernstein, and E. Meirlo.
Several branches of work in the future can be discerned. Localization of design concepts in legacy systems. In Pro-
A theoretical model analysis for the network-oriented un- ceedings of the International Conference on Software Main-
certainty propagation will be given; More tests of our ap- tenance 1994.
proach in different kinds of “knowledge to program” sce- [12] N. C. Mendonca and J. Kramer. A quality-based analysis of
narios, such as network protocol, interface, etc. are needed; architecture recovery. In 1st Euromicro Working Conference
the implication of our approach to other AI areas, such on Software Maintenance and Reengineering CSMR 97.
as information retrival from web, image or speech recog- [13] A. Quilici. A hybrid approach to recognizing programming
nition need to be pondered on; the roles of programming plans. In Proceedings of the 1st Working Conference on Re-
psychology-based program partitioning method in other verse Engineering.
software engineering tasks need to be found out; also ex- [14] A. Quilici. Reverse engineering of legacy systems: A path
pected is that research on the study of knowledge point, toward success. In Proceedings of the 17th International
knowledge pattern and programming pattern of different Conference on Software Engineering.
programmers will be involved in our context. [15] S. Rugaber. White paper on reverse engineering, Mar 1994.
[16] S. Rugaber and V. Tisdale. Software psychology require-
ment for software maintenance activities. Technical report,
References Software Engineering Centre Georgia Institute of Technol-
ogy, Atlanta, GA, 1994.
[1] S. K. Abd-El-Hafiz and V. R. Basili. Documenting programs
[17] E. H. e. a. Shortliffe. A model of inexact reasoning.
using a library of tree structured plans. In Proceedings of the
Medicine Mathematical Bioscience, 23:351–379, 1975.
International Conference on Software Maintenance 1993,
[18] Tan and Dietz. Abstracting plan-like program informa-
pages 152–161. IEEE Computer Society Press, Sep 1993.
[2] G. Antoniol, G. Canfora, and A. D. Lucia. Recovery code to tion: A demonstration. In Proceedings of the International
documentation links in oo systems. In Proceedings of Work- Conference on Software Maintenance 1994, pages 262–271.
ing Conference on Reverse Engineering, Atlanta, Georgia, IEEE Computer Society Press, Sep 1994.
Oct 1999. [19] P. Tonella, R. Fiutem, G. Antoniol, and E. Merlo. Augment-
[3] F. Balmas. Toward a framework for conceptual and formal ing pattern-based architectural recovery with flow analysis:
outlines of programs. In Fourth Working Conference on Re- Mosaic - a case study. In Working Conference on Reverse
verse Engineering, pages 226–235, Amsterdam, Oct 1997. Engineering. IEEE Computer Society, 1996.
IEEE Computer Society. [20] R. C. Waters. Program translation via abstraction and reim-
[4] T. J. Biggerstaff, B. G. Mitbander, and D. Webster. The con- plementation. IEEE Transactions on Software Engineering,
cept assignment problem in program understanding. In Pro- 1988.
ceedings of the 15th International Conference on Software [21] L. Wills. Flexible control for program recognition. In Pro-
Engineering. ceedings of the 1st Working Conference on Reverse Engi-
[5] E. Burd, M. Munro, and C. Wezeman. Extracting reusable neering.
modules from legacy code: Considering the issues of mod-
[22] L. M. Wills. Automated program recognition by graph pars-
ule granularity. In Proceedings of Working Conference on
ing. Technical report, Massachusetts Institute of Technology
Reverse Engineering.
[6] W. W. Cohen. Inductive specification recovery: Understand- - Artificial Intelligence Laboratory, 1992.
ing software by learning from example behaviors. Auto- [23] H. Yang and K. Bennett. Extension of a transformation sys-
mated Software Engineering, 1995. tem for maintenance – dealing wtih data-intensive programs.
[7] C. Desclaux and M. Ribault. Macs: Maintenance assistance In Proceedings of the International Conference on Software
capability for software maintenance. In Proceedings of the Maintenance 1994, pages 344–353. IEEE Computer Society
International Conference on Software Maintenance 1991. Press, Sep 1994.

Das könnte Ihnen auch gefallen