Model Why

The Journal of Systems and Software 84 (2011) 10891099
Contents lists available at ScienceDirect

The Journal of Systems and Software
j our nal homepage: www. el sevi er . com/ l ocat e/ j ss
Is my model right? Let me ask the expert
Antonia Bertolino, Guglielmo De Angelis
, Alessio Di Sandro, Antonino Sabetta

CNR-ISTI, Pisa, Italy
a r t i c l e i n f o
Article history:
Received 2 August 2010
Received in revised form
27 December 2010
Accepted 25 January 2011
Available online 16 February 2011
Keywords:
Model validation
Domain modeling
Model driven engineering
Experience report
a b s t r a c t
Dening a domain model is a costly and error-prone process. It requires that the knowledge possessed by
domain experts be suitably captured by modeling experts. Eliciting what is in the domain experts mind
and expressing it using a modeling language involve substantial human effort. In the process, conceptual
errors may be introduced that are hard to detect without a suitable validation methodology. This paper
proposes anapproachto support suchvalidation, by reducing the knowledge gapthat separates modeling
experts and domain experts. While our methodology still requires the domain experts judgement, it
partiallyautomates thevalidationprocess bygeneratingaset of yes/noquestions fromthemodel. Answers
differing from expected ones point to elements in the model which require further consideration and
can be used to guide the dialogue between domain experts and modeling experts. Our methodology was
implemented as a tool and was applied to a real case study, within the IPERMOB project.
2011 Elsevier Inc. All rights reserved.
1. Introduction
Model Driven Engineering (MDE) is a very active research topic
(Schmidt, 2006; France and Rumpe, 2007). It focuses on creating an
abstract representation of a system that is close to some relevant
domain concepts rather than describing the systemin terms of the
underlying and general-purpose technologies by which it is imple-
mented. The principle behind is that everything is a model (Bzivin,
2005).
The denition of models (e.g. domain models) is an effort-
intensive activity that requires the synergy between different
backgrounds and skills: from modeling experts (MEs) to master
MDE environments and resources; and from the domain experts
(DEs) to delve into the application domain.
The denition of models is also a critical activity. Since models
usually are the starting point for many subsequent transfor-
mations, faults that are possibly introduced in this stage can
have deleterious impacts. Therefore, ensuring that the model is
right becomes of paramount importance. The term right in
the previous sentence may mean different things: paraphras-
ing Boehms informal denitions for distinguishing between
verication and validation (Boehm, 1984), we have to ask
ourselves: are we building the model right? are we building the
Corresponding author at: CNR, Istituto di Scienza e Tecnologie dellInformazione

A. Faedo, 56124 Pisa, Italy. Tel.: +39 0503153468; fax: +39 0503152924.
E-mail addresses: antonia.bertolino@isti.cnr.it (A. Bertolino),
guglielmo.deangelis@isti.cnr.it (G. De Angelis), alessio.disandro@isti.cnr.it
(A. Di Sandro), antonino.sabetta@isti.cnr.it (A. Sabetta).
right model? Our work addresses the second question, which is the
thorniest of the two.
More precisely, in their highly referenced framework for con-
ceptual model quality, Lindland et al. (1994) distinguish syntactic,
semantic andpragmatic qualityof amodel. Syntactic qualitytargets
syntactic correctness, i.e., adherence to the language rules, which
canbedone, relativelyeasily, inautomatedway. Incontrast, achiev-
ingsemantic qualityis extremelydifcult andcostly(Lindlandet al.,
1994). Lindland and coworkers reduce it to two main goals: valid-
ity and completeness,
1
and we agree with their consideration that
semantic quality mostly entails manual inspection of the model.
The goal is to validate that the model rightly captures the intended
domain knowledge, which is a matter of mutual understanding
between MEs and DEs (who usually are not the same people). Here,
the third quality dimension of a model comes into play: pragmatic
quality, which targets model comprehension, i.e., that all involved
stakeholders understand the parts of the model that are relevant
to them.
The issue is that a gap in knowledge, background, and skills
exists between MEs and DEs. In general we cannot assume that
the DEs yieldthe expertise requiredtodirectly inspect andnavigate
models. Visual aids andanimationcanbe of help, but ultimately the
usual approach remains that MEs explain models to DEs in natural
language (NL) and collect and process their feedback. Tool support
in this step can be extremely valuable, especially when models are
large and span over many different conceptual domains. Indeed,
1
Model consistency checking in this framework is seen as a means to achieve
semantic quality, and not a goal.
0164-1212/$ see front matter 2011 Elsevier Inc. All rights reserved.
doi:10.1016/j.jss.2011.01.054
1090 A. Bertolino et al. / The Journal of Systems and Software 84 (2011) 10891099
Fig. 1. Overview of a generic domain modeling process.
the MEs themselves without guidance may lack systematicity or
introduce errors when translating from the model to NL.
We recently experienced the challenge of building and vali-
dating the model of a complex embedded Information System for
urban mobility in the scope of the IPERMOB
2
project (presented
in Section 4.1). IPERMOB brings together experts from a variety of
domains, spanning over software engineering, realtime embedded
systems, wireless sensor networks, transport engineering, image
processing, and data mining. In IPERMOB, we experienced the sit-
uation described above: we acted as the MEs and built a domain
model in UML by going through several interactions with our part-
ners in the project, who acted as the DEs in their respective eld
of expertise. The model obtained needed to undergo semantic val-
idation to check that we had captured the DEs intents correctly,
but the knowledge of the IPERMOB domain was distributed among
several partners, not all of which possessed a sufcient expertise
to directly inspect the UML model we built.
Although the use of UML as a language to describe domain
models is common practice nowadays, other choices are possible,
such as dedicated domain-specic languages (DSLs) (Mernik et al.,
2005), ontologies, as well as combinations thereof (Walter et al.,
2009). It is easy to nd in the literature comparisons of the differ-
ent approaches, and advocates of one approach or the other (Selic,
2007; Dalgarno and Fowler, 2008). Joining this debate is out of the
scope of this paper, so we do not discuss this further. However, in
the context of this work, it is important to stress that while the
prototype we present to demonstrate our approach assumes UML
is used as a conceptual modeling notation, the methodology and
the technique on which the prototype is based are much more gen-
eral. Actually, they could be applied with other choices of modeling
notations, requiring just implementation-level extensions, but no
major changes to the underlying principles.
To support the task of interviewing the IPERMOB experts, we
developed a novel approach that uses state-of-art model-driven
techniques to navigate the current version of a model and to auto-
matically formulate questions inNL that the DEs canunderstand, so
to stimulate their feedback and possibly identify problematic parts
of the model. This approach has been implemented into a practical
tool and has been already usefully applied to the IPERMOB project.
Our approach stands between semantic and pragmatic model
quality, and the scientic challenges that it poses stay at the
convergence of different research directions: (a) approaches for
manipulating and querying formal models, such as (Strrle, 2007,
2009; Opoka et al., 2008); (b) approaches and techniques for UML
model verication (Malgouyres and Motet, 2006; Gogolla et al.,
2005) (c) approaches for expressingmodel properties inNL (Konrad
and Cheng, 2005); (d) cognitive approaches for eliciting require-
ments and extracting human knowledge (Glinz and Wieringa,
2
http://www.ipermob.org/.
2007; Cheng and Atlee, 2007). We are aware that our results are
far from having answered all the challenges posed in the above
research elds. Rather, we see our contribution as a humble start
which fromour experience looks promising and certainly deserves
further investigation.
The paper is organized as follows: after an overview of the
approach in the next section, we introduce the developed tool
in Section 3. Section 4 describes the settings of the case study in
the IPERMOB project, and reports the data we collected. Section
5 discusses the results of the case study, while Section 6 outlines
the potential threats to validity of our experience. Finally, relevant
related work is dealt with in Section 7, while Section 8 wraps-up
the conclusions of the paper and outlines future work.
2. Overview
In this section we rst overview a generic process of seman-
tic model validation, in which we highlight the steps related to
the communication between DEs and MEs. Then, we introduce our
approach and tool within this process, to facilitate mutual compre-
hension during validation.
2.1. Model building and validation process
ThroughtheUML ActivityDiagraminFig. 1weconceptualizethe
steps involved in a domain modeling process, with an emphasis on
the communications occurring between DEs and MEs.
DEs and MEs start interacting to establish a shared vocabulary
and elicit a common high-level view of the input domain. After
this phase, the MEs can propose an initial model of the domain,
which, as anticipated in Section 1, has to be validated by the DEs.
To this purpose, an iterative process starts. Along these iterations
the MEs try to make the model comprehensible to DEs by generat-
ing and explaining appropriate model views. Based on such views,
MEs request, collect, and analyse the feedbacks fromthe DEs. If the
feedbacks reveal any mismatchbetweenthe domainmodel andthe
DEs intents, the MEs should understand the nature of the conict
and address it (e.g. proposing a new version of the domain model).
This process can be iterated until all stakeholders are satised with
the domain model.
2.2. Tool-supported querying
Although the process described above can never be fully auto-
mated, we can use model-driven technology to make it more
systematic and more effective. In particular, the approach we
propose here helps MEs in bridging the knowledge gap existing
between them and the DEs.
With reference to Fig. 1, we support the automatic trans-
formation of the model into views comprehensible by DEs (the
dark-shaded activity in the gure), by automatically generating a
A. Bertolino et al. / The Journal of Systems and Software 84 (2011) 10891099 1091
tunable list of simple Yes/No questions that span over all model
elements.
The idea is that at each iteration of the model validation pro-
cess, DEs can interview the MEs by means of such automatically
generated questionnaires, thus they are freed fromthe difcult and
error-prone task of translating the model into NL descriptions.
We also generate for each question the expected answer based
on the current version of the model. Therefore in the subse-
quent step of feedback analysis, MEs can also immediately identify
possible conicts betweenactual andexpectedanswers, andefca-
ciously focus the costly task of discussing feedbacks on those parts
of the model to which they belong.
How we obtain the questionnaires is described in the next sec-
tion.
3. Tool architecture and implementation
Our tool is called MOTHIA (Model Testing by Human Interroga-
tions & Answers).
3
Section 3.1 describes the overall architecture of
MOTHIA, and Section 3.2 proposes the reference implementation
we developed.
3.1. MOTHIA: the architecture
As described in Section 2, MOTHIA deals with the knowledge
gap that separates MEs and DEs. MOTHIA takes in input a model
representative of the domain and produces in output a question-
naire expressed in a language that is closer to the DEs. Fig. 2 depicts
MOTHIA internal structure (left side) and how it interacts with the
model (right side). The KnowledgeBaseGenerator is responsible to
convert the input domain model into an internal representation
describing both the domain entities, and their relations as facts.
The InferenceEngine loads such facts, and the set of rules represent-
ing the semantics of the input modeling language. By querying the
InferenceEngine, MOTHIA checks whether a given property is valid
in the input domain model. Also, the InferenceEngine can return all
the entities in the domain model satisfying a given property.
The QuestionnaireGenerator loads the congurations that drive
the queries on the InferenceEngine, and the creation of the ques-
tionnaire. The congurations of the QuestionnaireGenerator are
expressed in term of patterns and criteria.
A pattern represents a syntactical combination of elements in
the input domainmodel that is consideredrelevant. Inother words,
MOTHIA uses patterns as a strategy to match a portion of the spe-
cic input domain model driven by the syntax. A pattern is dened
in terms of an id, a signature (i.e. the name of the pattern with
the list of its formal parameters), and a textual description. Also, a
pattern composes one or more queries: each query implements a
clause that select syntactical structures on the input domain mod-
els. A priority attribute denes the sorting on how the different
clauses should be applied.
The criteria dene the properties that MOTHIA uses in order to
explore and query the input domain model. In other words, each
criterion abstracts a type of question in the questionnaire. A cri-
terion implements a question by composing patterns. Deductive
questions are generated from criteria that infer true predicates
from the input domain model. Alternatively, a question that corre-
sponds to a false predicate in the input domain model is called,
in this paper, a distractor.
4
Examples of the two types of questions
canbe foundinSection3.2 (deductive questions are markedwith
,
distractors with ).
3
http://labse.isti.cnr.it/tools/mothia.
4
This is a slight abuse of language; in the literature, distractors are incorrect
alternative answers to a multiple-choice question (Revuelta, 2004).
Thus, MOTHIAexpects that DEs conrmthe deductive questions
and reject the distractors.
In general, querying the input domain model from the set
of criteria leads to a questionnaire with an intractably large
number of questions. To reduce the number of questions, the
QuestionnaireGenerator uses the Filters component. For example, the
Filters component can do a random selection of questions, or can
target the questionnaire only towards some specic domain.
Patterns and criteria are artifacts that conform to a grammar,
which MOTHIA denes. Patterns and criteria are provided by the
user as inputs to the tool. In particular, they are loaded and pro-
cessed by the QuestionnaireGenerator. On the contrary, the Filters
component is a software module internal to MOTHIA. The user can-
not provide its own lter, but can congure the existing ones. Fig. 2
emphasizes sucharchitecture. Inshort, there is a language to create
patterns and criteria, while there is not a language to create lters.
The Filters component has an impact on which questions will
be submitted to the DEs. Thus, each time the validation process is
carried out, the selection of an appropriate ltering strategy could
be crucial to its success. For example, if MEs decide both to apply
a random ltering of the questions, and for each questionnaire to
keep the number of questions reasonably low, they balance the
number of the generated questionnaires against the probability to
cover the whole domain.
Finally, the EmittersContainer component deals with the output
format of the questionnaire. It abstracts the functionalities of a con-
gurable transformationengine andcomposes one or more specic
emitters.
3.2. MOTHIA: the reference implementation
This section describes a reference implementation for MOTHIA.
Specically, we will present the tool by means of the illustrative
scenario depicted in Fig. 3. Such scenario models the entities of an
imaginary municipality and their relations: among the candidates
of SomeParty, a City votes its Mayor, who in turn appoints the
Councillors (up to 10).
MOTHIA is developed as an Eclipse plugin, using JAVA and EMF
technologies, such as ECore and Acceleo. In particular, the current
implementation can generate questions formulated in NL out from
models dened by means of either ECore, or UML Class Diagram.
SWI-Prolog
5
is the InferenceEngine we chose (see Section 7); it is
accessed by our plugin via the JPL library. The internal representa-
tion of the input model is therefore a collection of facts. Without
loss of generality, the reference implementation considers only the
most important modeling elements (additional ones can be easily
addedto the KnowledgeBaseGenerator if needed). Four types of facts
are used:
1. class, to identify a class;
2. classAttribute, to identify an attribute of a class, and its proper-
ties;
3. inheritance, to identify a generalization;
4. association, to identify an association between two classes, and
its properties.
In addition to the facts, we dened a set of Prolog rules imple-
menting the semantics of the input modeling language (i.e. UML
Class Diagrams, ECore Diagrams). For example, the InferenceEngine
canloada semantic rule implementing the applicationof the Liskov
substitution principle (Liskov and Wing, 1994): in all the questions
where a super-class is used, it is possible to substitute it with any of
5
http://www.swi-prolog.org/.
Fig. 2. The architecture of MOTHIA.
Fig. 3. The domain model about an imaginary municipality.
its sub-classes. Thus, withrespect to Fig. 3, any questionconcerning
the relation citizen from City to Person can be also rephrased
in terms of either Councillor or Mayor, since they are persons
too. Note that the semantic rules we implemented within MOTHIA
can be extended or overloaded with other semantic interpretations
(e.g. domain specic semantic).
The previous four facts are the basic blocks that allowpatterns to
be built. As previously stated, questionnaires are derived by using
patterns and criteria. A pattern represents a syntactical structure
in the model. The InferenceEngine implements a pattern as a SWI-
Prolog rule where the signature is the head, while a query is the
body. It is interesting to note that a pattern itself can be used as a
building block for other patterns, allowing for an arbitrarily com-
plex structure. The most relevant patterns dened in the reference
implementation are described in Table 1.
For example, three noteworthy patterns can be found in Fig. 3:
the marriage pattern matches the inheritance structure composed
of classes Person, Authority (i.e. father and mother respec-
tively), Councillor (i.e. a step-son) and Mayor (i.e. a son); the
subclassesAssociation pattern matches the inheritance structure
composed of classes Authority, Party (i.e. super-classes), Mayor
and SomeParty (i.e. sub-classes); the indirectAssociation pattern
matches the chain of associations composed of classes Council-
lor, Mayor and SomeParty, and another chain composed of classes
Councillor, Mayor and City.
As described in Section 3.1, a criterion combines patterns
together in order to generate specic kinds of questions. In our
reference implementation, we dened some criteria from com-
mon modeling slips while others were inspired by the literature.
In particular, in Dalianis (1992) and Papasalouros et al. (2008) the
respective authors discussed some relevant syntactic structures to
look for into a domain model (the main differences between our
approachand suchworks are discussed inSection7). They also pre-
Table 1
Summary of patterns.
Pattern Description
1. attributeNotAssociation matches class attributes that are not used as
association member ends
2. chain matches all sort of information about a class, to
rank its relevance (attributes, associations
from and to the class, multiplicities, parent and
child classes)
3. family matches a grandfather super-class with
sub-classes, some of which (fathers) have
other sub-classes (sons)
4. indirectAssociation matches all classes that are reachable from a
certain class by navigating associations, with a
congurable amount of intermediate classes
5. marriage matches a class (son) that inherits from two
super-classes (father and mother), one of
which has other sub-classes (step-sons)
6. multiplicityRandom generates a positive random integer between
lower and upper bounds of an association
multiplicity
7. siblings matches sibling classes that inherit from the
same super-class
8. subclassesAssociation matches an association between two classes
that are each derived from a distinct
super-class
9. superclassesAssociation matches an association between two classes
that have each a distinct sub-class
10. unrelatedClasses matches two classes in a super/sub-class
relationship and a third class, with no
associations to and/or from the previous two
classes
sented some strategies inorder to generate questions. For example,
we took inspiration from Dalianis (1992) for questions about both
the direction and the type of a relation (e.g. What is the relation
between EntityA and EntityB?, Can an EntityA have exactly
one and only one EntityB?). Also, we referred to Papasalouros
et al. (2008) to generate questions about the generalization (e.g.
Is EntityA an EntityB?). Finally, we dened complex criteria
combining some of these generation strategies.
In the following, some denitions of criteria are shown.
Let us consider a simple criterion crit1 that uses only the
marriage pattern. For this criterion, MOTHIA generates a sim-
ple deductive question considering the inheritance relationship
between a class (e.g., step-son) and its parent, while a distrac-
tor can fake such relationship matching the same class with the
wrong parent class.
For example, with respect to the illustrative scenario depicted
in Fig. 3, crit1 produces:
Is Councillor a kind of Person?

Is Councillor a kind of Authority?
A complex criterion can be formulated by combining more than
one pattern by means of a join operation (i.e., combining together
the overlapping model elements). For example, let us dene crit2
as the criterion that looks for a class A that is acting: as a sub-class
in the pattern subclassesAssociation; as son in the pattern marriage;
and also as an intermediate class in the pattern indirectAssociation.
By querying the model in Fig. 3 with crit2, class A is instantiated
with the entity Mayor. In the following an example about the type
of questions that can be generated using crit2:
Can the relation fromMayor to Councillor exist, where Coun-

cillor is appointed by Mayor?
Can the relation from City to Mayor exist, where Mayor is a

citizen of City?
Is Mayor a kind of Person?

] Can the relation from SomeParty to Councillor exist, where
Councillor is appointed by SomeParty?
] Is Mayor a kind of Party?
] Can the relation from SomeParty to an unlimited number of
Mayor s exist, where a Mayor is a candidate of SomeParty?
To summarize, MOTHIA allows its users (i.e., the MEs) to add
arbitrary complexity whenlooking for elements inthe input model,
just by combining together elementary building blocks. Even more,
it allows MEs to dene what such building blocks are. Both pat-
terns and criteria conform to their respective metamodels, which
we dened. MOTHIA stores them as XMI encoded les. We refer to
Appendix A for examples of XMI les and for more details about
the previous examples.
Note that in this paper our goal is to present the MOTHIA
framework and investigate its effectiveness in practice through
its application to a real case study. This is the reason why we do
not further devote specic attention to investigate which are the
most common modeling mistakes and which patterns and crite-
ria should be dened to identify them; although we believe that
this is an important topic, this is outside the scope of this paper.
Nonetheless, we put an effort in creating a reference implemen-
tation that supports extensible patterns and criteria. MEs using
MOTHIAcanincludeour default set of patterns andcriteria. Besides,
they can simply enhance and customize the QuestionnaireGenerator
by extending the existing patterns and criteria or by dening their
own. Such customization can be easily implemented by MEs as
MOTHIA refers to many popular technologies running under the
Eclipse Modeling Framework Project.
6
6
http://www.eclipse.org/modeling/emf/.
Filters are Java classes that implement a specic ltering strat-
egy, in order to reduce the size of the questionnaire after all
questions have been generated. Each lter must dene three
ltering functions working at different levels of data aggrega-
tion, precisely: a reasoning category (deductions or distractors),
a criterion, a whole questionnaire. This allows for a ne grained
control over which questions make their way to the output. Two
types of lter have been implemented: a random lter and a
score-based lter that exploits class relevance into the model.
For example, by applying the score-based lter, only the x most
important questions are kept from each criterion. In alterna-
tive, by applying the random lter, a cap of overall y questions
is xed for the questionnaire, and the exceeding ones are ran-
domly cut. The two lters can also be applied in combined
way.
Finally, two output emitters have been implemented within the
EmittersContainer: an ECore emitter, to visualize the raw composi-
tion of the questionnaire, and a XHTML emitter, to publish it as a
web form.
4. Case study
This section reports our experience in applying the approach
described to a real case study.
Specically, the main goal of our experience was to stimulate
further feedback from DEs with respect to the specication of a
domain model. For this purpose, we applied MOTHIA as a means
for DEs and MEs to communicate with each other without using
the modeling notation, but only through NL. We also wanted to
estimate the performance of such approach in revealing errors in
the IPERMOB domain model.
In the following, we dene rst the reference scenario within
IPERMOB. Then, we describe the settings and the activities that we
conducted. The results of these activities are discussed in the next
section.
4.1. Scenario
Our approach has been developed and validated in the context
of the IPERMOB project.
The goal of IPERMOB is to realize an integrated multi-tier infor-
mation system to improve urban mobility. From a technical point
of view, the project aims at combining low-cost wireless technol-
ogy and efcient image processing techniques to collect, analyse,
and elaborate information related to trafc ow in order to offer
real-time advanced information to citizens and management func-
tionalities to local governing authorities.
As anticipated in Section 1, a distinctive characteristic of the
project is its interdisciplinarynature, whichbrings together experts
froma variety of domains (software engineering, real-time embed-
dedsystems, wireless sensor networks, transportationengineering,
image processing, and data mining). Since the early stages of
the project, the project partners decided that a common domain
model should be constructed. Because of their heterogeneous
backgrounds, this was deemed necessary to establish a shared
understanding of the key concepts coming from each of the sub-
domains.
The resulting domain model is structured in six packages,
each containing concepts from a particular sub-domain or area
of concern, and includes 50 classes, 68 associations (of which 14
compositions), 23 inheritance relationships.
7
7
Available at http://labsewiki.isti.cnr.it/labse/projects/ipermob/public/main.
Table 2
Injected faults.
Fault Id Description
F1 Create association between two (previously unassociated) classes
(referred to as CACA in Mottu et al., 2006).
F2 Replace element with parent or sibling (CCCR in Mottu et al., 2006).
F3 Flip direction of monodirectional association.
F4 Turn simple association into composition and viceversa.
F5 Change cardinality (set lower bound to 0 or 1, set upper bound to 1
or )
4.2. The Experts Answers
In this study we used MOTHIA to generate 14 questionnaires.
8
This corresponds tothe number of DEs fromdifferent partners, who
were available to carry out this model validation exercise on behalf
of the IPERMOB project. Each questionnaire covered all the areas
of the domain model and included a mix of both deductive and
distractor questions.
For each question produced by MOTHIA, we proposed the
interviewee a multiple-choice answer based on four possible, and
exclusive, items: (1) Yes, (2) No, (3) Im not an expert of this par-
ticular subject (NE), and (4) Im an expert of the subject but Im
not sure what to answer (DK).
Of course we did not know in advance which and how many
errors were actually present in the domain model as it was origi-
nally formulated. In other words, we had no reference baseline to
measure how effective MOTHIA was in discovering faults in the
model.
Hence, only for the purpose of evaluating howeffective MOTHIA
is in generating questionnaires that reveal errors in the input
model, we decided to mutate the input domain model and to cre-
ate a new slightly different version by injecting some articial
faults. Similarly to the idea used in mutation testing, this con-
trolled manipulation ensures that even if errors were not present
in the model, we could still evaluate the effectiveness of MOTHIA
in detecting the injected errors; and, even if some real errors were
detected (as it happened), we could not know how many others
were remaining, i.e. we could not calculate their percentage.
Table 2 reports the faults we used in order to mutate the domain
model; some of them were taken from the literature on mutation
testing for both models and model transformations (Mottu et al.,
2006).
In order to avoid bias, for each fault we randomly selected the
portion of the domain model to mutate. Note that differently than
mutation testing of code (were one mutant includes a single muta-
tion), all the ve faults were simultaneously inserted in the same
model.
Fromthe mutated domain model, MOTHIA generated the ques-
tionnaires made of 30 questions each. Each questionnaire mixed
randomly questions affected by the articial faults (in the follow-
ing faulty questions), and questions not affected (in the following
genuine questions). The two types of questions could not be distin-
guished by the DEs, who were unaware that faults had been ad hoc
introduced.
Finally, each questionnaire was administered to DEs of one or
more sub-domains of IPERMOB. The exercises have been carried
out independently in small separate groups (just depending on the
availability of the DEs). Answers to genuine questions are summa-
rizedinTable 3(a), while results for the faulty questions are givenin
Table 3(b). For the purposes of reporting, a questioncanbe regarded
8
The patterns and the criteria implemented are available at
http://labse.isti.cnr.it/tools/mothia.
Table 3
The Experts Answers.
Questionnaire #Pass #Fail #NE #DK
(a) Results for the genuine questions
1 12 4 6 2
2 16 9 1 0
3 9 10 4 0
4 15 4 2 3
5 15 7 0 0
6 11 11 1 1
7 14 5 6 0
8 8 8 9 0
9 15 8 0 2
10 13 8 4 0
11 13 9 3 0
12 15 6 4 0
13 10 6 7 1
14 10 8 6 1
Totals 176 103 53 10
Questionnaire #Faulty questions #UN #DK #K (#UN +#DK) #NK #NE
(b) Results for the faulty questions
1 6 2 1 3 1 2
2 4 2 0 2 1 1
3 7 3 0 3 3 1
4 6 2 3 5 0 1
5 8 1 3 4 4 0
6 6 2 1 3 3 0
7 5 3 0 3 1 1
8 5 2 1 3 1 1
9 5 2 1 3 2 0
10 5 1 0 1 4 0
11 5 1 2 3 2 0
12 5 1 0 1 2 2
13 6 1 0 1 1 4
14 5 3 0 3 2 0
Totals 78 26 12 38 27 13
as a test on the input model: the test is Passed if the answer of
the DE matches the answer expected by MOTHIA, Failed other-
wise. With respect to the faulty questions, they are considered not
killed
9
(NK column in Table 3(b)) if the interviewed DE provides a
Yes or No answer accordingly to the tool expectation (as we mod-
ied the original model, the expected answer should be deemed
wrong). On the contrary, a faulty question is considered as killed
(K column in Table 3(b)) if the DE gives either an answer differ-
ent from what MOTHIA expected (UN-expected column), or a DK
answer. In fact, MEs should always pay attention when DEs assert
their expertise on a question but answer DK. It is something worth
to investigating deeper, because it could be due either to a model
error that disorients the DE or to a malformed question. Hence,
when the answer is DK, then a faulty question is considered killed
for this very reason, as the consequent analysis will very likely lead
to the discovery of the (injected) fault.
After collecting the answers, we performed an ex-post analysis
on the questionnaires, during which we discussed the unexpected
answers with the interviewees. Of course, in this analysis, we only
consideredthequestions that werenot relatedtothearticial faults
we injected, which were analysed and interpreted separately, as
discussed in Section 5.
4.3. The Experts Feedbacks
In the case study we also aimed at collecting feedback from the
interviewees about the questionnaires generated by MOTHIA. To
9
In mutation testing, a mutant is said to be killed when its outcome on a test case
is different from the outcome of the original program on the same test case.
Table 4
The Experts Feedbacks.
Evaluation Question Responses
1. Did any question stimulate you to think about aspects of
the domain that you had not considered before?
Yes: 9
No: 5
2. If yes, which ones? Feedbacks: 9
No Feedback: 0
3. In your opinion, were the questions heterogeneous enough? Yes: 12
No: 2
4. In your opinion, were the questions covering most of the
concepts within the domain?
Yes: 14
No: 0
5. Which is your overall evaluation about the previous
questionnaire?
A:6 B:6 C:2 D:0
6 6 2 0
6. Please, explain your answer Feedbacks: 13
No Feedback: 1
7. Please, suggest us how we can improve the questionnaire Feedbacks: 9
No Feedback: 5
this end, after theylledtheir questionnaire, weaskedDEs toll out
a form, whichwas structuredintwomainparts. The rst part aimed
at evaluatingthe extent towhichthe generatedquestionnaires cov-
eredthe domain(evaluationquestions 14); the secondpart aimed
at evaluating the usefulness of the questionnaire as perceived by
the DE (evaluation questions 5 7), and foresaw free text answers.
The questions of the evaluation form are presented in Table 4.
To evaluate howthe questionnaires generated by MOTHIAwere
perceived by the domain experts, in question 5 we proposed to
express a forced-choice preference by means of a four-point Likert
scale (Likert, 1932). Specically, the domain experts evaluated our
approachbyrankingthequestionnairewithascorefromA(e.g. Use-
ful, effective, etc.), to D (e.g. Useless, waste of time, supercial, etc.).
5. Discussion
This section discusses the results of the case study presented
in the previous section, and attempts an interpretation of those
results.
Note that, in this paper we report our experience in modeling
and validating a domain model for a real case study, in the inter-
action with real DEs that were not MEs and the vice-versa. The
difculty and costs of experimenting with professionals are a well
knownissueinempirical softwareengineering(Tichy, 2000). More-
over, we remark that the interpretation of the results we give in the
following is valid for the case study IPERMOB. Thus, even though
the results of this experience look promising, formulating generally
valid considerations is not in the scope of the paper (this is left to
future work).
As described in Section 4, our evaluation was conducted by
means of two-phase questionnaires: questions for the rst phase
were generated through MOTHIA(see Section 4.2), while questions
for the second phase (see Section 4.3) were formulated by us, and
aimed at collecting feedback about the questionnaire of the rst
phase.
5.1. Was MOTHIA effective in revealing modeling errors in
IPERMOB?
Althoughwe do not have extensive empirical results, we believe
the outcome of our experience can be interpreted positively with
respect to the effectiveness of MOTHIA in helping to reveal model-
ing errors. In particular, as we describe in Section 4.2, we assessed
the ability of MOTHIA to detect modeling faults based on howgood
the generated questionnaires have been in spotting the faults that
we deliberately inserted (see Table 2). Such assessment is based on
the assumption that if MOTHIAis effective in revealing the articial
faults, then it will be as good in revealing real modeling faults (in
mutation testing such assumption, which has been empirically val-
idated (Andrews et al., 2005), is known as the coupling effect).
Concerning the articial faults, the answers to the question-
naires were able to detect all types of faults (in at least one
questionnaire), and to kill almost half of the faulty questions (58%
if we exclude questions in which the MEs declared themselves not
expert).
More than this, we also collected answers differing from
expected ones to the genuine questions, i.e., not triggered by arti-
cial faults. Inthe ex-post analysis, the discussionbetweenus andthe
DEs stimulated by such conicts allowed us to detect several other
real faults that had slipped into the model. Due to the heterogene-
ity of IPERMOB experts, the interconnections among concepts of
different sub-domains (i.e. the ve model packages) in fact proved
to be particularly critical. The faults we discovered amount at 16
major faults (i.e. associations, and generalizations either deleted or
introduced) plus several other minor issues, and included mostly
ambiguities in relating a same entity from different domains, as
well as errors that were introduced in interpreting DEs exigencies.
Notably, some wrong answers highlighted the need to introduce
new model elements (classes and associations).
The last result seems to hint at the promising perspective that,
with reference to semantic quality (Lindland et al., 1994), not only
MOTHIA can be a means to check validity, that is what we had
expected, but could also help to check model completeness. We
plan to devote specic experiments to this purpose.
5.2. Was MOTHIA helpful in reducing the gap in IPERMOB?
The possible answers to the questionnaires described in Section
4.2, can be partitioned into two groups: those answers that classify
the responder as an expert of the topic treated by the question
(i.e., Yes, No, DK); and those answers that classify the responder as
not expert with respect to that specic topic (i.e., NE). To assess
whether MOTHIA is effective in formulating questions we refer to
the number of responders indicating themselves as experts.
Hence, we dene the efcacy of a questionnaire as a metric that
compares the number of the questions on which the responder was
actually able to express an opinion (i.e., answering Yes, or No), over
the total number of questions in the questionnaire for which the
interviewees classied themselves as experts of the topic:
#Pass +#Fail
#Total #NE
100
Since in this case study the questionnaires covered several sub-
domains, we had to accept that a questionnaire might not be
Table 5
Evaluation metrics.
Questionnaire %Efcacy %Adequacy
1 88,89 75
2 100 96,15
3 100 82,61
4 86,36 91,67
5 100 100
6 95,65 95,83
7 100 76
8 100 64
9 92 100
10 100 84
11 100 88
12 100 84
13 94,12 70,83
14 94,74 76
Mean 96,55 84,58
perfectly tailored to the expertise of a particular interviewee. This
explains why we admit a NE answer. Thus, when considering the
efcacy of a questionnaire, we should also take into account a
measure of the adequacy of that questionnaire w.r.t. the specic
interviewee. We dene the adequacy metric for a questionnaire and
a responder as the number of questions where the responder iden-
tiedhim/herself as anexpert, over the total number of questions
contained in the questionnaire:
#Total #NE
#Total
100
According to Table 5, most DEs were able to express an opinion on
the modeled elements starting from Thus, using the responses to
the questionnaires we could effectively communicate with the DEs
in IPERMOB.
It is important to notice that we are not arguing that MOTHIA is
either more useful or alternative to the dialogues between DEs and
MEs. MOTHIA is an approach that generates artifacts (i.e., the
questionnaires) that can facilitate such dialogues. For example, in
IPERMOB we drove the discussion with the DEs by means of those
questions whose DEs answer was different from the one expected
by MOTHIA. Although we cannot yet claim that the results of the
experience inIPERMOBhave a general validity, the results reported
in Table 3(a), and in the ex-post analysis described in Section 5.1,
show that such artifacts actually helped in guiding the communi-
cation between DEs and MEs.
5.3. Was MOTHIA perceived as useful in IPERMOB?
To evaluate the opinions of the DEs with respect to the per-
ceived usefulness of MOTHIA, we examined the responses to the
questions described in Section 4.3. According to Table 4, most of
the domain experts had a positive impression about the ques-
tions and the way they supported the cooperation and the sharing
of knowledge among the different stakeholders within IPERMOB.
Specically, 9 DEs out of 14 answered that the questionnaire gen-
erated by MOTHIA made them think about aspects of the domain
that they had not considered before (see Table 4, evaluation ques-
tion 1). In addition, some of the experts spontaneously declared to
be interested in further investigating those aspects. These positive
results were also clearly conrmed by the positive answers to the
evaluation questions 3 and 4.
Furthermore, in one case, even before our ex-post interviews,
a DE realized that she had introduced an ambiguity in the domain
description that she had previously given to us.
Overall, we got a rst feedback that IPERMOB DEs perceived
MOTHIA as useful.
6. From the experience to the experiment
Inlight of some potential threats to validity, our case study must
be considered closer to an experience rather than to an experiment.
Concerningtheeffectiveness of MOTHIAingeneratingquestion-
naires that reveal modeling errors, although we derived themfrom
a study of literature and attentive consideration, the articial faults
we introduced might not be representative of real faults. Neverthe-
less, our analysis was also backed by the detection of real faults. In
addition, to avoid biasing DEs towards negative answers, respon-
ders were not aware that we had deliberately introduced some
faulty questions.
Moreover, we are not deep experts of question lexicalization,
thus MOTHIA questions might not be always meaningful to DEs.
It was to cope with this issue that we introduced the answer of
type DK. We also fear that the positive feedback collected in Section
4.3 might be caused by the existing work relationship. To min-
imize such bias, we did not tell the DEs that the questionnaires
were aimed at evaluating our approach, but only at validating the
IPERMOB model.
Finally, eventhoughwe believe that the results of this rst appli-
cationof MOTHIAlookpromising, theconclusions of our experience
are far frombeing generalizable, as our validationwas basedonone
model only and a relatively small number of interviewees. Indeed,
we plan to make more extensive validation studies in the future.
In addition, all the experiment settings (see Appendix B), such as:
the number of questions in each questionnaire, the distribution of
questions over the criteria, the number and types of articial faults,
and so on, were decided ad hoc, although common sense and rea-
sonings were applied. Next studies will be adjusted based on these
rst results.
7. Related work
The research in this paper is related to work in several elds.
7.1. Quality of models
Bolloju and Leung (2006) tried to identify the most typical set
of errors frequently committed in UML modeling, and discussed
how they affect the quality (syntactic, semantic and pragmatic) of
models developed. With regard to class diagrams, the errors they
listed are analysed by our criteria.
Shanks et al. (2003) focusedonthe importance of early, effective
andefcient validationtoavoidthepropagationof defects tosubse-
quent system design. They argumented that ontologies could help
choosing the domain grammar, understanding the phenomena
represented in conceptual models and make sense of ambiguous
semantics.
7.2. Syntactic realization
The problem of generating plausible NL questions has not been
tackled in our paper: we used a naive approach, as this is not in the
scope of our research. There exist tools that address this problem,
referred to as the problem of syntactic realization.
10
Kroha and Rink (2009) described a text generation method for
requirements validation, in which they automatically generate a
textual description of a UML diagram(limited to use case, sequence
or state diagrams). Similarly to our approach, their purpose is to
bridge the gap between DEs and MEs, but they only translate the
model to NL rather than testing also wrong assertions on it.
10
Seehttp://www.aclweb.org/aclwiki/index.php?title=Downloadable NLG systems,
accessed in May 2010.
Cabot et al. (2010) implemented a transformation from UML
class diagrams (with possible OCL constraints) to SBVR, a OMG
metamodel for documenting the semantics of business vocabulary,
business facts and business rules. They also paraphrased the result
in Structured English and RuleSpeak English notations to facilitate
the discussion of the diagrams with the stakeholders. Again, they
only provide a NL translation of the model.
Dalianis (1992) proposed several categories of NL discourses on
a conceptual model, in order to ease its representation and valida-
tion. The input model is initially converted to a Prolog knowledge
base to be queried. The DE thenactively investigates the knowledge
base by asking questions (chosen from a limited set) that involve
model entities, andgets a NL replywitha selectionof relevant infor-
mation. The main difference fromour approach is the proactivity of
the DE, that chooses which areas of the model to inspect and may
miss some problems.
7.3. Validation using multiple-choice questions
Mitkov and Ha (2003) proposed a technique to generate
multiple-choice tests starting from electronic questions. The
approachof Mitkov differs fromours inseveral aspects: their goal is
to produce tests that are useful to test the expertise of the respon-
ders in the area covered by the questionnaire; they use NL analysis
to extract knowledge from a corpus and to formulate questions
related to that corpus. Our goal is to test the model, not the respon-
ders knowledge.
Hoshino and Nakagawa (2005) used machine learning tech-
niques to generate questions of ll-in-the-blank type, in order to
allowthe testing of any kindof knowledge, rather thenfor a specic
purpose like our approach.
Papasalouros et al. (2008) proposed the automatic creation of
NL multiple choice questions from domain ontologies. The goal
is to create a questionnaire for educational purposes, rather than
validation, since the input knowledge base is considered as cor-
rect. While we focused on simpler Yes/No questions, strategies
to create distractors can be seen as the equivalent of our crite-
ria.
7.4. Model querying
During the last years, Prolog has been proposed and studied in
literature as an analysis engine for software models. Specically,
in Strrle (2007) the author discussed why Prolog should be con-
sidered a valid alternative to other widespread and more common
querying languages such as SQL or OCL. In Opoka et al. (2008) the
authors measured and compared the model querying performance
of Prolog and OCL; and more recently, Prolog has been also pro-
posed as a platformfor a high level programming interface to query
UML models (Strrle, 2009).
Suchdiscussions arestimulatingseveral debates intothemodel-
driven communities; we here maintain a neutral position on
which is the best approach for model querying. What we con-
sider important in this work, is having an inference engine able
to deal with the semantics associated to the input modeling
language. Given the well established tradition of the Prolog infer-
ence engine, we adopted Prolog in implementing MOTHIA, but
the overall architecture proposed in Fig. 2 does not prevent
from developing other implementations based on other querying
languages.
8. Conclusions and future work
Domain models are an essential asset in MDE; their deni-
tion involves on the one side MEs who master the modeling
technology but might be agnostic of specic domains concepts,
and on the other the DEs who know all tiny details of the
domain but might not understand the model representation. The
knowledge gap between MEs and DEs is a well known chal-
lenge of MDE and different research directions are taken to ll
it.
In this work we contributed to such effort with an approach,
and its supporting tool MOTHIA, to interview DEs via NL question-
naires that are automatically generated starting from the domain
model. Although in this paper we considered UML Class Diagrams
as the input models, the tool itself was indeed engineered with
exibility in mind from the outset and relies on input models
expressed in Ecore. As a consequence, it can be easily adapted to
other Ecore-basedconceptual modelingnotations andDSLs withno
changetotheoverall architectureandwithlimitedimplementation
effort.
The approach has been applied to a real-life model validation
probleminthecontext of theIPERMOBresearchproject. Theresults
of the case study within IPERMOB are by no means conclusive and
a more extensive evaluation is necessary to draw generally valid
conclusions. Certainly, MOTHIA is not a magic tool, which can
by itself perform model validation and deliver perfect models.
Nonetheless, our initial studies provide some evidence to con-
rm that the approach is viable, and can be applicable to real
modeling problems, to facilitate the dialogue between DEs and
MEs.
Future research work will include experimentation on other
models with more interviewees and expanding the set of arti-
cial faults, focusing on understanding how patterns of common
modeling slips affect the modeling activity. We also plan to fur-
ther automate the process illustrated in Fig. 1, by automating the
analysis of DEs responses so as to map wrong answers on the model
elements they refer to, and by also exploring more systematic ways
to transform and correct the input models, which we currently
handled manually.
As technical improvement to MOTHIA, we plan to revise the
lexicalization module, in order to reduce the effect of poor syn-
tactic realization on the understandability of questions. In order
to simplify the denition of custom patterns and criteria, we are
considering the integration of more sophisticated Prolog libraries
(such as the one presented in Strrle, 2009). We are also planning
to widen the range of accepted UML input models, to include Use
Case Diagram, Activity Diagram, and Sequence Diagram. Due to the
modularity of the tool architecture, only a few modications are
required: (a) extension of the KnowledgeBaseGenerator component
in order to deal with such diagrams and to create appropriate facts;
(b) denition of new semantic rules, if any; (c) creation of patterns
and criteria that use the new facts and rules.
An interesting direction for future research concerns extending
MOTHIA in order to support the validation of model completeness.
Inother words, we are investigating howto generate questions that
stimulate discussions about elements that do not exist inthe model
yet. To this end, the information contained in the input model itself
is not enough, and it should be supported by additional knowl-
edge bases (e.g. domain ontologies). Specically, we are planning
to exploit semantic relations among sets of cognitive synonyms
(i.e. synsets) (Garshol, 2004) that include model elements. Unfor-
tunately, such ontologies are seldom available, and denitely they
do not cover every possible domain. However, in a preliminary
studyunder development, weareplanningtoimplement adomain-
independent approach integrating MOTHIA with a lexical database
such as WordNet (Miller, 1995). For example, by means of such
integration MOTHIA would generate questions using entities that
are not strictly dened within the domain model, but that Word-
Net denes as synonymfor another entity contained in the domain
model.
Acknowledgements
This paper has been supported by the Regione Toscana Project
POR CReO FESR: IPERMOB, and partially by the European Project
FP7 IP 216287: TAS
3
. We would like to thank the anonymous
reviewers for the thoughtful comments, which helped us to greatly
improve the presentation.
Appendix A. Code examples for Section 3.2
Listing A.1 shows the XMI le that implements the marriage pat-
tern. A pattern is composed of a signature, a SWI-Prolog functor
with variables as arguments (line 4), and a number of implemen-
tation s that can be sorted through the priority attribute (just one
in this case, lines 5 through 13).
In particular, the marriage pattern implementation is a SWI-
Prolog rule whose head is the signature itself. The implementation
body uses the inheritance fact to select the relevant portion of the
input model. If we consider again the scenario in Fig. 3, the inheri-
tance facts in the knowledge base are:
inheritance(Party, SomeParty).
inheritance(Authority, Mayor).
inheritance(Person, Mayor).
inheritance(Person, Councillor).
When the marriage pattern is used to query the InferenceEngine,
the variables of its signature match just once with the information
in the knowledge base. Thus, the following quadruple is returned:
Father = Person,
Mother = Authority,
StepSon = Councillor,
Son = Mayor.
Listing A.2 shows the XMI le that implements the crit1 crite-
rion.
A criterion contains a number of sections to create deductive
questions (lines 3 through 10), as well as a number of sections to
create distractors (lines 11 through 18). Each of these sections has
a template to generate a NL question, where variables between $
characters need to be replaced (lines 3 and 11). A number of pat-
terns can be used by each section to get the needed values to ll
in the templates. The crit1 criterion uses only the marriage pat-
tern, both for deductive questions and distractors (lines 4 and 12).
For the successful construction of the NL question, variables in the
question template must match the variables of the pattern sig-
nature, as it happens in this example. It is also possible here to
rename the variables in the pattern signatures, or to pass atoms as
actual parameters, to allow for exibility when using more than
one pattern. The remaining lines deal with the use of lters.
Appendix B. Conguration of the case study
In the following we report the settings we used to congure our
case study, and some brief motivations behind them:
number of questions per questionnaire: set to 30, as a trade off

between a high number of questions and an appropriate duration
of the interview;
distribution of questions over criteria: we ensured at least one

questionfromeachcriterion, andpickedmore questions for crite-
ria focusing onsyntactical structures (patterns) whichwere more
widely used in the model;
Listing A.1. marriage pattern implementation.
Listing A.2. crit1 criterion implementation.
number and types of injected faults: 5 injected faults of 5 differ-

ent types, 2 from literature of mutation testing and 3 to affect
association questionnaire, and we reached that percentage after
the introduction of the fth fault.
References
Andrews, J.H., Briand, L.C., Labiche, Y., 2005. Is mutation an appropriate tool for
testing experiments? In: Proceedings of the 27th international conference on
Software engineering. ICSE05 ,. ACM, New York, NY, USA, pp. 402411.
Bzivin, J., 2005. On the unication power of models. Softw. Syst. Model. 4 (2),
171188.
Boehm, B.W., 1984. Verifyingandvalidatingsoftwarerequirements anddesignspec-
ications. IEEE Softw. 1 (1), 7588.
Bolloju, N., Leung, F.S.K., 2006. Assisting novice analysts in developing quality con-
ceptual models with UML. Commun. ACM 49 (7), 108112.
Cabot, J., Pau, R., Ravents, R., 2010. From UML/OCL to SBVR specications: a chal-
lenging transformation. Inf. Syst. 35 (4), 417440.
Cheng, B.H.C., Atlee, J.M., 2007. Research Directions in Requirements Engineering.
In: Briand, L., Wolf, A. (Eds.), FOSE, pp. 285303.
Dalgarno, M., Fowler, M., 2008. Uml vs. domain-specic languages, 2008.
Dalianis, H., 1992. A method for validating a conceptual model by natural language
discourse generation. In: CAiSE , pp. 425444.
France, R., Rumpe, B., 2007. Model-driven development of complex software: a
research roadmap. In: FOSE07: 2007 Future of Software Engineering ,. IEEE
Computer Society, Washington, DC, USA, pp. 3754.
Garshol, L.M., 2004. Metadata? thesauri? taxonomies? topic maps! making sense of
it all. J. Inform. Sci. 30 (4), 378391.
Glinz, M., Wieringa, R.J., 2007. Guest editors introduction: stakeholders in require-
ments engineering. IEEE Softw. 24, 1820.
Gogolla, M., Bohling, J., Richters, M., 2005. Validating UML and OCL models in USE
by automatic snapshot generation. Softw. Syst. Model. 4 (4), 386398.
Hoshino, A., Nakagawa, H., 2005. A real-time multiple-choice question generation
for language testing: a preliminary study. In: Proc. of EdAppsNLP05 ,. ACL, Mor-
ristown, NJ, USA, pp. 1720.
Konrad, S., Cheng, B.H.C., 2005. Automated analysis of natural language properties
for UML models. In: Bruel, J.-M. (Ed.), MoDELS Satellite Events. Vol. 3844of LNCS.
Springer, pp. 4857.
Kroha, P., Rink, M., 2009. Text generation for requirements validation. In: ICEIS , pp.
467478.
Likert, R., 1932. A technique for the measurement of attitudes. Arch. Psychol. 22
(140), 155.
Lindland, O.I., Sindre, G., Slvberg, A., 1994. Understanding quality in conceptual
modeling. IEEE Softw. 11 (2), 4249.
Liskov, B., Wing, J.M., 1994. Abehavioral notion of subtyping. ACMTrans. Prog. Lang.
Syst. 16 (6), 18111841.
Malgouyres, H., Motet, G., April 2006. A UML model consistency verication
approachbasedonmeta-modelingformalization. In: Haddad, H. (Ed.), SAC. ACM,
Dijon, France, pp. 18041809.
Mernik, M., Heering, J., Sloane, A.M., 2005. When and how to develop domain-
specic languages. ACM Comput. Surv. 37 (4), 316344.
Miller, G.A., 1995. Wordnet: a lexical database for English. Commun. ACM 38 (11),
3941.
Mitkov, R., Ha, L.A., 2003. Computer-aided generation of multiple-choice tests. In:
Proc. of HLT-NAACL03. ACL , Morristown, NJ, USA, pp. 1722.
Mottu, J.M., Baudry, B., Le Traon, Y., 2006. Mutation analysis testing for model trans-
formations. In: Rensink, A., Warmer, J. (Eds.), ECMDA-FA. Vol. 4066 of LNCS.
Springer, pp. 376390.
Opoka, C.J., Felderer, M., Lenz, C., Lange, C., 2008. Querying UML models using OCL
and prolog: a performance study. In: Proc. of ICSTW08 ,. IEEE CS, Washington,
DC, USA, pp. 8188.
Papasalouros, A., Kanaris, K., Kotis, K., 2008. Automatic generationof multiple choice
questions from domain ontologies. In: Nunes, M.B., McPherson, M. (Eds.), e-
Learning. IADIS, pp. 427434.
Revuelta, J., June 2004. Analysis of distractor difculty in multiple-choice items.
Psychometrika 69 (2), 217234.
Schmidt, D.C., 2006. Guest editors introduction: model-driven engineering. Com-
puter 39, 2531.
Selic, B., 2007. Asystematic approach to domain-specic language design using uml.
In: ISORC07: Proceedings of the 10th IEEE International Symposium on Object
and Component-Oriented Real-Time Distributed Computing ,. IEEE Computer
Society, Washington, DC, USA, pp. 29.
Shanks, G., Tansley, E., Weber, R., 2003. Using ontology to validate conceptual mod-
els. Commun. ACM 46 (10), 8589.
Strrle, H., 2007. A prolog-based approach to representing and querying software
engineering models. In: Cox, P., Fish, A., Howse, J. (Eds.), VLL. Vol. 274 of CEUR
Workshop Proceedings, pp. 7183 (CEUR-WS.org).
Strrle, H., September 2009. A logical model query interface. In: Proc. of the VLL, vol.
510 , pp. 1836.
Tichy, W.F., 2000. Hints for reviewing empirical work in software engineering.
Empirical Softw. Eng. 5 (4), 309312.
Walter, T., Silva Parreiras, F., Staab, S., 2009. Ontodsl: an ontology-based frame-
work for domain-specic languages. In: MODELS09: Proceedings of the 12th
International Conference onModel DrivenEngineering Languages and Systems,.
Springer-Verlag, Berlin, Heidelberg, pp. 408422.
Antonia Bertolino (http://www.iei.pi.cnr.it/ antonia/) is a CNR Research Director in
ISTI-CNR, Pisa. She is an internationally renowned researcher in the elds of soft-
ware testing anddependability, andparticipates to the FP7 projects CHOReOS, TAS3,
CONNECT. She is an Associate Editor of the IEEE Transactions on Software Engineer-
ing and Springer Empirical Software Engineering Journal, and serves as the Software
Testing Area Editor for the Elsevier Journal of Systems and Software. She is the Pro-
gram Co-Chair of IEEE ICST 2012, and has been the Program Chair of the agship
conference ESEC/FSE 2007. She has (co)authored over 100 papers in international
journals and conferences.
Guglielmo De Angelis received a rst-class honors degree in Computer Science in
2003 from University of LAquila and a PhD in Industrial and Information Engineer-
ing from Scuola Superiore SantAnna of Pisa in 2007. Currently he is a researcher at
ISTI-CNR. His research mainly focuses on Model-Driven Engineering and on Service
Oriented Architectures. In particular in he is studying approaches for: the gen-
eration of environments for QoS testing, the generation of adaptable monitoring
infrastructures for QoS, analytically predict performance parameters, empirically
test performance parameters, test on-line QoS parameters such as trustworthiness.
Alessio Di Sandro received his degree (with honors) in Computer Engineering from
the University of Pisa in2009. During his Masters thesis, he also worked for Ericsson
Research in Stockholm, Sweden. He was a researcher at CNR-ISTI in Pisa, Italy, and
currently he is an independent IT professional in Sydney, Australia. His research
interests include topics from Model-Driven Engineering and Visual Technologies,
such as model validation, automated code generation, network monitoring, design
of graphical interfaces, web technologies.
Antonino Sabetta is a researcher at SAP Research, Sophia-Antipolis, France. Before
joining SAP in October 2010, he was a researcher at CNR-ISTI, Pisa, Italy. Antonino
received his PhD in Computer Science and Automation Engineering from the Uni-
versity of Rome Tor Vergata, Italy in June 2007. He had previously earned his Laurea
cum Laude degree in Engineering in 2003 from the same University. Antoninos
research interests cover various topics in software engineering, including applica-
tions of model-driven engineering, online monitoring, and more recently, security
assurance in service-oriented systems. Antonino serves regularly as a reviewer
for the major journals and conferences on software engineering; he is in the
organizing committee of international schools (MDD4DRES, Modeling Wizards)
as well as a program committee member of several international conferences and
workshops.

Model Why

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Model Why

Hochgeladen von

Copyright:

Verfügbare Formate

The Journal of Systems and Software 84 (2011) 10891099

Contents lists available at ScienceDirect

, Alessio Di Sandro, Antonino Sabetta

Corresponding author at: CNR, Istituto di Scienza e Tecnologie dellInformazione

Is Councillor a kind of Person?

Can the relation fromMayor to Councillor exist, where Coun-

Can the relation from City to Mayor exist, where Mayor is a

Is Mayor a kind of Person?

number of questions per questionnaire: set to 30, as a trade off

distribution of questions over criteria: we ensured at least one

number and types of injected faults: 5 injected faults of 5 differ-

Das könnte Ihnen auch gefallen