You are on page 1of 91

Proceedings of the First Symposium on Healthcare

Systems Interoperability

Organized by the OpenHealth-Spain group

Alcalá de Henares, Madrid, Spain, April 2009

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Table of contents

Foreword 5

Standardization of Clinical Health Record through CEN/ISO 13606 7

(1) (2)
R. Somolinos Cristóbal , A. Muñoz Carrero 7

Standardization Framework for Legacy Health Information Systems 16

(1) (1) (1) (1) (1) (1)
David Moner , José-Alberto Maldonado , Diego Boscá , Montserrat Robles , Carlos Angulo , Ernesto Reig ,
Luis Marco(1), Pablo Serrano(2), Daniel Pérez(3) 16

Ontology-based Archetype Interoperability and Management 22

(1) (1) (1)
Catalina Martínez-Costa , Marcos Menárguez-Tortosa , J. T. Fernández-Breis 22

Standardized Access Policies for the EHR 28

(1) (1) (1) (1) (1) (1)
David Moner , Montserrat Robles , José-Alberto Maldonado , Diego Boscá , Carlos Angulo , Ernesto Reig ,
Luis Marco(1) 28


(1,) (1) (2)
Raimundo Lozano , Xavier Pastor , Esther Lozano 34

Evaluation of a Named-Entity Recognition System over SNOMED CT 42

(1) (1) (1)
Elena Castro , Leonardo Castaño , Paloma Martinez 42

A practical approach to create ontology networks in e-Health: The NeOn take 48

Tomás Pariente Lobo(1,), Germán Herrero Cárcel(1) 48

Software Agent Standards Based Electronic Health Records Communication Platform 56

Diego Boscá(1), David Moner(1), José Alberto Maldonado(1), Carlos Angulo(1), Ernesto Reig(1), Montserrat Robles(1) 56

CARDEA: Service platform for monitoring patients and medicines based on SIP-OSGi and RFID technologies in
hospital environment 62
Saúl Navarro, Ramón Alcarria, Juan A. Botía, Silvia Platas, Tomás Robles 62

On processing processes in healthcare: combining processes and reasoning in personal health records 68
Leonardo Lezcano, Miguel-Angel Sicilia 68

Generating Standardized Demographic Repositories 74

Diego Boscá(1), David Moner(1), José Alberto Maldonado(1), Carlos Angulo(1), Ernesto Reig(1), Montserrat Robles(1) 74

Archetypes and ontologies to facilitate the breast cancer identification and treatment process 80
Ainhoa Serna(1), Jon Kepa Gerrikagoitia(1), Iker Huerga, Jose Antonio Zumalakarregi (2), Jose Ignacio Pijoan(3) 80

A method to reconcile biomedical terminologies 86

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

M. Taboada(1), R. Lalín(1), D. Martínez(2) 86

Annex I. Committees 91

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

The first OpenHealth-Spain symposium aimed at gathering together researchers and professionals
interested in the area of the interoperability in Health Systems. The workshop took place at the
Rectorate building of the University of Alcalá (Spain), 29th and 30th of April, 2009. The Information
Engineering Unit, a research group of the Computer Science Department of the University
organized the event with the collaboration of the Hospital of the Fuenlabrada city (located in the
metropolitan area of Madrid) and of the Health Division of Atos Origin.
The workshop was initially conceived by participants of the CISEP project (“Historia Clínica
Inteligente para la seguridad del Paciente”/”Intelligent Clinical Records for Patient Safety”, code
FIT-350301-2007-18, funded by the Spanish Ministry of Industry), and later supported by members
of a Spanish informal network of researchers and practitioners working in related topics. Even
though the working language of the workshop was Spanish, peer reviewed contributions were
requested in English to achieve broader dissemination.
The workshop featured an invited talk by the Director
of ATOS Research & Innovation, Jose Maria Cavanillas,
titled “Personalized Health and the Future Internet”,
and a round table discussing the recently issued
SemanticHealth report1, which were explained by
Raimundo Lozano from the Hospital Clínic (Barcelona).
A session was also devoted to discuss on the EuroRec2
network promoting best practice in Electronic Health
Records (HER).

This proceedings book collects the peer reviewed papers presented at the Workshop, which
reflect the diversity and high quality work being carried out at a national level in the topics of the
Workshop. The second edition of the workshop will take place at the Hospital of the Fuenlabrada
city in 2010.

Miguel-Angel Sicilia
Workshop co-chair


Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Standardization of Clinical Health Record

through CEN/ISO 13606
R. Somolinos Cristóbal(1), A. Muñoz Carrero(2)
Bioengineering, Biomaterials and Telemedicine Laboratory, Hospital Universitario Puerta de Hierro Majadahonda,
Majadahonda, Madrid, Spain
Telemedicine and eHealth Research Unit, Health Institute Carlos III, Madrid, Spain

Electronic Health Record (EHR) is one of the most important research areas in the Telemedicine field. In
this paper, the great European challenge of EHR standardization and interoperability, the standard
CEN/ISO 13606, is described. Our investigation group has an opened work line in this field, in which we
are carrying out projects as design and development of a ‘middleware’ module EHR server according to
the standard ISO 13606, compatibility studies between different standards (CCR modeling, design of
harmonization mechanisms based on archetypes) and the development of a demographic server
according to the standard EN13606 integrated in different clinical trials.

1. Introduction
The ongoing monitoring of patients and their high mobility are the factors which impulse the need
of using Electronic Health Records (EHR) in current health systems. The information systems
should be able to transfer the information so that its meaning is preserved, regardless of the place
they are located. For this reason, interoperability is an indispensable requirement for EHR. In
order to carry out this requirement, EHR should be normalized, or at least, interchange messages
between EHR systems.
The main goal of the European norm EN13606 is normalizing the transference between EHR
information systems (or part of them) so that they could be interoperable, but without specifying
how to implement such systems. The Task Force EHRCom, part of the Working Group 1 of the
Technical Committee 251 -Health Informatics- of the European Committee of Normalization (CEN)
[1] is responsible for the elaboration of the norm EN13606. It follows the current paradigms on the
standards creation: separation of points of view, separation of responsibilities and separation
between information and knowledge. This standard, automatically adopted as a Spanish standard
UNE, is in process to become an ISO standard. The first four parts have already been accepted and
the fifth part is following a common process in both normalization organizations under the Vienna
This standard is orientated to communication. It defines how to perform the EHR interchange
between information systems, allowing them to store clinical data as they prefer. The norm can be

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

divided into five parts: 1.- Reference model, 2.- Archetype interchange specification, 3.- Reference
archetypes and term lists, 4.- Security, and 5.- Exchange models.
The norm ISO 13606 is based on the double model information/knowledge, and its design changes
radically with respect to previous norms. The dual model is based on two models: reference model
and archetypes model. The reference model represents the structures used to store information,
while the archetypes model is used to generate structures that store the domain knowledge.
These structures are called archetypes.
The Reference Model (RM) defines the necessary structures to organize the information. The most
general structure is the extract, which contains the chosen part of the EHR to be transferred to
another information system. The extracts are contained in messages, which are higher level
structures defined in the part 5 of the standard. The extracts include demographic information to
recognize patients and all the agents related, information about access policies, clinical
information and other types of auxiliary information as audits or signatures.

2. EHR server according to ISO 13606

Our research group has designed and developed a ‘middleware’ EHR server according to the
standard ISO13606 [2-5]. This server uses technologies of very different fields: Java as
programming language, XML Schemas to validate XML extracts, SAX to analyze the extracts and
Web Services as communication technology.
The service presents a client-server architecture and it is prepared to work with many clients
simultaneously. The clients can access to the functions through Web Services. The functions
offered by the server are storeExtract y retrieveExtract. On the one hand, the function storeExtract
sends an extract in XML file to the server, extracts its information and stores it in the data base.
On the other hand, the function retrieveExtract retrieves an extract previously stored in the data
base like a XML file. These functions are based on unique identifiers to store and retrieve the
The server is structured around some modules which give solutions for different functional
requirements. Modules are independent, so their integration and reusability are easier. In figure 1,
server modules are shown in a diagram. These modules and the way they manage the information
will be described in next paragraphs.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 1: EHR server diagram.

The main server module is the EHR representation according to the reference model of the norm
ISO 13606. With this objective, four Java classes packets have been developed: one to represent
the reference model, other for the demographic packet, another for the support data types and
the last to represent other data types. One Java class is generated by each UML entity of the
different reference models. Each class includes variables that represent different fields, functions
to access and update the variables, functions to obtain the information from the XML extracts and
functions to communicate with the data base.
The storage module is based on an ODBC interface, so the data base can be changed easily
without modifying the classes. Currently, the storage is made in a MySQL data base, though other
data bases were used before as Access data base. Each Java class is represented by a different
table in the data base. The relationships between different classes are implemented based on the
unique key of each class.
The validation module is responsible for verifying the XML files that arrive to the server before
they are processed, and those XML files that are going to be transmitted to the client before they
are sent. Thus, future errors can be avoided. In order to reach this objective, some XML Schemas
have been implemented with the same structure than the libraries.
The parser module is responsible for processing XML data. This module transforms the
information from the validated XML extracts into Java objects according to the reference model.
The information processing is possible in both directions: from XML files to Java objects and from
Java objects to XML files. The first case is more complex and it needs the use of a specific XML
analysis technology. The chosen technology is SAX and the parser has been developed with the
SAX Java library. The second case is easier and the transformation is made through of a recursive
function called convertToXml.
The communication module is based on web services. To implement these web services, the Axis
tool from Apache has been chosen as Java library for the functions development. The web services
deployment was made with Tomcat application server through the port 8080.

3. Compatibility studies with other norms

In addition to the standard ISO 13606, there are other alternatives in the EHR standardization as
HL7v3 and the initiative openEHR. So, the compatibility among different EHR standards is a vital
aspect to reach the systems interoperability. Our investigation group has performed some studies
to improve the compatibility among norms. The two studies most relevant are described in the
next sections.

3.1. CCR modeling according to ISO 13606

The Continuity of Care Record (CCR) [6] is the specification of a standard developed by ASTM
International (American Society for Testing and Materials), MMS (Massachussets Medical Society),
HIMMS (Health Information Management and Systems Society), AAFP (American Academy of
Family Physicians) and AAP (American Academy of Pediatrics) whose objectives are to organize
and to transfer enough patient data in order to give support to the assistance continuity. Other
secondary objectives are to increase the patient security, to reduce the risks of medical errors, to

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

reduce costs, to make more efficient the information interchange and to ensure a standard for the
information interchange when the patient is derived or transferred to another health professional.
The CCR is used to derive a patient from a section to another, to possess a personal health record
and to store an attention record and to present it in future occasions. The CCR contains three main
components: the header, the body and the footer. The header includes fields as identifier,
language, version, date/time and patient identifier, from, to and purpose. The body includes
sections relative to clinical status and administrative aspects. And the footer includes other extra
fields as actors, references, comments and signatures.
Our group made an exhaustive study about how to represent the CCR and its fields on an EN
13606 extract, for possible transmission among different information systems [7]. The result was
that the CCR can be represented as a composition on an extract, because it is a document created
in a unique interaction with the information system. This composition consists of three main
sections linked with the three main parts of the CCR: header, body and footer. This is shown in
figure 2. Each section is modeled by a different archetype.

Fig. 2: CCR representation in an EN 13606 extract.

Each header field is translated as an entry in its respective section. Different demographic entities
are linked with their respective classes of the demographic packet and they are pointed from the
fields From and To. The patient identifier field is translated in the header section by
subject_of_care, which is a field specifically defined by the standard EN 13606.
The body fields can be classified into two different groups. On the one hand, there are no clinical
fields (payers, support, healthcare providers) that are linked in the extract with one section for
each of them inside the body section and they can point to demographic entities. On the other
hand, there are fields that include clinical information and they are translated by necessary clinical
sections. These sections can link other patient compositions (other sanitary meetings).
The footer has only four fields. References and comments fields are translated as two sections in
the footer section of the extract. The signatures field is translated using the Attestation_info class
from the reference model and the actors field is represented by the necessary demographic
entities of the standard EN 13606.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

3.2. Harmonization mechanism HL7v3/ISO13606 based on archetypes

Following the lines to harmonize the different EHR standards, our group proposed a new model
for automatic translation of HL7v3 messages to ISO13606 extracts [8, 9]. The design of this
mechanism is based on two rules: that the mechanism could be applied to all HL7v3 messages,
and that the mechanisms developed in the norm ISO13606 can be reused in this new model.
Thus, the method proposed for the information modeling consists in the reference model
restriction using the archetype model modified with a new “paths” mechanism. This new
mechanism is very similar to the ADL mechanism used to link HL7 messages content with the
classes of the reference model EN13606.
The proposed harmonization mechanism is based on the following points:
 Creation of an archetype according to the EN13606 archetype model (with necessary
modifications) for each message defined by a HMD
 The values restrictions for the archetype attributes can contain “paths” what point directly
to the values attributes in the HMD
 The “paths” are obtained automatically and recurrently from the HMD, beginning with the
attributes names of the expressed hierarchy in the HMD.
 These archetypes can be stored in repositories in order to be used later for messages
 When a message is going to be translated, the appropriated archetype will be extracted
from the repository, according to the message type. The classes expressed in the archetype
will be generated and their fields will be filled in with the original message contents
pointed by the archetype “paths”
 It is possible to use archetypes inside of other archetypes. There is a mechanism to create
relative “paths”, so paths can be generated from different archetypes paths in a recursive
The proposed mechanism uses the ISO13606 archetype model, but with small modifications. The
changes allow to “point” from an EN13606 attribute to its equivalent value in the HL7 message.
The modifications are the followings:
 Restriction of attribute values: a restriction should link the attribute values of the ISO13606
extracts with the attribute content of the HL7v3 messages. For this, it is necessary to
include the reserved word use_value_path in ADL in order to include the path extracted
from the HMD.
 Relative paths in extern archetypes: a relative paths mechanism is needed to use extern
archetypes. ADL is modified to add a new restriction to the allow_archetype structures,
with the reserved word paths.
4. Demographic server according to ISO 13606
A great need for health information systems is the management of demographic data of patients,
health staff and other entities involved in medical care, separated from clinical information. This
need has grown with the data protection law. Moreover, information systems are not currently
prepared to manage demographic information in a shared way.
Our investigation group has a wide work line opened on treatment and monitoring of chronic
patients using Telemedicine. In this line, a platform has been developed in order to give support to

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

large investigations projects and clinical trials of different pathologies (arterial hypertension [10],
oral anticoagulant treatment [11] and asthma [12]). In these projects, the patient personal
information is stored in local databases. Moreover, the demographic information used can vary
depending on each project.
For these reasons, our investigation group designed and developed an extern and independent
demographic server according to the norm ISO13606. This server is accessible by different
information systems to manage their demographic data. Thus, a separation in the storage
between clinical data and patient personal data is achieved as it is established by the data
protection law.
From now on, our group intends that the new investigation projects of this line will be integrated
with the demographic server, separating conceptually and physically the storage and the
management of clinical and demographic information. This is a first step to demonstrate the need
and viability of using extern and independent demographic servers to sanitary authorities.
For the design and development of the demographic server, the work and experience obtained in
the EHR server have been exploited. Many technologies are the same in both servers: Java as
programming language, database interface ODBC (MySQL), SAX technology for the parser,
communication technology based on Web Services implemented with the Axis tool.
The demographic server is focused in the classes of the demographic packet and the support
packet of the EHR server. The different functions are accessible for the clients through a web
service. These functions are the well-known functions storeExtract y retrieveExtract, from the EHR
server, and other new specific functions for the management of demographic data from the
Telemedicine projects:
 retrieveIdentifiedEntity: returns an extract with all the demographic information of a
patient linked by one of the patient identifiers stored in the system
 getPatientName, getPatientFullName y getPatientAllData: are different functions that
return different demographic data depending on the needs of each client. The input
arguments are a patient identifier and the project identifier.

The necessity and the concept of the demographic archetype were born in the development of the
demographic server. A demographic archetype is a way to represent the knowledge that limits the
multiple possibilities of the ADL language over the classes of the demographic packet. So,
information could be sent to the server according to specific archetypes, and the demographic
information could be asked to the server with only a patient identifier and a specific archetype. In
this way, the server would only return the requested data in the indicated format. Each project, or
each application, would define its own archetypes that model the used demographic information.
The server could be used with different archetypes without modifications.
1. CEN (technical committee 251). (available Apr-2009). A. Muñoz,
R. Somolinos, J. A. Fragua, C. H. Salvador. Servidor de historias clínicas electrónicas
conforme a la norma EN 13606. Informática y Salud, n. 51, march 2005, 47-52

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

2. R. Somolinos, A. Muñoz, J. A. Fragua, C. H. Salvador. Servidor de historias clínicas

electrónicas conforme a la norma europea EN 13606. VIII Congreso Nacional de
Informática de la Salud – INFORSALUD 2005, Madrid, 5-7 april 2005, 143-147
3. Somolinos R, Muñoz A, Fragua JA, Pascual M, González MA, Salvador CH. Servidor de
extractos de historias clínicas conformes a la norma EN 13606. XXIII Congreso Anual de la
Sociedad Española de Ingeniería Biomédica (CASEIB'05), Madrid, 10-12 november 2005,
4. Muñoz A, Somolinos R, Pascual M, Fragua JA, González MA, Monteagudo JL, Salvador CH.
Proof-of-concept Design and Development of an EN13606-based Electronic Health Care
Record Service. Journal of the American Medical Informatics Association (J Am Med Inform
Assoc), vol 14, 2007, 118-129 (DOI 10.1197/jamia.M2058).
5. Continuity of Care Record (CCR).
(available Apr-2009). A. Muñoz, R.Somolinos, C. H. Salvador. Descripción del ASTM E2369
Continuity of Care Record (CCR) según la Norma Europea EN13606. Informática y Salud, n.
59, november 2006, 79-86
6. Muñoz A. Interoperabilidad semántica entre los modelos de historia clínica electrónica de
CEN y HL7. Propuesta de un modelo de armonización. Doctoral thesis. 2007.
7. R. Somolinos, J. A. Fragua, M. A. González, M. Pascual, E. Pregigueiro, P. García, M.
Carmona, A. Muñoz. Mecanismo de armonización HL7v3 / CEN/ISO 13606 basado en
arquetipos. XXVI Congreso Anual de la Sociedad Española de Ingeniería Biomédica
(CASEIB'08), Valladolid, 15-17 october 2008, 533-536.
8. M. Pascual, CH. Salvador, PG. Sagredo, J. Marquez-Montes, MA. González, JA. Fragua, M.
Carmona, LM. Garcia-Olmos, F. Garcia-Lopez, A. Muñoz, JL. Monteagudo. Impact of
patient-general practitioner interaction on the control of hypertension in a follow-up
service for low-to-medium risk hypertensive patients. IEEE T Inf Technol B, 2008, 12(6):780-
9. CH. Salvador, A. Ruiz-Sanchez, MA. Gonzalez, M. Carmona, M. Pascual, PG. Sagredo, JA.
Fragua, F. Caballero-Martinez, F. García-López, J. Márquez-Montes, JL. Monteagudo.
Evaluation of a telemedicine-based service for the follow-up and monitoring of patients
treated with oral anticoagulant therapy. IEEE Trans Inf Technol Biomed. 2008, 12(6):696-
10. L. Otero, M. Pascual, J. A. Fragua, P. García-Sagredo, M. Carmona, I. Urgoiti, M. A.
González, A. Muñoz, A. López-Viña, L. Sánchez-Agudo, J. L. Monteagudo, C. H. Salvador.
Seguimiento de planes de autocuidado en asma mediante un sistema de telemedicina. VIII
Congreso Nacional de Informática de la Salud – INFORSALUD 2005, Madrid, 5-7 april 2005,

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Standardization Framework for Legacy

Health Information Systems
David Moner(1), José-Alberto Maldonado(1), Diego Boscá(1), Montserrat Robles(1), Carlos Angulo(1),
Ernesto Reig(1), Luis Marco(1), Pablo Serrano(2), Daniel Pérez(3)
IBIME group, ITACA Institute, Universidad Politécnica de Valencia,
Camí de Vera S/N, 46022 Valencia, Spain
Hospital de Fuenlabrada, Madrid, Spain
Hospital General Universitario de Valencia, Valencia, Spain

The construction of a Virtual Federated Electronic Health Record (VFEHR) requires using standards, tools
and an adequate technological infrastructure. We have developed LinkEHR as a framework platform for
the standardization, integration and sharing of health information among distributed and
heterogeneous Health Information Systems. To perform this task, LinkEHR trusts in archetypes as a
mechanism for semi-authomatic normalization of legacy data. This framework has already been
evaluated in existing health institutions for the construction of standardized extracts of the EHR.
Keywords: Health Information Systems, Standardization, Electronic Health Record, Archetype

1. Introduction
For many years, the standardization of health information systems has been an added value with
no direct influence in the daily work of healthcare institutions. Many other problems, such as the
basic digitalization of the diverse clinical information were more urgent. But healthcare
institutions are not a closed environment anymore. Nowadays, information sharing is not the
exception but the rule. To obtain a unified and universal electronic health record (EHR) for each
person is one of the most important objectives of health informatics. This Virtual Federated EHR
(VFEHR) should include all the existing information related to a person from his birth to his death,
independently of the place where the patient has received attention. Resolving this problem
requires interconnecting all the information systems and achieving an agreement about the
format of the transmitted information. But not only the syntax is important, but also the meaning
of the information, which assures a correct interpretation by human readers or computer systems.
This is called semantic interoperability and it is mainly based on the use of ontologies or medical
terminologies and the formal definition of the domain concepts that will be used by the HIS. Both
problems are faced by the CEN EN13606 standard for EHR communication.
2. CEN EN13606
The CEN EN13606 norm [1] is a five-part standard developed by the European Committee for
Standardization (CEN) intended for the communication and semantic interoperability of EHR
extracts among heterogeneous HIS.
Reference Model
The first part of the standard develops an information reference model (RM), which describes a
generic model for representing any clinical annotation of the EHR. It specifies how health data

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

should be aggregated to create more complex data structures, and the context information that
must accompany every piece of data in order to meet ethical and legal requirements. The RM also
stores context information of the clinical events together with clinical information. For example, it
supports information about the subject of care, the place and date of the clinical event, and the
participants in the clinical act.
Archetype Model
The second part of the standard defines an Archetype Model (AM). Archetypes are formal
definitions of domain-level concepts in the form of structured and constrained combinations of
the classes contained in the reference model. Their principal purpose is to provide a powerful,
reusable and interoperable mechanism for managing the creation, description, validation and
query of EHRs. For each domain concept, a definition can be developed in terms of constraints on
structure, types, values, and behaviors of business objects. Basically, archetypes are a means for
providing semantics to data instances that conform to some reference model by assuring that data
obey a particular structure and satisfy a set of semantic constraints.
Examples of archetypes can include prescriptions, problem lists, differential diagnosis, pregnancy
reports or blood pressure observations. In fact, any desired archetype can be defined when it is
3. The LinkEHR approach
In most health organizations the coexistence of several heterogeneous information sources in
terms of platform, structure (data model) and semantics is a common scenario. They were created
and are maintained to fulfill the requirements of a particular set of users or department. As a
consequence they are suitable for the department or application they were created for but not for
other users or applications that may also need to make use of the information held by these
In order to take the maximum advantage of information it is necessary to transform source data to
meet the data format of the target applications. This problem is known in the literature as the
data which is the problem of taking data structure under a source schema and creating an instance
of a target schema that reflects the source data as accurately as possible [2,3]. The effort required
to create and manage such transformations is considerable. It involves writing and managing
complex data transformations programs and keeping up with the changing sources.

3.1. LinkEHR-Ed Integration Archetype Editor

LinkEHR-Ed ( [4] is a visual tool implemented in Java under the Eclipse platform
which allows the edition of archetypes based on different reference models, the specification of
mappings between archetypes and data sources and the semi-automatic generation of data
conversion scripts which translate unnormalized data into XML documents conforming to the
selected reference model (in our context the standard chosen to represent and communicate EHR
extract) and at the same time satisfy all the data constraints imposed by archetypes. LinkEHR-Ed
explores the use of archetypes as a means to achieve standardization and semantic integration of
distributed health data. We intend to employ archetypes for making public existing clinical
information in the form of standardized EHR extracts. We will call them Integration Archetypes.
Figure 1 shows an overview of the edition process of an integration archetype with LinkEHR-Ed,
which is divided into four main steps.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 1: Overall integration archetype edition process in LinkEHR-Ed.

The first step deals with the importation of reference models. In LinkEHR-Ed a new reference
model expressed as W3C XML Schema can be imported at any time. Obviously, this step only
needs to be performed once for each reference model. Therefore, it is possible to define
archetypes based on different standards. Three different reference models have been tested
successfully, namely CEN EN13606, OpenEHR and CCR. To the authors’ knowledge, LinkEHR-Ed is
the only editor capable of handling CEN EN13606.
The second step is the actual archetype edition process. New archetypes can be edited either from
scratch or by specialization of an existing one. Our job was to define a formal modeling framework
as a prerequisite for implementing a tool providing enhanced support for the edition of
The third step is about mapping specification. Since the health data to be made public resides in
the underlying data sources, it is necessary to define some kind of mapping information that links
archetype entities to data elements in data repositories (e.g. tables and attributes in the case of
relational data sources or element and attributes in the case of XML data). A mapping specification
between an archetype and a source schema is done by specifying a set of value mappings which
define how to obtain a value for an atomic attribute of an archetype by using a set of atomic
elements from the data sources and applying, if necessary, some transformation functions. For
instance, it includes functions for the transformation of source time and date values into values
conforming to the international standard ISO 8601 for date and time representation.
Finally, the fourth step is about the generation of data transformation programs. The actual
transformation of source data into archetype instances is done by an XQuery program which is
automatically generated from the mapping specification. Its execution over a set of data sources
yields a XML instance that satisfies the constraints imposed by the archetype and at the same time
is compliant with the underlying reference model.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

3.2. The LinkEHR Standardization Platform

Once we have defined an integration archetype and we have automatically generated an XQuery
program to extract and normalize data from a HIS, the next step is to build an infrastructure for
the global EHR. LinkEHR-Ed is just one part of a bigger system called LinkEHR. The LinkEHR
Standardization Platform allows to create a VFEHR infrastructure, that is, to create a virtual and
federated view of the EHR of a patient whose data are distributed among heterogeneous HIS. The
global architecture of the LinkEHR platform can be seen on Figure 2.

HIS information
EHR Viewer
Extract server

Agent-based communication layer

Extract server Extract server

Demographic Archetype
HIS server repository HIS

Fig. 2: The LinkEHR Platform architecture.

Extract server. An EHR extract server must be deployed in each HIS or data source. It will receive
the petitions of information, extract data and normalize it by using the XQuery scripts previously
generated with LinkEHR-Ed. Then, it builds the extract header and communicates the result to the
requester. The requester might be a final system or another extract server.
Archetype repository. This is the support server for archetype definitions, archetype management
and development. It is based in an XML database to ease structural queries over archetype
Demographic server. A patient information repository compatible with the specifications of the
CEN EN13606 norm. It does not substitute the master patient index, but normalizes demographic
information of patients, health professionals and organizations.
Clinical information index. An index of locations of clinical information for each patient. This
system avoids unnecessary queries to nodes where do not exist relevant information.
EHR Viewer. A web-based generic EHR viewer for standardized clinical information. It is the main
user entrance to the VFEHR.
Agent-based communication layer. The distributed EHR infrastructure depends on a flexible and
adaptable communication layer. This layer has been implemented with software agent
technologies which will provide a high degree of independence and scalability for the complete
LinkEHR platform.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

4. Examples of application
Several examples of the application of this methodology have been developed during the last year
in collaboration with healthcare institutions: the Hospital General Universitario de Valencia and
the Hospital de Fuenlabrada, in Madrid.

4.1. Hospital General Universitario de Valencia

LinkEHR-Ed is being validated at the Hospital General Universitario de Valencia (CHGUV) in several
scenarios. CHGUV is a 592-bed hospital with more than 1,000 medical professionals which serves a
population of nearly 350,000 people. Currently, information about discharge reports and an EHR
summary of patients have been designed in the form of archetypes mapped to the hospital HIS.
Normalized information has been extracted and included in the existing EHR viewer.

4.2. Hospital de Fuenlabrada

At the Hospital de Fuenlabrada different use cases have been implemented using LinkEHR-Ed. This
includes archetypes for measuring a pressure ulcer risk with the Norton scale and a set of
archetypes to achieve a conciliation of active medication between transient patients from primary
to specialized care.

4.3. HCDSNS national project pilot

The Spanish Ministry of Health is developing the HCDSNS project [5] in order to gain a national
global access to a summarized EHR with information from the diverse autonomous regions. As a
pilot for that project, the semantic interoperability of a summarized EHR will be tested between
the CHGUV and the Hospital de Fuenlabrada. The transmitted information will be standardized
with the CEN EN13606 norm and communicated through the LinkEHR platform.
5. Conclusions
We have presented LinkEHR, a platform that allows the utilization of archetypes for upgrading
already deployed systems in order to make them compatible with an EHR standard. The overall
objective is to maintain in-production systems and applications without any changes while
providing a mean for making public clinical information in the form of standardized EHR extracts,
hiding technical details, location and heterogeneity of data repositories. Therefore, archetypes
could be used as a semantic layer over the underlying databases associating them with domain
specific semantics and therefore upgrading the semantics of the data they hold. In our future
work, intensive testing of the XQuery generation module will be carried out as well as the
improvement of the visual interface of both the archetype and mapping editor.
This work was supported in part by the Spanish Ministry of Education and Science under Grant
TSI2007-66575-C02-01 and by the Generalitat Valenciana under grant APOSTD/2007/055.
European Committee for Standardization: Health informatics - Electronic health record
communication. EN13606. (2006)
R. Fagin, P.G. Kolaitis, R. J. Miller, and L. Popa: Data exchange: semantics and query answering.
Proceedings of the 9th International Conference on Database Theory, pp. 207-224, (2003).

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

A. Raffio, D. Braga, S. Ceri, P. Papoti, M.A. Hernández. “Clip: a Visual Language for Explicit Schema
Mappings,” presented at the 24th Int. Conf. on Data Engineering, Cancún, México.
J. A. Maldonado, D. Moner, D. Boscá, C. Angulo, M. Robles, and J. T. Fernández-Breis, “Framework
for clinical data standarization based on archetypes”. Proceedings of 12th World Congress on
Health (Medical) Informatics (MedInfo’07), pp. 454-458.
Plan de calidad del Sistema Nacional de Salud. Ministerio de Sanidad.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Ontology-based Archetype Interoperability and

Catalina Martínez-Costa(1), Marcos Menárguez-Tortosa(1), J. T. Fernández-Breis(1)
Departamento de Informática y Sistemas, Facultad de Informática
Universidad de Murcia, CP 30100, Murcia, Spain
{cmartinezcosta, marcos, jfernand}

Semantic interoperability of clinical standards is a major challenge in the eHealth across Europe,
because this would allow healthcare professionals to manage the complete EHR of patient. Archetypes
are considered a cornerstone to deliver fully interoperable EHRs. Our work is focused on the
development of ontology-based methods and techniques for providing semantic interoperability
between diferent EHR standards at archetype level. Hence, solutions for the semantic representation,
transformation and management of clinical archetypes are described in this work.

1. Introduction
The lifelong clinical information of any person supported by electronic means configures his
Electronic Healthcare Record (EHR). Nowadays there are diferent advanced standards and
architectures [1] for representing and communicating EHRs, such as HL7 [2], OpenEHR [3] and
UNE-EN 13606 [4]. Some of these advanced EHR standards, such as OpenEHR and UNE-EN 13606
make use of the dual model architecture approach [5]. This architecture is based on two modelling
levels: information and knowledge. The information level is provided by the reference model and
the knowledge level by the archetype model. Archetypes define clinical concepts and are usually
built by domain experts. They are a tool for building clinical consensus in a consistent way. The
semantic interoperability of clinical standards is a major challenge in the eHealth across Europe,
because this would allow healthcare professionals to manage the complete EHR of patient. Clinical
archetypes are fundametal for the consecution of semantic interoperability and they are built for
particular EHR standards.
Our recent work has focused on the development of methods and techniques for providing
semantic interoperability between diferent EHR standards at archetype level. First, a methodology
for obtaining a semantic representation of archetypes will be presented, describing how syntactic
archetypes can automatically be transformed in semantic ones. Next, the semantic interoperability
of two dual-model based standards UNE-EN 13606 and OpenEHR will be addressed. Finally, the
development of a prototype for the semantic management of clinical archetypes will be described.
2. An Ontological Representation of Clinical Archetypes
Clinical archetypes are defined using the Archetype Definition Language (ADL). This is a generic
language that does not allow to perform any semantic activity over archetypes. Nevertheless,
these activities require the exploitation of information and knowledge: comparisons, classification,
integration of information and knowledge coming from diferent, heterogeneous systems. In this
way, it can be stated that such activities are knowledge intensive, they require for the semantic
management of knowledge and information, and for semantic interoperability between such
heterogeneous systems. The advances in the SemanticWeb [6] community make it a candidate

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

technology for supporting such knowledge-intensive tasks related to archetypes and EHR systems.
This section aims at providing a mechanism for representing archetypes in a Semantic Web-
manageable manner.

2.1. An ontological infrastructure for representing clinical archetypes

The use of ontologies to represent biomedical knowledge is not new, since ontologies have been
widely used in biomedical domains for the last years with diferent purposes [7,8]. For representing
clinical information semantics, we have used ontologies modeled by the Web Ontology Language
(OWL) [9]. The representation of archetypes in OWL requires the semantic interpretation of
clinical archetypes, so both the UNE-EN 13606 and OpenEHR reference and archetype models
were analyzed [10]. As result of this process, two main ontologies were built for each standard
(see Table 1 for details). The CEN-SP and OpenEHR-SP ontologies represent the clinical data
structures and datatypes defined in the corresponding standards, and CEN-AR and OpenEHR-AR
that are archetype model ontologies.
These ones are available online at
Table 1. Details of the OWL ontologies, in terms of classes, dataproperties (DP), objectproperties
(OP) and restrictions
Ontology Classes DP OP Restrictions
CEN-SP 68 16 92 227
CEN-AR 122 76 142 462
OpenEHR-SP 87 14 156 302
OpenEHR-AR 144 75 210 524

2.2. Transforming ADL archetypes into OWL

Once the ontological representation has been provided, to obtain archetypes conforming to this
representation from the ADL ones a transformation process is needed. Our technical solution
involves three technical spaces (TS) [11]. The Grammar TS to which ADL belongs, the Semantic
Web TS to which ontologies belong, and the Model Driven Engineering (MDE) TS is the pivotal TS
in which the transformation takes place. Thus, the transformation process is divided in the
following three phases described below and shown in Figure 1. A web tool for transforming ADL
archetypes into OWL is available online at [12].
 Phase One: ADL archetypes are transformed into models according to the Archetype
Object Model (AOM). The AOM representation is common to any dual-model EHR
standard. At this phase (left side of Figure 1) a change of TS is carried out, from Grammar
TS to MDE TS. Archetypes in ADL are processed and serialized in XML by an ADL parser
[13]. The AOM metamodel is obtained from its XML Schema and XML archetypes are
transformed into models by using EMF [14].
 Phase Two: AOM models are transformed into models according to the ontological
structure modeled before. The ODM standard [15] defines the representation of OWL
ontologies in MDE TS. Protégé [16] implements the transformation from OWL to MDE TS
and was used to get the archetypes metamodels (OWL-AR) from the OWL Archetype
ontologies. Next, models that conforms the AOM metamodel are transformed into OWL-

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

AR models. This transformation is implemented in RubyTL [17], a model transformation

language, and is standard specific.
 Phase Three: Finally, OWL files are obtained from the ontological models. Its is shown in
the right side of the figure. In this phase we move from MDE TS to the Semantic Web TS.
To get OWL instances from ontology models, a model-to-text transformation language,
MOFScript [18] has been used.

Fig. 1: The ADL to OWL transformation process.

3. Towards UNE-EN 13606 and OpenEHR archetype-based semantic
Both clinical standards, UNE-EN 13606 and OpenEHR follow the dual model approach. However,
they differ in how they structure the EHR domain, that is, they define diferent reference models.
OpenEHR offers a full specification for the creation, storage, maintenance and querying of EHRs.
On the other hand, the UNE-EN 13606 standard was developed to act as an EHR Extract Exchange
standard, so it does not provide proper version management, workflow management, interfaces
to other systems, etc. It provides the necessary requirements related to moving pieces of the EHR
from one system to another. In summary, OpenEHR allows for defining data more precisely due to
its richer data structures and data types. To transform OpenEHR archetypes into UNE-EN 13606
and vice-versa, a similar technological solution to the adopted in the ADL to OWL trasnformation
has been used, this allowing us to reuse some previous work. The transformation mechanism
starts with the ontological representation of the archetype obtained as explained in Section 2. An
integrated ontology has been developed to be used in this process and correspondences with the
specific standards and the integrated one have been established. Figure 2 describes the schema of
our solution. There, solid arrows represent the correspondences between the three ontologies
and dashed arrows shows the possible archetype transformations. The transformations carried
out between the three ontologies, take place at the MDE TS, but this has been omitted in the
figure for the sake of simplicity. At present, the methodology proposed has been applied to the
OpenEHR to UNE-EN 13606 transformation, but in the near future it will be also applied to the
opposite transformation.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 2: Ontology-based archetype transformation process.

4. Clinical archetypes management
Concerning archetype management, a semantic system called Archetype Management System
(ArchMS) [19] has been designed. The objective of the system is to support the execution of
clinical, semantics activities over archetypes. ArchMS is built on the idea of a virtual archetype
repository for dual-model based EHR standards, whose basic unit is the archetype. It is capable of
working with UNEEN 13606 and OpenEHR archetypes, providing the same functionality for both
standards. These ones can be semantically annotated in the system, so that these metadata can
be used to support semantic searches and comparisons. Annotating in ArchMS allows for adding
semantic metadata to archetypes. This semantic metadata can be associated to a complete
archetype or a term of it, allowing them for annotations with diferent granularity. To define an
annotation a classifier resource is needed. This has to be an OWL ontology and can be a domain
ontology, a terminology and so on. Any OWL resource can be used for this purpose (see Figure 3).

Fig. 3: Annotating in ArchMS.

Semantic annotations allow for searching archetypes holding some specific properties, that is,
exploiting the repository in the diferent existing dimensions. The system also allows for suggesting
annotations for new archetypes, since searches for similar archetypes can be found. Search

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

mechanisms make use of semantic similarity functions for this purpose. In general, two main
searches can be performed: for similar archetypes, and for archetypes holding some properties.
The global similarity looks for archetypes similar to a given one by doing semantic comparisons in
the context of the archetype ontology available for the particular standard. Archetypes are
instances of that ontology, so that instances comparison mechanisms are used. These mechanisms
would take into account the following categories: conceptual proximity, property similarity,
annotations similarity and linguistic proximity. On the other hand, user can search archetypes
holding some properties. These can be either definitional or annotations properties.
On the one hand, we might be looking for archetypes written in English, orarchetypes including an
element measured in a certain unit. On the other hand, we might be looking for archetypes
related to a particular disease, being such associations being established through a classifier
5. Conclusions
Providing an OWL representation for archetypes allows semantic activities such as comparison,
classification, selection or consistency checking to be carried out more eficiently. Here, an
overview of our work towards semantic interoperability between archetypes has been presented.
The first step was the design and implementation of a methodology for the transformation of ADL
archetypes into semantic archetypes expressed in OWL. This methodology has been applied to
two dual-model based EHR clinical standards: UNE-EN 13606 and OpenEHR.
After that, a similar technological solution has been applied for transforming OpenEHR archetypes
into UNE-EN 13606 archetypes and vice-versa. Finally, the ArchMS system for annotating
archetypes and to perform diferent types of semantic searches has been presented.
1. Blobel, B.: Advanced ehr architectures_promises or reality. Methods of Information in Medicine
45(1) (2006) 95_101
2. HL7:
3. OpenEHR:
4. UNE-EN13606:
5. Beale, T.: Archetypes and the ehr. Stud Health Technol Inform 96 (2003) 238_244
6. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American May (2001) 29_37
7. Schulz, S., Hahn, U.: Part-whole representation and reasoning in formal biomedical ontologies.
Artificial Intelligence in Medicine 34(3) (2005) 179_200
8. Smith, B.: From concepts to clinical reality: An essay on the benchmarking of biomedical
terminologies. Journal of Biomedical Informatics 39(3) (2006) 288_298
10. Fernandez-Breis, J., Vivancos-Vicente, P., Menarguez-Tortosa, M., Moner, D., Maldonado, J.,
Valencia-Garcia, R., Miranda-Mena, T.: Using semantic technologies to promote interoperability
between electronic healthcare records' information models. In: Proc. 28th Annual International

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Conference of the IEEE Engineering in Medicine and Biology Society EMBS '06. (Aug. 30 2006_Sept.
3 2006) 2614_2617
11. Kurtev, I., Bezivin, J., Aksit, M.: Technological spaces: an initial appraisal. In: CoopIS, DOA'2002
Federated Conferences, Industrial track,. (2003)
12. Martinez-Costa, C., Menarguez-Tortosa, M., Fernandez-Breis, J., Maldonado, J.: A model-driven
approach for representing clinical archetypes for semantic web environments. J. of Biomedical
Informatics 42(1) (2009) 150_164
13. ADL-Parser:
14. EMF:
15. OMG: Ontology metamodel definition specification.
doc?ad/2006-05-01.pdf (2006)
16. Protege:
17. Sanchez-Cuadrado, J., Garcia-Molina, J., Menarguez-Tortosa, M.: Rubytl: A practical, extensible
transformation language. In: ECMDA-FA. (2006) 158_172
18. MOFScript:
19. Fernandez-Breis, J.T., Menarguez-Tortosa, M., Martinez-Costa, C., Fernandez- Breis, E.,
Herrero-Sempere, J., Moner, D., Sanchez, J., Valencia-Garcia, R., Robles, M.: A semantic web-based
system for managing clinical archetypes. Conf Proc IEEE Eng Med Biol Soc 2008 (2008) 1482_1485

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Standardized Access Policies for the EHR

David Moner(1), Montserrat Robles(1), José-Alberto Maldonado(1), Diego Boscá(1), Carlos Angulo(1),
Ernesto Reig(1), Luis Marco(1)
IBIME group, ITACA Institute, Universidad Politécnica de Valencia,
Camí de Vera S/N, 46022 Valencia, Spain

Privacy of personal health information is the target of many efforts of Health Information Systems
administrators. But every person has the right to gain access and control his own information security
rules. In this work we propose a framework for the definition of access policies oriented to its use by the
legal owners of data: the patients. At the same time, the framework guarantees some degree of decision
and control to other levels of responsibility: the organization custodian of the information and the
health professionals responsible of generating and incorporating the clinical data of a patient. Finally,
this framework is based in the CEN EN13606 standard to assure the interoperability of the defined
access policies.
Keywords: Health Information Systems, Standardization, Electronic Health Record, Archetype, Security,
Access Policy

1. Introduction
Security and privacy of health information has become one of the most relevant aspects of Health
Information Systems (HIS). Sharing information among different health institutions is a daily reality
and must be protected. At the same time, conscientiousness about the value of our own data and
the rights we have about it is increasing. Different laws and regulations have appeared about this
matter during the last decade, at national and international levels. We can highlight two main
international regulation frameworks: the directive 95/46/EC with regard to the processing and
communication of personal data in the European Union [1] and the Health Insurance Portability
and Accountability Act (HIPAA) from the United States of America [2]. They define which
information about health status, provision of health care, or payment for health care can be linked
to an individual and thus has an special protection status.
Other group of applicable laws is the referred to the protection of personal data and the rights
that every person has about his own information. Basically these laws provide a way in which
individuals can enforce the control of information about themselves: right of access, right to have
factually incorrect information corrected and right to delete the stored information.
We need to define a common technical framework to make true all these rights in a simple and,
preferably, standardized manner. This will enable a true patient empowerment of the definition of
access rules to their own clinical information.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

2. CEN EN13606 Access Policies

The CEN EN13606 norm [3] is a five-part standard developed by the European Committee for
Standardization (CEN) intended for the communication and semantic interoperability of EHR
extracts among heterogeneous HIS. Parts 1 and 2 have been developed in order to represent the
clinical information of the EHR. This is achieved by the combination of the information reference
model (CEN EN13606-1) and archetypes for the definition of high level clinical concepts (CEN
EN13606-2). Part 4 of the standard is about security in the communication of the EHR. In this
context, security refers to access policies and audit logs. Other issues, such as secure
communications, encryption or anonymization of data are not covered by this norm.
The Access Policy Archetype
We call Access Policy Archetype (APA) to the representation of the access rules to the EHR
information in the form of an EN13606 archetype. This archetype uses the EN13606-1 reference
model to build an structure, divided in several sections, which defines the rules or access rights to
the EHR information. These rules include the admitted/prohibited requester individuals, roles or
organizations (called request specifications section), the design of access rights to specific EHR
nodes or time intervals (called EHR target section) and specific access rights for creation, revision
or communication of information (called access rules section).
One interesting characteristic of an APA is that it is readable by people and easy to be
automatically interpreted by a computer system at the same time.
3. Definition of Standardized Access Policies
Currently, the CEN EN13606-4 norm is just a specification for the communication of access
policies. Our framework proposes to use APAs not only for transmission but also for the definition
of new access policies. The benefits of this approach are to allow a patient, clinician or
organization to define their own access policies. With an adequate set of tools, this will enable a
user empowerment of the definition of these policies. We have divided the definition of APAs in
three consecutive steps, depending on the level of responsibility on the information for each
participant. A global view of the APA definition path can be seen in Figure 1.

3.1. Organizational Access Policies

Health organizations are usually responsible for maintaining most of our clinical information, and
therefore, maintaining its security and privacy. At this point, we will suppose that that information
is standardized or can be standardized according to the CEN EN13606 norm. In that scenario, all
our clinical information is represented and can be accessed through a set of existing archetypes.
We call them Clinical Archetype (CA), which are different from the APA that we will define.
Not all the information existing in an organizational HIS is clinically relevant. There can exist
documents or reports that are private to the organization or just bureaucratic forms. These
documents might be archetyped just like the other clinical information. To the extent of EHR
communication, this kind of information can be safely hidden.
An Organizational Access Policy Archetype (O-APA) defines which CA of all the existing ones in the
organizations can be used for the communication of information. In other works, it selects which
groups of information are clinically relevant for the construction of the EHR of a patient.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

3.2. Clinical Access Policies

The next step is to define a Clinical Access Policy Archetype (C-APA) for each CA which has been
approved by the organization. The aim of these policies is to assure that the health professionals
who have introduced the clinical information can hide subjective information, opinions or
comment fields which are not clinically relevant.
A C-APA is defined by browsing a CA structure and selecting which nodes of each CA archetype can
be made public or should stay private or restricted.

3.3. Personal Access Policies

The third and final step of the definition of access policies represents the truly patient
empowerment of access policies. A Personal Access Policy Archetype (P-APA) is the set of rules
and access rights to the EHR of a single person. A P-APA is a relation of who can access to what.
The “who” can be a role, a specific person or a specific organization. The “what” can be the
different remaining nodes of CA that have been approved by the organization and the clinicians in
the two previous steps, that is, remaining information after applying O-APAs and C-APAs. It also
can be specific instances of data from the EHR, depending on its date, originator health service or
any other parameters.
For example, a person could hide a section or an entire archetype for the “nurse” role, or for a
particular GP professional. Or he/she can hide a discharge report of a particular encounter in a
particular date.
The definition of P-APAs is done by generating configuration screens with crossed tables of
archetype nodes or archetype instances in combination with roles, people or organizations. In
these tables, the EHR owner can freely approve or deny access to the information nodes, thus
generating a P-APA for each CA.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)


Organizational Normalized
Access Policy: HIS EHR data
Selection of
archetypes CA1  … CAn 

O-APA set Data filter

Clinical Access CA1


Selection of 

nodes 

C-APA set Data filter

By archetype node

CA1 GP Nurse Clinician

  
Personal Access   
Policy:   
Selection of
By data instance
access rights
CA1 GP Nurse Clinician

2008/05/16   
2009/01/25   

P-APA set Data filter

APA: Access Policy Archetype Information

CA1..n: Clinical Archetypes

Fig. 1: Global architecture for the standardized access policy framework.

4. Application of Standardized Access Policies
Once we have defined the different set of policies (O-APA, C-APA and P-APA), they must be
applied correctly. The APA application path scenario is represented in Figure 1. When an EHR
access request arrives to our EHR system, the different policies are analyzed consecutively,
filtering the data that will be finally communicated. First, it checks that requested archetypes are
approved by the O-APA set. Then, subjective information indicated in C-APAs is hidden. And
finally, the approved nodes or data instances depending on the requester role or identification are
filtered. The result will be an EHR extract containing only approved information by all levels of

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

5. Conclusions
We have presented a standard-based framework for the definition of interoperable and patient-
empowered access policies. The consecutive refinement of APA from the organization to the
patient owner of data assures that every level of responsibility can include its own opinions, rules
or limitations. The use of standard representations permits to refine access policies to the desired
level of complexity by using automatically generated administration forms. Moreover, the use of
archetypes and standardized clinical information allows communicating access policies among
different health institutions. Some questions should be solved in order to implement this
framework adequately. A common definition of roles, professionals and organizations identifiers
must exist. An ontology representing the different access policy concepts, such as “allow”,
“disallow”, “read”, “write”, etc. is also needed. Finally, we will also require some mechanisms
which can guarantee that the filtered health information after applying the APAs is still a clinically
valid EHR extract.
This work was supported in part by the Spanish Ministry of Education and Science under Grant
TSI2007-66575-C02-01 and by the Generalitat Valenciana under grant APOSTD/2007/055.
Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the
protection of individuals with regard to the processing of personal data and on the free movement
of such data. Official Journal of the European Communities, Nº L281, 23-11-1995.
Health Insurance Portability and Accountability Act of 1996, Public Law 104-191, (1996).
European Committee for Standardization: Health informatics - Electronic health record
communication. EN13606. (2006)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)


Raimundo Lozano(1,), Xavier Pastor(1) , Esther Lozano(2)
Medical Informatics. Hospital Clínic. Barcelona. Spain
Technical Engineering High School. Autonomous University. Barcelona. Spain.

Explicit representation and knowledge management has been proposed as a key element to solve some
of the actual challenges in health informatics. In parallel, major advances have happened in the last
years in the field of knowledge management tools and methodologies, some of them have achieved
enough maturity to face implementations in real environments.
OntoDDB is a knowledge management based framework for the definition and modeling of data
repositories. All required information is gathered into an ontology in OWL format. The user interface is
built automatically on the fly in the form of web pages whereas data are stored in a generic repository.
This allows the immediate deployment and population of the database as well as on line availability of
any modification too.
Keywords: Ontology, Clinical repositories, knowledge management

1. Introduction
Health Sciences in general, and Medicine in particular, are sciences based upon information and
communication. A big part of the clinical practice and research processes consist in gathering,
summarizing and using information that, properly integrated with clinical knowledge, constitutes
the base for decision support and new knowledge generation. Nevertheless and in spite of great
advances in information and communication technologies (ICT) domain during last years, the
progress in Medical Informatics is slower than predicted and clinical information systems are
failing to provide true support for the clinicians needs [1,2].
Many causes have been identified to explain this situation and as expected in a domain so related
to knowledge, several authors point toward knowledge management as the way to solve the open
challenges in Medical Informatics [3]. Following the track: data informationknowledge it’s
possible to raise the abstraction level when looking at reality and opening new horizons to
progress in the application of ICT to the clinical information processing.
Great advances have happened last years that allow to envisage a solution to overcome the
impasse of Medical Informatics. One of them is the use of ontologies, in the field of knowledge
representation. Although there are not many examples of real implementations yet, there are
hardly any research area of Artificial Intelligence not using them [4–7] and their presence is more
and more common in the healthcare field [8].
Other is the progress in standardization. Once some more “technical” standards, as HL7 2.X or
DICOM, are well established in many installations. Standards in medical vocabularies like SNOMED
CT [9], and in information models [10] are beginning to establish the foundations for information

Corresponding author: Raimundo Lozano, Informática Médica, Hospital Clínic de Barcelona, 08036 Barcelona, Spain.
Tel.: +34 93 227 92 06; E-mail:

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

interchange and reuse, conferring full of sense to knowledge management as the engine which
empower the Health Informatics scenario.

OntoDDB was designed with the aim of advancing in the path of explicit knowledge management
as a concept proof whith the following premises:
 Clear-cut separation between information and knowledge.
 Formal declaration of knowledge about the information to be stored as data, making it
 Capability to manage explicit knowledge.
 Capability to build knowledge-driven systems.
 Greater abstraction level which allows a greater expressiveness level when building
The chosen context is a real case of clinical information usage. The Hospital Clinic of Barcelona
(HCB) does a lot of Biomedical Research with a high impact-factor in scientific literature [11]. In
the last years a research project cannot be understood whithout the ICT support in some way.
Obviously, each one of the projects has specific requirements about the subject and objectives of
research, but there are always a set of common requirements related with information systems
support: all projects need to introduce and to store structured data for later analysis, in a
distributed access context and with high rate of knowledge turnover.
2. Objectives
OntoDDB is a system designed to build clinical data repositories for research, based on knowledge
engineering techniques.
In the last years, systems for gathering clinical data for research purposes were built using a multi-
tier architecture composed by a centralized database, an application server and a web server
providing the user interface. Although this architecture meets the basic requirements of this kind
of projects it presents some disadvantages. First of all, it’s costly. The expenditure investment in
the development of a medium-size application easily reaches several tens of thousands of euros.
Moreover, the maintenance is very high too because any modification implies changes of the
database, web pages and application. Secondly, the investment is made for a very short period of
time. Research projects have a typical extent of 2-3 years and during this time you have to include
the development phase of the system, which uses to be long. Thirdly, this classical approach
requires a very specialized panel of computer technicians which implies a big gap between the
biomedical researcher and the development team difficulting the communication between
functional and technical people. And last but not least, this kind of approach in the bosom of an
organization produces a big heterogeneity among the different applications devoted to research
projects, without any reusing of components.
The general objective of OntoDDB is to test the capabilities of a system driven by knowledge
management in order to meet the requirements for gathering data in a research context while
avoiding the disadvantages of more traditional approaches like those exposed above. The
underlying idea is that making explicit the usually implicit knowledge about the systems is possible
to raise the abstraction level in the design and management phases allowing an easy and cheaper

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

construction and maintenance of these systems without losing functionalities and lengthening
their life-cycle.
 The system should achieve some specific objectives like:
 To allow the data model specification by means of an ontology
 To allow the user interface specification by means of an ontology
 To get automatic data storage and user interface
 To allow data model and user interface modifications in real time
 To allow extracting data for analysis purposes
 To allow distributed access to the system
3. Methods
The core of OntoDDB is the use of ontologies for knowledge representation. They are stored in a
database accessible by the rest of the system.
OntoDDB is composed by four modules and a metamodel:
 A module for edition of the ontology
 A module for storage of data and ontologies.
 The Metamodel of the system
 A module for building the user interface
 A module for data extraction

3.1. Ontology edition

The edition of the ontology is based on Protégé 2000® [12]. Protégé® is a well recognized standard
for ontology edition with more than 60.000 registered users all around the world. A very
interesting characteristic of Protégé® is its extensibility property. It is possible to include new
functionalities to the tool adding new plug-ins. We have developed OWL-DB Plug-in that connects
Protégé® with OntoDDB storage module. This plug-in uses Jena [13] to manage OWL statements,
since OWL is a supported format by Protégé. That development overcomes the storage limitations
of Protégé.

3.2. Storage of data and ontologies

OWL-DB module was developed for storage. It’s a relational database designed according the OWL
specification [14]. This approach has all the advantages of Entity-Atribute-Value (EAV) systems
[15,16]. This implies that it does not exist specific tables to store patient data, lab data and so but
abstract tables representing the elements of OWL specification: resource, class, property, domain,
range, etc and tables to store the values of instances by data type: integer, float, string, etc. With
such approach, neither the design nor the structure of database need to be changed, in despite of
the specific data model used for a concrete application.
Other applications can interact with OWL-DB using an API build with stored procedures. Adding
information to an ontology is performed by adding new statements to the statements table. A set
of stored procedures are responsible to spread this information to the rest of the tables in the
database. In this way, whatever application (for example ontology edition tools) managing OWL
statements are capable of being connected with OWL-DB.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

To extract information from the database, a set of functions allow accessing to the subclasses,
properties and instances of a class, domain and range of properties, values of instance properties,

3.3. The Metamodel

OntoDDB-MM, the OntoDDB metamodel, defines the set of classes and properties representing
the self-elements of the system: application entities, auxiliary entities, menu elements, etc.
OntoDDB-MM is represented in figure 1.

Fig. 1: OntoDDB-MM.

OntoDDB-MM is composed by the following elements:

 Metaclasses
o Class_root: Is a subclass of owl.Class. Is used to define the entry points in the
 Metaproperties
o webDataProperty: Is a subclass of DatatypeProperty used to collect needed
information to manage data type properties. It Introduces the following facets:
 webColumn: window column where to show the property.
 webRow: window row where to show the prperty.
 webDescriptionProperty: is a flag to mark properties that are part of the
description of the corresponding object and are shown in the headers, etc..
 webIdProperty: is a flag to mark prpperties that constitutes the Id of the
correspondig object.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

webDataProperty has two subclasses:

 webLiteralProperty: to represent fix text.
 WebMultilineStringProperty: to represent multiline strings
o webObjectProperty: Is a subclass of ObjectProperty used to collect needed
information to manage object properties in the application. It Introduces the
following facets:
 webColumn: the same as above
 webRow: the same as above
 webDescriptionProperty: the same as above.
 webIdProperty: the same as above.
 webDirectlyDependent: to identify depending objects. The objects that are
values of these properties, can not exists without the object that has this
 Classes:
o Application_item: the classes of the application have to be subclasses of this one.
o ValidationState: To represent the different validation states allowed.
o Data_extraction: To define data extractions. Has the following properties:
 main_class: to specify the class to export
 class_properties: properties to go inside recursively.
 Other properties:
o validationProperty: defined as a property of the class Thing, allow to specify the
validation state of whatever object.

3.4 Building the User interface

The user interface is a web portal that accesses directly to OWL-DB and extracts the specific
ontology building dynamically the web pages on the fly, using the information derived from the
metamodel. It is at this point when the explicit knowledge about the application, expressed in the
ontology, is used. We have avoided a hard and heavy coded interface and a dynamic interface has
been achieved reflecting the ontology, so whatever conceptual modification is done there, is
automatically expressed in the interface.

3.5 Data extraction

Data extraction module aims to allow periodic extractions of stored data to analyze them. Which
data has to be extracted is defined in the own ontology. This module is composed by two parts:
the first one is made up by a metamodel class, Data_extraction, defining data extractions. This
class has two properties, main_class and class_properties, and should be instantiated to define
which class of the application has to be extracted and the way to extract the values of its
properties. The range of main_class property is owl:Class and the range of class_properties is
WebObjectProperty. When creating and instance of Data_extraction the value of main_class will
be the class its instances have to be extracted. Some of the properties of these instances are
supposed to be of predefined data types and some other of object type pointing to instances of
other classes which they self have its own properties, and so on. Two methods are designed to
extract information from these last: only the identifier of the object or all the information in the
object. Properties appearing as value of class_properties are gone inside its instances to extract all

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

the properties of the corresponding object in a recursively way. For the rest of them, only the
identifier of the object is extracted.
The other part of the module is a small application which access to OWL-DB, after the definition of
connection parameters. It interprets the instances of Data_extraction class and creates a set of
text files, one by each instance of Data_extraction in CSV format, extracting the intended data.
This module goes inside the properties appearing as values of class_properties to extract all the
properties of the corresponding object in a recursively way. For the rest of properties is extracted
their value, which in the case of object properties is the identifier of the object actins as value.
These files can be imported to a conventional relational database to be analyzed.

3.6. Requirements
 The hardware and software requirements to run OntoDDB are the following:
 Server: a simple PC up to 2Gb of RAM and 40 Mb of hard disc is enough. Does no matter
the operating system but the Java Virtual Machine is needed.
 Database server: Tested in MS SQL Server®, Sybase® and Oracle®, no problems are
envisaged with other database management systems.
 Application server: Apache Tomcat® 5.5 or higher.
 Ontology edition: A PC with Protégé 2000® + OWL-DB Plug-in.
 Client: MS Internet Explorer® 6 or higher.
4. Results
In order to evaluate the suitability of OntoDDB we have used it in a real research project. VALID is
a project to gather clinical data from patients affected by Budd-Chiari Syndrome or Portal Venous
Thrombosis in order to advance in the knowledge about these diseases. It is managed and
financed by a French Research group over liver vascular diseases (“Centre de Référence des
Maladies Vasculaires Du Foie”).
VALID project comes from a bigger European funded project called EN-Vie [17], which have been
using a traditional architecture for its information system. Although some less data are intended
to be gathered in VALID with regard to EN-Vie, the functional requirements are very similar. This
fact allows us to compare the behavior of both systems.
To store the more of 300 different items defined in VALID should require to build a traditional
database with around 40 tables. With OntoDDB, all data model and storage requirements were
covered. An ontology consisting in 60 classes represents both the data model and the user
interface. Only some additional work was needed to fulfill the requirement to show on the web
pages several calculated fields, functionality which is not available in this version of OntoDDB.
This project is running in production for a year. Around 75 cases have been introduced in the
system, which means more than 20.000 data, and the first data extraction has been performed
without any special difficulties.
We are using now OntoDDB in a new project, which is nowadays in the testing phase, with the
same good results. In both projects, the flexibility provided by the system allowed us to have
available prototypes since the first moment, which are a very valuable resource in order to work
close to the physicians. Since the very beginning of the project, key users had stuff to work with
and even it was possible to make on-line modifications and check the results immediately.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

5. Discussion
The use of OntoDDB has several advantages. First of all the application development phase
practically disappears, remaining only analysis and design, with available prototypes from the very
beginning. This fact implies a very important drop in costs and time with their consequent savings.
The maintenance is also less expensive not only by the same reason stated before but specially
because the flexibility to make modifications very easily.
On the other hand, as differences between applications are reduced to their conceptual model,
the same infrastructure could be used taken advantage of scale economy, some elements or
models could be reused and homogeneous criteria could be established inside an organization.
They are some more conceptual advantages derived from the use of ontologies and standards.
Making an ontological analysis of an application allows moving the focus of attention to a higher
abstraction level and to concentrate on the domain aspects, helping the researchers to clarify the
implicit knowledge structure. The use of standards, like OWL, makes easier the interchange and
reuse of models.
The metamodel of OntoDDB has no capability for process representation and is not possible for
the moment to manage explicit knowledge related to processes. This weakness was well
illustrated with the impossibility to represent calculated fields.
OntoDDB has been only used in the clinical environment to the moment but the model is totally
independent of the domain, so it would be suitable to gather data in whatever context.
6. Future work
The first version of OntoDDB has served as a concept proof, it has allow to demonstrate that is
possible to detach the knowledge from the information when building a system.
Actual working in progress deals to go in deep to make knowledge explicit in order to split more
and more the data model from the presentation layer to incorporate more web functionalities and
to incorporate processes in the future.
Using this tool as a base for other kind of applications is envisioned too, as we consider that
OntoDDB has a more general range of possibilities than gathering data. In particular we are now
working to build power and versatile knowledge servers able to cope with medical knowledge.
7. Conclusions
OntoDDB is a tool that allows modelling some aspects of the reality and to automatically create a
database from it to collect data. But still more interesting is that the knowledge to be gathered in
the database remains very well documented.
To have this functionality available allows sharing data in a very efficient way and allows us to
explore new ways for knowledge management and sharing.
This research was partially supported by the Centre de Référence des Maladies Vasculaires Du
Foie, Paris.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

[1] Ball MJ, Silva JS, Bierstock S, Douglas JV, Norcio AF, Chakraborty J, et al. Failure to Provide
Clinicians Useful IT Systems: Opportunities to Leapfrog Current technologies. Methods Inf Med
47:4-7 (2008)
[2] Lehmann CU, Altuwaijri MM, Li YC, Ball MJ, Haux R. Translational Research in Medical
Informatics or from Theory to Practice. Methods Inf Med;47:1-3 (2008).
[3] Shepherd M. Challenges in Health Informatics. Proc 40 HICSS;1-10 (2007).
[4] Smith B, Ceusters W. Ontology as the Core Discipline of Biomedical Informatics. Legacies of the
Past and Recommendations for the Future Direction of Research. Comput Ph Cognit Science;1-14
[5] Knublauch H. Ontology-Driven Software Development in the Context of the Semantic Web: An
Example Scenario with Protégé/OWL. Stanford University; (2004).
[6] Boella G, van der Torre L, Verhagen H. Roles, an interdisciplinary perspective. Applied
Ontology;2[2]:81-8 (2007).
[7] Ferrario R, Prévot L. Formal ontologies for communicating agents. Applied Ontology;2[3-4]:209-
15 (2007).
[8] Bodenreider O, Burgun A. Biomedical ontologies. Medical informatics: Advances in knowledge
management and data mining in biomedicine.p. 1-25 (2005).
[10] Muñoz A, Somolinos R, Pascual M, Fragua JA, Gonzalez MA, Monteagudo JL, et al. Proof-of-
concept Design and Development of an EN 13606-based Electronic Health Care Record Service. J
Am Med Inform Assoc Oct 26;14[1]:118-29 (2006).
[11] Asenjo MA, Bertrán MJ, Guinovart C, Llach M, Prat A, Trilla A Analysis of Spanish hospital's
reputation: relationship with their scientific production in different subspecialities Med Clin (Barc).
2006 May 27;126(20):768-70.
[14] OWL Web Ontology Language Reference.
[15] Anhoj J. Generic Design of Web-Based Clinical Databases. Journal of Medical Internet
Research Oct;5[4] (2003).
[16] Stephen B, Johnson PhD, Paul T, Khenina A. Generic database design for patient
management information. p. 22-6 (1997).

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Evaluation of a Named-Entity Recognition

System over SNOMED CT
Elena Castro(1) , Leonardo Castaño(1), Paloma Martinez(1)
Computer Science Department of the University Carlos III of Madrid, Spain
{ecastro, lcastano, pmf}

Biomedical information processing related to medical records and clinical notes is a complex task due to
nature of the documents (hand-written and semi-structured or non-structured data) and the diversity of
terminology used. There are some technologies that rely on some standards to deal with this kind of
data in English language, however, in the case of Spanish Language there are only few initiatives. The
following paper briefly describe a tool to map Spanish medical terminology over the meta-thesaurus
SNOMED CT, in addition tool performance and architectural features will be addressed, and a short
assessment will be achieved.
Keywords: semantic tagging, meta-thesaurus, SNOMED.

1. Foreword
Medical text processing is one of the most interesting areas in the last few years, due to several
issues: first of all, the huge amount of scientific papers yielded, next to the need of use automatic
tools to manage and search this documentation and the complexity to process different type of
information drew up by domain specialist. In most scenarios, such documentation consists of
several records which are composed by no-structured data, usually this data is manually created
(this fact might lead to achieve some orthographic mistakes) and also is not fulfilling any naming
convention related to concepts or acronyms transcription. Also, personal information is included
in most medical records, this fact causes a security hole because private information related to
patients or specialist might be revealed.
On the one hand concerning English clinical notes processing, there are several tools and meta-
thesaurus such as Mesh and UMLS [1], which are performing well, on the other hand, there is a
lack of similar tools focused on other languages such as Spanish. ISSE project (FIT-350300-2007-
75), provides a SNOMED based tool which attempts to recognize concepts within the meta-
thesaurus SNOMED Clinical Terms, those concepts belongs to several Spanish written clinical
The following sections, addresses some other related work, a brief description of the tool, the tool
assessment and various conclusions and proposed works.
2. Related works
Medical information technologies focused on clinical notes processing, trying to figure out new
treatments and drugs. For that purpose, several disciplines such as Computer Science, Linguistic,

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Biomedicine, Genetic, etc, should join to develop management and search applications. Those
applications should incorporate new medical resources; during this stage one of the most
important steps is document semantic tagging which is mandatory to reach the following stages.
First step within documents tagging, consist of terms identification or recognition, afterwards
these terms should be matched against the meta-thesaurus. System performance depends on
linguistic processing efficiency and also depends on coverage and quality of the proposed
thesaurus. Using thesaurus such as SNOMED3 or UMLS [3] and [4], which are considered as
standards, it is possible to assure the quality which them provides to multilingual semantic
networks. There are several approaches which justify using such thesaurus, for example those
thesaurus provides wider coverage than others like GALEN or MeSH [5]. However in spite of its
pure advantages, these terminologies do not cover all languages, this fact is against all non-English
speakers which must achieve their own terminologies, in order to profit similar tools [6].
Biomedical domain records are written by human specialist who accomplished huge amount of
faults due to, the use of symbols which might have several meanings and the use of non-
normalized terms. Hence it is necessary to add new resources such as spellcheckers and acronym
dictionaries in order to face these troubles [7], [8].
3. Concept recognizer
The concept matching unit is inside a text pre-processing framework. This framework provides a
system which is in charge of retrieving semantic information from a set of clinical notes which acts
as a system input. The described framework matches a set of sentences belonging to clinical notes
against the thesaurus SNOMED. This unit attempts to identify all terms in the input sentences
which are in the thesaurus, during this task, the system also provides a synonymous and related
terms recognition. The SNOMED concept recognizer performs quite similar to other tools like
Metamap4 which provides concept recognition over UMLS, which means English concept
matching. However our concept recognizer works over Spanish section of SNOMED, this is the
main difference between the new concept recognizer and Metamap.
Taking into account SNOMED storage, two solutions were proposed, first one is an index based
solution, which use Lucene5 indexes to access SNOMED, thus SNOMED access has been improved,
due to inverted index which provides pretty good response times. Indexes have been built in order
to allow querying several fields of SNOMED description table, such as Term, conceptId, etc. On the
other hand a MySql database developed by software development company Isoco 6 was proposed,
this database includes information of the three tables of SNOMED, thus this solution provides
wider coverage but worse response times than Lucene indexes.
Up to this moment, the score formula will be explained in order to provide knowledge about
system performance and concepts categorization, the formula is based on the one proposed by
Patrick, J., Wang, Y. and Bud, P. [9], the formula proposed has been modified in order to fit new
requirements. The proposed formula is depicted next;

SNOMED: The systematized nomenclature of medicine.
Apache Lucene:

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Score 
length (Q) * length ( R)
  matches Q  Query R  Re trieved

Thus the score takes into account both the length of the query and the length of the retrieved
Henceforth an example related to system score performance will be depicted;
Suppose the following query:
Q  “Bacterial pneumonia”

Suppose the following retrieved sentence:

R  “The patient presents bacterial pneumonia“
The score is calculated as follows:
 2 length (Q)  2 length ( R)  4

It is important to emphasize that the length of the retrieved sentence is 4 because “The” is
considered as a stop word and it is ignored by the system.
The final score basing on the previously detailed data is:
Score  4/8 = 0,5

4. Recognizer assessment
System assessment was achieved over a set of 100 clinical notes, previously hand-tagged by a
specialist, this set is considered as a Gold-Standard and afterwards this set of clinical notes is
compared with the results yielded by the recognizer in order to assess the system performance.
Due to SNOMED spread, only two hierarchies within SNOMED were considered during this
process, these hierarchies are “procedures” and “disruptions” which were considered most
relevant by specialists.
The milestones taken into account during evaluation were:
Acceptation threshold: Symbolize the minimum score that a retrieved concept should have in
order to be considered relevant.
Amount of retrieved concepts: Means the number of concepts which will be retrieved for each
Also several evaluation functions were proved in order to figure out which one fits better with the
In order to achieve a complete evaluation of the tool both, complete matching and partial
matching techniques were proved. Partial matching consists of splitting the sentenced retrieved in
three parts, the left one, the center one and the right one. Once this process has been

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

accomplished, the system checks whether some part matches the query and if it succeed, the
sentenced is considered as relevant for this query.
Hence for each evaluation function six experiments were achieved, the high points of these
experiments were, retrieve 1, 2 and 5 concepts per query and for each one set a threshold rate of
0.2 and 0.4. Finally two approaches were achieved, one performing complete matching and other
performing partial matching, at the end coverage and precision rates were estimated as depicted
in Table 1.
The results yielded shows how precision and coverage experience a slightly improvement when
achieving partial matching, thus the precision rate for complete matching is set to 0.4 and
coverage rate reaches 0.08, however using partial matching, both coverage and precision rates
are higher.
Table 1. Assessment results
Partial matching Complete Disruptions Procedures
Precision 72% 43% 70% 35%

Coverage 9% 6% 5,5% 7%

Taking into account the previously explained results, the results are slightly better when using a
partial matching technique; also the system performs better when retrieving disruptions instead
of procedures. This fact happens maybe due to system architecture, but also due to Gold-Standard
file reliability which might be better for disruptions than for procedures. Analyzing the table
above, several conclusions may be drawn, first of all, the precision rates are good enough,
especially when performing partial matching techniques, but the real problem arises when taking
a glance to the coverage rates which are lower than expected and always lower than 10 per cent.
Thus several analysis in order to improve those rates should be done, there are several ways to
analyze and improve the results, first of all analyze system behavior and check whether it is
performing as expected and other related to check Gold-Standard reliability which should be done
together with domain experts.
5. Future research work
In order to refine the system to get better results, the future works include, to build a repository
of medical resources which basing on dictionaries and ontology belonging to medical domain,
allows term recognition even when those terms are not included on SNOMED, but they are related
to other terms of the thesaurus.
And last but not less important, it would be really useful to extend the Gold-Standard and also the
corpus scope to reach higher reliability while results verification.
The main purpose of the whole work is to establish new semantic relationships which allow
knowledge retrieving and inferring a new one. Following this trend all future works mentioned
before are crucial in order to reach our goals.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

6. References
[1] Ananiadou, S. and McNaught, J. Text Mining for Biology and Biomedicine. Artech House, Inc.
[2] Vintar, P. Buitelaar, M. Volk, Semantic relations in concept-based cross-language medical
information retrieval, in: Proceedings of the Workshop on Adaptive Text Extraction and Mining,
Cavtat-Dubrovnik (2003).
[3] Volk M., Ripplinger B., Vintar, S., Buitelaar, P., Raileanu, D., Sacaleanu, B. Semantic annotation
for concept-based cross-language medical information retrieval. International Journal of Medical
Informatics; 67(1): 97-112 (2002).
[4] Jang, H., Song S. K., Myaeng, S. H. Semantic Tagging for Medical Knowledge Tracking.
Proceedings of the 28th IEEE EMBS Annual International Conference. New York City, USA, Aug 30-
Sept 3 (2006).
[5] Ruch, P., Wagner, J., Bouillon, P., Baud, R., Rassinoux, A.-M., Robert, G. Medtag: Tag-like
semantics for medical document indexing. In Proceedings of AMIA'99, p. 35-- 42 (1999).
[6] Lu, W-H., Lin, R., Chan, Y-CH, Chen, K-H. Overcoming Terminology Barrier Using Web Resources
for Cross-Language Medical Information Retrieval. AMIA Annu Symp Proc.; 519–523 (2006).
[7] Schuler, K., Kaggal, V., Masanz, J., Ogren, P., Savova, G.. System Evaluation on a Named Entity
Corpus from Clinical Notes. In Proceedings of the Sixth International Language Resources and
Evaluation (LREC'08) (2008).
[8] Ogren, P., Savova, G., Chute, Ch. Constructing Evaluation Corpora for Automated Clinical
Named Entity Recognition. In Proceedings of the Sixth International Language Resources and
Evaluation (LREC'08) (2008).
[9] Patrick, J., Wang, Y., Bud, P. An Automated System for Conversion of Clinical Notes into
SNOMED Clinical Terminology. Proceeding of the fifth Australasian symposium on ACSW frontiers;
68: 219-226 (2007)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

A practical approach to create ontology

networks in e-Health: The NeOn take
Tomás Pariente Lobo(1,), Germán Herrero Cárcel(1)
ATOS Research and Innovation, ATOS Origin SAE, 28037 Madrid, Spain

Ontological representation of the health domain is widespread. In particular the semantic description of
drugs is being tackled in several ongoing initiatives. However, the resultant ontologies tend to be large
and unmanageable. In recent years recommendations go towards the use of smaller, dynamic and
interlinked ontologies to ease the ontology life-cycle. So far there have been attempts to build
ontologies using this approach, but with very few methodological and tooling support. In this paper we
propose to apply the notion of networked ontologies, methodology and tools developed within the
NeOn project.
Keywords: Ontologies, networked, mappings, tools, knowledge management, nomenclature,

1. Introduction
In recent years, there has been an increasing interest in semantic interoperability in e-Health.
Semantic interoperability is about sharing and combining data and health records among different
systems and actors. It is also related to foster a consistent usage of the terminology (drugs and
bio-medical knowledge bases), and the adoption of shared and standard models of clinical data. In
short, semantic interoperability goes to the underlying objective of formalizing the health science
using shared or linkable models.
One of the key aspects to tackle in order to achieve semantic interoperability is the usage of
common or interoperable terminologies about drugs, diseases, treatments and so on. Different
actors (governmental bodies, hospitals, labs, key industries, etc.) should be able to understand the
terminology used by others. To complicate matters, it is quite common that different systems in
the same organization do not use the same terminology. In order to overcome this problem, over
the past years numerous initiatives, roadmaps and emerging standards have seen an increasingly
rapid development. SNOMED-CT [1] is emerging as a de-facto terminological standard for many
international initiatives. Examples of this are the information models adopted in Australia (NEHTA)
[2], UK (NHS dm+d) [3] or USA. The W3C created in 2008 the Semantic Web Health Care and Life
Sciences (HCLS) Interest Group [4]. The EU financed the SemanticHEALTH FP6 project [5] with the
objective of delivering a Semantic Interoperability roadmap for Europe.
In particular, SemanticHEALTH issues recommendations such as interlinking health models and
terminologies by means of modular, multilingual, dynamic (just-intime), collaboratively-designed
networks of ontologies [6] [7]. These recommendations also stress the methodological support
needed to specify highquality, consistent and scalable ontologies. It also recommends the use of
the W3C standard ontology language, OWL [8], because a large and growing community is

Corresponding author: Tomás Pariente Lobo, ATOS Research and Innovation, ATOS Origin
SAE, 28037 Madrid, Spain. E-mail:

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

developing tools and software (in many occasions freely available) that will benefit the integration
and maintenance of ontologies based in this language.
The tooling and methodological support needed to foster the adoption of interoperable solutions,
especially when talking about bridging the gap between huge terminologies, have not followed
such a rapid evolution. There are partial solutions that tackle one or several of the issues raised by
SemanticHEALTH. However, far too little attention has been paid to the delivery of an overall
framework that covers most of the recommendations cited above.
2. The NeOn approach
NeOn [9] is a FP6 EU ICT funded project which aim is to create an open infrastructure, and
associated methodology, to support the overall development lifecycle of large scale, complex,
semantic applications. This infrastructure is based on the notion of networked ontologies. A
network of ontologies is a collection of ontologies related together via a variety of different
relationships such as mapping, modularization, version, and dependency relationships [10]. NeOn
define four main ontology assumptions: Dynamic (ontologies will evolve), Networking (ontologies
are interconnected via mappings, alignments or by means of reuse), Shared (ontologies are shared
by people and applications), and Contextualized (ontologies are dependent of the context in which
are built or are used) [11].
NeOn has defined a service-based reference architecture that covers design and runtime aspects
of ontology engineering, plus the usage and integration of the networked ontologies into
semantic-enabled applications.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 1: NeOn Architecure [12].

As part of the reference implementation of the NeOn architecture, NeOn delivers an open
software suite called the NeOn Toolkit [13]. The NeOn Toolkit is an extensible Eclipse-based
Ontology Engineering Environment containing plugins for ontology management and visualization.
NeOn Toolkit includes some core features such as editing the ontology schema, visualization an
browsing of ontology entities, and it allows the usage of OWL, F-Logic and (subsets of) RDF(S)
ontologies. There is a number of commercial plugins that extend the functionality of the NeOn
Currently, there are more than thirty plugins available to download, providing several
funtionalities such as rule support, mapping editors, database integration, dealing with ontology
dynamics (modularization, inconsistency checking), collaboration, localization (multilingualism),
etc. The NeOn Toolkit offers a simple Eclipse plugin extensibility that allows an easy deployment of
new plugins.
NeOn also offers methodological support to dealing with networks of ontologies.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

The methodological approach is twofold: on the one hand it provides ontology engineering
support, and on the other hand, the methodology provides also guidance to develop applications
using networked ontologies. The usage of publically available Ontology Design Patterns [14], in
order to improve the quality of the ontology design in a variety of scenarios and needs, is also one
of the most relevant outcomes of the project.
It is clear the alignment of the NeOn objectives and the recommendations issued by
SemanticHEALTH. However, NeOn is not targeting specifically the e-Health domain, but taking a
horizontal approach valid for multiple domains. However, one of the NeOn case studies is focused
on the pharmaceutical domain. In particular, the pilot targeting the interoperability between
different drugs terminologies is the socalled Semantic Nomenclature case study.
The Semantic Nomenclature case study is trying to pave the way towards the use of a network of
ontologies to relate different drug terminologies. It defines an ontology network where each actor
potentially plugs in its own model as ontology. All ontologies are interconnected and mapped
between them to share information.

Fig. 2: Semantic Nomenclature ontology network [15].

The main ontology in the network is the Nomenclature Reference Ontology. This OWL ontology
acts as a bridge between the different application ontologies modeling drugs and domain
ontologies, that support some other aspects of the model. The approach followed in this case if to
map all the ontologies to the reference ontology. This design decision allows an easy access to the
whole ontology network. The ontology is based on the study of several product definitions in
other ontologies. Specifically, it follows partially the semantic model of SNOMED-CT as
background knowledge, mainly from the Pharmaceutical/Biological product term used in that
terminology, allowing the distinction between clinical and commercial drugs.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 3: Part of the semantic nomenclature reference ontology.

The main hierarchy is Pharmaceutical_Product, and the underlying Clinical_Drug,
Prescription_Drug concepts are mapped to the equivalent concepts of the domain ontologies in
the ontology network. On the other hand, Marketed_Drug concept is mapped to the equivalent
concepts of the application ontologies which provide Access to relevant product information of
the pharmaceutical products marketed in Spain.
Based on the hierarchy provided by SNOMED, the reference ontology defines a hierarchy that
distinguishes between clinical drugs and branded drugs. The main generic concept is
Pharmaceutical_Product. The Categorized_Product concept serves to classify the products
according to their therapeutic use. In the next level, we define the Clinical_Drug, which can be
mapped for instance to active ingredients. The Prescription_Drug concept could be described by
the pharmaceutical form and dosage of the Pharmaceutical_Product, and as a matter of example
is useful for prescription in hospitals. Finally, the Marketed_Product concept is used to describe
branded or commercial drug, as they are dispensed in pharmacies. This last concept is defined by
its national code, price, etc. The next figure depicts the hierarchy relation for pharmaceutical
products in the reference ontology

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 4: Drug-related concepts in the reference ontology.

The rest of the ontologies are linked to the different entities of the reference ontology by means
of mappings and axioms.
3. Discussion
The NeOn approach is promising because it offers a complete and coherent set of tools, APIs and
methodological support for ontology engineering and usage.
The outcomes of the Semantic Nomenclature case study can be seen as a proof of concept of the
NeOn approach applied to the semantic interoperability between drug terminologies. In this
sense, the network of ontologies covered by the case study does not intend to be exhaustive or
cover the whole set of information coming from external data sources.
4. Future work
NeOn is now entering in its final year, which means that some of the results of the project are still
not finalized. The true potential of NeOn is expected by summer 2009, when a new Manchester
OWL Syntax version of the NeOn Toolkit will be released.
In the scope of the Semantic Nomenclature case study, several experiments have been carried out
in order to generate semi-automatically mappings using one of the NeOn plugins (the Alignment
plugin). During the last year of the project we expect to verify the quality of these alignments and
perform some more experiments with other NeOn Toolkit plugins.
Besides, by the end of the first semester of 2009, the Semantic Nomenclature case study will
release a Web application based on the underlying knowledge base (networked ontologies). This
application will show the availability of developing semantic applications based on NeOn
5. Conclusion
This paper has given an account of the results of the NeOn project in respect to its usage in the e-
Health domain. The purpose of this paper was to show the different tools and methodology that
NeOn puts at the disposal of the e-Health community to model the domain.
It was also shown that the NeOn approach towards the use of networked ontologies is clearly in
line with some of the ongoing initiatives and roadmaps regarding semantic interoperability in e-
In summary, the use of NeOn, specially the NeOn Toolkit, the methodology provided by the
project, and the approach towards the use of a network of ontologies in order to bridge the gap
between different drug terminologies could prove to be beneficial to the health informatics in
general and the semantic interoperability in particular.
[2] NEHTA - National E-Health Transition Authority,
[3] Dictionary of Medicines and Devices (dm + d),

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

[4]Semantic Web Health Care and Life Sciences (HCLS) Interest Group,
[5] SemanticHEALTH project,
[6] SemanticHEALTH partners. Semantic Interoperability Deployment and Research Roadmap.
SemanticHEALTH SSA project Deliverable D7.1, 2008
[7] Rector A. Barriers, approaches and research priorities for integrating biomedical ontologies.
SemanticHEALTH SSA project Deliverable D6.1, 2008.
[8] Web Ontology Language (OWL),
[9] NeOn Project,
[10] Haase P, Rudolph S, Wang Y, Brockmans S, 2006. Networked Ontology Model. NeOn
Deliverable D1.1.1
[11] Sabou M. et al, 2006. NeOn Requirements and Vision Deliverable
[12] Waterfeld W, Erdmann M, Schweitzer T, Haase P. Specification of NeOn architecture and API
V2. NeOn Deliverable D6.9.1, 2008
[13] NeOn Toolkit website
[14] Ontology Design Patterns,
[15] Herrero G, Pariente T. Revision of ontologies for Semantic Nomenclature: pharmaceutical
networked ontologies. NeOn Deliverable D8.3.2, 2008

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Software Agent Standards Based Electronic

Health Records Communication Platform
Diego Boscá(1), David Moner(1), José Alberto Maldonado(1), Carlos Angulo(1), Ernesto Reig(1),
Montserrat Robles(1)
IBIME Group, ITACA Institute, Universidad Poltécnica de Valencia
{diebosto, damoca, jamaldo, cangulo, ereial, mrobles}

During last years great efforts have been made on Healthcare systems computerization. Those efforts
are a great leap on both quantitative and qualitative patient care. However, nearly all of current
developed systems are still being built ad-hoc for each organization. This makes the communication
between organizations a time and money consuming task. In this document we present a platform for
standardized Electronic Health Record (EHR) communication based on software agents that provides
many functionalities to standard EHR communication.
Keywords: Electronic Health Records, software agents, standardization.

1. Introduction
The EHR communication in a semantic interoperable way is still a “work in progress” problem. In
the best case scenario the developed solutions achieve to communicate a limited number of
systems. That solution cannot be exported to other similar problematic (For example, a solution
built for a hospital cannot be exported to another one). Even worse, changing anything on any of
the integrated systems entails changes on the remaining systems if they want to access to the
information of the former. This maintenance is time and money consuming, which makes it very
Due to those problems, new models to the EHR communication based on a dual model approach
[1] have been developed (norms like ISO 13606 and openEHR). The dual model for EHR
communication is based on the separation between information (the data) and knowledge (that
changes and improves in time). This knowledge is represented on the ISO 13606 and the openEHR
standards as archetypes (formal descriptions of the domain concepts). Those archetypes can (and
should) be defined by domain experts, who know which concepts the system uses.
A software agent standard platform based on those dual models has been developed to solve this
problem. The platform can be used to access to standardized and non-standardized data sources.
In last case, LinkEHR tool is used to generate the EHR standard extracts using the chosen reference
model [2].
In this work we describe the architecture of a dual model based health information system and
prove how the shown architecture of software agents achieves the integration of distributed EHR
2. Background
Software agents were chosen as the technology of the platform because they have solved the
integration problem in several health scenarios [3-4]. However, until now this integration has been
made ad hoc to the involved systems, making the integration result still not shareable in an

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

interoperable way. This is an essential aspect on the developed system, as sharing health records
between different systems is currently a need.

Despite that, use of standards in healthcare is still at initial stages. With some exceptions (as can
be DICOM *5+, which is the de facto standard for image storage and communication) there isn’t a
public awareness on public administrations and healthcare professionals yet [6]. One of the
needed steps for the improvement and evolution of current systems is the EHR use of standards.
Although dual model standards are being used in production systems in Australia, Holland and
England [7], the dual model standards field is still unknown for a large number of professionals.
2. Methods
The developed multi-agent system is an EHR distributed system based on software agents. The
platform allows to standardize all the systems so tFor the implementation and test the JADE
platform was used. (
The proposed agent system allows searching and retrieving of standardized extracts from
distributed data sources. We created an ontology based on the knowledge of a dual model based
clinical system (archetypes, EHR extracts, patients, etc.). This ontology has allowed us to define
four different agent roles for the platform: The archetype repository agent, the EHR agent, the
register agent and the user agent.

 The archetype repository agent: Is an agent assigned to each one of the known archetype
repository. This agent has an XML database [8] to allow queries to the archetypes. Those
queries allow:
o Store archetypes up in the repository (or a full load of the repository)
o Update, change version or mark as deleted archetypes
o Query the database to get archetypes by their Id (archetype name), organization
that has created it (CEN, openEHR, CCR, etc.), archetype description (institution,
author, date, status, etc.), textual content of the archetype, archetype paths and
the ontology section of the archetype.
All those queries are applicable to the “integration archetypes” developed by the IBIME group *9+,
which are archetypes mapped to a data source from where we want to extract information in a
standardized way. This agent allows the access to archetypes and its mappings.

The defined queries allow the clinical domain expert to search for the most suitable archetype for
the health record query. Ontology rules can be applied to the archetype search as the ontology
bindings of the archetype can be easily obtained.

 The EHR agent: Is the agent assigned to each one of the health record data sources. The
main function of this agent is the generation of the standardized EHR extracts from the
available data sources of the system. Those data sources can be non-standardized data
sources. In that case the agent must preprocess the existing information to standardize it.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

This preprocess is performed by the tool LinkEHR, which allows an easy way to access
existing health data by using integration archetypes. This process consists of mapping of
the data sources to the archetypes used by the system [9]. The mapping definition does
not need to be specified if the source can already provide standardized EHR extracts. In
that case the agent will only fill the headers of the extract wrapping the data and transfer
 The register agent: This agent holds the information of the systems where a patient has
parts of his health information (clinical information index). This agent also unifies all the
received EHR extracts in a single one. That EHR extract is then transferred to the user
agent. The clinical census is a basic service for every system that has the information
scattered through several sources. For example, on the project of the electronic health
records of the health national service (in Spanish, HCDSNS), a clinical information index will
be created. This service will show in which regional systems a patient has part of his EHR.
 The user agent: These agents are assigned to each one of the users of the system. The
function of this agent is to be the gateway between the users and the system
(understanding users as both the ones that access directly through a graphical user (or
web) interface and the software programs that access the system through web services).
These agents are able to ask the repository agents to obtain the most suitable archetype
for the EHR query. Once the user knows the archetype identifier of the concept he can
query the register agent to obtain the EHR extract from the complete EHR of the desired
patients. Additionally, this agent should provide the authentication mechanisms and the
needed certificates to assure the security of the system.

Figure 1 shows a schema of the platform flow.

Fig. 1: EHR communication system flow chart.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

The user agent is the access point to the system. User agent can obtain standardized extracts and
ask about the available archetypes.
To obtain the EHR extract, first user agent queries the Register agent with a patient Id and an
archetype identifier. Then Register agent asks the clinical census for the places where a patient
has parts of his health information. Next Register agent queries those EHR agents over each one of
the data sources. Then EHR agents build the standardized EHR extract for the data source. And
finally, those EHR extracts are unified by the Register agent and then returned to the User agent as
a single extract.
To query the archetypes the queries are built on the User agent. The queries will be sent to the
Repository agent which will then return the archetype or the archetype list to the User agent.

2.1. Advanced software agents

As an advantage of using an agent platform we can define more agents with different roles. Those
agents are not strictly necessary to make the platform work. However they will give an added
value to the platform. Agents like:
 Ontological reasoning agent: Agents can be improved to follow reasoning according to an
ontology or a set of rules and facts.
 Alert control agents: Agents can act proactively (can act on their own). This is perfect to
throw alerts when a condition is accomplished.

 Result subscription agents: In a similar way to the last one, agents could be designed to
inform when a result is ready to be served.
This work has been funded by project TSI2007-66575-C02-01 from Ministerio de Educación y
Ciencia, the Consellería d’Empresa, Universitat i Ciencia, reference APOSTD/2007/055 and the
Programa de Apoyo a la Investigación y Desarrollo (PAID-06-07) from the Universidad Politécnica
de Valencia.
[1] T. Beale. Archetypes, Constraint-based Domain Models for Future-proof Information Systems.
[2] JA. Maldonado, D. Moner, D. Boscá, C. Angulo, M. Robles, JT. Fernández. Framework for clinical
data standardization based on archetypes. Stud. Health Technol. Inform., 454-8, (2007)
[3] G. Lanzola, A framework for building cooperative software agents in medical applications.
Artif. Int. Med., Volume 16, Issue 3, Pag 223-249 (1999)
[4] D. Isern, D. Sánchez Moreno, A. Valls, HeCaSe: an agent-based system to provide personalized
medical services. CAEPIA (2003)
[5] DICOM: Digital Imaging and Communications in Medicine
[6] ICT standards in the health sector: current situation and prospects

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

[7] M. Al-Ubaydli. Open source medical records systems around the world. UKHIT 49 (2006)
[8] eXist – Open Source Native XML Database

[9] Bosca D, Moner D, Maldonado JA, Angulo C, Robles M. LinkEHR: a tool for standarization and
integration of legacy clinical data. Proceedings of the 12th World Congress on Health Informatics

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

CARDEA: Service platform for monitoring

patients and medicines based on SIP-OSGi and
RFID technologies in hospital environment
Saúl Navarro, Ramón Alcarria, Juan A. Botía, Silvia Platas, Tomás Robles,,,,

This paper describes the CARDEA platform. Main technologies are described and how they are been
used for this platform. CARDEA architecture deployed in Hospital Gregorio Marañon is shown and the
expected results of the pilot.
Keywords: monitoring, hospital, medicines, patients, RFID, OSGi, ESB, SIP

1. Introduction
A hospital enclosure is usually a complex environment with a huge number of rooms normally
arranged in different and connected buildings. This situation means a lot of obstacles for the
transmission of the radio signals. Therefore, solid strategies and redundant measures of position,
and radio with good technical skills to cross materials must be used in such environment.
A typical hospital service could be composed by scattered units physically and logistically, that
serve an important number of patients with a very high hospital rotation and in where many
resources of the hospital staff and assets (medicines, material and equipment) are needed to be
managed, which in many cases are residing outside their own service.
The CARDEA Project, in order to allow the integration of new multimedia service generation in a
uniform way into the hospital environment, try to research the definition and development of a
service platform for hospital monitoring based on different standards, which in turn allows third
parties to deploy services on a standard, safe and controlled way, without having to invest in
proprietary solutions and difficult integrations.
2. Technologies
As explained before, this project has been developed using four different technologies:
 The global element in this environment is the Framework. For this framework we have
used the OSGi technology.
 The use of RFID tags allows the platform to capture information about medicaments such
as quantity, name, composition, expiry date, etc.
 The information captured is processed en the platform and it will generate alarms and
error messages that will be sent to mobile nodes. For this functionality we have used the
SIP technology by integrating SIP elements into the OSGi Framework.
 CARDEA is context-aware. The situation of people considered as actors in the hospital
together with assets like expensive medicines are considered as elements with a dynamical
situation. A Semantic Web based ontology managed with Jena, a knowledge rules API

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

integrating Pellet and a middleware for capturing, storing and delivering context
information is integrated. This subsystem is called OCP (Open Context Platform).
For the correct understanding of how this system works it’s necessary to explain the main features
of the technologies involved: OSGi, RFID, SIP and OCP
The OSGi platform (Open Service Gateway initiative) was defined by the international association
OSGi Alliance. Its main objective was to define open software specifications for designing
compatible platforms that were able to provide multiple services. OSGi defines an extremely
efficient infrastructure for designing service based applications inside a Java Virtual Machine (JVM)
and provides a development environment running on Java and 100% compatible with J2ME.
The main part of the infrastructure is the Framework that implements a dynamic component-
based model. With this model the mentioned environment is able to manage the applications
installed in the framework in a dynamic way. The applications (named bundles) can be installed,
started, stopped, uninstalled and updated remotely without needing to restart neither the device
nor the framework.
The key element in the OSGi framework is the component or bundle. Every bundle can consume
services provided by other bundles and can provide another services at the same time. This
process can be explained below:
 A bundle can register and unregister services in the container in a dynamic way. For
obtaining interaction this bundle must register a service interface and a class that
implements this interface. Every change in the service (register, change, unregister) will
produce some events that will be captured and processed by the framework.
 If another bundle wants to use the service registered by the previous bundle it will ask the
platform for the service reference. When it obtains the service reference thought the OSGi
platform, the service is ready to be used. The consumer bundle will be able to call any
method of the service which implementation was registered by the provider bundle.
The services will be available and registered if the bundles that implements them are installed in
the platform.
RFID (Radio-frequency identification) is an automatic identification method, relying on storing and
remotely retrieving data using devices called RFID tags or transponders. The RFID technology is
used to describe a communications system that transmits the information located in a RFID tag via
Wireless. The technology requires some extent of cooperation of an RFID reader and an RFID tag.
An RFID tag contains the identity of an object that can be applied to a product, resource, person,
etc. A RFID tag can be divided into two parts: integrated circuit for storing and processing
information and for modulating and demodulating the radio-frequency signal, and antenna for
receiving and transmitting the signal.
The main properties of the RFID technology for this project are:the RFID technology permits the
system to store information about the products marked with RFID tags; RFID tags can be read
from several meters away and beyond the line of sight of the reader; RFID tags have become very
cheap nowadays with a unitary cost of a few cents of euro.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

In this project we have used passive RFID tags which have no battery instead of active RFID tags
which need to contain a battery to be located by the reader.
The SIP protocol (Session Initiation Protocol) is a application-level signaling protocol defined by the
IETF (Internet Engineering Task Force) with the RFC 3261 4. The aim of the IETF is that the SIP
protocol becomes the standard for the initiation, modification and finalization of interactive
session which participate multimedia elements such as video, voice, instant messaging, online
game, etc.
SIP supports device mobility, location independence and has robust security specifications. The
main feature of SIP and the one that made us choose SIP in the project is that it can resolve
addresses by the use of URI’s (Uniform Resource Identifiers) 4. With these addresses we can
determine the physical address of the user in any time and also the IP address of the device that is
used. Any user can establish a communication with another user only knowing the information
about the URI identifier. Other features of SIP are: session negotiation, call management,
modification of the features of a established session and possibility of updating the protocol with
OCP is a middleware which allows services and applications in a service oriented architecture to be
context-aware. Context-aware computing is a recently emerged paradigm which allows to adapt
to changes in the environment. Adaptation is done by services and applications by using up to
date information about the state of end-users (i.e. patients and hospitalary personnel in the case
In OCP environments, there are two main roles for software entities engaged in them. The first
one is the context producer. A context producer is a dynamic entity, usually a software entity
related to a concrete person, which changes as the user changes her state. For example, a CARDEA
client running on a hand held device will change its current location within the hospital when the
corresponding nurse carrying the device changes her location also. The second role is the context
consumer. Usually, a consumer of context information is a service or application that consumes
such information to adapt its behaviour. For example, a different interface for the CARDEA
application is displayed depending on the device that the user is working with (e.g. PDA, laptop,
Users, devices, physical environment, all these information is represented in an ontology domain.
This representation is based on OWL and hosted and managed by Jena. Among the main
advantages of such representation we have a common and shared representation of the domain
for all the CARDEA elements, managed by OCP.
4. Global Architecture of CARDEA
The diagram of Fig. 2 describes the architecture by levels and identifies the subsystems involved
and the relationship between them.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 1: Global Architecture

The global element in the infrastructure is the OSGi node. It deploys the basic elements of CARDEA
that are, the Contextual Information Management System (OCP), the service that allows the
interaction multi-device (SIP) and the services of hospitals and laboratories thought the Mule
module. We can see different layers; the bottom layer supports the operative system, with a Java
virtual machine installed. Running on the virtual machine we can see the OSGi framework, which is
100% Java. The rest of the items placed on the upper layer are the corresponding OSGi bundles
which register services in the platform that can be used by other applications.
One example of the services built based on this architecture is the “OCP - Contextual Information
Management System”, formed by the following components:
 Jena 4: Open Source Semantic Web Framework for Java. It provides a mechanism to
manage ontology in OWL format, extract the data and store it in the database and obtain
results thought its inference engine.
 OCP Service: Main part of the OCP System. It creates context objects with the information
provided by other agents and manages them. It communicates with Jena in order to make
it persistent.
 RFID: Acts as a RFID server. This service receives the information that was read by the RFID
readers and transmits it to the OCP service so that it can be transform into context
 SIP: The SIP service is used to establish communication between the OSGi framework and
some external elements (mobile phones, PDAs, etc.). The SIP service is provided by a
bundle installed in the platform, so every bundle can use it in order to send messages or
make SIP calls.
 ESB: This element is a middleware based on synchronous and asynchronous messaging
that provides secure interoperability between applications using XML. This middleware
allows business applications to communicate each other. This service make logical
decisions using the information provided by the OCP service and use the SIP bundle for
accessing to external entities.

This system provides some functional facilities:

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

1. When a medicine tagged with RFID crosses the system of RFID readers, the information of
the tag is transmitted to the RFID service in the OSGi platform.
2. This information contains the type of the medicine, the expiry date, the current time and
the identification of the tag. When calling to the OCP service these attributes that define a
medicine are transformed into a context object called “Medicine” and managed by the OCP
3. The OCP Service stores this object into a database thought the Jena service, so the
database contains all the object references generated in the OCP bundle.
4. The ESB service is subscribed to the context notifications generated by new Events in the
OCP bundle. This subscription allows the ESB service to be notified when a new medicine
object arrives. The ESB service can determine the stock of a certain medicament and act in
5. Validation at Gregorio Marañon Hospital
The CARDEA platform is being tested in a real scenario by means of a pilot deployed in Gregorio
Marañon Hospital. This scenario has the following elements:
 CARDEA platform. The CARDEA platform was installed in a server located at Gregorio
Marañon facilities.
 RFID tags. These tags are sticked on the medicine unit. (Fig 2-b)
 RFID sensors (Fig 2-a). Two groups of RFID sensors are installed in Gregorio Marañon
Hospital facilities.
 RFID PDA reader (Fig 2-b). This reader registers the tags sticked on medicine units and
inserts them into CARDEA platform. Besides the registration, the reader allows to
get/modify the information registered and check the localization of the medicine units
 Console of activity. This console allows viewing the activity registered by CARDEA platform.
Basically, it receives all the notifications sent by CARDEA ESB module.
 SIP agent on device emulator. A SIP agent was installed in a mobile device emulator. This
agent receives alarm notifications from CARDEA SIP module of different alarms:
Currently the pharmacy service of Gregorio Marañon Hospital is validating CARDEA and the results
will be collected at the end of March.

a) b)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 2: a) Pharmacy service corridor. .Two RFID sensors are shown, which registers the out of
medicine units. b) Medicine units with RFID tag and the RFID PDA reader.
Many people have been involved in the success of CARDEA platform: Alfredo Pedromingo, Eneko
Taberna, Pablo Piñeiro (Ariadna Servicios Informáticos); Francisco López (Murcia University),
Augusto Morales (Universidad Politécnica de Madrid), Carlos Ángel Iglesias (E-Práctica), Dra Ana
Herranz and Dra Arantxa (Pharmacy service in Gregorio Marañon Hospital)
[1] OSGi Alliance Home Page,
[2] RFID Technology information page,
[3] SIP information page,
[4] J. Rosenberg, H. Schulzrinne , G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley y E.
Schooler, "SIP: Session Initiation Protocol", Internet Eng. Task Force RFC 3261, June 2002.
[5] T. Berners-Lee, R. Fielding, U.C. Irvine, L. Masinter, “Uniform Resource Identifiers (URI): Generic
Syntax”, Internet Eng, Task Force RFC 2396. 1998.
[6] Jena project page,

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

On processing processes in healthcare:

combining processes and reasoning in personal
health records
Leonardo Lezcano, Miguel-Angel Sicilia
Information Engineering Research Unit, Computer Science Dept., University of Alcalá
Ctra. Barcelona km. 33.6 – 28871 Alcalá de Henares (Madrid), Spain
{Leonardo.lezcano, msicilia}

Healthcare is inherently process-oriented in the sense that care requires continuous assistance
sustained in time and linked to previous states or events. Process-orientation has two main aspects: the
natural/biological processes people are subject of, and the planned and systematized care processes
devised by professionals and managed by healthcare providers or institutions. The paradigm of Personal
Health Records (PHR) places the personal, subjective view of health as the centre of the data model, and
it must account for biological and care processes if the possibilities of enhancing safety and empowering
the patient are to be maximized. This paper explores the elements required to integrate process models
in personal health record platforms, and the role of ontologies in making process information
Keywords: Healthcare process, ontology

1. Introduction
The Merriam-Webster on-line dictionary includes the two following senses for the word “process”:
(a) a natural phenomenon marked by gradual changes that lead toward a particular result, e.g. the
“process of growth” and (b) a series of actions or operations conducing to an end. The former
sense can be considered to include natural processes as pregnancy, but also other biological
processes as illness (be it chronic or transient), disorders or traumatism. The latter can be
interpreted as purposeful actions performed by humans towards an end, this including healthcare
or clinical processes. Obviously, natural or biological processes are in many cases interwoven, as
care processes are typically triggered by natural or biological processes and their steps attempt to
follow and intervene in their evolution. It is widely acknowledged that analyzing and modeling
healthcare processes is significant to improve patient safety (Carstens et al., 2009).
The concept of personal health record (PHR) allows for combining in a single in-formation
technology piece both the personal, subjective view of natural processes and the planned and
systematic course of care processes (Tang et al., 2005). This enables providing services or alerts
based on the analysis of the state of the user in natural processes declared combined with clinical
knowledge represented and asso-ciated to these processes, and informed by the results of tests,
procedures, medica-tions and other care events. This also has a significant application in
anticipating alternate paths in the biological processes by exploiting what is represented in clinical
process ontologies. In the clinical domain, there are several kinds of long-lived processes (i.e.
spanning more than a single session and potentially weeks, months or years), including protocols,
guidelines and preventive programs. Several languages for guideline and protocol modeling have
been developed in the last years, including GLIF, PROForma, the Arden Syntax to name a few.
These are care-oriented in the sense that they are devised for the systematic delivery of care

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

events from a healthcare institution perspective. However, when approaching the problem from
the PHR perspective, new opportunities appear from the combination of the knowledge about
several ongoing processes for a given person, and from the possibility of combining them with
purely subjective statements on health not coming from any health record maintainer institution.
The rest of this paper is structured as follows. Section 2 briefly sketches existing technology for
process-oriented healthcare and explores how PHR systems could integrate with them. Then,
Section 3 discusses the role of reasoning about natural and care processes as a mean to increase
patient safety, link to healthcare offerings and improve the (self-tracking) of personal processes.
Finally, conclusions and outlook are provided in Section 4.
2. Representing processes associated to PHR
There are a number of languages specific to clinical computer-interpretable guidelines (CIG) that
are close to general-purpose workflow (orchestration) executable languages as XPDL, XLANG or
BPEL (Mulyar, van der Aalst, and Peleg, 2007).
To make the discussion concrete, we will focus in what follows in PROforma as a well-equipped
process for care guidelines executions (Sutton and Fox, 2003) and in the technology used in the
GoogleHealth PHR. These technologies represent a typical deployment scenario of process-based
multi-source health record with full control of the user. GoogleHealth is currently based on the
ASTM Continuity of Care record (CCR). The CCR was developed to store the most relevant patient
information elec-tronically and make it available to all providers, systems, and patients. An
important aspect of the ASTM CCR is that it is technology neutral which makes it a good candi-
date for PHR (Smolij & Dun, 2006). The CCR can be used as the base information model for a
process representation, and recommendations can be directed either to the user or to healthcare
professionals. Table 1 provides an illustration of the mapping of a fragment of the NICE guideline
CG62 “Antenatal care”7 to CCR ele-ments.
Table 1. Example guideline and mapping to CCR elements
Clinical guideline CCR elements
“Screening for gestational diabetes using Pregnancy:
risk factors is recommended in a healthy FunctionalStatus/Function
population. At the booking appointment, the
following risk factors for gestational [...77386006/SNOMED]
diabetes should be determined: Weight:
− body mass index above 30 kg/m2 VitalSigns/Result/Test
− previous gestational diabetes [...363808001/SNOMED]
− family history of diabetes (first-degree Height:
relative with diabetes)…”
Gestational diabetes:
NOTE: Family history: requires links to
related CCR profiles.


Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Table 1. Example guideline and mapping to CCR elements

The example in Table 1 can be mapped to a decision element in PROforma. The decision step is
simply that of recommending screening for gestational diabetes to the individual or the healthcare
provider. In the first case and considering GoogleHealth, this can be realized through a notice (an
Atom Feed with an optional CCR document).
The process execution engine can be implemented as a client application of the PHR that uses data
extraction from the PHR and posting of notices as inputs and outputs. In terms of PROforma the
main task would be mapping PHR information (represented for example in CCR) to different task
elements. For example, the guideline fragment in Table 1 would be a decision with two candidates
(recommending screening or not), choice_model as single, support_mode as
symbolic. Then, the condition that must be true in order for this candidate to be
“recommended” would be in the recommendation property of the candidates, and it would be
expressed in terms of the information extracted from the PHR shown in Table 1.
PROforma plans can be generated to program appointments (from the information contained in
the Pregnancy condition’s start date), for example, the CG62 guideline states that “for a woman
who is nulliparous with an uncomplicated pregnancy, a schedule of 10 appointments should be
adequate”. The different tests included in the guidelines can be arranged in the plan schedule with
CycleConditions for example.
In addition to the care processes typically specified in guideline process models, per-sonal
processes (subjective, biological ones) can be modeled in a loose, episodic way, e.g. occasional
fever self-reporting. For these indications to be useful it is important to have an adequate level of
detail in the data model. Coming back to the CG62 guide-line, it states “All pregnant women
should be made aware of the need to seek imme-diate advice from a healthcare professional if
they experience symptoms of pre-eclampsia. Symptoms include: *…+ severe pain just below the
ribs”. While pain is available as a condition in GoogleHealth, there is no formal way of specifying
loca-tion. The ribcage is represented in the Foundational Model of Anatomy (FMA8) on-tology with
id 7480, so combining that representation with some relative location predicates associated to
symptoms would enable representing that aspect of the guideline. This can be integrated in PHR
by combining ontologies with predicates specific to symptoms and signs.
3. Reasoning in the context of processes
Reasoning and inference require health information to be represented in some for of knowledge
representation formalism. There are existing reports of integrating process models with clinical
care ontologies. For example, Eccher et al. (2005) describe a system combining ontologies with
OpenEHR archetypes in the framework of hearth failure prevention and monitoring. However, the
ontological description reported is limited to classifying processes as biological and non-biological,
being the latter fur-ther categorized as MEDICAL-VISITs and DIAGNOSTIC-INVESTIGATIONs, and
borrowing process semantics from DOLCE (Gangemi et al., 2002). Fox et al. (2006) have described
a comprehensive approach to supporting complex treatment plans and care pathways, focusing on
the expression of goals.
Reasoning processes can be applied on the data and process model described above. It is
important to not that an open world assumption is required, as the absence of data on some

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

information piece can not be interpreted as negation. For example, it might be that for the
decision in Table 1 the user has not downloaded or registered data on family history. This fits with
the notion of enquiry tasks in PROforma that are used to get or request (missing) information.
Medication analysis can also be built for conditions. For example, NICE guideline CG22 “Anxiety”
includes the following: “Benzodiazepines are associated with a less good outcome in the long term
and should not be prescribed for the treatment of individuals with panic disorder.*…+.
Benzodiazepines should not usually be used beyond 2–4 weeks”. These decision points can be
implemented with SWRL if the supporting ontologies are using OWL. The following rule can be
used at any time as a way to increase safety:
Patient(?p) and Medication(?m) and taking(?p, ?m)
and current(?m) and active-ingredient(benzodiazepines)
and CurrentCondition(?c) and disorder(?c, panic-disorder)
-> alert(?p, ?m)

The rule above would have been different at diagnosis time, e.g.:
Patient(?p) and CurrentCondition(?c)
and disorder(?c, panic-disorder)
-> recommend-negative(?p, benzodiazepines)

In that second case, the guideline implementation is giving advice to the healthcare professional
(we assume that the negative statement is important as information in the given clinical context).
In both cases, the rules in this case are not deciding the flow of tasks in the process but serving as
alert triggers and aiding in the decision making of one of the steps. The decision steps themselves
can be modeled as SWRL rules in most of the cases, e.g. the simple procedural decision
procedures used in the Arden syntax can be expressed that way. However some guidelines are
fuzzy in nature as “beyond 2-4 weeks” and mechanisms for dealing such vagueness are not
present in ontology-based representations as the combination OWL+SWRL.
Reasoning external to processes also allows to detect the interaction of conflicting care processes.
For example, NICE Guideline GC23 “Depression” states that “When depressive symptoms are
accompanied by anxious symptoms, the first priority should usually be to treat the depression.
Psychological treatment for depression often reduces anxiety, and many antidepressants also have
sedative/anxiolytic effects”. This requires a representation of the process themselves. The
following sketches a possible formulation for the mentioned interaction in the context of a
healthcare event:

Patient(?p)and CareProcess(?p, ?cp) ongoing(?cp) and related-To(?cp, anxiety)

and PreDiagnosis(?p, depression)
-> recommend-cancellation(?cp)

This is an example of a meta-process tracking that would integrate with several ongo-ing
processes. This can be realized by having a meta-process ontology that unifies the task status of
the processes (as the results of those tasks can be assumed to be inte-grated in the PHR).

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

4. Conclusions and outlook

Personal Health Record (PHR) systems enable the combination of subjective re-ports and
information entered directly by the user and integration of clinical data coming from healthcare
institutions and services. Process models representing guidelines or protocols can be used for
tracking and alerting from the perspective of the healthcare organization, provided that the
information model in the PHR is sufficiently detailed and provides the required computational
semantics. Such semantics can be given by a combination of existing ontologies (including
anatomical and clinical but also other as those representing drugs) with formal representations of
user conditions and/or functional status. While there exist several process execution languages
specific to healthcare guidelines, the use of ontology languages combined with rules and
reasoners enable the expression or decision points in these models that exploit richer models and
inference that is produced outside of their execution engines. This paper has explored the
combination of PHR information with ontologies and process models as a way to introduce richer
functionality in the context of continuity of assistance. Ontologies and rules can be used both to
implement the logic of the decision points and also to provide support during the tasks that is
currently not part of process models. As illustrated above, representing processes requires
support for reasoning about time and some account of uncertainty, as these processes have an
inherent variability that need to be ac-counted for.
Combining process models with existing ontologies and reasoning mechanisms in PHR clients
represents a promising research direction as it would bring together process knowledge with
declarative knowledge and a fine-grained account of both objective and subjective condition
This work has been supported by project the project “Historia Clínica Inteligente para la seguridad
del Paciente/Intelligent Clinical Records for Patient Safety” (CISEP), code FIT-350301-2007-18,
funded by the Spanish Ministry of Science and Technology.
Carstens, D., Patterson, P., Laird, R. and Preston, P. (2009) Task analysis of healthcare delivery: A
case study, Journal of Engineering and Technology Management (to appear)
Eccher C et al. (2005). Ontologies supporting continuity of care: The case of heart failure. Com-
puters in Biology and Medicine. 36 (2006), 789-801
Fox, J., Alabassi, A., Patkar, V., Rose, T., Black, E. (2006) An ontological approach to model-ling
tasks and goals, Computers in Biology and Medicine, 36(7-8), pp. 837-856
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A., Schneider, L. (2002). Sweetening Ontol-ogies
with DOLCE. In A. Gómez-Pérez, V.R. Benjamins (eds.) Knowledge Engineering and Knowledge
Management. Ontologies and the Semantic Web, 13th International Conference, EKAW 2002,
Siguenza, Spain, October 1-4, 2002, Springer Verlag, pp. 166-181
Mulyar, N., van der Aalst, W.M.P and Peleg, M. (2007) A Pattern-based Analysis of Clinical
Computer-Interpretable Guideline Modeling Languages. Journal of the American Medical
Informatics Association 14(6), pp.781-787.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Smolij, K., Dun, K. (2006) Patient Health Information Management: Searching for the Right Model.
Perspect Health Inf Manag. V3 2006; 3-10.
Sutton, D.R. and Fox, J. (2003) The Syntax and Semantics of the PROforma Guideline Model-ing
Language. J. Am. Med. Inform. Assoc. 10, pp. 433-443.
Tang, P.C., Ash, J.S., Bates, D.W., Overhage, J.M. and Sands, D.Z. (2005) Personal Health Records:
Definitions, Benefits, and Strategies for Overcoming Barriers to Adoption. J. Am. Med. Inform.
Assoc. 13: 121-126.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Generating Standardized Demographic

Diego Boscá(1), David Moner(1), José Alberto Maldonado(1), Carlos Angulo(1), Ernesto Reig(1),
Montserrat Robles(1)
IBIME Group, ITACA Institute, Universidad Poltécnica de Valencia

The demographic information management of the patients in an information system is usually
considered a secondary problem. This causes that the demographic information is scattered around the
organization or stored along with the clinical information. With the standardization of the clinical
information becoming a popular topic, the standardization of demographic information is even more
important. This paper shows a way of generating standardized demographic repositories from the
different demographics sources available on the system using a standardization process based on a dual
model approach.
Keywords: Demographic, standardization, EHR.

1. Introduction
Demographic data is a key data on any health information system. The value of demographic data
grows up as the systems turn from local to federated systems (as more systems are integrated, is
more likely to get diverse demographic data). Thus, a good definition of the generic demographic
concepts of the system is needed in order to summarize all the requirements. The demographic
data should be stored according to those concepts.
However, nowadays the storage of the demographics data of the system is usually scattered
around all the organizations and systems: The patient related data is stored on a Master Patient
Index or within their clinical data; the healthcare professional demographics are stored on another
different system (usually a LDAP-like server, if they are stored at all); the devices demographic
information is stored on a resources catalog and on the devices itself; and the demographic
information of the organizations is provided by a resource catalog (for example, the catalog of
Spanish health organizations can be found on the Ministry of Health website [1]).
There are also two dual model 9 standards for EHR interoperability (CEN EN13606 and openEHR).
Both define a reference model and a demographic reference model [2], [3]. The instances defined
by the demographics model are used along with the EHR instances defined by the reference model
to create the EHR extract. Figure 1 shows a portion from a demographic instance from a CEN
EN13606 EHR extract.

The dual model for EHR communication is based on the separation between information (the data) and knowledge
(what we know about the data). Knowledge changes and improves over time. For further explanation check [4]

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 1: portion of a demographic instance

This paper proposes the creation and population of a standardized demographics data repository
from available data sources, as EHR data changes more often than the demographic data. Thus for
the generation of the EHR extract the EHR data will be extracted on the fly and then completed
with the demographic instances available on the repository.
As an additional result of this paper, CEN EN13606 and openEHR demographic XSD schemas have
been created from the Reference Model class diagram. Both models class diagrams can be seen at
figures 2 and 3.

Fig. 2: CEN EN13606 demographic class diagram.

Also, the first CEN EN13606 demographic archetypes have been created based on CEN EN13606
demographic schema.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig. 3: openEHR demographic class diagram.

2. Background
On both CEN EN13606 and openEHR the knowledge concepts are represented as archetypes. The
correct definition of those demographic archetypes requires taking into account not only the
clinical knowledge of the data, but also the legal and privacy constraints in order to have usable
archetypes. Several ISO norms have been proposed to represent demographic information for
health purposes, like norm ISO/TS 22220 “Identification of subjects of health care” which defines
the identification requisites of the subjects of care within and between health care organizations,
norm ISO/TR 11487 “Clinical stakeholder participation in the work of ISO TC 215” which defines
the identification requisites of the service providers or norm EN 14822 “General Purpose
Information Components” (GPIC) which defines the components based on HL7 RIM classes of an
information system.
Also, each government defines its own laws regarding data protection and privacy. For example,
the law “Ley Organica de Protección de datos” (LOPD) in Spain defines how personal data should
be used and which are the requirements for the personal data storage.
Some demographic archetypes for the openEHR model have been developed following the ISO
recommendations [5]. No demographic archetypes for CEN EN13606 model exist yet.
The tool LinkEHR [6] is used to build the EHR extract on the fly. LinkEHR generates the
transformation scripts to transform the un-standardized data sources into standard EHR extracts.
This process is accomplished with the mapping of the archetypes to the data sources schemas.
3. Methods
As LinkEHR is able to import any Reference Model from its XSD Schema, the CEN EN13606
demographic model XSD schema and the openEHR demographic XSD schema were generated 10.
Both demographic schemas were successfully imported to LinkEHR.

The XSD schemas for CEN EN13606 demographic model and openEHR demographic model can be downloaded at

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Since demographic openEHR archetypes already exist, the efforts have been aimed to create the
CEN EN13606 demographic archetypes. Archetypes of the basic concepts from the CEN EN13606
demographic model were created. Figure 4 shows an archetype defined in CEN EN13606 standard.

Fig. 4: CEN EN13606 demographic archetype.

Once the archetypes have been created and the data sources schemas have been imported to
LinkEHR the mapping can be done. The mapping is done by defining functions between the data
sources schemas and the archetype nodes. The full mapping process explanation can be found at
[6]. When the mapping has been defined LinkEHR generates automatically the transformation
script to transform the data available on the data sources into XML demographic instances. These
instances can be stored into a XML capable data source so they can be queried.
The creation of this repository allows the generation of complete EHR extracts as the demographic
OIDs of the data are known on data transformation time. Those OIDs are queried to the
demographic repository and the XML demographic instance results are included into the EHR
standardized extract.
4. Discussion
From the study of CEN EN13606 and openEHR demographic models some differences have been
spotted. First of all, the aim of both demographic models is different: CEN EN13606 provides the
minimal demographic information that should be attached to an EHR extract. The included
demographic information in the EHR extract is enough to retrieve the full demographic
information on the systems. OpenEHR demographic model includes several attributes that can
store tables, lists or trees, which allows the creation of more complex structures on the
demographic section. Comparing to CEN EN1606 demographics, OpenEHR can define a wider set
of structures for demographic information as openEHR model models the full demographic system
and CEN EN13606 standard models the extract demographic information.
Another difference is the separation in CEN EN13606 demographic model of the telecom address
and the postal address. This eases the understanding of the instances as different sets of codes are
used on each one.
Both models define role classes and define or can define the same demographic root classes
(person, patients, devices, organizations, etc.)
5. Conclusion
Currently the demographic data is scattered around all the systems in the organization. There is no
easy way to extract all the available demographics data of the systems. The presented solution

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

proves that it is possible to standardize this demographic information using a dual model
approach. LinkEHR can be used to standardize both EHR and the demographic data to generate
standardized EHR extracts. The creation of this repository provides a way of generating a complete
EHR extract from the un-standardized data available on the system.
Each one of the demographic models reviewed on this paper is aimed to one specific demographic
use. On the one hand, CEN EN 13606 demographic model is aimed to the transmission of an EHR
extract of the minimal part of demographic information. That information should allow the
retrieval of the full demographic information available on the system. On the other hand,
openEHR demographic model aims to model all the demographic information stored on the
system. Thus, both openEHR and CEN EN13606 demographic models can coexist on the systems.
As additional results of this paper, both CEN EN13606 and openEHR demographic XSD schema
have been developed. Furthermore, with the importation to LinkEHR of these XSD schemas,
LinkEHR is the first archetype editor to support the creation of openEHR and CEN EN13606
demographic archetypes.
As a future work, more specific archetypes should be created in order to define demographic
concepts available on real systems. Also, this work does not deal with the problem of generating
the unique identifiers of the demographic data in a unified demographic server. This will be
addressed on future works.
This work has been funded by project TSI2007-66575-C02-01 from Ministerio de Educación y
Ciencia, the Consellería d’Empresa, Universitat i Ciencia, reference APOSTD/2007/055 and the
Programa de Apoyo a la Investigación y Desarrollo (PAID-06-07) from the Universidad Politécnica
de Valencia.
[1] Primary care centers from the Spanish national health system catalog and hospitals national
[2] CEN/TC251, EN13606-1: Health Informatics - Electronic Health Record communication, part 1.
[3] T. Beale, S. Heard, D. Kalra, D. Lloyd. The openEHR Reference Model, Demographic Information
Model. 2008.
[4] T. Beale. Archetypes, Constraint-based Domain Models for Future-proof Information Systems.
[5] RDM. Dias, SM. Freire, Arquétipos para Representar as Informações Demográficas em Saúde.
XI Congresso Brasileiro de Informática em Saúde, 2008. v. 1. p. 1-6
[6] JA. Maldonado, D. Moner, D. Boscá, C. Angulo, M. Robles, JT. Fernández. Framework for clinical
data standardization based on archetypes. Stud. Health Technol. Inform., 454-8, (2007)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Archetypes and ontologies to facilitate the

breast cancer identification and treatment
Ainhoa Serna(1), Jon Kepa Gerrikagoitia(1), Iker Huerga, Jose Antonio Zumalakarregi(2), Jose Ignacio
Computer Science Depart, Mondragon Unibertsitatea,
Loramendi nº4, Arrasate Gipuzkoa, Spain
Chief of Health Management Service,
Hospital of Cruces, Bizkaia, Spain
Chief of Resarch Unit,
Hospital of Cruces, Bizkaia, Spain

The breast cancer medical process is almost entirely achieved manually and there is an evident risk of
human errors due to the eventual lack of experience of the staff (substitutions, sick leaves and other
reasons). Another fact is that the correct fulfillment of process may depend on the personal attitude of
the administrative staff. In order to guarantee the security of the patient the whole process will be
automatically orchestrated, and monitored. The support for the solution will be a service oriented
architecture combined with semantic web techniques (archetypes and ontologies) to infer knowledge
from predefined rules to make the process secure.

Keywords: Electronic healthcare records, clinical archetypes, ontologies, OWL, Web Services, breast
cancer, prognostic factor.

1. Introduction
The breast cancer is an increasing disease that affects many women in our country. Every year
there are diagnosed 40 cancer cases per 100.000 queries, thus breast cancer is the most frequent
malignant tumor in female population.
The diagnostic of the breast cancer involves a great number of professionals in different assistance
areas: family doctors, gynecologists, radiologists, pathologists, oncologists, administrative staff…,
and with diverse diagnostic and treatment resources that increases the complexity of the way to
handle. It supposes an organizational challenge because many services of the health care service
are involved and many different people are interacting to succeed with the diagnostic and
therapeutic process.
The weakness in the chain is that currently the process can only be made manually and the whole
process that involves administrative work, appointments and so is managed by the doctors.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

The different doctors that watch the patient during the assistance process can make decisions
individually that can change the treatment even though there is a written protocol to proceed in a
general way.
Due to previously exposed reasons there is a basic need that is to have a decision making tool that
communicates with the different IT systems and guaranties the reliability, control and monitoring
of the medical process. This tool will manage efficiently the resources independently who the
operator is and his attitude to work in order to minimize human errors.
The decision making backed in artificial intelligence mechanisms plays an important role in this
project because human errors due to lack of experience can be reduced and the addition of
automatic inferred knowledge will add relevant value to the research.
The use of ontologies is a key factor in this project because the knowledge of a doctor about a
diagnostic is difficult to transfer because the knowledge is based on personal experience and the
representation of this knowledge to explain to others is not homogeneous. The lack of a
homogeneous representation of the knowledge is a problem in order to share and compare
experiences and knowledge among professionals.
2. Development
Based on the research done by Matthew Hardy Williams11 (Integrating Ontologies and
Argumentation for decision-making in breast cancer) we produce our ontology for the data
obtained from Cruces Hospital.

2.1. Identifying and classifying the process

The first step for the early detection of the breast cancer is self-exam, that should be part of the
monthly health revision of every woman. If they notice any change in the breast they should go to
their oncologist as soon as possible. If they are older than 40 years or they have a high risk of
suffer breast cancer they should have a yearly mammography and a physical exploration by the
If any of those tests has a minimal chance of cancer the doctor should make the pertinent test to
know the stage of the cancer. There are different types12 of tests.
In the medical terminology the stage of the breast cancer is defined by three principles: T (size of
the tumor), N (lymphatic nodes), M (metastasis).
1. T (Primary Tumor):
- T0: there is no any evidence of tumor.
- T1: tumor size is less or equal than 2 cm.
- T2: tumor size is between 2 and 5 cm.
- T3: tumor size is bigger than 5cm.
2. N (lymphatic nodes):
- N0: There is no any lymphatic node affected.
- N1: The cancer has spread from 1 to 3 lymphatic nodes.
- N2: The cancer has spread from 4 to 9 lymphatic nodes.


Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

- N3: The cancer has spread to more than 9 lymphatic nodes.

3. M (presence of metastasis):
- M0: There is no metastasis.
- M1: There is metastasis, so the cancer has spread to near organs.

Based on these features (characteristics) defines the different stages13 of the breast cancer.

2.2. Clinical Archetypes

An archetype is a re-usable, formal model of a domain concept. The formal concept was originally
described in detail in a paper by Thomas Beale 14.
For example an archetype for "Breast cancer identification" is a model of what information should
be captured for this kind of identification - usually primary tumor, lymphatic nodes and presence
of metastasis and instrument or other protocol information. In general, they are defined for wide
re-use, however, they can be specialized to include local particularities. For this approach, only
those related with breast cancer´s context should be used.
The key benefits of archetypes include:
Knowledge-enabled systems: the separation of information and knowledge concerns in software
systems, allowing cheap, future-proof software to be built;
Knowledge-level interoperability: the ability of systems to reliably communicate with each other at
the level of knowledge concepts;
Domain empowerment: the empowerment of domain specialists to define the informational
concepts they work with, and have direct control over their information systems.
Intelligent Querying: to be used at runtime to enable the efficient querying of data based on the
structure of archetypes from which the data was created.
To achieve the aim of this paper, OpenEHR 15 archetypes are used as the main basis to build OWL
classes, subclasses and properties. Once these clinical archetypes are translated into OWL objects
we combine them to set up our Breast Cancer Ontology which becomes the starting point of the
inference process. By example Figure. 1 shows Breast Cancer Archetype:

Breast cancer

Primary tumor cm [size]

Lymphatic nodes Number of nodes


Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Fig.1: Breast cancer Archetype.

2.3. Ontology
The following example shows a practical approach of a part of the ontology. The definition of
MsJones is described in Figure 2:
For example: Let’s suppose the following significant data to identify the breast cancer. Ms Jones is
an aged 50 plus woman, postmenopausal, 53 years old who after done the breast cancer test has a
5 cm tumor in her breast. She has more than 9 lymphatic nodes infected and there is no
<Women rdf:ID="MsJones">
<Metastasis rdf:ID="Met_Negative">
<hasResult rdf:datatype=""
<hasAge rdf:datatype=""
<rdf:type rdf:resource="#Aged50Plus"/>
<rdf:type rdf:resource="#Postmenopausal"/>
<Bigger5cm rdf:ID="Bigger5cm_1"/>
<hasLymphNodes rdf:resource="#More9Node_1"/>

Fig.2: Ms. Jones´ Definition.

2.4. Inference rules

Based on this representation inference rules will be created to identify the cancer and the possible
Using natural language the code would be:
If the tumor is bigger than 5 cm it is considered T3 (rule1),if there are more than 9 nodes infected
N3 (rule2), and the results of metastasis tests are negative (rule3). With these results the
diagnostic is a surgery cancer in stage IIIC (rule4).
The 5, 6 and 7 rules indicate the treatment details (drugs, duration) for the breast and lymphatic

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

2.5. Inferred knowledge

Based on the data shown in previous sections and after applying the inference rules defined the
results of the inferred knowledge are displayed in Figure 3:

<rdf:Description rdf:about="http://acl/BMV#MsJones">
<j.0:hasAge rdf:datatype="">53</j.0:hasAge>
<j.0:hasLymphNodes rdf:resource="http://acl/BMV#More9Node_1"/>
<j.0:hasMetastasis rdf:resource="http://acl/BMV#Met_Negative"/>
<j.0:hasTumor rdf:resource="http://acl/BMV#Bigger5cm_1"/>
<rdf:type rdf:resource="http://acl/BMV#Postmenopausal"/>
<rdf:type rdf:resource="http://acl/BMV#Aged50Plus"/>
<rdf:type rdf:resource="http://acl/BMV#Women"/>
<rdf:type rdf:resource=""/>
<rdf:type rdf:resource="http://acl/BMV#Adults"/>
<rdf:type rdf:resource=""/>
<j.0:recommendedDrugTreatment>Tamoxifen 40mg during 2 years</j.0:recommendedDrugTreatment>
<j.0:lymphNodesRecommendedTreatment>radiation to supravicular and/or internal mamary lymph nodes
and removed auxiliary lymph nodes</j.0:lymphNodesRecommendedTreatment>
<j.0:breastRecommendedTreatment>modified radical mastectomy followed by radiation and lumpectomy
plus radiation following chemotherapy to shrink a large single

Fig.3: Inferred knowledge.

As we can see the current state of the cancer for the patient has been inferred as well as the
recommended treatment for the lymphatic nodes (radiation to supravicular and/or internal
mammary lymph nodes and removed auxiliary lymph nodes), the pills doses and the duration of
the treatment (Tamoxifen 40mg during 2 years).
We have implemented a prototype with this part of the functionality to make a demonstration to
non experts in semantic web following the W3C accessibility and usability recommendations.
The inferred knowledge is represented in a more user-friendly format using web development and
transformation techniques.
The prototype shows the inferred knowledge with the detail of the stage of the cancer, possible
treatment etc…
It is important to say that the presented solution will be used to help the doctor to make a
decision on the diagnostic or treatment. The final decision will always be a human decision.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

3. Conclusions
- We developed a case study based on a breast cancer guideline, and in order to make this
feasible we have provided a simple prototype.
- We aimed to achieve the following:
- Model the results of clinical trials, and the background knowledge that provides the terms
used to describe the results of the trials.
- Model arguments for both belief and decisions.
- Take a piece of medical knowledge and represent knowledge at different levels of
- Represent the terms related to breast cancer in order to unify concepts.
- A rapid access to the updated information of the patient that will improve the diagnostic
and treatment.
- The prototype will help the doctors in the decision making for diagnostic and treatment.

1. Douglas K. Barry. The Object Database Handbook: How to Select, Implement, and Use
Object-Oriented Databases. John Wiley and Sons, 1th edition, 1996.
2. Anita Burgun, Olivier Bodenreider, Christian Jacquelinet Issues in the Classification of
Disease Instances with Ontologies, MIE 2006
3. Bibbo M. Comprehensive Cyropathology. W.B. Saunders Co. Philadelphia. 1997
4. J. Broekstra, A. Kampman, y F. van Harmelen. Sesame: A generic architecture for storing
and querying rdf and rdf schema, 2002.
5. Tim Berners-Lee, James Hendler, y Ora Lassila. The semantic web. Scientic American,
284(5):34{43, May 2001.
6. Clark D. P. Thyroid Cytopathology. ESSENTIALS IN CYTOPATHOLOGY. Foreword by Edmund
S. Cibas, M.D. Series Editor Dorothy L.Rosenthal. Springer 2005
7. Amarnath Gupta et alia: Towards a formalization of disease-specific ontologies for
neuroinformatics, 2003.
8. T. R. Gruber. A translation approach to portable ontology specifications. Knowledge
Acquisition, 6(2):199{221, 1993.
9. M. Hardy Williams. Integrating Ontologies and argumentation for decision-making in breast
cancer. Doctoral Thesis, University College London, 2008.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

A method to reconcile biomedical terminologies

M. Taboada(1), R. Lalín(1), D. Martínez(2)
Departamento de Electrónica e Computación, Universidad de Santiago de Compostela,
15782 Santiago de Compostela, Spain.
Departamento de Física Aplicada. Universidad de Santiago de Compostela,
15782 Santiago de Compostela, Spain.

Terminology lexical alignment is a crucial task to enable inter-operability between health care
applications indexed by different but related controlled vocabularies. In this study, we propose a
method to automatically reconcile two biomedical terminologies. First, the method identifies similar
terms across terminologies using a lexical technique provided by the National Library of Medicine (NLM)
to perform a search in the UMLS Metathesaurus. Second, based on this term alignment, the method
recognizes similar concepts on the basis of concept-to-term relationships. Third, the method validates
the lexical alignment by checking the belonging of concepts to similar top-level categories across
Keywords: controlled vocabularies, the Unified Medical Language System (UMLS), terminology mapping.

1. Introduction
Biomedical terminologies offer shared vocabularies, so they are key to integrate health care
systems. The need of using them in many health care activities, as well as in information retrieval,
has caused they have increased in number. With this proliferation, diferent health care systems
use diferent biomedical termi nologies. Therefore, biomedical terminology alignment helps
establish agreement between diferent health care systems. In this context, the need for new tools
and techniques to reconcile diferent but related biomedical terminologies becomes crucial [3, 10,
The use of lexical methods in terminology alignment produces high-quality mappings [3, 9, 10, 13].
However, the huge volume of data in biomedical termi nologies hinders the manual revision of
lexical mappings; a relevant human effort is needed to suitably interpret them and guarantee the
validity of the final lexical alignment [12]. Therefore, new methods to automatically evaluate the
lexical alignments are needed.
The work presented here exemplifies a way of facing up to a method that can establish
correspondences between diferent but related biomedical terminologies. In the present study, we
examine a lexical technique (named NormalizeString) provided by the UMLS Knowledge Source
Server (UMLSKS)16, one of the most important publicly available resources in the biomedical
domain. We propose a method to automatically detect invalid lexical mappings and so, en hance
the lexical alignment generated by this lexical technique. The method was applied to map a large-
scale biomedical thesaurus (EMTREE17) to the complete UMLS Metathesaurus [1, 6].


Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

2. Background
Clinical terminology systems group words and phrases symbolizing the knowledge within a
particular domain [2]. There are multiples ways of expressing knowledge in a terminology [7].
Simplifying, terminologies can represent knowledge by means of only terms (term-based view) or
by means of both concept and terms (concept-based view). In this second case, terms are only
labels (words) for concepts, which are the main entities encapsulating meaning.
On the other hand, terminology mapping is the task of identifying correspondences between
entities (terms or concepts) across two terminologies. Discovering these matches is intrinsically
problematic to automate. Currently, we can find two approaches to get interoperability among
terminologies: merging and aligning source thesauri. In the first approach, a single coherent
version is created from merging original sources. This approach was followed in the UMLS [6] to
develop a large meta-thesaurus that reconciles diferences in terminology from over 130
biomedical information sources. However, this solution is too expensive with large sources, so
often the most feasible solution is to keep the original sources separately and to add
correspondences among sources. This second option is, for example, the recommended one for
developing multilingual thesauri[5] and the followed one in many researches in the biomedical
domain [4, 8, 10,13].
3. Methods

3.1. Terminologies used

EMTREE is a terminology developed by Elsevier to index EMBASE, an on-line database for life
science researchers. The version of EMTREE used in our experiments contains 46.427 concepts
distributed into 15 main categories or facets, where each facet is represented by a taxonomy. The
used version of EMTREE contains more than 200.000 terms.
The UMLS Metathesaurus [1, 6] contains information about medical concepts, terms, string-names
and the relationships between them. All this information is drawn from over 130 controlled
vocabularies, such as SNOMED or MeSH. The used version of the UMLS contains around 1,3
millions of concepts.

3.2. Overview
The following sections present, in detail, the three steps included in our method: lexical alignment
of terms, lexical alignment of concepts and validation of lexical alignment. Our method was
programmed in Java, running on a Personal Computer over Linux, in some occasions, and
Microsoft Windows XP, in other occasions, and it used an XML representation for both EMTREE
and the Metathesaurus.

3.3. Lexical alignment of terms

This phase identifies pairs of terms having the same normalized form across systems. In Figure 1,
thorax Thorax and chest  Chest are pairs of EMTREE-UMLS terms exhibiting lexical similarity.
We call these pairs term anchors. The method works in the following way. For each EMTREE
concept, a request for all terms defining the concept is sent to the UMLS database. The UMLSKS
may return zero, one or several UMLS concepts. In the first case, we say that there is no mapping;
in the second case, the mapping is simple (one-to-one); in the third case, the mapping is
ambiguous as a single term maps to several possible UMLS concepts. In Figure 1, there are two

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

simple mappings (thorax  Thorax and rib cage  Rib cage), and one ambiguous mapping: chest
maps both Chest and CHEST.

3.4. Lexical alignment of concepts

As in both, EMTREE and the Metathesaurus, knowledge is organized by concepts (i.e. meanings),
we are concerned with concept mappings besides term mappings. On the basis of concept-to-term
relationships into each terminology, the method groups all UMLS concepts recovered for an
EMTREE concept. The resulting pairs of EMTREE-UMLS concepts are called concept anchors.
Examples shown in Figure 1 include six concept anchors:
thorax-Anterior thoracic region
thorax-Entire thorax
thorax-Thoracic cage structure
thorax-Malignant neoplasm of thorax
thorax-Chest problem

3.5. Validation of lexical mappings

This stage validates the lexical alignment, including two phases:
1. Identification of similar top-level categories across terminologies. From the set of concept
anchors, the method calculates the similarity between the top-level categories in EMTREE
and the UMLS.

Fig. 1: Example data of term anchors between EMTREE and the UMLS.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

2. Global checking of the belonging of the concepts in an anchor to similar top-level

categories. For example, in Figure 1, six concept anchors are generated during the lexical
alignment for the EMTREE concept thorax; but only four of them are identified as valid,
those that are in similar top-level categories:
thorax-Anterior thoracic region
thorax-Entire thorax
thorax-Thoracic cage structure

The two remaining are identified as invalid, those that are in dissimilar categories (Chest is
an Anatomical concept in EMTREE whereas Malignant neoplasm of thorax and Chest
problem are Disorders in the UMLS):
thorax-Malignant neoplasm of thorax
thorax-Chest problem
4. Results
In total, forty-one thousand five hundred and thirty-four concepts in EMTREE matched one or
more concepts in the UMLS Metathesaurus, by lexical alignment. In addition to the UMLS
Metathesaurus, our experiment involved a source terminology, EMTREE, with a good collection of
synonyms: 4.18 per concept. Our method exploited the fact that the knowledge in EMTREE is
organized by concepts, in the way of concept-to-term relationships. As a consequence of taking all
EMTREE synonyms into account, the coverage increased from 66.5% (for EMTREE terms) up to
80% (for EMTREE concepts). This confirms that if the terminologies to be mapped include a large
set of synonyms, then the coverage of lexical alignment increases. However, the higher the
number of synonyms, the higher the number of ambiguous mappings. In our experiment, on
average 2.1 mappings per EMTREE concept were found when we only mapped EMTREE preferred
terms; but this number increased up to 3.2 mappings per EMTREE concept when we mapped all
EMTREE synonyms.
On the other hand, 6 similar top-level categories across the two terminologies were identified.
Although the number of categories having similarity is small (6from 15 EMTREE top-level
categories), this corresponds to a substantial number of EMTREE concepts: 65.3% of the total
EMTREE concepts and so, 75.8% of the total lexical mappings found by NormalizeString.
Finally, by global checking, our method detected 6,927 (7.9%) invalid term anchors. This led to 410
invalid concepts (1.2%) of the total concept anchors.
5. Conclusions
The contribution of our work, rather than producing new alignment methods or tools, is to apply
the existing lexical techniques to map large-scale terminologies and to provide a method to
improve lexical techniques by detecting invalid lexical mappings.
A comparison of our method against past terminology mapping research is highlighted by some
key features. First, our method is fexible enough to validate lexical mappings in terminology
alignment through the UMLS. The only prerequisite is that one of the two terminologies has to be
integrated into the UMLS Metathesaurus. Although our experiment has validated the complete
UMLS Metathesaurus, our approach can be used to create inter-terminology mappings using the

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

UMLS, as an external resource. In order to do this, the lexical technique must be restricted to the
terminology target. In addition, although our method has been tested using the NormalizeString, it
can also be applied selecting other techniques provided by the UMLS, such as ExactMatch,
NormalizeWord, ApproxMatch, etc. Second, our system is fully automated, requiring no manual
input or rule definitions. Third, our method has been tested and evaluated with a large-scale real-
world vocabulary against the UMLS.
This work has been funded by the Secretaría General de Política Científica y Tecnológica del
Ministerio de Educación y Ciencia, through the research project TIN2006-15453-C04-02.
1. Bodenreider, O. The Unified Medical Language System(UMLS): integrating biomedical
terminology. Nucleic Acids Research, 32 (2004), Database issue D267-D270.
2. Cimino, J.J. Desiderata for Controlled Medical vocabularies in the Twenty-First Century.
Methods Inf. Med., 37 (4-5) (1998), 394-403.
3. Doerr, M. Semantic problems of thesaurus mapping. Journal of Digital Information, 1 (8)
4. Fung KW, Bodenreider O, Aronson AR, Hole WT and Srinivasan S. Combining lexical and
semantic methods of inter-terminology mapping using the UMLS. StudHealth Technol
Inform 129 (2007), 605-609.
5. ISO 5964, 1985. Guidelines for the establishment and development of multilingual
thesauri. International Organization for Standarization.
6. Lindberg, D., Humphreys, B. and Mc Cray, A. The Unified Medical LanguageSystem.
Methods of Information in Medicine, 32 (1993), 281-291.
7. Rosenbloom, S.T., Miller, R.A., Johnson, K.B., Elkin, P.L., Brown, S.H. A Model for Evaluating
Interface Terminologies. JAMIA, 15 (1) (2008), 65-76.
8. Sarkar, I.N., Cantor, M.N., Gelman, R., Hartel, F. and Lussier, Y.A. Linking biomedical
language Information and knowledge resources in the 21st Century: GO and UMLS. Pacific
Symposium on Biocomputing, 8 (2003), 439-450.
9. Sun, J. Y., Sun, Y. A system for automated lexical mapping. JAMIA, 13 (3) (2006), 334-343.
10. Vizine-Goetz, D., Hickey, C., Houghton, A. and Thompson, R. Vocabulary Mapping for
Terminology Services. Journal of Digital Information, 4(4) (2004).
11. Yu, A.C. Methods in biomedical ontology. Journal of Biomedical ontology, 30(3)(2006), 252-
12. Zeng, M.L. and Chang, L.M. Trends and issues in establishing interoperability among
knowledge organization systems. Journal of the American Society for Information Science
and Technology, 55 (5) (2004), 377-395.
13. Zhang, S., Mork, P., Bodenreider, O. and Berstein, P.A. Comparing two approaches for
aligning representations of anatomy. Artificial Intelligence in Medicine 39 (2007), 227-236.

Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)

Annex I. Committees

Technical committee

Pablo Serrano Balazote, Hospital de Fuenlabrada (co-chair)

Miguel-Angel Sicilia, Universidad de Alcalá (co-chair)
Leonardo Lezcano, Universidad a Distancia de Madrid
Jesualdo Fernández, Universidad de Murcia
Montse Robles, Universidad Politécnica de Valencia
José Manuel Gómez, Intelligent Software Components
Raimundo Lozano, Hospital Clínico y Provincial de Barcelona
Adolfo Muñoz, Instituto de Salud Carlos III
Tomas Pariente, ATOS Origin
Teddy Miranda. IMET
Sascha Ossowski, Universidad Rey Juan Carlos
Paloma Martinez Fernandez, Universidad Carlos III de Madrid

Organization commitee

Leonardo Lezcano, Universidad a Distancia de Madrid (chair)

Rosmary Calzadilla, Universidad de Alcalá
Juan-José Sicilia, Hospital del Henares
Elena García Barriocanal, Universidad de Alcalá