Sie sind auf Seite 1von 4

www.ijcait.

com International Journal of Computer Applications & Information Technology


Vol. II, Issue I, January 2013 (ISSN: 2278-7720)

P a g e | 1

Data mining in pharmacovigilance to reduce Adverse
Drug Effects(ADRs)

Ms. Miral Kothari
Gujarat Technological
University
AITS, Yogidham Gurukul,
Kalawad Road, Rajkot.

Ms. Priti Sadaria
Saurashtra University
Virani Science College,
Yogidham Gurukul,
Kalawad Road, Rajkot.
Ms. Nehal Dave
Saurashtra University
Virani Science College,
Yogidham Gurukul,
Kalawad Road, Rajkot.
ABSTRACT
Pharmaceutical industry provides the medicines in
different formats. It can be tablets, capsules, liquid or
injectables. Every drug in any form may cause adverse
effect varies from person to person. Before putting any
drug in the market, the drugs are being tested for adverse
effects on large scale. Pharacovigilance is a science which
is purely related with discovery, understanding and
anticipation of the Adverse Drug Effect (ADEs).
Pharmaceutical experts and industries much rely on data
mining algorithms or techniques to understand the huge
data collected from healthcare professionals and patients
and make the use of that data for further research and
development of new drug. In this paper, author has tried to
implement Bayesian Classification method of data mining
to assist the research person in decision making.

Keywords
Data mining, pharmacovigilance, Bayesian classification

1. INTRODUCTION
Medicines are required to be evaluated in terms of harm to
the human body. Harm can be of short term or long term.
Before being introduced to market, every drug or medicine
is tested but comparitively on a small number of people. In
wider population, it is possible that drug may create
reactions to the human body which were not detected
during testing. Adverse Drug Effect (ADEs) is also
refered as Adverse Drug Reaction(ADRs) are the response
to the medicine which is used. Every patient is a unique
medicine user with different life style and circumstances
and whose body will react in different way.
Pharmacovigilance is a tool or science which can be used
to evaluate and improve the safety of medicines [1].
Pharamacovigilance is a collection of activities which are
conducted to detect, assess, understand, monitor or to
prevent the Adverse Drug Reactions(ADRs)[2].
The main question arises in pharamcovigilance is what is
the need to monitor the adverse reactions of drugs? This
question is important because every drug before being
introduced to the market for commercial purpose, have
been gone through adequate study. But the answer is very
simple and is that the highest priority is given to human
health and to keep the humans safe and make the drug
more safe even after adequate testing, monitoring the
adverse effects is necessary[3]. The process of
pharmacovigilance involves risk analysis and risk
management.

The process is illustrated in Figure (see Figure 1). The risk
analysis is the phase which involves identification,
quantification and assessment of the drug reactions. The
survey will be held to collect the data and identifying the
actual cause of the reaction. If it is really the drug taken
then, the stage of quantification will be held. In
quantification, more samples will be considered and the
last phase of risk analysis is assessment in which the
collected and identified samples will be assess and how
much the drug is risky to the patient can be evaluated. As
we understand the risk during risk analysis, the second
main phase of the pharmacovigilance process is carried
out. In the second phase, the identified risks are managed.
The remedies to avoid or to reduce the intensity of risk are
taken. The risks will be measured and evalueated at the
admininstrative level and the significance of the risks will
be communicated to all the levels. Based on the risks,
administration will form some strategies to prevent the
risks. As the risks are on the health of human beings, the
strategies are developed to reduce or to eliminate the risk
as far as possible.
























Fig 1: Initial phases of pharmacovigilance

The aims of pharamacovgilance are:
Revealing the adverse effects of existing drugs
Discovering unexpected effects of newer drugs
Recognizing the risk factors associated with
development of adverse drug reactions.
Quantitative estimation of the risk factors like
Risk Analysis
Identification
Quantification
Assessment
Risk Management
Administrative
Communicating
Prevention
www.ijcait.com International Journal of Computer Applications & Information Technology
Vol. II, Issue I, January 2013 (ISSN: 2278-7720)

P a g e | 2

o how much admissions to the hospital due to
ADRs?
o What is the mortality ratio of ADRs?
Data mining is the tool which will assist the
pharmacovigilance in reducing, eliminating or
understanding the risk factor of ADR.[4-6]. The data
mining is discussed in the succeeding session of the paper.

2. PRACTICAL ASPECT OF DATA
MINING
Due to the development of industries and technologies,
industries produce very large amount of data. It becomes
very necessary to manage this data in order to utilize the
data. Based on this data, the business decision can be made
by the decision maker. But data in the available form is
tough to manage and analyze. For this reason, it is
necessary to take abstract of the data. Data mining can be
used as a tool to discover the pattern or prototype available
in the data and discovering the facts hidden behind the
data[7]. Data mining is a combination of data, database
management and data visualization. The purpose of data
mining is to extract the knowledge. Data mining is more
useful particularly when data set is too large. The points to
be considered are once the data mining is stated precisely
i.e. it is decided that on which kind of data, the data mining
is to be done, the large data set also becomes small as out
of it some amount of data is of interest from the view point
of the data mining. While the second point is in very large
database, a sample is sufficient for accurate model[8].

To mine the data means to extract the useful
information from the data. Basically, at first sight, this task
does not require any kind of expertise. But in actual sense,
to make the data mining effective and to extract knowledge
from large data set, it is not easy. Mainly expertise is
required in subject and data analysis and data observation.
An expert in subject can decide which kind of questions
can be answered from the analysis. Data expert is able to
decide that from where the data is to be collected. While
the analysis expert requires strong judgment power based
on statistics along with considering selection bias[9]. Data
mining is a repetitive process. The result of data mining is
knowledge. Figure (see Figure 2) illustrates the process of
data mining. The main stages of data mining involve four
major activities.
2.1 Problem definition
In this stage, the problem is identified. To identify the
problem means it is to be judged that which kind of
knowledge or information will be there as output after
completion of the process. It is advisable to decide in
advance that how the produced outcome will be used. The
outputs can be categorized into three[10] as under.
1. The result can be used for descriptive purpose i.e. the
resultant data may be used to describe any segment or
group of whole data set.
2. The discovered facts that were hidden behind the data
may be used to predict the situation outside the
database.
3. The result can be directly involved in the system
being developed.


Fig 2: Main stages of process of data mining

No doubt that there are possibilities of biases in the data.
In spite of this possibility, the consideration is to be that up
to what extent the data is related with the question to be
solved after the data mining. Biased data produces wrong
conclusion. When any correlation is found between the
produced data and requirement, it cannot be justified only
by data analysis but it requires knowledge about the
domain also.

2.2 Achieve required information
However, analysis and conclusion is done on the basis of
databases, but the purpose of creating the database in to
support the business decision making and processes. The
analysis can suggest the statistical design. To identify the
actual pattern of the dataset, it is necessary to understand
the possible biases. If the bias occurs frequently on the
same dataset and for the same goal, it may mislead to the
actual knowledge. Ultimately it is going to affect the
correct decision. After the problem is defined, the
necessary information is to be achieved. While doing the
process of achieving the information, it is to be kept in
mind that whatever information is going to be acquired,
must be related or targeted to the final result. If the
information is achieved and it is not concerned with the
final output, then the effort is wasted and has to move
again for collection. It is illustrated in the following Figure
(see Figure 3).













Problem
Definition
achieve
required info
Selection of
data
Pre-
processing of
data
Interpretation
Use
Information
Biased and
not useful
Unbiased
and useful
Desired
information
www.ijcait.com International Journal of Computer Applications & Information Technology
Vol. II, Issue I, January 2013 (ISSN: 2278-7720)

P a g e | 3

Fig 3. Collection of information

Therefore, in the second stage of the data mining process,
it is very important to achieve unbiased useful and timely
information.
2.3 Selection of data
The very important stage in the process of data mining is
the selection of data. Once the required information is
achieved, it is very significant job to select the related data
out of it. To select the appropriate and relevant data seeks
responsibility. The selected data should reflect the data
mining belief that let the data speak itself. Selection of
the data should be free from pre-defined criteria which
have been set prior to looking at the data. To select the
appropriate data, data from different sources such as data
warehouses, data marts etc. are also to be considered.
Analysts generally like to make use of data warehouse or
data mart to select the data accurately and relatively[11].
We can say that analyst or personnel can filter the relevant
data with the help of sources like data warehouse or data
mart. It is advisable to first take the prevention steps to
avoid the conflicts and inconsistency of data before
integrating more than one resource otherwise it may result
into a time consuming process. Thus, by looking at the risk
and time affecting factor, selection of the data is to be done
very precisely.
2.4 Pre-processing of data
At the fourth stage of data mining, we already have some
data on hand which is selected out of the identified data.
We know that the data is selected from the required
information but before the analysis takes place on the data,
it has to be processed and that is why the stage is called
pre- processing of the data. In this phase, data is processed
and experts say that you can use more than one data
mining functions for the same type of data. It should be
noted that if there are more than one model, each model
should be assessed for the data by experts. Deriving new
attributes other than existing is also one of the important
task carried on during data mining process.
2.5 Interpretation
Analysis of the data and evaluation of the data takes place
in this phase. This is the really significant phase for the
data mining because interpretation of data asks for
expertise in analyzing. If the data is interpreted wrongly, it
may lead to wrong business decision or conclusion. To
analyze a data in a correct direction needs all three kind of
expertise that we talked about earlier. Knowledge about
the domain on which mining is to be done or being done is
required to interpret the result in correct manner. To
understand the patterns discovered during the process, data
expertise is required. Data mining expertise can be
implemented for technical interpretation of results. For
further, data mining questions can be raised for sub-
regions of the data and attributes where you find the
average of the target variable is smaller than value of
target variable[12]. The meaning is we have to verify
whether the model or processed data achieve the business
objective or have all business issues been considered or
not.
2.6 Use
The experts use the results produced during the process of
data mining. The data which is selected, related to the
domain and interpreted correctly will be used into the
database domain. The results are stored and can be used at
any stage. It can be used as input for any further process or
it can directly be integrated for the application. Result
(output) of one process can be raw material (input) for
other process.
Thus, looking at the data mining process, we can say that
by passing the data from 6 main stages, at the end of the
cycle, we can have some meaningful and useful data which
can assist the analyst in changing the strategy of the
product. The techniques which can be implemented in
pharmacovigilance are discussed in next section.
3. BAYESIAN CLASSIFICATION -
IMPLEMENTATION
For pharmacovigilance, out of the many available data
mining techniques any one can be used. Classification and
prediction are the two techniques of data analysis that can
be used to describe the significant and useful data or to
predict the future requirements. Many classification and
prediction methods have been proposed by researchers in
machine learning, pattern recognition and statistics.
Bayesian classifier is one of the most efficient techniques
used for classification. An officer in pharmaceutical
research industry, want to analyze ADEs data for one
particular drug say gatifloxacin to come to the decision
whether to make any changes in dosage or to withdraw it
from the market. The decision can be taken by classifying
the ADEs of gatifloxacin into two categories. Common
side effects and Severe side effects. The data available
from the patients, clinical experts and pharmacists will be
categorized and classifier is constructed to predict
categorical labels such as common or severe.
Classification and prediction can be compared by factors
like accuracy, speed, robustness, scalability and
interpretability. Bayesian classification is a well known
approach for data classification.
Bayesian provides practical learning algorithm which are
very useful in practical real life aspects. Bayesian
combines prior obtained knowledge and recently observed
data. It is a model based approach which offers to generate
useful conceptual framework. As the Bayesian follows
probabilistic model specification, any sequences or objects
can be classified. Bayesian classifier is comparable with
other techniques like decision tree. Bayesian classifiers are
useful when dealing with large databases with high
accuracy and speed[13]. Nave Bayesian classifier assumes
that the effect of an attribute value on a given class is not
dependent of the values of the other attributes. This
assumption is known as class conditional independence.
Bayesian classifiers, allow the representation of
dependencies among subsets of attributes. Following is the
Bayes Theorem.

3.1 Bayes Rule
Bayes theorem is named after Thomas Bayes. It is a
theorem with two different interpretations.
1. Bayesian interpretation
2. Frequentist interpretation
The first interprets in a way which makes clear that how a
subjective degree of belief should rationally change to
evidence. The second one interprets in a way which relates
inverse representation of the probabilities. In the Baysian
interpretation, Bayes theorem is based on statistics and
can be applied to various fields like science, engineering,
micro economics, game theory, medicine and law.

www.ijcait.com International Journal of Computer Applications & Information Technology
Vol. II, Issue I, January 2013 (ISSN: 2278-7720)

P a g e | 4

Table 1: To keep the drug in market or to withdraw








The formula provided by Bayesian and known as Bayes
rule is as under:





In the Bayes rule:
d = data
h = hypothesis (model) rearranging
p(h|d)P(d) = P(d|h) P(h)
P(d|h) = P(d|h)
The joined probability on both the sides.
What indicates what in Bayes rule:
P(h): Probability of hypothesis h before seeing any data
P(d|h): Probability of the data if the hypothesis h is true
P(d): Marginal probability of the data
P(h|d): probability of hypothesis after seeing the data
The bayesian method can be illustrated more properly by
an example. Lets assume that there is one drug say A.
After introducing it to the market, some unexpected and
harmful results are coming out. In this situation, the data
from health experts, patients and pharmacists wil be
collected by considering the side effects. The data
collected is shown in Table 1. The ultimate goal after
refering table is to look at the probability of side effects
and compute it according to Bayesian rule. Based on the
computed result, the decision will be taken by medical
authorities whether to continue the drug in the society or to
declare it harmful and stop its usage. By looking at the
table, we can estimate that the probability of drug to be
withdrawn from the market is more than to be kept in the
market.
P(w) = 7/10 = 0.7
P(u) = 3/10 = 0.3
Where w = withdrawing the drug and
u = use the drug
Looking at the result, we can come to a conclusion that it
is preferable to withdraw the drug from the market as 7 out
of 10 consumers or patients faces danger to the health.

4. CONCLUSION
Phamaceutical industry is the area which is directly related
to the human beings life. It produces disadvantages as
much as it produces benefits. Any drug may react in
adverse way varying from person to person.
Pharmacovigilance experts give over extensive effort to


















post marketing observation of adverse drug reactions.
Based on the observation and its data, they use data mining
technique to find the hidden fact and takes decision
accordingly.The Bayesian Rule is somewhat more
efficient, fast and useful for pharmacovigilance.

5. REFERENCES
[1] haiweb.org/19072009/19July2009FactsheetTheEurope
anCommission'sProposalforaPharmacovigilanceDirective.
pdf
[2] who.int/medicines/areas/quality_safety/safety_efficac
y/ S.AfricaDraftGuidelines.pdf
[3] Dhikav, V., Singh, S. 2004 Adverse drug reactions
monitoring in india, Journal, Indian academy of clinical
medicine vol. 5, no.1 28-33
[4] Bates, DW., Spell, N., Cullen, DJ et al. JAMA 1997,
The costs of adverse drug reactions in hospitalised
patients.; 277: 301-07.
[5] Leape, LL. Errors in medicine. JAMA 1994; 272:
1851-7.
[6] Bates, DW., Collen, DJ., laird, N et al. JAMA 1997
Incidence of adverse drug events and potential adverse
drug events in hospitalized patients.; 277: 307-11.
[7] Mara, S. P., Alberto, S., Vctor, R., Pilar, H., Jose M
,P. 2007 Design and implementation of a data mining grid-
aware architecture, Future Generation Computer Systems
23 4247
[8] Feelders, A., Daniels, H., Holsheimer, M. 2000
Briefings Methodological and practical aspects of data
mining Information & Management 37 271-281
[9] Hand, D. J., 1998, Data mining: statistics and more?
The American Statistician 52 (2), pp. 112-118.
[10] Glymour, C., Madigan, D., Pregibon, D., Smyth, P.,
1997 Statistical themes and lessons for data mining, Data
Mining and Knowledge Discovery 1, pp. 11 - 28.
[11] Subramanian, A., Smith, L. D., Nelson, A. C.,
Campbell, J.F., Bird, D. A., 1997 Strategic planning for
data warehousing, Information and Management 33, pp.
99-113.
[12] Friedman, J.H., Fisher, N. I., 1999 Bump hunting in
high-dimensional data, Statistics and Computing 9 (2), pp.
123 - 143.
13]Jiawei, H., Micheline, K., 2011 Data mining concepts
and techniques,



) (
) ( ) | (
) | (
d P
h P h d P
d h p
Common adverse effects Severe adverse effects
Patient Constipation Itching Vomiting Diabetes
fluctuation
Chest
pain
Blurred
vision
Should
the drug
be used?
1 YES NO NO NO NO NO YES
2 YES YES NO YES NO NO NO
3 YES YES YES NO NO NO YES
4 NO NO NO NO YES YES NO
5 NO YES NO NO NO YES NO
6 YES NO YES YES YES NO NO
7 NO YES NO NO YES NO NO
8 NO NO YES YES NO NO NO
9 NO NO YES NO NO NO YES
10 YES YES YES YES YES YES NO

Das könnte Ihnen auch gefallen