0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
107 Ansichten4 Seiten
This document summarizes a research paper that examines using data mining techniques like Bayesian classification to help reduce adverse drug effects. It discusses how pharmacovigilance collects data on adverse drug reactions to evaluate drug safety. Data mining can analyze large amounts of pharmacovigilance data to better understand risk factors and reactions. The document outlines the data mining process, including problem definition, data collection, data preparation, and result evaluation to extract useful knowledge from pharmacovigilance data and help improve drug safety.
This document summarizes a research paper that examines using data mining techniques like Bayesian classification to help reduce adverse drug effects. It discusses how pharmacovigilance collects data on adverse drug reactions to evaluate drug safety. Data mining can analyze large amounts of pharmacovigilance data to better understand risk factors and reactions. The document outlines the data mining process, including problem definition, data collection, data preparation, and result evaluation to extract useful knowledge from pharmacovigilance data and help improve drug safety.
This document summarizes a research paper that examines using data mining techniques like Bayesian classification to help reduce adverse drug effects. It discusses how pharmacovigilance collects data on adverse drug reactions to evaluate drug safety. Data mining can analyze large amounts of pharmacovigilance data to better understand risk factors and reactions. The document outlines the data mining process, including problem definition, data collection, data preparation, and result evaluation to extract useful knowledge from pharmacovigilance data and help improve drug safety.
Ms. Priti Sadaria Saurashtra University Virani Science College, Yogidham Gurukul, Kalawad Road, Rajkot. Ms. Nehal Dave Saurashtra University Virani Science College, Yogidham Gurukul, Kalawad Road, Rajkot. ABSTRACT Pharmaceutical industry provides the medicines in different formats. It can be tablets, capsules, liquid or injectables. Every drug in any form may cause adverse effect varies from person to person. Before putting any drug in the market, the drugs are being tested for adverse effects on large scale. Pharacovigilance is a science which is purely related with discovery, understanding and anticipation of the Adverse Drug Effect (ADEs). Pharmaceutical experts and industries much rely on data mining algorithms or techniques to understand the huge data collected from healthcare professionals and patients and make the use of that data for further research and development of new drug. In this paper, author has tried to implement Bayesian Classification method of data mining to assist the research person in decision making.
Keywords Data mining, pharmacovigilance, Bayesian classification
1. INTRODUCTION Medicines are required to be evaluated in terms of harm to the human body. Harm can be of short term or long term. Before being introduced to market, every drug or medicine is tested but comparitively on a small number of people. In wider population, it is possible that drug may create reactions to the human body which were not detected during testing. Adverse Drug Effect (ADEs) is also refered as Adverse Drug Reaction(ADRs) are the response to the medicine which is used. Every patient is a unique medicine user with different life style and circumstances and whose body will react in different way. Pharmacovigilance is a tool or science which can be used to evaluate and improve the safety of medicines [1]. Pharamacovigilance is a collection of activities which are conducted to detect, assess, understand, monitor or to prevent the Adverse Drug Reactions(ADRs)[2]. The main question arises in pharamcovigilance is what is the need to monitor the adverse reactions of drugs? This question is important because every drug before being introduced to the market for commercial purpose, have been gone through adequate study. But the answer is very simple and is that the highest priority is given to human health and to keep the humans safe and make the drug more safe even after adequate testing, monitoring the adverse effects is necessary[3]. The process of pharmacovigilance involves risk analysis and risk management.
The process is illustrated in Figure (see Figure 1). The risk analysis is the phase which involves identification, quantification and assessment of the drug reactions. The survey will be held to collect the data and identifying the actual cause of the reaction. If it is really the drug taken then, the stage of quantification will be held. In quantification, more samples will be considered and the last phase of risk analysis is assessment in which the collected and identified samples will be assess and how much the drug is risky to the patient can be evaluated. As we understand the risk during risk analysis, the second main phase of the pharmacovigilance process is carried out. In the second phase, the identified risks are managed. The remedies to avoid or to reduce the intensity of risk are taken. The risks will be measured and evalueated at the admininstrative level and the significance of the risks will be communicated to all the levels. Based on the risks, administration will form some strategies to prevent the risks. As the risks are on the health of human beings, the strategies are developed to reduce or to eliminate the risk as far as possible.
Fig 1: Initial phases of pharmacovigilance
The aims of pharamacovgilance are: Revealing the adverse effects of existing drugs Discovering unexpected effects of newer drugs Recognizing the risk factors associated with development of adverse drug reactions. Quantitative estimation of the risk factors like Risk Analysis Identification Quantification Assessment Risk Management Administrative Communicating Prevention www.ijcait.com International Journal of Computer Applications & Information Technology Vol. II, Issue I, January 2013 (ISSN: 2278-7720)
P a g e | 2
o how much admissions to the hospital due to ADRs? o What is the mortality ratio of ADRs? Data mining is the tool which will assist the pharmacovigilance in reducing, eliminating or understanding the risk factor of ADR.[4-6]. The data mining is discussed in the succeeding session of the paper.
2. PRACTICAL ASPECT OF DATA MINING Due to the development of industries and technologies, industries produce very large amount of data. It becomes very necessary to manage this data in order to utilize the data. Based on this data, the business decision can be made by the decision maker. But data in the available form is tough to manage and analyze. For this reason, it is necessary to take abstract of the data. Data mining can be used as a tool to discover the pattern or prototype available in the data and discovering the facts hidden behind the data[7]. Data mining is a combination of data, database management and data visualization. The purpose of data mining is to extract the knowledge. Data mining is more useful particularly when data set is too large. The points to be considered are once the data mining is stated precisely i.e. it is decided that on which kind of data, the data mining is to be done, the large data set also becomes small as out of it some amount of data is of interest from the view point of the data mining. While the second point is in very large database, a sample is sufficient for accurate model[8].
To mine the data means to extract the useful information from the data. Basically, at first sight, this task does not require any kind of expertise. But in actual sense, to make the data mining effective and to extract knowledge from large data set, it is not easy. Mainly expertise is required in subject and data analysis and data observation. An expert in subject can decide which kind of questions can be answered from the analysis. Data expert is able to decide that from where the data is to be collected. While the analysis expert requires strong judgment power based on statistics along with considering selection bias[9]. Data mining is a repetitive process. The result of data mining is knowledge. Figure (see Figure 2) illustrates the process of data mining. The main stages of data mining involve four major activities. 2.1 Problem definition In this stage, the problem is identified. To identify the problem means it is to be judged that which kind of knowledge or information will be there as output after completion of the process. It is advisable to decide in advance that how the produced outcome will be used. The outputs can be categorized into three[10] as under. 1. The result can be used for descriptive purpose i.e. the resultant data may be used to describe any segment or group of whole data set. 2. The discovered facts that were hidden behind the data may be used to predict the situation outside the database. 3. The result can be directly involved in the system being developed.
Fig 2: Main stages of process of data mining
No doubt that there are possibilities of biases in the data. In spite of this possibility, the consideration is to be that up to what extent the data is related with the question to be solved after the data mining. Biased data produces wrong conclusion. When any correlation is found between the produced data and requirement, it cannot be justified only by data analysis but it requires knowledge about the domain also.
2.2 Achieve required information However, analysis and conclusion is done on the basis of databases, but the purpose of creating the database in to support the business decision making and processes. The analysis can suggest the statistical design. To identify the actual pattern of the dataset, it is necessary to understand the possible biases. If the bias occurs frequently on the same dataset and for the same goal, it may mislead to the actual knowledge. Ultimately it is going to affect the correct decision. After the problem is defined, the necessary information is to be achieved. While doing the process of achieving the information, it is to be kept in mind that whatever information is going to be acquired, must be related or targeted to the final result. If the information is achieved and it is not concerned with the final output, then the effort is wasted and has to move again for collection. It is illustrated in the following Figure (see Figure 3).
Problem Definition achieve required info Selection of data Pre- processing of data Interpretation Use Information Biased and not useful Unbiased and useful Desired information www.ijcait.com International Journal of Computer Applications & Information Technology Vol. II, Issue I, January 2013 (ISSN: 2278-7720)
P a g e | 3
Fig 3. Collection of information
Therefore, in the second stage of the data mining process, it is very important to achieve unbiased useful and timely information. 2.3 Selection of data The very important stage in the process of data mining is the selection of data. Once the required information is achieved, it is very significant job to select the related data out of it. To select the appropriate and relevant data seeks responsibility. The selected data should reflect the data mining belief that let the data speak itself. Selection of the data should be free from pre-defined criteria which have been set prior to looking at the data. To select the appropriate data, data from different sources such as data warehouses, data marts etc. are also to be considered. Analysts generally like to make use of data warehouse or data mart to select the data accurately and relatively[11]. We can say that analyst or personnel can filter the relevant data with the help of sources like data warehouse or data mart. It is advisable to first take the prevention steps to avoid the conflicts and inconsistency of data before integrating more than one resource otherwise it may result into a time consuming process. Thus, by looking at the risk and time affecting factor, selection of the data is to be done very precisely. 2.4 Pre-processing of data At the fourth stage of data mining, we already have some data on hand which is selected out of the identified data. We know that the data is selected from the required information but before the analysis takes place on the data, it has to be processed and that is why the stage is called pre- processing of the data. In this phase, data is processed and experts say that you can use more than one data mining functions for the same type of data. It should be noted that if there are more than one model, each model should be assessed for the data by experts. Deriving new attributes other than existing is also one of the important task carried on during data mining process. 2.5 Interpretation Analysis of the data and evaluation of the data takes place in this phase. This is the really significant phase for the data mining because interpretation of data asks for expertise in analyzing. If the data is interpreted wrongly, it may lead to wrong business decision or conclusion. To analyze a data in a correct direction needs all three kind of expertise that we talked about earlier. Knowledge about the domain on which mining is to be done or being done is required to interpret the result in correct manner. To understand the patterns discovered during the process, data expertise is required. Data mining expertise can be implemented for technical interpretation of results. For further, data mining questions can be raised for sub- regions of the data and attributes where you find the average of the target variable is smaller than value of target variable[12]. The meaning is we have to verify whether the model or processed data achieve the business objective or have all business issues been considered or not. 2.6 Use The experts use the results produced during the process of data mining. The data which is selected, related to the domain and interpreted correctly will be used into the database domain. The results are stored and can be used at any stage. It can be used as input for any further process or it can directly be integrated for the application. Result (output) of one process can be raw material (input) for other process. Thus, looking at the data mining process, we can say that by passing the data from 6 main stages, at the end of the cycle, we can have some meaningful and useful data which can assist the analyst in changing the strategy of the product. The techniques which can be implemented in pharmacovigilance are discussed in next section. 3. BAYESIAN CLASSIFICATION - IMPLEMENTATION For pharmacovigilance, out of the many available data mining techniques any one can be used. Classification and prediction are the two techniques of data analysis that can be used to describe the significant and useful data or to predict the future requirements. Many classification and prediction methods have been proposed by researchers in machine learning, pattern recognition and statistics. Bayesian classifier is one of the most efficient techniques used for classification. An officer in pharmaceutical research industry, want to analyze ADEs data for one particular drug say gatifloxacin to come to the decision whether to make any changes in dosage or to withdraw it from the market. The decision can be taken by classifying the ADEs of gatifloxacin into two categories. Common side effects and Severe side effects. The data available from the patients, clinical experts and pharmacists will be categorized and classifier is constructed to predict categorical labels such as common or severe. Classification and prediction can be compared by factors like accuracy, speed, robustness, scalability and interpretability. Bayesian classification is a well known approach for data classification. Bayesian provides practical learning algorithm which are very useful in practical real life aspects. Bayesian combines prior obtained knowledge and recently observed data. It is a model based approach which offers to generate useful conceptual framework. As the Bayesian follows probabilistic model specification, any sequences or objects can be classified. Bayesian classifier is comparable with other techniques like decision tree. Bayesian classifiers are useful when dealing with large databases with high accuracy and speed[13]. Nave Bayesian classifier assumes that the effect of an attribute value on a given class is not dependent of the values of the other attributes. This assumption is known as class conditional independence. Bayesian classifiers, allow the representation of dependencies among subsets of attributes. Following is the Bayes Theorem.
3.1 Bayes Rule Bayes theorem is named after Thomas Bayes. It is a theorem with two different interpretations. 1. Bayesian interpretation 2. Frequentist interpretation The first interprets in a way which makes clear that how a subjective degree of belief should rationally change to evidence. The second one interprets in a way which relates inverse representation of the probabilities. In the Baysian interpretation, Bayes theorem is based on statistics and can be applied to various fields like science, engineering, micro economics, game theory, medicine and law.
www.ijcait.com International Journal of Computer Applications & Information Technology Vol. II, Issue I, January 2013 (ISSN: 2278-7720)
P a g e | 4
Table 1: To keep the drug in market or to withdraw
The formula provided by Bayesian and known as Bayes rule is as under:
In the Bayes rule: d = data h = hypothesis (model) rearranging p(h|d)P(d) = P(d|h) P(h) P(d|h) = P(d|h) The joined probability on both the sides. What indicates what in Bayes rule: P(h): Probability of hypothesis h before seeing any data P(d|h): Probability of the data if the hypothesis h is true P(d): Marginal probability of the data P(h|d): probability of hypothesis after seeing the data The bayesian method can be illustrated more properly by an example. Lets assume that there is one drug say A. After introducing it to the market, some unexpected and harmful results are coming out. In this situation, the data from health experts, patients and pharmacists wil be collected by considering the side effects. The data collected is shown in Table 1. The ultimate goal after refering table is to look at the probability of side effects and compute it according to Bayesian rule. Based on the computed result, the decision will be taken by medical authorities whether to continue the drug in the society or to declare it harmful and stop its usage. By looking at the table, we can estimate that the probability of drug to be withdrawn from the market is more than to be kept in the market. P(w) = 7/10 = 0.7 P(u) = 3/10 = 0.3 Where w = withdrawing the drug and u = use the drug Looking at the result, we can come to a conclusion that it is preferable to withdraw the drug from the market as 7 out of 10 consumers or patients faces danger to the health.
4. CONCLUSION Phamaceutical industry is the area which is directly related to the human beings life. It produces disadvantages as much as it produces benefits. Any drug may react in adverse way varying from person to person. Pharmacovigilance experts give over extensive effort to
post marketing observation of adverse drug reactions. Based on the observation and its data, they use data mining technique to find the hidden fact and takes decision accordingly.The Bayesian Rule is somewhat more efficient, fast and useful for pharmacovigilance.
5. REFERENCES [1] haiweb.org/19072009/19July2009FactsheetTheEurope anCommission'sProposalforaPharmacovigilanceDirective. pdf [2] who.int/medicines/areas/quality_safety/safety_efficac y/ S.AfricaDraftGuidelines.pdf [3] Dhikav, V., Singh, S. 2004 Adverse drug reactions monitoring in india, Journal, Indian academy of clinical medicine vol. 5, no.1 28-33 [4] Bates, DW., Spell, N., Cullen, DJ et al. JAMA 1997, The costs of adverse drug reactions in hospitalised patients.; 277: 301-07. [5] Leape, LL. Errors in medicine. JAMA 1994; 272: 1851-7. [6] Bates, DW., Collen, DJ., laird, N et al. JAMA 1997 Incidence of adverse drug events and potential adverse drug events in hospitalized patients.; 277: 307-11. [7] Mara, S. P., Alberto, S., Vctor, R., Pilar, H., Jose M ,P. 2007 Design and implementation of a data mining grid- aware architecture, Future Generation Computer Systems 23 4247 [8] Feelders, A., Daniels, H., Holsheimer, M. 2000 Briefings Methodological and practical aspects of data mining Information & Management 37 271-281 [9] Hand, D. J., 1998, Data mining: statistics and more? The American Statistician 52 (2), pp. 112-118. [10] Glymour, C., Madigan, D., Pregibon, D., Smyth, P., 1997 Statistical themes and lessons for data mining, Data Mining and Knowledge Discovery 1, pp. 11 - 28. [11] Subramanian, A., Smith, L. D., Nelson, A. C., Campbell, J.F., Bird, D. A., 1997 Strategic planning for data warehousing, Information and Management 33, pp. 99-113. [12] Friedman, J.H., Fisher, N. I., 1999 Bump hunting in high-dimensional data, Statistics and Computing 9 (2), pp. 123 - 143. 13]Jiawei, H., Micheline, K., 2011 Data mining concepts and techniques,
) ( ) ( ) | ( ) | ( d P h P h d P d h p Common adverse effects Severe adverse effects Patient Constipation Itching Vomiting Diabetes fluctuation Chest pain Blurred vision Should the drug be used? 1 YES NO NO NO NO NO YES 2 YES YES NO YES NO NO NO 3 YES YES YES NO NO NO YES 4 NO NO NO NO YES YES NO 5 NO YES NO NO NO YES NO 6 YES NO YES YES YES NO NO 7 NO YES NO NO YES NO NO 8 NO NO YES YES NO NO NO 9 NO NO YES NO NO NO YES 10 YES YES YES YES YES YES NO
Python For Data Analysis - The Ultimate Beginner's Guide To Learn Programming in Python For Data Science With Pandas and NumPy, Master Statistical Analysis, and Visualization (2020)