Beruflich Dokumente
Kultur Dokumente
6 ISSN: 1837-7823
1. Introduction
In modern society, people care about their privacy issues increasingly more with the development of information technology. In hospital, patients will have their own medical records stored in the computer, so that biomedical scientists can use this information to do some research. These records will include the medical history of patients such as laboratory test results and medications prescribed. In order to prevent the leak of personal electronic health record, the federal Health Insurance Portability and Accountability Act (HIPAA) has set a national standard to protect privacy of this kind of information. Since the explosive growth of medical research in recent years, biomedical scientists have come up with the idea of using these electronic medical data for incorporate research. However, the privacy and security issue still has been the most concerned thing that impedes such kind of incorporate research. For this reason, with the development of information and cryptograph technology, there is a trend that using computer methods and programs to help medical scientists to solve the privacy issue without revealing patients information to others. Survival analysis is also called time to event analysis. Survival analysis is very useful for studying different kinds of event like disease onset, earthquakes, stock market crash [1]. Survival analysis can be used to predict after observing a set of individuals at some specifically time point and continuous monitoring them for fixed intervals of time. Therefore, how to build a survival analysis model is the most critical component to get a better prediction. In biomedical field, survival analysis mainly means observing time to death of experimental subject. Obviously, If having more experiment data that we used for training we can get a more precise model. Therefore, biomedical researchers want to combine the data from different institutes to build a better survival analysis model, especially survival function comparison models [2]. For the privacy and security issues, computer scientist can use privacy preserving method to protect the data from revealing to anyone. In order to compare the survival curves without revealing the data, [2] has come up with a privacy preserving model that can protect the data privacy. However it is a tedious process to compare survival curves step by step. In medical area, Microsoft Excel is widely used due to its friendly user-interface and easy operation. Compared with other statistical computing softwares like SAS and SPSS etc, although most of these softwares have a strong data management ability, the usage of them will be complicated for medical people who has not been training professionally. Microsoft Excel has been widely applied in Medical institutes no matter it is used for store experimental data or creates survival curves. It can help medical scientists to analyse and make better decisions. Besides these, Microsoft Excel has a strong ability to let VBA (Visual Basic for Applications) or Macro develop programs to control Excel. Therefore, most of biomedical scientists are more willing to use Microsoft Excel to store the data that get from the experiment. Consequently, many scientists have developed programs which can apply to Microsoft Excel immediately and automatically. In [3], Hitoshi Sato presented a package of macro programs (named PK MOMENT) to automatically calculate non-compartmental pharmacokinetic parameters on Microsoft Excel spreadsheet. In [4], Zhang presents PKSolver, a freely available menu-driven add-in program for Microsoft Excel written in Visual Basic for Applications (VBA), for solving basic problems in pharmacokinetic (PK) and pharmacodynamic (PD) data analysis. In [5], Brown presented a simple, easily understood methodology for solving biologically based models using a Microsoft Excel spreadsheet. In [6], a user-friendly, inexpensive EXCEL-based program to find potential phosphorylation sites in proteins is presented. In this paper, we develop a user-friendly, cloud based Microsoft Excel privacy-preserving program, named Scorpio, for incorporation of electronic health care using privacy preserving logrank test model. Since the * This paper was supported by NSF CNS-0845149 and CCF-0915374. Part of the results were presented at [UNESST 2012]
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 program does not require any programming skills or any use of VBA or Macro language. Once the data from all institutes are ready, the program can be run automatically. In the rest of this paper, we describe the method of creating privacy preserving comparison test of survival curves, especially data store and collection method as well as the design and implementation of our program.
2. Methods
Logrank test is a standard comparison test of survival curves. When a research institute wants to raise a computation for logrank test, he needs to collect data from different medical institutes. However, some medical data are very sensitive. How to compute the logrank test without revealing these data to other people who does not own is a big issue. In [2], the authors have come up with a privacy preserving secure sum method which generate an initial random number and add it to the first medical institutes data. Here, we introduce their method briefly. They suppose there are n groups of individuals.
: the number of individuals that are alive in group k at the beginning of time interval j. : the number of events occurring in group k in interval j. : the number of observed deaths in group k. : expected number of deaths in group k.
The final Z is the logrank test result. A smaller Z indicates that the hypothesis has a higher probability that is true. In [2], the authors assume there are s parties (s > 3) involved in this logrank test computation. They provided a privacy preserving method that let the first institute who participate this survival analysis and . computation add a random number to its data. The range of the random number should as same as Then pass it to the next participant. Similarly, every other participant adds its local value to the sums that it receives and sends the new sums to the next party. Finally, the first institute can get the sum and calculate the and are hidden logrank test with the random number he already knows. In this process, actual values of behind the random numbers [2]. Based on this privacy preserving model, we design a program that can automatically collect data from each participate medical institute and add these data to the initial file immediately. After collecting all the data, the program then calculate the quotient of the number of events occurring divided by the number of individuals that are alive in each interval. Then each medical institute can get the value automatically. After that each institute can calculate the expected number of deaths and logrank test statistic automatically. Then we let the program repeat the method again that add another random number to the first medical institutes logrank test result and add up all these result. Then first institute who rise up the comparison can get the final logrank test statistic and inform all other participants. 37
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 Specifically we use cloud-based storage to collect the data from each institute. Cloud-based storage can let everybody who has the permission reach the file from anywhere. In this part, as shown in figure 1, we first let party 1 add a random number on its data and upload the file into the server, then party 2 download the file and add its own data on the existing data, then upload the file to the server. Go on like this until the last party done. Therefore the first party can get the sum of actual data after minus the random number. After that program can automatically call Microsoft Excel Macro we developed to calculate the value we need. After that party 1 can get the final logrank test statistic result and let other participated institutes know.
38
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823
Figure 2: The program user interface for privacy preserving logrank test
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,20)" Range("I1").Select ActiveCell.FormulaR1C1 = "Random Number for n" Range("H1").Select Selection.Copy Range("I2:I12").Select Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks :=False, Transpose:=False Range("H2").Select Application.CutCopyMode = False ActiveCell.FormulaR1C1 = "" Range("H1").Select ActiveCell.FormulaR1C1 = "nj+RNN" Range("H2").Select ActiveCell.FormulaR1C1 = "=RC[-5]+RC[1]" Selection.AutoFill Destination:=Range("H2:H12"), Type:=xlFillDefault End Sub Compute Ek Sub ComputeE() Range("M1").Select Application.CutCopyMode = False ActiveCell.FormulaR1C1 = "E" Range("M2").Select ActiveCell.FormulaR1C1 = "=R[-1]C[-10]*R[-1]C[-2]" Range("M2").Select ActiveCell.FormulaR1C1 = "=RC[-10]*RC[-2]" Range("M2").Select Selection.AutoFill Destination:=Range("M2:M12"), Type:=xlFillDefault End Sub
40
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823
Figure 3: Original data owned by each institute which should be keep confidential from revealing to other parties
6. Conclusion
In this paper, we have designed a Microsoft Excel Macro based privacy preserving program for survival curves comparison using logrank test. In order to make it easy to use and protect the data privacy, the program can be applied to Microsoft Excel immediately which is widely used by clinics and biomedical scientists. The program also can protect privacy of the data by adding random number to the original data. Experiments on the real medical data have shown the effectiveness of our proposed program.
References
[1] Allison, P.D. (2010) Survival analysis using SAS: A practical guide, SAS publishing. [2] Chen, T. and Zhong, S (2011) Privacy-Preserving Models for Comparing Survival Curves Using the Logrank Test, Computer methods and programs in biomedicine. [3] Sato, H. and Sato, S. and Wang, Y.M. and Horikoshi, I. (1996) Add-in macros for rapid and versatile calculation of non-compartmental pharmacokinetic parameters on Microsoft Excel spreadsheets., Computer methods and programs in biomedicine.50,1,43-52. [4] Zhang, Y. and Huo, M. and Zhou, J. and Xie, S.(2010) PKSolver: An add-in program for pharmacokinetic and pharmacodynamic data analysis in Microsoft Excel. Computer methods and programs in biomedicine. 99,3,306-314. [5] Brown, M. (1999) A methodology for simulating biological systems using Microsoft Excel. Computer methods and programs in biomedicine. 58,2,181-190 [6] Wera, S. (1998): An EXCEL-based method to search for potential Ser/Thr-phosphorylation sites in proteins. Computer methods and programs in biomedicine. 58,1,65-68 [7] Li, Y and Zhong, S. (2012) Scorpio: A simple, convenient, Microsoft Excel Macro based program for privacy-preserving logrank test. Computer Applications for Database, Education, and Ubiquitous Computing. 86-91
41