Sie sind auf Seite 1von 10

Case Study: How to Apply Data Mining Techniques in a Healthcare Data Warehouse

Michael Silver, MD, FACP FCCP, FCCM; Taiki Sakata; , Hua-Ching Su, MS; Charles Herman; Steven B. Dolins, PhD; Michael J. OShea
ABSTRACT Healthcare provider organizations are faced with a rising number of nancial pressures. Both administrators and physicians need help analyzing large numbers of clinical and nancial data when making decisions. To assist them, Rush-PresbyterianSt. Lukes Medical Center and Hitachi America, Ltd. (HAL), Inc., have partnered to build an enterprise data warehouse and perform a series of case study analyses. This article focuses on one analysis, which was performed by a team of physicians and computer science researchers, using a commercially available on-line analytical processing (OLAP) tool in conjunction with proprietary data mining techniques developed by HAL researchers. The initial objective of the analysis was to discover how to use data mining techniques to make business decisions that can inuence cost, revenue, and operational efciency while maintaining a high level of care. Another objective was to understand how to apply these techniques appropriately and to nd a repeatable method for analyzing data and nding business insights. The process used to identify opportunities and effect changes is described. KEYWORDS Data mining On-line analytical processing tool (OLAP) Data warehouse Business process improvement
Note: The authors would like to thank Pat Skarulis, chief information ofcer at RushPresbyterianSt. Lukes Medical Center, Yoichi Shintani, vice president at Hitachi America, Ltd., and Bob Kero, chief of business development at Hitachi America, Ltd., for providing guidance for this research. Thanks to Shinji Fujiwara and Arti Denterlein, our colleagues at Hitachi America, Ltd., for setting up the case study environment.
JOURNAL OF HEALTHCARE INFORMATION MANAGEMENT, vol. 15, no. 2, Summer 2001 Healthcare Information Management Systems Society and Jossey-Bass, A Publishing Unit of John Wiley & Sons, Inc.

155

156

Silver, Sakata, Su, Herman, Dolins, OShea

Healthcare provider organizations are faced with a rising number of nancial pressures: payer reimbursements that are not covering costs, uninsured patients who are provided care at low or no reimbursement, increased labor costs, decreased admissions, and so on. Both administrators and physicians need help analyzing clinical and nancial data when making decisions.1,2 To assist administrators and physicians, Rush-PresbyterianSt. Lukes Medical Center and Hitachi America, Ltd. (HAL), Inc., have partnered to build an enterprise data warehouse and to perform a series of case study analyses; this article focuses on one analysis.

Background of the Case Study


OLAP is a technique used to analyze databases. A number of commercially available products have been built to support this functionality; examples are Cognos Enterprises OLAP and PowerPlay, Business Objects Inc.s Business Objects, Informixs MetaCube, Platinums InfoBeacon, MicroStrategys DSS Agent, and Oracles Express. All these products offer similar functionality. OLAP typically includes the following kinds of analyses: simple (view one or more measures that can be sorted and totaled), comparison (view one measure and sort or total based on two dimensions), trend (view measure over time), variance (compare one measure at different times such as sales and sales a year ago), and ranking (top 10 or bottom 10 products sold).3 OLAP enables users to drill down within a dimension to see more detailed data at various levels of aggregation. Users can also lter data, that is, focus their analysis on a subset of records in the database. For example, if a user is interacting with a retail chain store database, then he or she may only be interested in West Coast stores. Users need to know for which attribute or attributes they want to set up lter conditions. Users also need to know how to dene the ltering conditions; OLAP enables users to lter records based only on arithmetic conditions on one or more database attributes or a where clause in a SQL statement. For the case study we used Microsoft OLAP Services for the multidimensional database server and Knosys ProClarity to do the reporting, that is, to display grids and graph. Typical problems that data mining addresses are how to classify data, cluster data, nd associations between data items, and perform time series analysis. Numerous data mining techniques have been invented for each type of problem.4,5 Each problem requires data mining techniques to analyze large quantities of data. Two techniques for data mining were used: patient rule induction method (PRIM)6,7 and weighted item sets (WIS), a type of association rule technique. PRIM and WIS are described next. PRIM. PRIM is a technique that does not fall exactly into one of the business problem categories listed earlier. PRIM nds the optimal region, that is, a

Case Study: How to Apply Data Mining Techniques

157

subset of data points with the highest average value, given a set of input attributes and a minimum size of the region specied by the user. Data records contain input variables and an output variable (variables are record attributes or derived attributes, and the output variable must be a measure). A records location in a dimensional space is based on the value of its attributes, for example, attending physician, payer, and LOS in a hospital database. PRIM nds regions where the output variable has a high average value compared to the average value for the entire set of records. PRIM could also be used to nd regions with minimum average value by maximizing the negative values of the output variable. WIS. WIS is an association rule tool that nds relationships between various attributes in a database; some of the attributes can be derived measures. The relationships are dened in terms of if-then rules that show the frequency of records appearing in the database that satises the rule. For example, ninety out of one hundred patients in the database with DRG 999 have a length of stay greater than or equal to ten days. Data mining and data warehousing are becoming more prevalent in the healthcare industry because of the large quantities of data stored in various systems at medical institutions and the number of business decisions made based on the data.8,9,10

Identify Cost and Revenue Opportunities Using Data Mining


The objective of the analysis was to discover interesting and unexpected business insights through the application of data mining techniques; subsequently, these insights can be used to make business decisions that can inuence cost, revenue, and operational efciency while maintaining a high level of care. We investigated this business problem from different levels of abstraction: the entire enterprise, department or line of business, and DRG level. For the analysis we are describing, we looked at one specic DRG: Medicare and Medicaid inpatientsa population the institution wanted to study. The case study analysis was performed by a cross-functional team consisting of one physician, several computer science researchers, and one IT project manager. The physician provided the clinical expertise required to analyze results; the computer science researchers applied tools and techniques they had developed; and the IT project manager provided expertise in the hospitals patient accounting system. All team members helped formulate the business problem(s). At the DRG level we looked at DRGs that were the most and least protable in the institution, solely for Medicare and Medicaid patients. We asked computer science researchers to apply the tools they had developed rather than asking business analysts to apply them. We did this because data analysis tools can perform sophisticated analyses, and their potential is enormous, but they can be difcult to apply to business problems with

158

Silver, Sakata, Su, Herman, Dolins, OShea

numerous, complicated, and interdependent factors. For example, when do you apply OLAP, an association rule tool? This was an important decision that led to our better understanding of how to apply these techniques appropriately and to nd repeatable methods for analyzing data and nding business insights. The data mining tools and the OLAP tool were evaluated, and we attempted to use the strength of each tool. Both PRIM and WIS are capable of analyzing large numbers of records in a database. PRIM is an algorithm for solving global optimization problems. PRIM can process many dimensions simultaneously when nding the best region, that is, a subset of data points with a high average value for an output variable. A SQL statement or rule can represent this region. WIS is useful for nding patterns or associations between attributes. It does not nd optimized regions. For WIS the patterns are represented by rules; each rule describes a region that consists of data points satisfying the rules conditions. The results of neither PRIM nor WIS are easy for users to evaluate. A user cannot easily look at a SQL statement describing a PRIM region and intuitively understand the differences between the high average region and the outer region. A user can look at a WIS rule or pattern and understand the attributes and values. However, some difference between the rules region and the outer region may be missed, which could offer an explanation of the meaning of the pattern. OLAP tools cannot discover high average regions or nd new patterns in data. OLAP does allow users to drill down into detail, once a data area of focus is identied, and then lets the user visualize the result on various dimensions effectively. This means the tools can be used to complement one another. For example, PRIM nds an optimized region (a subset of data points), then OLAP can graphically display aggregated values for various dimensions for the region and points outside the region, that is, the outer region. For WIS, the algorithm nds an association rule (a subset of data points) by looking at all combinations of attributes; OLAP can then display data graphically for both the region and outer region. In WISs case the region is not an optimized region but a region made of records satisfying the criteria in the rule. We used PRIM and WIS to nd regions. In essence we ran PRIM and WIS on the entire inpatient record set: inpatients in a department and inpatients with a specic DRG. We experimented with parameter settings so that the algorithms would run efciently and nd the most accurate results. For example, for PRIM we needed to select an alpha value, which controls how fast the algorithm nds a region, and beta value, which constrains the size of the region. Based on the data mining results, we used OLAP to compare data points in a region to data points outside a region. We could run an OLAP report on all dimensions, for example, on all input variables used in PRIM. Even new measures could be dened for the OLAP analysis, for example, day of week of admission. Based on the OLAP results, further OLAP reports can be run to drill down on interesting dimensions.

Case Study: How to Apply Data Mining Techniques

159

Case Study: DRG-Level Analysis


We selected an unprotable DRG that had approximately 426 inpatient visits during a one-year period; we only included Medicare and Medicaid patient visits. This total number of patient visits does not include a small number of inpatient visits for which payment was not yet received at the time of the study. PRIM was executed using loss as the output variable. Over fteen inpatient attributes were used for input variables. After running PRIM, the region consisted of sixty-four inpatient visits. This makes up 15 percent of the inpatient visits. However, these visits made up more than half of the total losses associated with these inpatient visits. The average loss associated with the inpatient visits in the region was seven times larger than the inpatient visits outside the region, that is, the outer region. The average length of stay for the inpatient visits in the region was two times larger than the outer region. This is shown in Figure 1. After PRIM successfully found the high average region, we wanted to compare the high average region with the outer region. We wanted to know why these patient visits had greater losses.

Figure 1. DRG Analysis Using a Region Found by PRIM: Medicare and Medicaid Data for One Year
Visit, Loss, Average Loss, Average Age, Average LOS (Measures Level) Visit Loss Average Loss

Average Age

Average LOS Box 0 Outer 0

Values Displayed Box 0 Outer 0

Visit 64 362

Loss $1,361,823.38 $1,293,739.26

Average Loss $21,278.49 $3,573.87

Average Age 67.0 74.2

Average LOS 26.8 12.5

160

Silver, Sakata, Su, Herman, Dolins, OShea

Numerous OLAP reports were run on the attributes, for example, nancial class, marital status, and age. The report on nancial class broke down the losses by the following categories: Medicare-Exempt Rhab/Psych/SNF, MedicareNonexempt, and Medicaid. Medicare-Exempt Rhab/Psych/SNF had large and comparable losses in the region and outside the region, but the average loss was signicantly larger in the regionalmost seven times larger. The report on age showed that inpatients between the ages of forty-six and sixty-four had signicantly larger losses than the rest of the patients in the region. This is shown in Figure 2. They also had signicantly larger losses than the patients outside the region. Based on these results, it was decided to better understand why patients between the ages of forty-six and sixty-four had poor nancial performance. A follow-up OLAP analysis was performed that investigated admission source. The analysis revealed that for inpatient visits in the region with MedicareNonexempt, patients admitted via routine admission had the highest average loss; for inpatient visits in the region with Medicare-Exempt Rhab/Psych/SNF , patients admitted via routine admission had the most visits. We further investigated Medicare-Exempt Rhab/Psych/SNF inpatients with a routine admission source. For this subset of inpatient visits inside the region and outside the region, that is, patients between the ages of forty-six and sixty-four and who entered via routine admission and whose payer is Medicare-Exempt Rhab/Psych/SNF , we ran OLAP reports on admission day of week, icd-9 procedure, icd-9 diagnosis, and surgeon department. We discovered that on Tuesdays, the average loss is signicantly greater for patients in the region. Patients in the region have two times the average loss of patients in the region admitted on days of the week other than Tuesday. The difference between patients in the region and patients outside the region is even more dramatic. Although the absolute number of patients is small, and differences may not be statistically signicant, we believe that this approach will be useful for high-volume cases. Examination of Tuesdays admitting physicians revealed that several of the physicians were in the same medical specialty. This specialty cared for patients that typically required a high level of service intensity over a long period of time. The identication of a subset of patients with disproportionately high costs has prompted the institution to reevaluate its admission criteria to this unit.

Advantages and Limitations of a Repeatable Methodology


By using the strength of each tool, we were able to take advantage of the complex, sophisticated algorithms of the data mining techniques and then more easily visualize the results using OLAP. This is important for several reasons, which we discuss next.

Case Study: How to Apply Data Mining Techniques

161

Figure 2. DRG Analysis Using a Region Found by PRIM: Medicare and Medicaid Data for One Year
Average Loss 100 90 80 70 60 50 40 30 20 10 0 1945 4664 Age Description and All Age Dimensions Box 0/Medicare-Exempt Rhab/Psych/SNF Box 0/Medicare-Nonexempt Outer 0/Medicaid Outer 0/Medicare-Exempt Rhab/Psych/SNF Outer 0/Medicare-Nonexempt 65120

Values Displayed Average Loss 1945 4664 65120 Medicare-Exempt Rhab/Psych/SNF $18,543.05 $26,644.60 $17,785.52 Medicare-Nonexempt $63,989.42 $14,162.33 $4,013.35 $4,323.30 Medicaid $5,606.27 $3,481.69 Medicare-Exempt Rhab/Psych/SNF $3,877.73 Medicare-Nonexempt $929.36

Box 0 Outer 0

In order to effect change we need to identify opportunities, that is, either large nancial losses that can be prevented, large nancial successes that can be identied, or moderate nancial success that should be promoted. Some action(s) must be taken. Once these opportunities are identied and action(s) taken, then we need to be able to measure change over timeto rerun these tools in the same manner repeatedly over time so that we can measure and

162

Silver, Sakata, Su, Herman, Dolins, OShea

determine whether the action(s) taken have the desired effect. This is one reason that developing a repeatable methodology is important. Several case studies run at Rush-PresbyterianSt. Lukes have resulted in senior management reviewing identied issues and opportunities. In several other studies, the ndings provided additional support for actions being considered by the institution. For the case study described in this article, actions are being considered but have not yet been implemented. One observation that should have been anticipated but was not is the effect on the ow of the information with the introduction of this system. Historically, information has owed unidirectionally from Physician to Medical Record Department to Billing or Physician to Patient to Cost Center Manager to Billing. With the introduction of this tool, we have already seen physicians providing valuable feedback to the medical records department regarding how specic patient services are coded. This has created additional opportunities and challenges for the organization. A second reason for developing a repeatable methodology is to be able to semi-automate the analysesthat is, to run thorough, critical evaluations of the entire institution, departments, or DRGs on demand. We believe we are developing a process that will allow managers who are not skilled in data mining techniques to view their business units data in a format that allows them to use their domain expertise to ask additional questions or develop defensible arguments for change. This approach should be valid at the managers level, the Department of Medicine level, or the level of the charge nurse in a patient care area. A third reason for developing the repeatable methodology is so that we can easily apply these tools to other institutions and possibly other industries. However, the method still depends on human expertise to understand the OLAP reports, make inferences, run more reports, and take action(s); it is not fully automated.

Conclusion
A cross-functional team was formed that included clinical, nancial, and technical expertise. Business problems were formulated at different levels of abstraction in order to identify nancial opportunities. We had a set of tools that we developed and bought, and we tried executing these tools in various combinations with one another. We eventually came up with a repeatable methodology. We discovered that we can apply PRIM and WIS to nd optimized regions and patterns, respectively, by performing complex algorithmic steps. And then after running those tools, we could take a region and points outside the region and compare them, using OLAP, which allows us to compare the region and outer region attribute by attribute. In other words, standard OLAP reports can be run in which each dimension or attribute (payer, admission source, patient age, physician

Case Study: How to Apply Data Mining Techniques

163

perspectives, marital status, and so on) can be described. If any of the reports describing an attribute is interesting, then a set of follow-up, detailed OLAP reports can be run. We showed how to apply the methodology to an analysis of one DRG. We explained each step and the results of running each step; we illustrated how this method helped turn data into knowledge. Results were presented in a graphically appealing way that helped users determine how to use the information to make decisions. References
1. Schneider, P. How Do You Measure Success? Healthcare Informatics, 1998, 15(3), 4556. 2. Rosenstein, A. Inpatient Clinical Decision-Support Systems: Determining the ROI. Healthcare Financial Management, Feb. 1999, pp. 5155. 3. Peterson, T., Pinkelman, J., and Pfeiff, B. Microsoft OLAP Unleashed. Indianapolis, Ind.: SAMS/McMillian USA, 1999. 4. Brachman, R., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., and Simoudis, E. Mining Business Databases. Communications of the ACM, 1996, 39(11), 4248. 5. Brin, S., Motwani, R., Ullman, J., and Tsur, S. Dynamic Itemset Counting and Implication Rules for Market Basket Data. Paper presented at the SIGMOD Conference, Tucson, Ariz., 1997. 6. Srivastava, A., and Singh, V. Deriving Interpretable Rules for Financial Outliers in Rush Hospital Data. Unpublished internal technical report, Hitachi America, Ltd. 7. Friedman, J., and Fisher, N. Bump Hunting in High-Dimensional Data. Statistics and Computing, 1999, 9(2), 123143. 8. Scheese, R. Data Warehousing as a Healthcare Business Solution. Healthcare Financial Management, Feb. 1998, pp. 5659. 9. Borok, L. Data Mining: Sophisticated Forms of Managed Care Modeling Through Articial Intelligence. Journal of Health Care Finance, 1997, 23(3), 2036. 10. Herr, W. The Benets of Data Integration: HFMA Study Findings. Healthcare Financial Management, Sept. 1996, pp. 5256.

About the Authors


Michael Silver, MD, FACP, FCCP, FCCM, is associate professor of medicine at Rush Medical College, associate director of the Section of Pulmonary and Critical Care Medicine at Rush, and vice president of medical affairs at Oak Park Hospital. He is board certied in internal medicine, pulmonary medicine, and critical care medicine. Taiki Sakata has worked for Hitachi for eight years. He is currently a researcher at the Information Technology Laboratory at Hitachi America, Ltd. His technical interests are computer network architecture, data warehouses, and OLAP. Hua-Ching Su received a BS and MS in computer science and has worked on various software technology and research projects. She is currently a senior software engineer at the Information Technology Laboratory at Hitachi America, Ltd. Charles Herman is senior researcher at the Information Technology Laboratory at Hitachi America, Ltd.

164

Silver, Sakata, Su, Herman, Dolins, OShea

Steven B. Dolins received his BS in physics and MS in computer science from Tulane University, and his PhD in computer science from the University of Texas, Arlington. He has worked on semiconductor, military, and consumer packaged goods applications for fteen years and is currently chief researcher at the Information Technology Laboratory at Hitachi America, Ltd. Michael J. OShea is an IS project manager for Rush-PresbyterianSt. Lukes Medical Center. He has project management responsibility for the development and implementation of a data warehousing project, is responsible for the patient accounting system, and acts as liaison between the nancial management group and the IS department.

Das könnte Ihnen auch gefallen