Analytics Magzine

h t t p : / / w w w. a n a ly t i c s - m a g a z i n e .
c o m
Driving Better Business Decisions
m ay /j u n e 2011
BANkINg ON BEttEr DAyS

POSt-crISIS ANALySIS: crEDIt rISk MANAgEMENt LESSONS LEArNED thE hArD wAy
ALSO INSIDE:
Behavior Segmentation: Five best practices Data Mining Survey: trends and new insights Sports Law Analytics: high-stakes litigation Simulation Frameworks: Keys to dashboarding
executive edge michael Kubica of applied Quantitative sciences: simulation in strategic forecasting
i n s i De sto ry
Failure to communicate
When confronted with complex business analytics problems that beg for mathematical modeling, the reactive first response is, Show me the data. However, based on one of the recurring themes that came out of the recent INFORMS Conference on Business Analytics & Operations Research held in Chicago, the proper first response is, Tell me about your business. After all, how can you solve someones business problem if you dont first thoroughly understand their business and all of the behind-the-scenes issues, constraints and personality conflicts that will ultimately and perhaps surprisingly impact the outcome of the project and implementation of the solution? Expert interviewing, as its sometimes referred to, is the unappreciated yet critical art of ascertaining client information up front that will often determine the success or failure of an analytics project. Who hasnt had a promising project and/or elegant solution scuttled or never applied because someone forgot to ask the right questions early on?
1 |
I was reminded of this basic tenet of analytical problem-solving on my way home from the Chicago conference. A fellow attendee and I, over lunch, determined that we both had 7 p.m. flights home that evening. We agreed to share a taxi to the airport, a more efficient mode of transportation and a means of saving us both a few bucks. We met in the hotel lobby at the appointed hour as planned, climbed into a taxi, and the driver asked where we were going. I said, Midway, at the exact moment my fellow attendee said, OHare. We had never asked each other during our initial conversation which airport we were flying from. We scrambled out of the taxi and looked around for other options, our original efficient plan doomed because of a classic failure to communicate. By the way, the conference was terrific, from UPS Vice President Chuck Hollands powerful opening plenary to the Oscar-like Edelman Award gala. See you at the 2012 event April 15-17 in Huntington Beach, Calif.
peter horner, eDitor horner@ lionhrtpub.com
a n a ly t i c s - m a g a z i n e . c o m
C o n t e n t s
DRIVING BETTER BUSINESS DECISIONS
mAy/june 2 011
Brought to you by
FEAturES
11 economically caliBrateD moDels By Andrew Jennings and Carolyn Wang Lessons learned the hard way: the secret to better credit risk management. 16 16 risK in revenue management By Param Singh Acknowledging risks existence and knowing how to minimize it.
20 Behavioral segmentation By Talha Omer Five best practices in making embedded segmentation highly relevant.
24
24 unDerstanDing Data miners By Karl Rexer, Heather N. Allen and Paul Gearan Data miner survey examines trends and reveals new insights.
29 sports law analytics By Ryan M. Rodenberg and Anastasios Kaburakis Analytics proving to be dispositive in high-stakes sports industry litigation.
33 simulation FrameworKs By Zubin Dowlaty, Subir Mansukhani and Keshav Athreya the key to building succssful dashboards for displaying, deploying metrics. 29
2 | a n a ly t i c s - m a g a z i n e . c o m a n a ly t i c s | Winter 2008
DRIVING BETTER BUSINESS DECISIONS
Brought to you by
RegisteR foR a fRee subscRiption: http://analytics.informs.org infoRMs boaRd of diRectoRs President President-Elect Past President Secretary Treasurer Vice President-Meetings Vice President-Publications Vice PresidentSections and Societies Vice PresidentInformation Technology Vice President-Practice Activities Vice President-International Activities Vice President-Membership and Professional Recognition Vice President-Education Vice PresidentMarketing and Outreach Vice President-Chapters/Fora Rina Schneur, Verizon Network & Tech. Terry P. Harrison, Penn State University Susan L. Albin, Rutgers University Anton J. Kleywegt, Georgia Tech Nicholas G. Hall, The Ohio State University William Klimack, Decision Strategies, Inc. Linda Argote, Carnegie Mellon University Barrett Thomas, University of Iowa Bjarni Kristjansson, Maximal Software Jack Levis, UPS Jionghua Judy Jin, Univ. of Michigan Ozlem Ergun, Georgia Tech Joel Sokol, Georgia Tech Anne G. Robinson, Cisco Systems Inc. Stefan Karisch, Jeppesen
37
DEPArtMENtS
1 inside story
By Peter Horner
4 executive edge 6 profit center
By Michael Kubica
infoRMs offices www.informs.org Tel: 1-800-4INFORMS Executive Director Melissa Moore Marketing Director Gary Bennett Communications Director Barry List Corporate, Member, INFORMS (Maryland) Publications and 7240 Parkway Drive, Suite 300 Subdivision Services Hanover, MD 21076 USA Tel.: 443.757.3500 E-mail: informs@informs.org Meetings Services INFORMS (Rhode Island) 12 Breakneck Hill Road, Suite 102 Lincoln, RI 02865 USA Tel.: 401.722.2595 E-mail: meetings@informs.org analytics editoRial and adveRtising Lionheart Publishing Inc., 506 Roswell Street, Suite 220, Marietta, GA 30060 USA Tel.: 770.431.0867 Fax: 770.432.6969 President & Advertising Sales John Llewellyn Llewellyn@lionhrtpub.com Tel.: 770.431.0867, ext.209 Editor Peter R. Horner horner@lionhrtpub.com Tel.: 770.587.3172 Art Director Lindsay Sport lindsay@lionhrtpub.com Tel.: 770.431.0867, ext.223 Advertising Sales (A-K) Aileen Kronke aileen@lionhrtpub.com Tel.: 770.431.0867, ext.212 Advertising Sales (L-Z) Sharon Baker sharonb@lionhrtpub.com Tel.: 813.852.9942
By E. Andrew Boyd
8 analyze this! 10 newsmakers
By Vijay Mehrotra
37 corporate profile
By Chris Holliday
41 the Five-minute analyst

By Harrison Schramm
42 thinking analytically
By John Toczek
Analytics (ISSN 1938-1697) is published six times a year by the Institute for Operations Research and the Management Sciences (INFORMS). For a free subscription, register at http:// analytics.informs.org. Address other correspondence to the editor, Peter Horner, horner@lionhrtpub.com. The opinions expressed in Analytics are those of the authors, and do not necessarily reflect the opinions of INFORMS, its officers, Lionheart Publishing Inc. or the editorial staff of Analytics. Analytics copyright 2011 by the Institute for Operations Research and the Management Sciences. All rights reserved.
executive eDge
executive briefing on simulation in strategic forecasting

simulation forecasts have several important advantages over single point estimates.
Any strategic forecast is by definition a representation of the future. Understanding that a single estimate of the future is not truly representative, Monte Carlo simulation is a powerful alternative to the best estimate forecast. Business leaders are becoming increasingly aware of the deficiencies inherent in traditional forecasting methods. And the past two decades have ushered in an explosion of tools to facilitate novice and expert alike in applying Monte Carlo simulation. Though the growth and accessibility of these tools has been staggering, less prolific has been the adoption of the methodologies these powerful tools enable. Why? Part of the answer lies in a lack of understanding of what a simulation forecast is, what the relative merits and limitations are, and when it is most appropriate to consider using it. In this article I address these questions.
whAt IS A SIMuLAtION FOrEcASt?
By michael KuBica
In a traditional forecast, input assumptions are mathematically related to each other in a
model. Based on these defined mathematical relationships, model outputs are calculated, such as market units sold, market share and revenue. The model may be simple or very elaborate. The defining characteristic is that inputs are defined as single point values, or best estimates. This type of model has been the staple of business for many years. They can answer questions such as, If all of our assumptions are perfectly accurate, we can expect ... However, experience has shown that all of the assumptions are not perfectly accurate. We are forecasters after all, not fortunetellers! Simulation can remedy this problem. Instead of defining input variables as single point estimates, we define them as probability distributions representing the range of uncertainty associated with the variable being defined. These ranged variables are fed into the exact same forecast model. When the simulation model is run, we sample each input variables distribution thousands of times and relate each instance of the distribution samples within the forecast model structure. Because we have defined the inputs as uncertainties, the outputs represent all of these uncertainties in a simulation forecast. Instead of a single line on the graph, we may represent an infinite number of lines, bounded by the possibilities constrained by the input distributions. Of course, we summarize these probabilistic outputs according to the confidence intervals relevant to the decision at hand.
W W W. i n f o r m s . o r g
thE rELAtIvE MErItS AND LIMItAtIONS
Simulation forecasts have several important advantages over single point estimates. First, assuming that the input variables are representative of the full range of possible values along with best estimates where available, we have what may be referred to as a representative forecast. A representative forecast incorporates all currently available information, including uncertainty about future values. In this sense it is a truthful forecast. A simulation forecast allows for an examination of both what is possible and how likely each of those possibilities are. We can examine the best estimate forecast in the context of the full range of possibility and discern true upside and downside risk. We gain these advantages without losing the ability to do specific scenario analyses. But now we can peer into the risk associated with achieving any defined scenario. Simulation modeling does come at some cost, though. Rather than having a single input per assumption, you will define anywhere from one to four inputs, depending on the type of distribution being represented. This is because, in order to create a probability distribution to represent your uncertainty regarding the assumption, you will need to define the bounds of possibility for that variable (minimum, maximum) in addition to a best estimate, and potentially a peakedness variable. This makes the model appear more complex and can make it seem more daunting to users (the truth is, the model itself has not changed
a n a ly t i c s | m a y / j u n e 2 011
executive eDge
from the point estimate, given that it was appropriately specified to begin with). This appearance of increased complexity can contribute to a black box perception among model consumers. Avoiding this issue is often as simple as explaining that the expanded input set is nothing more than representing the diligence that (hopefully) is going into formulating the best estimate in a traditional forecast, and leveraging all of this additional information to improve understanding and decision-making. Simulation outputs cannot always be interpreted the same way as traditional forecast outputs. It is therefore prudent to hold an orientation meeting with model consumers to discuss how to interpret results and to address common misapplications of simulation outputs. While it is not necessary for users to understand the theory per se, it is important to avoid having them multiply percentiles together or misinterpreting what the probabilistic outputs mean. A small investment here can go a very long way in creating value from the forecasting process.
whEN ShOuLD SIMuLAtION BE cONSIDErED AS thE MEthODOLOgy OF chOIcE?
where I heard one of the speakers say (in the context of creating forecasts to drive portfolio analysis): Simulation is OK, but you better be really sure you are right about your assumptions if you are going to use it! I was astounded that such misinformation was coming from someone forwarded as an expert. Nothing could be further from the truth. The less certain you are about the assumptions and the more there is at stake based on the decisions being made from the model, the more appropriate and important simulation forecasting is. This is especially true if the cost associated with being wrong significantly exceeds the cost of the incremental resources to define that uncertainty. In summary, simulation forecasting is a powerful methodology for understanding not just the possible future outcomes, but establishing a truthful representation of how likely any single scenario is within the range of possibilities. Because strategic (long-term) forecasting is inherently risky and driven by many uncertain variables, adding Monte Carlo simulation to your forecasting tool chest can create enormous value.
Michael Kubica is president of Applied Quantitative Sciences, Inc. Please send comments to mkubica@ aqs-us.com.
I once attended a pharmaceutical portfolio management conference

5 | a n a ly t i c s - m a g a z i n e . c o m
proFit center
learning by example
three traits of successful analytics projects.
Fundamental characteristics: a must answer question is addressed, the solution is simple and a specific action is proposed.
By e. anDrew BoyD
When speaking about analytics, or any other topic for that matter, its easy to be drawn into generalities. Forecasts improve profits. Information on past purchases can be used to increase sales. Generalities are important. They help us navigate environments crowded with details. But details provide important lessons that generalities cant, helping us learn by example. In this column we look at one particular screen in one particular software system. Its not overly complicated, but it illustrates three general traits common to many successful applications of analytics. To understand the context in which the screen is used, consider the example of a charitable organization preparing to mail requests for donations. At its disposal is a large database of past donors. The charity has a fixed budget for mailing. The question is, Who among the many past donors should receive a mailer? Analytics can be used to evaluate any number of factors. Are recent contributors
more likely to give again, or is it better to target individuals who havent contributed in a while? Are people from certain geographic regions more likely to give than others? Analytics offers a multitude of mathematical tools for answering these questions and determining which customers are most likely to Figure 1: Screen shot illustrates three general traits common to many successful send a donation. applications of analytics. Whatever mathematical tools are chosen, however, the results The application and the screen vividly ilcan be easily and clearly communicated. lustrate three fundamental characteristics of a The screen capture shown in Figure 1 is successful analytics endeavor. taken from SAS Enterprise Miner. On the horizontal axis is the percent of the popula- 1. a must answer question is addressed. tion the charity might send mailers to. For Contributors need reminding. Donations fall if example, 20 percent on the horizontal axis charities dont reach out. Without an unlimited corresponds to the question, Suppose we mailing budget, the charity is forced to ask, Who send mailers to the 20 percent of donors should we contact? The question must be anmost likely to respond? The vertical axis swered one way or another. Analytics provides then fills in the blank. By choosing the 20 an answer through the logical analysis of facts. percent of donors most likely to respond, Its useful to contrast the question faced by we can expect a response (cumulative lift) the charity with a question such as, Should I about 1.7 times greater than if we send mail- change the price of a gallon of milk? Retailers to 20 percent of the donor population at ers need to set prices, but once prices are set, random. (The charitys budget corresponds theres considerable inertia for leaving them to a mailing that reaches 20 percent of the unchanged. A retailer doesnt need to change donor population.) The system arrives at this prices tomorrow. Analytics can still bring trenumber by determining which customers are mendous value in this case, and pricing has most likely to respond. received considerable attention by analytics
W W W. i n f o r m s . o r g a n a ly t i c s | m a y / j u n e 2 011
proFit center
practitioners. Nonetheless, its easier for analytics to be adopted in applications where theres a question that unequivocally must be answered. 2. the solution is simple. It doesnt take an advanced degree in mathematics to understand either the problem or, at a general level, the logic behind the solution. Some people are more likely to respond to a mailer than others, and its possible to take an educated guess about who those people are based on historical data. And, recognizing that the question must be answered, its better to take an educated guess than a shot in the dark. The SAS system, along with similar systems offered by other analytics software vendors, allows users to pick among different mathematical methods for predicting who is most likely to respond. Modelers can then choose the method they feel most comfortable with to support an educated guess. 3. A specific action is proposed. The screen shows the expected lift from mailing to the right customers, but
more importantly, in the background it identifies those customers who should receive mailers. Once the analysis is run a very specific action results: mail to these customers. Not all analytics applications provide such explicit actions. Reports provide useful information, but what to do with that information isnt always clear. Its of value to know that David closed $80,000 in business last month while his peers averaged $100,000, but what action should Davids manager take? When the action isnt obvious, neither is the value. The value only becomes apparent when good business processes are put in place. In cases where the action is immediately apparent, the value is much easier to see. It isnt necessary for a successful application of analytics to demonstrate all three traits. Not all applications are so fortunate to have all of them. But when all three are present the case for analytics is extremely compelling, making life easier for everyone involved. Well return to look at other detailed examples in future columns.
Andrew Boyd served as executive and chief scientist at an analytics firm for many years. He can be reached at e.a.boyd@earthlink.net. Thanks to Jodi Blomberg of SAS for her help in preparing Figure 1.
subscribe to Analytics
its fast, its easy and its fRee! Just visit: http://analytics.informs.org/
a n a ly z e t h i s !
lets get this analytics party hopping

Data hopelessness is what happens when an analyst has no hope of ever getting data in time to possibly answer anything but the most short-term business questions.
In my ongoing quest to figure out whats going on in the world of analytics, Ive recently been to the Predictive Analytics World (PAW) conference (www.predictiveanalyticsworld. com/sanfrancisco/2011/) and the INFORMS Business Analytics and Operations Research conference in Chicago (http://meetings2.informs.org/Analytics2011/). I have seen dozens of presentations, heard scores of panel discussions and had a million or so conversations. In no particular order, here are some of my impressions. big numbers: Attendance at both conferences was way up this year. PAW attracted 550 people, up 78 percent from the previous year. PAW was one of several workshops and conferences (along with Marketing Optimization Summit, Conversion Conference and Google Analytics Users Great Event) bundled into Data Driven Business Week, which drew more than 1,300 attendees Meanwhile, the INFORMS conference drew 623 registrants, or 52 percent more than 2010. The events new title obviously resonated more deeply than the old one (INFORMS Practice Conference) with many first-time attendees. Moreover, conference organizers clearly achieved their objective of engaging folks from
By vijay mehrotra
outside the traditional INFORMS community; I bumped into several people who had never had any interaction with the organization before, and many non-members delivered compelling presentations. Other numbers that caught my attention: 1,000+ (number of analytics professionals working at consulting firm Mu Sigma www.musigma.com, probably the largest company of its kind in the world) and 2,000+ (the number of SAS licensees within Wells Fargo). groundhog day: At lunch in Chicago one day, someone pointed out that one of the Edelman Award finalists projects was an intelligent way to re-position empty shipping containers, something we had talked about ad nauseam in the early 1990s when consulting with SeaLand. Other old-timers reported similar moments of dj vu. Whats different now? Two decades later, the data is now just a lot better. Sigh. Whose party is this anyway? Many people from the Knowledge Discovery and Data Mining (KDD) community attended PAW. While some were heavy-duty statistics folks, one could sense a strong vibe that this conference was about something other than statistics. Indeed, at one session, a very technical speaker confessed that he had never taken a statistics class and knew only what he had needed to learn for a couple of consulting projects. Of course, the INFORMS conference was attended by lots of us operations research and industrial engineering types, though not near enough folks from other classical
analytics disciplines such as business intelligence and machine learning. INFORMS has just launched a new section on analytics. A big shout out to founding officers Michael Gorman (president), Zahir Balaporia (president-elect), Warren Lieberman (treasurer) and Doug Mohr (secretary) for making this happen, and for keeping their eyes and minds open for opportunities to collaborate with the KDD community and other professional groups, because were all on the same side, and we all have lots to learn. data Hopelessness: Despite the many inspiring success stories at both conferences and the many instructive presentations about the mainstreaming of analytics, I also had several dark coffee and cocktail conversations about some old familiar struggles. Kudos to Sam Eldersveld for his treatise on a (still) common disease. Data hopelessness, he explains, is what happens when an analyst has no hope of ever getting data in time to possibly answer anything but the most short-term business questions. In its most common form, getting useful data requires its own herculean and unsustainable effort. In its strongest form, even data obtained under great duress is hopelessly incomplete/insufficient to answer the questions that are being asked with any level of confidence and those questions are inevitably the wrong ones. On the plus side for those dealing with data hopelessness, 25 employers participated at the informal job fair in Chicago and many potential employers trolled the hallways at both
a n a ly z e t h i s !
PAW and INFORMS. Clearly, the professional opportunities are out there, but beware: the data may not be any cleaner on the other side. Whose party is this anyway? (part 2): The two major sponsors for both of these events were SAS and IBM. These companies are the two biggest players in this space, but they could hardly be less alike: SAS is a privately held firm that has been focused on this business since its founding in 1976, while IBM is a publicly traded global behemoth that has spent more than $14 billion on analytics acquisitions such as ILOG, SPSS and Netezza over the past four years. After visiting with folks from both of these companies, it is clear to me that each is grappling with serious challenges. For SAS, its growing on-demand business requires them to learn to operate data centers, manage service level agreements and track clients data flows, a far cry from their roots as a tools vendor. In addition, bigger data sets and smarter algorithms continue to put pressure on the SAS folks to get smarter about
how their software utilizes the capability of modern hardware platforms. For IBM, the success of its acquisition strategy depends heavily on its ability to integrate these capabilities into its global services organization and communicate to customers how these pieces can be leveraged. It, too, has a long road ahead. Certainly it is terrific that these two companies are supporting these conferences and others like them. We also saw a ton of small entrepreneurial companies (consulting firms, software vendors, data aggregators, executive recruiters) at both conferences. But it sure would be nice to see some more big players bring their resources to the analytics space. Is anyone listening over there at SAP, Oracle, salesforce. com? We would all love to see you at the next conference because this party is really just getting started.
Vijay Mehrotra (vmehrotra@usfca.edu) is an associate professor, Department of Finance and Quantitative Analytics, School of Business and Professional Studies, University of San Francisco. He is also an experienced analytics consultant and entrepreneur and an angel investor in several successful analytics companies.
help promote Analytics

its fast and its easy! visit: http://analytics.informs.org/button.html
newsmaKers
midwest iso earns edelman honors

all of us everyone who works for the company try to focus every day on how can we continuously improve, how can we do things better, how can we move ourselves forward, whether its incrementally or fundamentally.
Moments after Midwest ISO, which manages one of the worlds largest energy markets, won the 2011 Franz Edelman Award for Achievement in Operations Research and the Management Sciences, company CEO and President John Bear was asked how many people comprise his high-end analytics (operations research, i.e. O.R.) group. Thats a trick question, Bear said. In theory, there are 850 of us involved with O.R. All of us everyone who works for the company try to focus every day on how can we continuously improve, how can we do things better, how can we move ourselves forward, whether its incrementally or fundamentally. The Edelman competition, sponsored by the Institute for Operations Research and the Management Sciences (INFORMS) and considered the Super Bowl of O.R., is an annual event that recognizes outstanding examples of operations research-based projects that have transformed companies, entire industries and peoples lives. The 2011 winner was announced at an awards gala April 11 in conjunction with the INFORMS Conference on Business Analytics & Operations Research in Chicago.
Bonus links
see the presentation of Midwest isos award-winning analytics project here: http://livewebcast.net/ infoRMs_ac_edelman_award_2011 see Midwest iso receive the coveted infoRMs edelman award here: http://www.youtube.com/user/ infoRMsonline#p/u/9/exl_9Mrblfa
10 |
For many years, the U.S. power industry consisted of utilities that focused locally and ignored the possibility that there might be better, regional solutions. Midwest ISO was the nations first regional transmission organization (RTO) to emerge following the Federal Energy Regulation Commissions push in the 1990s to reThe Edelman-winning team, including Midwest ISO CEO and President John Bear (fifth from structure and boost right). efficiency throughout the power industry. Headquartered in Car- plants and transmission assets. Based on its mel, Ind., Midwest ISO has operational control annual value proposition study, the Midwest over more than 1,500 power plants and 55,000 ISO region realized between $2.1 and $3.0 bilmiles of transmission lines throughout a dozen lion in cumulative savings from 2007 through Midwest states, as well as Manitoba, Canada. 2010. Midwest ISO estimated an additional Driven by the goal of minimizing delivered $6.1 to $8.1 billion of value will be achieved wholesale energy costs reliably, Midwest through 2020. The savings translate into lower ISO, with the support of Alstom Grid, The energy bills for millions of customers throughGlarus Group, Paragon Decision Technology out the region. and Utilicast, used operations research and Along with Midwest ISO, the other 2011 analytics to design and launch its energy- Edelman finalists included CSAV (Chilean only market on April 1, 2005, and introduced shipping company), Fluor Corporation, Inits energy and ancillary services markets on dustrial and Commercial Bank of China Jan. 6, 2009. (ICBC), InterContinental Hotels Group (IHG) Midwest ISO improved reliability and in- and the New York State Department of Taxacreased efficiencies of the regions power tion and Finance.
Ba n K i n g s e c to r
the secret to better credit risk management: economically calibrated models

By anDrew jennings (leFt) anD carolyn wang (right)
As the banking sector gradually rebounds from the global recession, many bank executives and boards are focused on incorporating the painful lessons learned during the past three years into their business operations. Chief among those lessons learned is the need to strengthen the management of credit risk as economic conditions fluctuate. It is clear that many banks werent prepared for the economic and financial storms that struck in 2008. Not only were
the analytic models employed by banks illprepared for the depth and length of the recession, bank executives were caught off guard by their inability to do more to manage through the onslaught of consumer defaults. The good news is that these hard times spurred analytic innovation and produced useful data to strengthen risk management going forward. Our post-crisis research has revealed three important lessons regarding credit risk that can be instructive for banks everywhere:
1. Risk is dynamic. A banks riskmanagement strategy must be agile enough to keep pace with a risk environment that is evolving continuously. 2. Rapid and significant changes in economic and market forces can render traditional risk-management approaches less reliable. 3. Credit providers need better economic forecasting relative to risk management for loan origination and portfolio management.
EcONOMIcALLy cALIBrAtED rISk MODELS
Risk models that are used to originate loans or make credit decisions on existing customers need to take an economically sensitive approach that offers the guidance and insight banks require for effective risk management. Such an approach will enable models to provide decision makers with more reliable and actionable information. While most of todays credit-risk models continue to rank-order risk properly during turbulent times, we now know that
11
c r e D i t r i s K m a n ag e m e n t
immediate past default experience can be a weak indicator of future payment performance when economic conditions change significantly and unexpectedly. Empirical evidence shows that default rates can shift substantially even when credit scores stay the same (see Figure 1). For example, in 2005 and 2006, a 2 percent default rate was associated with a FICO Score of 650-660. By 2007, a 2 percent default rate was associated with a score of about 710 as rapidly worsening economic conditions (and the impact of prior weak underwriting standards) affected loan performance. Although most banks already incorporate some type of economic forecasting into their policies, our research and experience indicates a substantial portion of this input
is static and may not provide useful guidance for risk managers. As a result, there is a tendency to over-correct and miss key revenue opportunities, or under-correct and retain more portfolio risk than desired. Fortunately, progress in predictive analytics over the past three years now allows forecasting based on a more empirical foundation that is far more adept at managing risk in a dynamic environment. Such forecasts can augment existing credit-risk predictions in two ways: 1. They can improve predictions for payment performance. These improved predictions can be incorporated into individual lending decisions, and they can be used at the aggregate level to predict portfolio performance.
2. They can be used to predict the migration of assets between tranches of risk grades. When used in conjunction with aggregate portfolio default probabilities, this can form the basis of forecasting risk-weighted assets for the purpose of Basel capital calculations (and other types of regulatory compliance).
rISk ShIFtS AS EcONOMy ShIFtS
2. Limit losses by tightening credit policies sooner and targeting appropriate customer segments more precisely for early-stage collections. 3. Prepare for the future with improvements in long-term strategy and stress testing. 4. Achieve compliance with capital regulations more efficiently. (Improved accuracy in reserving will also reduce the cost of capital.) At the simplest level, next-generation analytics provides lenders with an understanding of how the future risk level associated with a given credit score will change under current and projected economic conditions. These sophisticated analytic models are able to derive the relationship between historical changes in economic conditions and the default rates at different score ranges (i.e., the odds-to-score relationship) in a lenders portfolio. Using this derived relationship, lenders can input current and anticipated economic conditions into their models to project the expected odds-to-score outcome under those conditions. They can model their portfolio performance under a variety of scenarios utilizing economic indicators such as the unemployment rate, key interest rates, Gross Domestic
During economic downturns, many lower-risk consumers may refinance their debt obligations, leaving their previous lenders with portfolios full of riskier consumers. Other borrowers who were lower-risk in the past may reach their breaking points through job loss or increased payment requirements. And higher-risk consumers may get stretched further, resulting in more frequent and severe delinquencies and defaults. Economically calibrated analytics give lenders a way to understand the complex dynamics at work during unstable economic times. The resulting models provide an additional dimension to risk prediction that enables lenders to: 1. Grow portfolios in a less risky and more sustainable manner by identifying more profitable customers and extending more appropriate offers.
12
Product (GDP), housing price changes and many others variables. These models can be constructed regionally or locally to account for the fact that economic conditions may not be homogenous across an entire country. The odds-to-score relationship can be studied at an overall portfolio level or it can be scrutinized more finely for key customer segments that may behave differently under varying economic conditions. And, it can be applied to a variety of score types, such as origination scores, behavior scores, broad-based bureau scores like the FICO Score and Basel II risk metrics. Economically calibrated analytics can be particularly valuable when examining the behavior scores that lenders utilize to manage accounts already on their books for actions such as credit line increases/ decreases, authorizations, loan re-pricing and cross selling. An economically calibrated behavior score could be used in place of, or along with, the traditional behavior score across the full range of account-management actions.
SIgNIFIcANt vALuE ADD FOr cOMPLIANcE
In addition to operational risk management, the incorporation of economic factors into portfolio performance
modeling can be valuable for regulatory compliance. When lenders set aside provisions and capital reserves, it is important that they understand the risks in their portfolios under stressed economic conditions because that is when the reserves are likely to be needed most. In fact, forward-looking risk prediction is explicitly mandated in Basel II regulations, and such predictive analytics should be part of any lenders best practices for risk management. FICO has been working for some time now with European lenders to add economic projections into Basel II Probability of Default (PD) models. Using the derived odds-to-score relationship between a lenders PD score and various economic conditions, lenders can simulate the expected PD at a given risk-grade level in many different scenarios. Thus, lenders can more accurately calculate forwardlooking, long-term PD estimates to meet regulatory requirements and calculate capital reserves in a more efficient and reliable manner. This can help banks free up more capital for lending and credit without taking on unreasonable risk. It can also help improve the transparency of a banks compliance program and reduce the time and resources that must be dedicated to compliance.
APPrOAch ALrEADy BEArINg FruIt
We recently applied our economically calibrated risk-management methodology to the portfolio of a top-10 U.S. credit card issuer. We compared the actual bad rate in the portfolio to predictions from both the traditional historical odds approach as well as the economically calibrated methodology. We found that the latter would have reduced the issuers error rate (the difference between the actual and predicted bad rates) by 73 percent over three years, resulting in millions of dollars of loss avoidance. In a second example, European lender Raiffeisen Bank International (RBI) is using an economically calibrated risk-management technique to complement its more traditional credit scoring information.
RBI overlays macroeconomic information on the banks traditional credit-scoring process, creating a system that leverages and extends the value of RBIs in-house economic research. This provides the bank with a forward-looking element to its credit scoring following concerns about the creditworthiness of some of the central and eastern European countries in which the bank operates. RBI is using this new approach on its credit card, personal loan and mortgage portfolios to build future economic expectations into credit risk analysis. Regulatory compliance was the initial driver of this move, but RBI quickly realized the new approach could help it achieve stability in the overall capital requirements for its retail business segment.
c r e D i t r i s K m a n ag e m e n t each market the bank serves faces different economic prospects, and calibrating risk strategies for each market can help the bank grow during good and bad economic periods.
Each market the bank serves faces different economic prospects, and calibrating risk strategies for each market can help the bank grow during good and bad economic periods. In another real-world case, a U.S. credit card issuer retroactively applied this economic-impact methodology to its credit-line-decrease and collections strategies. An analysis of its 2008 data (conducted with the new methodology) found that the predicted bad rate for its portfolio rose more than 250 basis points compared to predictions based on a more traditional approach. The new approach would have decreased the amount of credit extended to a larger portion of the portfolio (and not decreased credit to those less sensitive to the downturn). The lender would have realized millions of dollars in yearly loss savings. For the same U.S. card issuer, we retroactively used an economically adjusted behavior score in place of the traditional behavior score to treat early-stage (cycle 1) delinquent accounts. Prioritizing accounts by risk, the strategy would have targeted 41 percent of the population for more aggressive treatment in April 2008. We then examined the resulting bad rates six months later (October 2008) and saw that these accounts resulted in higher default rates than the accounts that werent targeted. In other words, the economically adjusted scores improved the identification of accounts that should have received more aggressive treatment in anticipation of the economic downturn. Using this strategy, the lender would have been ahead of its competition in
14
collecting on the same limited dollars. The lender could have saved approximately $4 million by taking aggressive action earlier. FICO calculated this figure using the number of actual bad accounts that would have received accelerated treatment, the average account balance, and industry roll rates. The combination of this loss prevention through more aggressive collection and the millions of dollars the lender could have saved from an improved credit-line decrease strategy would have made a material impact on the lenders earnings. This underscores the aggregate benefits of economically calibrated risk management when used across a customer lifecycle. And, the benefits are scalable for larger portfolios. These are just a few examples among many worldwide that illustrate the value of economically calibrated analytics for risk management. In fact, one of the largest financial institutions in South Korea recently adopted this same approach to help it derive forward-looking estimates on the probability of default in its consumer finance portfolio. The lender will be
using these predictions to continuously adjust its operational decisions depending on anticipated economic conditions.
NOw IS thE tIME tO Act
Smart lenders are reevaluating their risk-management practices now when economic conditions are somewhat calm and there is no immediate crisis that requires their full attention. A reevaluation of risk-management practices can enable measured growth while simultaneously preparing a lender for the next recession. The use of forward-looking analytic tools will become the risk-management best practice of tomorrow. With improved risk predictions that are better aligned to current and future economic conditions, lenders can more quickly adjust to dynamic market conditions and steer their portfolios through uncertain times.
Andrew Jennings is chief analytics officer at FICO and the head of FICO Labs. Carolyn Wang is a senior manager of analytics at FICO. To read more commentary from Dr. Jennings and other FICO banking experts, visit http://bankinganalyticsblog.fico. com/.
15
cruise line experiment
risk in revenue management

acknowledging risks existence and knowing how to minimize it.
By param singh
Most of us go about our daily actions in our personal lives constantly evaluating risk vs. reward elements. Most actions are based on decisions that are made quickly and sub-consciously with barely a thought regarding risk vs. reward, but others are more deliberate and calculating as, say, financial decisions to invest in the stock market. We naturally carry our disposition regarding risk into the workplace, too, making on-the-job decisions based on our
general attitude toward risk. People differ in the level of risk with which they are comfortable. So the question, then, is how does the variety in peoples willingness to take risk affect decisions that have an impact on the companys bottom line? Lets narrow this question down to the world of revenue management (RM). RM systems help in the decision-making process by evaluating vast amounts of complex demand and supply information and recommending optimal actions to maximize revenues. But even so, RM analysts
have a role, rightly so, in deciding whether or not to approve or adjust these system recommendations. But can they do that without imposing their viewpoint on risk into the equation?
cruISE LINES ExPErIMENt EvALuAtES EFFEct OF PrOPENSIty tO tAkE rISkS
As a real-life example to understand this phenomenon, a revenue management department undertook an experiment with analysts using its RM system. This company was a cruise line so the resources being
priced were cabins for future sailings of varying durations on ships with various itineraries. With several hundred sailings each year for this cruise line, the RM workload of the department was divided among its dozen analysts on the basis of the ship type, sailing duration, season and destinations. The RM system evaluated the data and performed its modeling, forecasting and optimization steps to recommend prices for its products. The analysts either approved the system recommendations or adjusted the prices up or down.
16
The key metrics for evaluating individual sailings were occupancy and various flavors of net revenue. Revenue came from tickets for the cabins and cruise and onboard revenues from shopping, casinos, liquor, offshore excursions, etc. High occupancy was desirable sometimes at the cost of low ticket prices for both the onboard revenue component plus the positive psychological effect on passengers (similar to customers feeling somewhat let down if the restaurant they went to dine in was sparsely occupied). Also, the cruise line preferred to raise ticket prices as the sail date approached, though this was not always upheld for various reasons such as poor forecasts, disbelief in forecasts or a variety of market conditions, giving rise to confusion from the customers point of view some of whom thought it better to wait to get good deals on cruise prices. As part of this experiment, a single small sample of future sailings of varying durations, itineraries, etc. was assigned to all RM analysts. This was workload over and above the individual collection of sailings they were each already responsible for. They were asked to evaluate the system recommendations for this handful of
sailings and decide whether to accept it or assign new lower or higher prices. All analysts had the same data available to them. This experiment lasted several months since the sample consisted of sailings a few weeks from departure and others several months from departure. Even though only the true owners of the sample of sailings made the real implementable pricing decisions, all analysts recorded their pricing decisions and reasons behind them. This was done once a week, at the same frequency of the RM system forecasting/optimization runs, until the sailings departed. The recorded results of decisions that would have been made had different RM analysts been in charge of these sailings were very informative. It became obvious that different people viewed the same information differently, sometimes to the point of making opposite decisions: If the systems recommendation was to raise prices from their current level, some analysts suggested raising the price even higher whereas others suggested lowering the prices, the recommendations notwithstanding! And all this based on the same RM data elements. The risk tolerance in its most extreme
The optimists. analysts who waited too long for the close-to-departure higher revenue demand and thus either suffered lower occupancy or did a last minute fire sale, resulting in lower total revenues.
form was expressed by two divergent camps: 1. The pessimists. Analysts who would rather not wait till close-to-departure for higher revenue demand and filled the ship somewhat earlier by accepting demand sooner than later, thereby reducing the risk of empty cabins at sailing but also getting lower total revenues. 2. The optimists. Analysts who waited too long for the close-to-departure higher revenue demand and thus either suffered lower occupancy or did a last minute fire sale, resulting in lower total revenues. One can debate which risk tolerance approach was best for the cruise line. The latter certainly sent the wrong message to the marketplace in terms of waiting for deals close to sailing, especially if it occurred often. Another interesting observation was that senior managements risk tolerance also played into the analyst decisions. Since all pricing decisions had to be approved by the managers and/or directors, their risk tolerance and preferences were superimposed upon each analysts decision-making process. In this case, a systemic shift of metrics occurred for the department as a whole
during the time preceding the sailings: Early on, far from the sail date, the metric was net revenue (i.e., holdout and wait for higher valued demand), and, closer in as the sail date approached, the metric shifted toward occupancy.
StEPS tO MINIMIzE thE EFFEct OF rISk PrEDISPOSItIONS
Although it was reassuring that analysts did not blindly accept the RM system recommendations, its clear companies can better direct their efforts and minimize the risk-taking element through good RM models, training and metrics. RM models. Its vitally important to ensure the effectiveness of the five main pieces of your RM system: 1. Data: Good, clean and timely data in a single location provide a reliable foundation for downstream RM processes. 2. Estimation models: Accurate and frequently updated models provide the best supporting parameters used in the RM system. These include cancellation rates, segmentation, unconstraining and price elasticities. 3. Forecasting: Accurate prediction of demand as best as data will allow, and flexibility to incorporate new business conditions or information without delay.
4. Optimization: Good recommendations based on valid representations of the real worlds business constraints and market conditions, built to take advantage of advances from evolving mathematical techniques. 5. Tracking and reporting: Visibility into knowing that the models are working well and that optimal revenue opportunities are being captured. Training. Training provides both an understanding and a belief in the RM system. Training underlines that RM models, if stochastic in nature, are generally risk neutral and on the average will provide superior revenue results compared to the risk taking by analysts which is akin to gambling. In the short term, it may pay off, but in the long term it will generate suboptimal levels of revenues. If the analysts are trained to understand how the data is used, various parameters estimated, demand forecasted and optimization recommendations produced, they are more likely to know where to focus their efforts in determining the validity of the RM system decisions. Metrics. Confidence in the recommendations produced by a RM system comes by producing and reviewing post-sailing metrics such as accuracy metrics of
18
practitioners and researchers using rm in several industries have observed that risk averseness is a common and natural human behavior. thats especially true as rm analysts make their decisions under the generally difficult condition where the higher revenue customers demand occurs toward the end of the booking cycle.
forecasts and other parameters used in the RM models and metrics of revenue opportunities captured. Showing analysts how well the forecasting models predict when the various demand streams can be expected to occur and did occur, will take them a long way in not unnecessarily second guessing the demand forecasts. And viewing revenue opportunity captured metrics (actual revenue captured on a scale of no RM revenue vs. optimal revenue possible) also shows them the direct results of RM actions, whether positive or negative in nature.
rISk SENSItIvE MODELS
Practitioners and researchers using RM in several industries have observed that risk averseness is a common and natural human behavior. Thats especially true as RM analysts make their decisions under the generally difficult condition where the higher revenue customers demand occurs toward the end of the booking cycle. Thats when compensating for poor RM decisions or sub-par models is most difficult. Most of the mathematics used in the RM optimization models rely on both the

long run on the average, based on a high volume of flight departures, cruise sailings, hotel nights and car rentals and therefore have risk-neutral revenue maximizing objective functions. But they dont directly consider the fact that sometimes RM industries may prefer stable financial results in the short term rather than some of the inherent volatility produced with the use of risk-neutral models and market randomness. Recent research and development of mathematical formulations incorporate a variety of mechanisms called risksensitive formulations into the RM models to mitigate these risk elements. Following are a number of different risksensitive methods incorporating a variety of levers to achieve an acceptable risk objective: various utility functions as a way to reflect the level of risk that is acceptable; variance of sales as a function of price by using weighted penalty functions; value at risk or conditional value at risk functions; relative revenue per available seat mile at risk metric, for airlines; maximizing revenues, using constraints of minimum levels of revenue with associated probabilities; and
target percentile risk measures that prevent falling short of a revenue target. (For more information and a comprehensive bibliography, see Risk Minimizing Strategies for Revenue Management Problems with Target Values by Matthias Koenig and Joern Meissner.) Even though most current RM models are risk-neutral models, RM practitioners have to ensure that they do not make risk predisposition-based, sub-optimal decisions while trying to maximize revenues. If the RM models in use, whether forecasting or optimization, indeed are in need of risk adjustments, then those enhancements should be made. However, incremental benefits are possible from using good models to begin with, supported by frequent training and analytical review of results before incorporating additional risk-sensitive components into the RM models.
Param Singh, SAS Worldwide Marketing, has gained, over the past 15 years, a variety of cross-industry revenue management experience working in airlines, cruise lines, hotels and transportation. His responsibilities in RM have spanned all facets of revenue management systems including data management, forecasting, optimization, performance evaluation and metrics, reporting, GUI design, model calibration, testing and maintenance. Singh has also provided RM consulting services to several companies. Prior to RM, he worked in the application of a variety of operations research techniques and solutions in the airline industry in the areas of airport operations, food and beverage, maintenance and engineering.
19
a n a ly t i c s
m a y / j u n e 2 011
m a r K e t i n g s t r at e gy
Five best practices in behavioral segmentation

how should organizations embed segmentation to become highly relevant?
By talha omer
To marketers, it is fairly well established that enticing all customers with the same offer or campaign is useless. The No. 1 reason why people unsubscribe or opt-out is due to irrelevant messaging. A while back marketers moved to grouping customers on the basis of certain metrics that gave a bit more context to the marketing strategies. Example: offering an entertainment service to customers who: 1. use a lot of MMS for sending videos
and pictures, 2. download songs using GPRS, and 3. have a very high ARPU (average revenue per user). Segmentation drives conversion and avoids erosion. Thats a bold statement, but the reality is that monotonous subscribers do not come to your organization to make an activity. Your organization does not exist for a singular reason either. The core drivers of behavior are very different for each core group of subscribers. When you look at all that in aggregate you
get nothing. You think average duration of calls means something or revenue of calls and overall duration of calls give you insights, but do they? Probably not much. The problem is that all business reporting and analysis is data in aggregate, e.g. total number of daily calls, total daily revenue, average monthly call duration, total weekly volume of GPRS, overall customer satisfaction and many more gigabytes of data reports and megabytes of analysis, all just aggregates. The tiny percent of time that the analyst does segmentation, it seems to stop at ARPU. Segmenting by ARPU gives you segments, but they are so basic that you will not find anything in there that will get you anything insightful. So how can you make sure you are highly relevant to drive conversion and avoid erosion? If you want to find actionable insights you need to segment your data. Only then will you understand the behavior of micro-segments of your customers, which in turn will lead you to actionable insights because you are not focusing on the whole but rather on a specific. The power of segmenting your subscribers is that you get a 360-degree view of your customer while exploring such questions as, Whom am I going
to sell a certain product to? To answer this and similar questions, well focus on the five best practices in behavioral segmentation.
BESt PrActIcE NO. 1: FIrSt, DIScOvEr thE cLIENtS BuSINESS. START WiTH quESTiONS, NOT THE ANAlyTiCAl MODEl.
Business leaders feel frustrated when they dont get insights that they can act on. Similarly, from the analysts point of view, it cant be fun to hold the title of senior analyst only to be reduced to running aimless analytical models. Hence, the most important element of any effective analytics project is to discover the clients business dynamics by asking real business questions, understand those business questions and have the freedom to do what it takes to find answers to those questions by using analytical strategies. You need to ask business questions because, rather than simply being told what metrics or attributes to deliver, you need business context: Whats driving the request for the model? What is the client really looking for? Once you have context, then you apply your smart analytical skills.
20
B e h av i o r a l s e g m e n t a t i o n
The business questions should have these three simple characteristics: 1. They should be at a very high level, leaving you room to think and add value. 2. They should have a greater focus on achievable steps because each step enables you to focus your efforts and resources narrowly rather than implementing universal changes, which making every step easier to accomplish. 3. They should focus on the biggest and highest-value opportunities because the momentum of a single big idea and potentially game-changing insight will incite attention and action. The goal of business discovery is to pull an analyst up to do something he does less than 1 percent of the time in the analytics world look at the bigger business picture. It is nearly impossible to find eye-catching actionable insights if you just build a model straight away. Efforts will be wasted and the project will stall if you dont start by asking business questions. Along with wasting resources, failing to ask the right business questions up front risks creating widespread skepticism about the real value of segmentation analytics. The reason for asking business questions can be summed up in one word:
context. We are limited by what we know and what we dont know.

BESt PrActIcE NO. 2: rEcONcILE thE DAtA, BuILD BuSINESSrELEvANt SEgMENtS. HAVE AN OPEN MiND; TOO MANy OR TOO fEW NATuRAl SEGMENTS MAy BE JuST RiGHT.
Big data is getting bigger. Information is coming from instrumented, interconnected systems transmitting real-time data about everything from market demand and customer behavior to the weather. Additionally, strategic information has started arriving through unstructured digital channels: social media, smart phone applications and an ever-increasing stream of emerging Internet-based gadgets. Its no wonder that perhaps no other activity is as much a bane of our existence in analytical decision-making as is reconciling data. Most of the things dont seem to tie to anything; each time you present the outcomes, the executives are fanning the flames of mismatched numbers. All of the attributes created for any analytical project are available to the stakeholders via standard BI reporting simply compare the attributes with the reported numbers. If the numbers are
off by less than 5 percent resist the urge to dive deep into the data to find root cause. A comprehensive agenda enables the reconciliation of the numbers. A senior analyst at one company, for example, stated that they were blindsided when it came to reconciling the data. But once they started checking every number to the ones reported to the business they found themselves able to go forward. Cluster techniques transform data into insights. Cluster techniques are a powerful tool to embed insights by generating segments that can be readily understood and acted upon. These methods make it possible for decision-makers to identify customers having similar purchases, payments, interactions and other behavior, and to listen to customers unique wants and needs about channel and product preferences. As an analyst, youll no longer have to hypothesize the conditions and criteria on which to segment customers. Clustering techniques provide a 360-degree view of all customers, not just a segment of high-revenue customers. Running the statistical process for clustering customers creates clusters that are statistically significant. The question then becomes, Are the clusters significant from a business perspective? To answer that, ask the following questions:
Do you have enough customers in a segment to warrant a marketing intervention? What, and how many, attributes do they differ on? Are those attributes business critical to warrant different segments? Once the above questions have been sufficiently answered, the project team can determine if there are customer behaviors important enough from a business perspective to explain a marketing initiative.
BESt PrActIcE NO. 3: rEFrESh thE SEgMENtS. WHEN TO REViSE SEGMENTS TO ENSuRE THEy ARE AlWAyS ACTiONABlE.
3
a n a ly t i c s
Segmentation sets the stage for how the organization is going to behave for a given time period. This is analytics at its best and one of the most resource-intensive analytics initiatives that will add huge value. As executives start using segmentation more frequently to inform day-to-day decisions and strategic actions, this increasing demand will require that the information provided is recent and reliable. Therefore, it is necessary to keep the segments up to date.
| m a y / j u n e 2 011
A senior executive told me his company built a perfect statistical model that created highly actionable segments, but it soon became useless because a majority of subscriber profiles had changed over time. This was due to the dynamic and competitive market the segmentation was focused on. In such environments, new campaigns, pricing and products are launched every day, causing instant behavioral changes and hence accelerating the model decay. The executive said they had to streamline the operational processes and automate them so that the company could rebuild segmentation every month. At one time they even considered drawing real-time segmentation since the benefits they reaped were unparalled. Therefore, to keep the three gears moving together up-to-date segmentation, actionable insights and timely actions the overriding business purpose must always be in view. New analytic insights are embedded into the segments as business changes, as new products are launched and as new strategic developments happen, and a virtuous cycle of feedback and improvement takes hold. It starts with a foundation of analytical capabilities built on organizational context that delivers better insights, backed by a systematic review to continuously improve the decision process.
BESt PrActIcE NO. 4: MAkE SEgMENtS cOME ALIvE. ANAlyZE SEGMENTS TO DRiVE ACTiONS AND DEliVER VAluE.
New methods and tools to embed information into segments analytics solutions, inter/intra-segment highlights, psychographic and demographic analysis are making segments more understandable and actionable. Organizations expect the value from these techniques to soar, making it possible for segments to be used at all levels of the organization, e.g. for brand positioning or allowing marketers to see how their brands are perceived. Innovative uses of this type of information layering will continue to grow as a means to help individuals across the organization consume and act upon insights derived through segmentation that would otherwise be hard to piece together. These techniques to embed insights will add value by generating results that can be readily understood and acted upon: Intra-segment analysis evaluates the preferences of a segment, such as the highest proportion of revenue is realized from calls during the night,
etc. Measuring the proportion of traffic of an attribute for a segment will tell you the inclination and motivation for that segment. Inter-segment analysis reflects actual rank of a segment for an attribute across all segments a technique that would give you the best/worst segments with respect to a particular attribute, e.g. highest revenue, second lowest GPRS users, etc. Psychographics and demographics analysis is a fantastic way to understand the demographic (male, female, age, education, household income) and psychographic (Why do they call during the night? What do they use Internet for?) makeup of any segment. For example, if you are interested in the technology savvy segment, targeted surveying of each segment and analysis will tell you what zip codes these subscribers are likely in, why they are using so much GPRS, what websites they visit, etc. Once you establish the segments, you may then merge and/or discard segments that are business insignificant. The rule of thumb for merging segments: If you believe that you cannot devise distinctive campaigns for two segments, merge them.
5
a n a ly t i c s
These methods will make it possible for decision-makers to more fully understand their segments of subscribers and boost business value. Businesses will be able to listen to customers unique wants and needs about channel and product preferences. In fact, making customers, as well as information, come to life within complex organizational systems may well become the biggest benefit of making data-driven insights real to those who need to use them.
BESt PrActIcE NO. 5: SPEEDINg INSIghtS INtO thE SEgMENtAtION PrOcESS. WHAT SEGMENTATiONfOCuSED COMPANiES DO.
Most often, organizations start off their segmentation analysis by gathering all available data. This results in an all-encompassing focus on data management collecting, reconciling and transforming. This eventually leaves little time, energy or resources to focus on the rest of the segmentation process. Actions taken, if any, might not be the most valuable ones. Instead, organizations should start in what might seem like the middle of the process: implementing segmentation
| m a y / j u n e 2 011
by first defining the business questions needed to meet the big objective and then identifying those pieces of data needed for answers. By defining the business objective first, organizations can target specific subject areas and use readily available data in the initial analytic models. The insights delivered through these initial models will identify gaps in the data infrastructure and business processes. Time that would have been spent collecting and pre-processing all the data can be redirected toward targeted data needs and specific process improvements that the insights identify, enabling a successful segmentation. Companies that make data their overriding priority often lose momentum long before the first iteration is delivered, frequently because a data-first approach takes too long before delivering an actionable segmentation. In cases where the market is very volatile, by the time you deliver the segments, time for refresh arrives and you are back to square one. By narrowing the scope of these tasks to the specific subject areas needed to answer key questions, value can
be realized more quickly, while the insights are still relevant. Organizations that incorporate segmentation must be good at data capture, processing and have plenty of space available in their warehouse. In these areas, they must outperform the competition up to tenfold in their ability to execute. Time to market is very little. Market dynamics change quickly in highly competitive and saturated markets.
SEt yOurSELF uP FOr SuccESS
or bundled subscriptions. Understand what your business is, what are the areas of strategic focus, and then segment away. Likewise, the more you understand what your customers are doing, the more likely it is that youll stop the irrelevance of your marketing campaigns. Youll also likely find the optimum balance between what you want to have happen and what your customers want. Youll make happier customers, who will in turn make you happy. Summing up, start on the path to
segmentation, keep everyone focused on the big business issues and select the business problems that segmentation can solve today with new thinking and a framework for the future. Build on the operational and strategic capabilities you already have, and always keep pressing to embed the insights youve gained into your business strategy.
Talha Omer is an analytics professional and researcher. He currently serves as an analyst at a major telecommunication company. He holds a Masters degree from Cornell University in operations research. He can be reached at tno5@cornell.edu.
Remember, segmentation analysis is a tough game. The good news is that this is very far from daily business reporting and analysis. It requires more intense and focused effort, and it truly is advanced analysis. Not every company will be ready to leverage all of the above practices. The reader is encouraged to perform a self-critical analysis of your own abilities before you go into segmenting your subscribers, even though the upside is literally huge sums of money and a strategic advantage that will influence your fundamental business strategy in a very positive way. For your company and business, maybe revenue from off-net calls is not as important as duration of calls, the number of MMS, the volume of GPRS
23
60 countries representeD
understanding data miners

Data miner survey examines trends and reveals new insights.
By Karl rexer (pictureD), heather n. allen anD paul gearan

For four years, Rexer Analytics has conducted annual surveys of the data mining community to assess the experiences and perspectives of data miners in a variety of areas. In 2010, Rexer sent out more than 10,000 invitations, and the survey was promoted in a variety of newsgroups and blogs. Each year the number and variety of respondents has increased, from the 314 in the inaugural year (2007) to 735 respondents in 2010. [Rexer Analytics did not specifically
define data miner or data mining; the decision to participate in the survey was an individual choice.] The data miners who responded to the 2010 survey come from more than 60 countries and represent many facets of the data mining community. The respondents include consultants, academics, data mining practitioners in companies large and small, government employees and representatives of data mining software companies. They are generally experienced and come from a variety of educational backgrounds.
Each year the survey asks data miners about the specifics of their modeling process and practices (algorithms, fields of interest, technology use, etc.), their priorities and preferences for analytic software, the challenges they face and how they address them, and their thoughts on the future of data mining. Each year the survey also includes several questions on special topics. Often these questions are selected from the dozens of suggestions we receive from members of the data mining community. For example, the 2010
survey included questions about text mining and also gathered information about the best practices in overcoming the key challenges that data miners face. For a free copy of the 37-page summary report of the 2010 survey findings, e-mail DataMinerSurvey@RexerAnalytics.com.
DAtA MININg PrActIcES
The data miners responding to the survey apply data mining in a diverse set of industries and fields. In all, more
24
Data m i n i n g s u r v e y
than 20 fields were mentioned in last years survey, from telecommunications to pharmaceuticals to military security. In each of the four years, CRM/ marketing has been the field mentioned by the greatest number of respondents (41 percent in 2010). Many data miners also report working in financial services and in academia. Fittingly, improving the understanding of customers, retaining customers and other CRM goals were the goals identified by the most data miners. Decision trees, regression and cluster analysis form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. Time series, neural nets, factor analysis,
text mining and association rules were all used by at least one quarter of respondents. This year, for the first time, the survey asked about ensemble models and uplift modeling. Twenty-seven percent of data mining consultants report using ensemble models; about 20 percent of corporate, academic and non-government organization data miners report using them. About 10 percent of corporate and consulting data miners report using uplift modeling, whereas this technique was only used by about 5 percent of academic and NGO/ Govt data miners. Model size varied widely. About one-third of data miners typically utilize 10 or fewer variables in their final models, while about 28 percent generally construct models with more than 45 variables. Text mining has emerged as a hot data-mining topic in the past few years, and the 2010 survey asked several questions about text mining. About a third of data miners currently incorporate text mining into their analyses, while another third plan to do so. Most
Data miners consistently indicate that the quality and accuracy of model performance, the ability to handle very large datasets and the variety of available algorithms are their top priorities when selecting data mining software.
data miners using text mining employ it to extract key themes for analysis (sentiment analysis) or as inputs in a larger model. However, a notable minority use text mining as part of social network analyses. According to the survey respondents, data miners employ STATISTICA Text Miner and IBM SPSS Modeler most frequently for text mining. The survey also asked data miners working in companies whether most data mining is handled internally or externally (through consultants or vendor arrangements). Thirty-nine percent indicated that data mining is handled entirely internally, and 43 percent reported that it is handled mostly internally, while only 1 percent reported that it was entirely external. Additionally, 14 percent of data miners reported that their organization offshores some of its data analytics (an increase from 8 percent reported in the previous year).
SOFtwArE
the 5th annual data Miner survey

Rexer Analytics recently launched its fifth annual data miner survey. In addition to continuing to collect data on trends in data miners practices and views, this year Rexer analytics has included additional question on data visualization, best practices in analytic project success measurement and online analytic resources. to participate in the 2011 survey, follow this survey participation link, and use access code inf28.
One of the centerpieces of the data miner survey over the years has been assessing priorities and preferences for data mining software packages. Data miners consistently indicate that the quality and accuracy of model performance, the ability to handle very large datasets and the variety of available algorithms are their top priorities when selecting data mining software. Data miners report using an average of 4.6 software tools. After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43 percent) than any other. SAS and IBM SPSS Statistics are also used by more than 30 percent of data miners. STATISTICA, which has also been climbing in the rankings, was selected this year as the primary data-mining tool by the most data miners (18 percent). The summary report shows the differences in software preferences among corporate, consulting, academic and NGO/ Government data miners. For example, STATISTICA, SAS, IBM SPSS Modeler and R all have strong penetration in corporate environments, whereas Matlab, the open source tools Weka and R, and the IBM SPSS tools have strong penetration among academic data miners.
The survey also asked data miners about their satisfaction with their tools. STATISTICA, IBM SPSS Modeler and R received the strongest satisfaction ratings in both 2010 and 2009. Data miners were most satisfied with their primary software on two of the items most important to them quality and accuracy of performance and variety of algorithms but not as satisfied with the ability of their software to handle very large datasets. They were also highly satisfied with the dependability/stability of their software and its data manipulation capabilities. STASTICA and R users were the most satisfied across a wide range of factors. Data miners report that the computing environment for their data mining is frequently a desktop or laptop computer, and often the data is stored locally. Only a small number of data miners report using cloud computing. Model scoring typically happens using the same software that developed the models. STATISTICA users are more likely than other tool users to deploy models using PMML.
chALLENgES
In each of Rexer Analytics previous Data Miner Surveys, respondents were asked to share their greatest challenges as data miners. In each year, dirty data emerged as the No. 1 challenge.
26
Explaining data mining to others and difficulties accessing data have also persisted as top challenges year after year. Other challenges commonly identified include limitations of tools, difficulty finding qualified data miners and coordination with IT departments. In the 2010 survey, data miners also shared best practices for overcoming the top challenges. Respondents shared a wide variety of best practices, coming up with some innovative approaches to these perennial challenges. Their ideas are summarized and along with verbatim comments (196 suggestions) on the website: www.rexeranalytics.com/ Overcoming_Challenges.html. Key challenge no. 1: Dirty Data. Eighty-five data miners described their experiences in overcoming this challenge. Key themes were the use of descriptive statistics, data visualization, business rules and consultation with data content experts (business users). Some example responses: In terms of dirty data, we use a combination of two methods: informed intuition and data profiling. Informed intuition required our human analysts to really get to know their data. Data profiling entails
checking to see if the data falls into pre-defined norms. If it is outside the norms, we go through a data validation step to ensure that the data is in fact correct. Dont forget to look at a missing data plot to easily identify systematic pattern of missing data (MD). Multiple imputation of MD is much better than not to calculate MD and suffer from amputation of your data set. Alternatively flag MD as new category and model it actively. MD is information! Use random forest (RF) as feature selection. I used to incorporate often too many variables which models just noise and is complex. With RF before modeling, I end up with only 5-10 variables and brilliant models. A quick K-means clustering on a data set reveals the worst as they often end up as single observation clusters. We calculate descriptive statistics about the data and visualize before starting the modeling process. Discussions with the business owners of the data have helped to better understand the quality. We try to understand the complexity of the data by looking at multivariate combinations of data values.
Data miners are optimistic about continued growth in the number of projects they will be conducting in the near future. seventy-three percent reported they conducted more projects in 2010 than they did in 2009, a trend that is expected to continue in 201 1.
Key challenge no. 2: Explaining data mining to others. Sixty-five data miners described their experiences in overcoming this challenge. Key themes were the use of graphics, very simple examples and analogies, and focusing on the business impact of the data mining initiative. Some example responses: Leveraging competing on analytics and case studies from other organizations help build the power of the possible. Taking small impactful projects internally and then promoting those projects throughout the organization helps adoption. Finally, serving the data up in a meaningful application BI tool shows our stakeholders what data mining is capable of delivering. The problem is in getting enough time to lay out the problem and showing the solution. Most upper management wants short presentations but dont have the background to just get the results. They often dont buy into the solutions because they dont want to see the background. Thus we try to work with their more ambitious direct reports who are more willing to see the whole presentation and, if they buy into it, will defend the solution with their immediate superiors. Ive brought product managers
(clients) to my desk and had them work with me on what analyses was important to them. That way I was able to manipulate the data on the fly based on their expertise to analyze different aspects that were interesting to them. Key challenge no. 3: Difficulty accessing data. Forty-six data miners described their experiences in overcoming this challenge. Key themes were devoting resources to improving data availability and methods of overcoming organizational barriers. Some example responses: I usually would confer with the appropriate content experts in order to devise a reasonable heuristic to deal with unavailable data or impute variables. Difficult to access data means typically we dont have a good plan for what needs to be collected. I talk with the product managers and propose data needs for their business problems. If we can match the business issues with the needs, data access and availability is usually resolved. A lot of traveling to the business unit site to work with the direct customer and local IT ... generally put best practices into place after cleaning what little data we can find. Going forward we generally develop a project plan around better, more robust data collection.
thE FuturE OF DAtA MININg
Data miners are optimistic about continued growth in the number of projects they will be conducting in the near future. Seventy-three percent reported they conducted more projects in 2010 than they did in 2009, a trend that is expected to continue in 2011. This optimism is shared across data miners working in a variety of settings. When asked about future trends in data mining, the largest number of respondents identified the growth in adoption of data mining as a key trend. Other key trends identified by multiple data miners are increases in text mining, social network analysis and automation.
Karl Rexer (krexer@RexerAnalytics.com) is president of Rexer Analytics, a Boston-based consulting firm that specializes in data mining and analytic CRM consulting. He founded Rexer Analytics in 2002 after many years working in consulting, retail banking and academia. He holds a Ph.D. in Experimental Psychology from the University of Connecticut. Heather Allen (hallen@RexerAnalytics.com) is a senior consultant at Rexer Analytics. She has built predictive models, customer segmentation, forecasting and survey research solutions for many Rexer Analytics clients. Prior to joining the company she designed financial aid optimization solutions for colleges and universities. She holds a Ph.D. in Clinical Psychology from the University of North Carolina at Chapel Hill. Paul Gearan (pgearan@RexerAnalytics. com) is a senior consultant at Rexer Analytics. He has built attrition analyses, text mining, predictive models and survey research solutions for many Rexer Analytics clients. His 2006 in-depth analyses of the NBA draft resulted in an appearance on ESPNews. He holds a masters degree in Clinical Psychology from the University of Connecticut. More information about Rexer Analytics is available at www. RexerAnalytics.com. Questions about this research and requests for the free survey summary reports should be e-mailed to DataMinerSurvey@RexerAnalytics.com.
28
a n a ly t i c s
m a y / j u n e 2 011
l e ga l i s s u e s
sports law analytics

analytics are proving to be dispositive in high-stakes sports industry litigation.
By ryan m. roDenBerg (leFt) anD anastasios KaBuraKis (right)

As highlighted by James C. Cochran in the January/February 2010 issue of Analytics and a forthcoming special issue of Interfaces co-edited by Michael J. Fry and Jeffrey W. Ohlman, the sports industry has firmly embraced the use of analytics in the decision-making process. Such methods have similarly been adopted in sports law, a corollary field inextricably intertwined with the dynamic sports business. As a prime example, Shaun Assael of ESPN [1] recently described the
ongoing litigation involving the National Collegiate Athletic Associations (NCAA) licensing of former student-athletes names and likenesses in video games (a now-consolidated case [2] that started with the filing of two separate actions Keller v. NCAA, et al and OBannon v. NCAA, et al) as one of five lawsuits that will change sports, giving credence to the relevancy and importance of how analytics can and will be used in furthering specific arguments arising in the lawsuit. The purpose of this article is to provide
an overview of sports law analytics and discuss the role of analytics in sports law cases moving forward, with a pointed discussion of the aforementioned consolidated Keller and OBannon case and the U.S. Supreme Courts recent decision in American Needle v. NFL, et al [3].
OvErvIEw OF SPOrtS LAw ANALytIcS
The interdisciplinary methods employed in sports law analytics are derived from statistics, management science, operations research, economics, psychology
and sociology. However, the practice parameters of sports law analytics are set by evidentiary rules and relevant case law precedent. The consolidated case encompassing both Keller and OBannon implicates important intellectual property principles such as publicity rights and consent. Similarly, American Needle revolves around antitrust law and the complex competition-centered analysis that goes along with it. Daubert v. Merrell Dow Pharmaceuticals [4] was a U.S. Supreme Court
29
s p o r t s , a n a ly t i c s & t h e l a w
as binding precedent on every federal trial court in the united states, an understanding of the Daubert standard is a prerequisite to applying sports law analytics in pending litigation.
opinion that addressed the admissibility of expert testimony within the context of a drug-related birth defect case. Since being decided in 1993, Daubert has been the seminal case on the issue of whether expert testimony should be admitted or excluded. As binding precedent on every federal trial court in the United States, an understanding of the Daubert standard is a prerequisite to applying sports law analytics in pending litigation. Daubert requires courts to adopt a standard that determines whether the proffered evidence both rests on a reliable foundation and is relevant to the task at hand (597). In addition, the judge must consider whether the reasoning or methodology underlying the testimony is scientifically valid (59293). The case has had the effect of limiting the use of the so-called hired gun expert. Daubert requires the trial court judge to act as a gatekeeper to protect against unreliable expert testimony being admitted into evidence (592-94). As summarized in Nelson v. Tennessee Gas Pipeline [5], the Daubert case set forth several factors to be considered: (1) whether a theory or techniquecan be (and has been) tested; (2) whether the theory has been subjected to peer review and publication; (3) whether there is a high known or
potential rate of error and whether there are standards controlling the techniques operation; and (4) whether the theory or technique enjoys general acceptance within the scientific community (251). The U.S. Supreme Court, in cases such as Castaneda v. Partida [6], has offered guidance on the admissibility standards for quantitative evidence. As outlined by Winston [7], the nations highest court has accepted the 5 percent level of significance or two standard deviation rule as the level of evidence needed to shift the burden of proof from plaintiff to defendant or vice versa (96). Kentucky Speedway v. NASCAR, et al [8], a December 2009 case out of the U.S. Court of Appeals for the Sixth Circuit, illustrates the power and pitfalls of sports law analytics in litigation. The plaintiff alleged that NASCAR and an affiliate violated federal antitrust laws when the plaintiffs application for an elite-level sanction was not granted and the plaintiffs attempts to purchase pre-sanctioned races proved unsuccessful. The case also evidences how Daubert is applied in sports industry legal disputes. In Kentucky Speedway, NASCAR and its co-defendants prevailed after the court of appeals upheld the district courts determination that the
plaintiffs primary expert witness was unreliable. Specifically, the expert retained by the plaintiffs was deemed to have applied his own (incorrect) analytical test when testifying. Pointedly, the Kentucky Speedway court found that the experts own version of the well-accepted analytic pertaining to consumer substitution in the marketplace has not been tested, has not been subjected to peer review and publication; there are no standards controlling it, and there is no showing that it enjoys general acceptance within the scientific community[f]urther, it was produced solely for this litigation (918). Federal Rule of Evidence 702 [9] is the primary rule that guides the admissibility of evidence in the federal court system and was revised after the U.S. Supreme Court decided Daubert. In relevant part, the rule provides: If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the witness has applied the
30
principles and methods reliably to the facts of the case. Federal Rule of Evidence 702, coupled with Daubert, form the parameters of sports law analytics in the courtroom regardless of whether the dispute pertains to intellectual property law (Keller and OBannon), antitrust law (American Needle) or otherwise.
ANALytIcS IN INtELLEctuAL PrOPErty LItIgAtION
Quantitative-based analysis is playing a major part in the outcomes of virtually all contemporary intellectual property litigation that reaches trial. In fact, the absence of such analytics has been held as a shortcoming in the context of some intellectual property litigation. Commercial publicity rights have been represented by a number of malleable concepts on which there is no uniformity of acceptance, no dispositive codified law, and jurisdictions across the U.S. have been split. The student-athletes in the Keller/OBannon class action video game case are alleging, among other things, that the NCAA impermissibly licensed their name and likeness, a violation of their right of publicity. Section 46 of the Restatement
(Third) of Unfair Competition [10] sets the burden of proof for establishing a violation of a right of publicity as: (i) use of the plaintiffs identity; (ii) identity has commercial value; (iii) appropriation of commercial value for purposes of trade; (iv) lack of consent; and (v) resulting commercial injury. The aforementioned fourth prong will likely be at issue in the video game litigation, as the NCAAs defense will probably include a claim that the student-athletes depicted in the interactive games provided de facto consent to such licensing via their scholarship agreement, letter of intent or other related document. Moreover, the second prong has been traditionally decided after consideration of marketing research surveys and several analytical tools attempting to establish whether there is indeed commercial value; e.g. whether consumers can sufficiently identify the plaintiff and, in turn, make the clear connection between the plaintiff and the digital expression in a video game. Sports law analytics will almost certainly play an integral part in the resolution of the consolidated class action containing both Keller and OBannon if the case goes to trial. A bevy of expert witnesses will testify. Analytics-driven evidence will be proffered by both sides.
Both Keller and OBannon were seeded in the use of former players images in college sports video games, for which the NCAA, Collegiate Licensing Company (CLC) and NCAA member schools had contracted with Electronic Arts (EA), a leading video game manufacturer. Per NCAA policies on amateurism, studentathletes are not permitted to use their athletic skill to endorse commercial products or services. Similarly, the NCAA has taken the position that former student-athletes depicted in video games years after their collegiate careers have ended are not entitled to receive compensation in exchange for the licensing of the name and likeness. Keller filed his complaint in May 2009 and, among other things, alleged that the NCAA, CLC and EA violated his rights of publicity under Indiana and California law. OBannon and several co-plaintiffs, all former college basketball and football players, filed a related lawsuit two months later. Analytics presented on the twin issues of the right of publicity and the presence of consent will be influential, if not dispositive, in the cases resolution.
ANALytIcS IN ANtItruSt LItIgAtION
The importance of sports law analytics will also be realized in American Needle if the dispute reaches trial following remand by the U.S. Supreme Court on May 24, 2010. The American Needle case involved an antitrust challenge by a Chicago-area headwear manufacturer against the NFL following the leagues decision to enter into an exclusive arrangement with Reebok for the manufacture of officially licensed headwear. The Supreme Court unanimously reversed a lower court summary judgment motion in favor of the NFL, concluding that the league is not immune from antitrust scrutiny in connection with its intellectual property licensing activities. Barring settlement, the now-remanded case will go to trial. There, plaintiff American Needle will have the opportunity to present evidence showing that the NFLReebok agreement stifled competition in the marketplace, damaged the companys book of business and adversely impacted consumers. Analytics will play a role on two levels. First, macro-level experts for both sides will testify about economics-heavy
antitrust principles, gauging whether the pro-competitive effects of the exclusive arrangement are outweighed by the anticompetitive impact of the NFL-Reebok exclusivity. Second, a narrow investigation will be undertaken to ascertain the impact on consumers. American Needles micro-level analytics will be aimed at showing how a purported decreased level of competition has affected customers. Such analytics will likely focus on costs at the retail level before and after the NFL granted Reebok an exclusive license. In response, the NFL will likely retain experts capable of testifying about how consumers are benefitted by the economies of scale resulting from an allencompassing agreement in the form of greater selection, uniformity and quality control, for example. Finally, American Needle will need to demonstrate the extent of its lost profits following the NFLReebok licensing pact.
cONcLuSION
for the admissibility of statistical evidence and expert testimony in sports-related trials, such parameters can be generalized to non-sports contexts, as the legal rules are equally applicable. Experts with analytical acumen and some baseline level of sport-specific institutional knowledge frequently provide expert witness and consulting services, as the underlying legal disputes are often nuanced and technical, making them ripe for analytics.
Ryan M. Rodenberg (rrodenberg@fsu.edu) is an assistant professor at Florida State University. He earned a Ph.D. from Indiana University-Bloomington and a JD from the University of Washington-Seattle. Anastasios Kaburakis (kaburakis@slu.edu) is an assistant professor at Saint Louis University. He earned a Ph.D. from Indiana University-Bloomington and a law degree from Aristotle University in Thessaloniki, Greece. RefeRences
1. shaun assael, five lawsuits that Will change sports, espn.com, nov. 8, 2010. 2. in Re student athlete name and likeness licensing litigation, c 09-01967 cW (n.d. cal. 2010). 3. american needle v. nfl, et al, 130 s.ct. 2201 (2010). 4. daubert v. Merrell dow pharmaceuticals, 509 u.s. 579 (1993). 5. nelson v. tennessee gas pipeline, 243 f.3d 244 (6th cir. 2001). 6. castaneda v. partida, 430 u.s. 482 (1977). 7. Wayne l. Winston, 2009, Mathletics, princeton, n.J. 8. Kentucky speedway v. nascaR, et al, 588 f.3d 908 (6th cir. 2009). 9. federal Rule of evidence 702 (2011). 10. Restatement (third) of unfair competition 46 (1995).

High-stakes litigation in the sports industry often turns on analytics. The consolidated class action containing Keller/OBannon and the American Needle v. NFL case are current examples. While this article explained the federal evidentiary rules and U.S. Supreme Court opinions that set the parameters
32
a n a ly t i c s
m a y / j u n e 2 011
D i s p l ay & D e p l o y m e t r i c s
simulation frameworks: the key to dashboard success

By (leFt to right) zuBin Dowlaty, suBir mansuKhani anD Keshav athreya most dashboards typically start life in a business function (e.g. a spreadsheet tracking report).
Regardless of the organization that you work for, chances are that you use dashboards to display and deploy metrics. The technology for building dashboards has continuously evolved, so much so that it is now possible for a non-technical person to build a dashboard. Despite their ubiquity, whether dashboards have been able to achieve their utmost potential is subject to debate. Most dashboards typically start life in a business function (e.g. a spreadsheet
What metrics must be chosen to maximize impact on business? What is the relationship between metrics, and is there an overarching framework into which these KPIs slot in? Often the latter of the two the focus on the big picture is lost during the development of dashboards.
DAShBOArDS tO cOckPItS A SyStEM DyNAMIcS APPrOAch
tracking report). With increasing use, more data integration is required and the number of users burgeons, spawning the need for a full-fledged dashboarding solution. Departments (or governance bodies, in some instances) typically determine key metrics that must be part of the dashboarding solution, and IT is brought in to gather requirements and select the technology for a successful implementation. Independent of the hierarchy of implementation, each such exercise must attempt to answer two key questions:
Imagine that youre piloting a space shuttle. Would you prefer a conventional dashboard displaying certain choice metrics and trends or would you prefer a control panel, a cockpit, with actionable insights to negotiate the vagaries of inter-stellar travel? Piloting an organization is often not very different from helming a space shuttle, and the future of dashboards depends on the extent to which they can emulate cockpits, flight simulators and auto-pilot mode, a notion first explored by Rob Walker [1].
Figure 1: The stock flow concept.
The secret to developing dashboards of such astounding efficacy and power

33
B u i l D i n g Da s h B oa r D s a commonly studied concept in simulation is the stock Flow where a stock is simply an accumulation of an entity over time, and the status of stock varies depending on the flow variable.
could lie in the disciplines of simulation and system dynamics. A commonly studied concept in simulation is the Stock Flow where a stock is simply an accumulation of an entity over time, and the status of stock varies depending on the flow variable. The mathematical equivalents of stock and flow are the integral and partial derivative, respectively. This metaphor is appealing given its simplicity of explanation and intuitive appeal; stocks can be thought of as a bathtub and a flow will fill or drain the stock. Using these building blocks, one can then visually build a system complete with graphics and metrics that derive from the model. Someone non-technical could intuitively verify the model assumptions.
cuStOMEr LOyALty PrOgrAM thE DAShBOArD FrAMEwOrk
Lets say youre responsible for creating a tool to monitor the health of a loyalty program. Following the system dynamics approach would first entail the creation of a stock-flow map (see figure 2). Performing this exercise early on in the life of a dashboard ensures that the subsequent steps are grounded in theory and are sufficiently representative of reality. For a loyalty program, the key actors in the map are the customers
B u i l D i n g Da s h B oa r D s prospects flow into the enrollee customer state, and enrollees either activate into customers or they never conduct business with the loyalty program. customers that flow from the active state to inactive are considered the loyalty programs churn flow.
(stock) and their inflow/outflow represents the flow variable. Prospects flow into the enrollee customer state, and enrollees either activate into customers or they never conduct business with the loyalty program. Customers that flow from the active state to inactive are considered the loyalty programs churn flow.
Figure 2: A stock flow map for a typical loyalty program.
Now that we have The Big Picture in place, designing the dashboard is a straightforward process. We have an accurate view of the interrelationships governing the key metrics. The map display as a navigation device is a useful addition to any dashboard. Metric trends may be animated on the map. When one clicks on a stock or a flow, all the key metrics describing that state are displayed. For example, if one clicks the Actives stock, one could then see the number of active customers, customer segment distributions, recency and frequency tables, revenue and OLAP-style drill-downs displayed in a dedicated dashboard view. A benefit of this approach is it segments metrics immediately into two groups: Stock or Flow.
B u i l D i n g Da s h B oa r D s
as the process of scenario analysis matures, most likely users will begin asking for the dashboarding system to recommend optimal scenarios given constraints. optimization naturally extends the simulation apparatus.
Upon practice, one can utilize a similar template for stock variables and another for flow variables. The variable segmentation promotes re-use of the designed templates, thus enabling simpler implementations from a technology standpoint.
cuStOMEr LOyALty PrOgrAM ONwArD tO SIMuLAtION AND OPtIMIzAtION
With the stock flow map in hand, one can then form the basis for constructing a mathematical model of the system. The mathematical model opens the door for robust simulation and optimization as one matures beyond the dashboard reporting view. In its simplest form, the evolution of the system over time is constructed using the stocks and flows in the published map. For example, an analyst observes the active customer base in the firms customer loyalty program has begun to stagnate. The number of active customers is not increasing over time as expected. You need to intervene and try to boost active customers, but what do you do? Viewing Figure 2, lets increase

spending in the prospecting area of the map and boost the flow of spending dollars into the prospect stock. What would be the outcome with respect to active customers of this action? For example, increased prospect spending would likely cause an increase in the number of prospects, given an estimate of the response activation rate. You then can calculate the new stock of enrollees. Increased enrollees translate into a boost of active customers through the new enrollee activation rate. Improving response activation performance, attempting to reduce churn or some combination of these strategies present other scenarios to focus on. All these example scenarios are estimable from the map. In order to further enhance simulation accuracy one could introduce hierarchy. For each stock of customers, utilize customer segmentation to form sub groups. A customer segment is treated like a sub-stock to the parent stock, and one can track the inflow and outflow of each customer segment. This will reconcile in the parent stock, and one would gain considerable improvement in tactical ability. As the process of scenario analysis matures, most likely users will begin asking for the dashboarding system to recommend optimal scenarios given constraints. Optimization naturally extends
the simulation apparatus; one can link the optimization engine with the automated output of the simulator, iterate and search for an optimal condition or control rule. Stochastic optimization as well as probabilistic meta heuristic approaches such as simulated annealing work fine in these applications.
LASt wOrD
In sum, incorporating the stock flow mapping technique empowers the developer and end-user by giving them an extensible framework for understanding dashboards. Furthermore, this approach paves the way for successful implementation and is a natural step in the progression toward flight simulator and auto-pilot dashboards.
Zubin Dowlaty (zubin.dowlaty@mu-sigma.com) is vice president/head of Innovation & Development with Mu Sigma Inc., a provider of decision sciences and analytics services. Subir Mansukhani is senior innovation analyst and Keshav Athreya is a senior business analyst with Mu Sigma.
RefeRences
1. Rob Walker, 2009, the evolution and future of business intelligence, information Management, sept. 24 2009. 2. barry Richmond, 1994, systems dynamics/ systems thinking: lets Just get on With it, international systems dynamics conference sterling, scotland. 3. lawrence evans, an introduction to Mathematical optimal control theory, department of Mathematics university of california, berkeley.
36
a n a ly t i c s
m a y / j u n e 2 011
c o r p o r at e p r o F i l e
Fedex presents a playground of analytical problems

all those employees with all those vehicles moving all those packages on a daily basis provide problems that need to be modeled and solved.
Big companies have complex systems. Wait. That sentence was not finished. Big companies have complex systems to design and operate. Hold on. There is more. Big companies have complex systems to design and operate, which makes them a playground for operations research practitioners. FedEx falls into the big company category. The recent 2010 FedEx Annual Report shows that the company had $34.7 billion in revenue. More than 280,000 team members provide service to over 220 countries. There are 664 aircraft and more than 80,000 vehicles moving eight million packages a day. All those employees with all those vehicles moving all those packages on a daily basis provide problems that need to be modeled and solved.
FedEx Express schedules tens of thousands of workers to match anticipated work.
By chris holliDay
FedEx Express is the express airline subsidiary of FedEx Corporation and is the worlds largest express company. The operations research group at FedEx Express has been solving operational challenges since the early stages of the company. The group operates as an internal consultant, working on specific issues for various departments. Customers within FedEx Express include Air Operations, U.S. Operations, Central Support Services, Air
Ground Freight Services and International Operations. FedEx Founder Fred Smith introduced the People, Service, Profit philosophy at FedEx. If you put your people first, they will in turn provide quality service and profit will be the end result. People, Service, Profit also works well when grouping operations research (O.R.) problems. Without getting into too much detail on solutions, the playground of problems includes the six listed in the following groups:
37
c o r p o r at e p r o F i l e Fedex express volume fluctuates from day to day. the delivery routes are designed to meet a specific demand, but the couriers must expand or reduce route coverage based on volume changes.
PEOPLE:
1. Problem: FedEx Express must schedule tens of thousands of workers to match the anticipated work. Work fluctuates with the amount of packages at any specific location. Specific workers have specific skills and must be matched to specific work. Requirements: FedEx Express needs to match available workers to the shifts that need to be covered. Approach: A multi-stage mixed integer program is used to solve this problem. All work tasks must be identified by time of day. Workers and skill sets are documented. Work must be grouped into shifts for full-time and parttime work. Specific employees are assigned. 2. Problem: FedEx Express has a group of analysts who design the delivery and pickup routes for the couriers. The need for a new route structure varies based on package growth and facility limitations. The work must be done with the help of local operators to make the implementation successful. Requirements: FedEx Express needs to balance the workload for analysts planning the courier routes. Approach: A generalized assignment approach is used and solved with an integer program. The current workload of the analysts must be reviewed, as well as their availability for future work. A list of facilities that require a restructure must be compiled. Other considerations included in forming the list of facilities are the total number of courier routes,
a n a ly t i c s | m a r c h / a p r i l 2 011
FedEx moves more than eight million packages a day.
38
geography and route complexity. With this information, an optimized assignment of the analysts must be provided to management.
SErvIcE
3. Problem: FedEx Express volume fluctuates from day to day. The delivery routes are designed to meet a specific demand, but the couriers must expand or reduce route coverage based on volume changes. Requirements: FedEx Express needs to balance the workload and optimize the routes for the delivery couriers. Approach: A heuristic-based vehicle routing approach is used to solve this problem. All deliveries for a specific day are verified for a facility. The delivery routes are then optimized based on volume and drive time. The solutions are provided to the delivery couriers. 4. Problem: FedEx Express has facilities throughout major metropolitan areas. The couriers work at these facilities sorting and processing packages. The routes driven by the couriers must begin and end at these facilities. Being close to the
customer leads to better service. Requirements: FedEx Express needs to determine the optimal location for facilities. Approach: A classic location analysis model is used to solve this problem. The number of shipments to and from every customer must be determined. Those packages must be divided into courier routes. The distance to begin each route as well, as return to building for each route must be determined. The best location for a facility will include this input and then be passed on to management for review.
PrOFIt
to determine the number and size of aircraft required by the system five to 10 years in the future. The size of the supporting facilities and equipment needed must also be estimated. Approach: A multi-stage mixed integer program is used to solve this problem. The operations research groups must put together a forecast of packages and weight for five to 10 years in the future. The information must include package flow from each airport to each airport. The
number of aircraft available and connections to hub facilities must be determined. Input also includes the cost of operating aircraft as well as capital. With the information, an optimized network must be built. 6. Problem: FedEx Express has 664 aircraft that move throughout the world. Making sure that the right aircraft are in the right place at the right time is an ongoing task.
5. Problem: FedEx Express must invest in aircraft and facilities. As the international market continues to grow, larger aircraft are desired. Fuel efficiency is a major factor. These larger, newer airplanes are costly. They must be parked at airport facilities, some of the most expensive property in the world. These facilities must have support equipment able to move and sort packages. The purchase of aircraft and airport facilities requires significant lead time. Requirements: FedEx Express needs

39
a n a ly t i c s
m a y / j u n e 2 011
Flight schedules are created months in advance and are refined as the actual implementation date moves closer. During implementation, the actual available aircraft are assigned to the flight schedule. Requirements: FedEx Express needs to match aircraft with flight schedule. Approach: The operations research technique known FedEx deploys 664 aircraft and more than 80,000 vehicles. as the tanker scheduling apFedex express has 664 proach is used to solve this problem. The acaircraft that move tual aircraft total must include those available throughout the world. for service, those in maintenance plus those being used as spares. Certain aircraft are not making sure that the right allowed to fly into certain airports. Restrictions aircraft are in the right include noise and time-of-day. The actual asplace at the right time is signment of specific aircraft to the flight schedan ongoing task. ule is optimized. The six problems described above are just a few of the many opportunities to apply O.R. and other advanced analytics techniques at FedEx Express. One of the ongoing challenges is to apply a technique to each problem that will provide a solution that is easily understood and successfully implemented.
Chris Holliday (wcholliday@fedex.com), P.E., has been with FedEx 29 years and manages a group of operations research practitioners. Multiple FedEx team members contributed to this article. A version of this article appeared in OR/MS Today.
40
a n a ly t i c s
m a r c h / a p r i l 2 011
t h e F i v e - m i n u t e a n a ly s t
police vs. smartphone Dui apps

theres no reason that the police cant download the Dui apps and gain intelligence about where the drunks think the checkpoints are. Because both sides have the same information stream, this breaks down into a twoplayer game.
I was struck by something I saw in the news this morning lawmakers are concerned that socalled DUI checkpoint apps for smartphones would help drunk drivers avoid capture and abet them in breaking the law [1]. The story nagged at me all day; it was the sort of issue that I couldnt let go of. So I decided to ply my trade as an operations researcher and put a nickels worth of analysis against the problem [2]. The first thing I did was field research. I downloaded two such apps; Checkpoint Wingman and Phantom Alert. These apps work basically as a message board; persons who have the app can report a DUI checkpoint that they come across, and then these reports become part of a database. Owners of the app may then pull from the database the reported checkpoints and (theoretically) know whether they are at risk of getting busted with a DUI. Lets assume theres a strong correlation between a persons propensity to drive intoxicated and the odds that they would be willing to post to the database [3]. If this assumption stands, then the database relies on persons who drive intoxicated frequently but dont get caught at the checkpoint to make the updates. The updates could be so time late as to be useless. Because this is the five-minute analyst, well assume that (substantial) problem away with a hand-wave. Now, we can take cases on checkpoints. If
the checkpoint is optimally situated, that is, drunks/police deploy opposite deploy where dui app dui app says in a chokepoint that must be crossed for -1 0 the drunk to get from his starting location Believe DUI app -1 to his destination, there are two outcomes: Dont believe DUI app 0 either he elects to make the trip while in- Table 1: Police vs. drunks: a two-player game. toxicated and is arrested, which is counted as a win for law enforcement; or he is deterred The police could take this a step further and from making the trip and does something else post false information about the checkpoints. takes a cab, gets a driver, sleeps it off which From a practical standpoint, the drunks may is also a win for law enforcement. see Checkpoints everywhere and simply Easy enough. Now lets extend this to the choose to do something else [6]. case where there are two routes from the startWith a small amount of data and a short ing point to the destination. It would seem at amount of time, we have shown that the DUIfirst that the drunks would now have an ad- avoidance apps are no better than useless to the vantage because they could gain knowledge user (i.e. drunk) and no worse than harmless to about the risk of the paths. However: law enforcement. 1. As we discussed above, the information Harrison Schramm (harrison.schramm@gmail.com) is a military could be time-late. instructor in the Operations Research Department at the Naval Postgraduate School in Monterey, Calif. 2. The police get the same information. Theres no reason that the police cant download the DUI apps and gain intelligence about where the drunks think the checkpoints are. Because both sides have the same information stream [4], this breaks down into a two-player game (payoffs are relative to the drunks; see Table 1): The solution to this game is a mixed strategy for both players, and any individual drunk playing against the police in this situation will have a 50 percent chance of being caught [5], the same as if there was no app at all! An identical argument will show that the odds of escaping the checkpoint are 1/n where n is the number of possible (different) routes across the checkpoint plane.
RefeRences
1. http://techland.time.com/2011/03/23/senators-to-appstores-get-rid-of-pro-drunk-driving-apps/ 2. i do not intend to comment on the policy implications; im not particularly convinced whether these apps should be legal or not. What i am interested in from the oR point of view is what effect these apps have on the common good. 3. Justification: If you have a disposition to drive intoxicated, you consider knowing where the checkpoints are to be a public good; conversely, if you do not drive intoxicated you consider enforcement of checkpoints to be a public good. 4. assuming, of course, that the police can re-deploy (which is a good assumption). 5. for the game-theorists in the audience, because the value of the game is -1/2, for mixed strategy the drunks pick, the police can choose a corresponding mixed strategy and achieve the same result. this extends to the multi-road case as well, where the value of the game is -1/n 6. critics will note that i have valued deterring dui equally with punishing drunk drivers. those who weight punishment above deterrence will naturally come to a different conclusion.
By harrison schramm
41
a n a ly t i c s
m a y / j u n e 2 011
t h i n K i n g a n a ly t i c a l ly
the traveling spaceman problem

Figure 1 is a three-dimensional map of the universe containing nine galaxies that you, as the traveling spaceman, wish to visit. Each galaxys position in the universe is indicated by Table 1.
QuEStIONS:
Figure 1: Nine-galaxy universe.
1. Starting (and ending) at galaxy a, in what order should you visit each galaxy to minimize the traveled distance? You must visit each galaxy and you cannot visit any galaxy more than once. 2. What is the total distance traveled?
hINtS:
1. Larger galaxies indicate that they are closer to your viewpoint. 2. This problem can be solved using AMPL. Send your answer to ThinkingAnalytics@ gmail.com by July 15. The winner, chosen randomly from the correct answers, will receive an Analytics: Driving Better Business Decisions T-shirt.
Table 1: Coordinates for nine galaxies.
John Toczek is a risk analyst for ARAMARK Corporation in the Decision Support group. He earned his Bachelor of Science degree in Chemical Engineering at Drexel University (1996) and his Master of Science in Operations Research from Virginia Commonwealth University (2005).
By john toczeK
42 |
a n a ly t i c s
s e p t e m b e r / o c t o b e r 2 010

Analytics Magzine

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Analytics Magzine

Hochgeladen von

Copyright:

Verfügbare Formate

h t t p : / / w w w. a n a ly t i c s - m a g a z i n e .

Driving Better Business Decisions

BANkINg ON BEttEr DAyS

peter horner, eDitor horner@ lionhrtpub.com

DRIVING BETTER BUSINESS DECISIONS

DRIVING BETTER BUSINESS DECISIONS

4 executive edge 6 profit center

8 analyze this! 10 newsmakers

41 the Five-minute analyst

executive briefing on simulation in strategic forecasting

In a traditional forecast, input assumptions are mathematically related to each other in a

thE rELAtIvE MErItS AND LIMItAtIONS

I once attended a pharmaceutical portfolio management conference

lets get this analytics party hopping

help promote Analytics

midwest iso earns edelman honors

the secret to better credit risk management: economically calibrated models

EcONOMIcALLy cALIBrAtED rISk MODELS

APPrOAch ALrEADy BEArINg FruIt

cruise line experiment

risk in revenue management

help promote Analytics

Five best practices in behavioral segmentation

context. We are limited by what we know and what we dont know.

understanding data miners

By Karl rexer (pictureD), heather n. allen anD paul gearan

the 5th annual data Miner survey

thE FuturE OF DAtA MININg

sports law analytics

By ryan m. roDenBerg (leFt) anD anastasios KaBuraKis (right)

ANALytIcS IN ANtItruSt LItIgAtION

help promote Analytics

simulation frameworks: the key to dashboard success

Figure 1: The stock flow concept.

The secret to developing dashboards of such astounding efficacy and power

Figure 2: A stock flow map for a typical loyalty program.

help promote Analytics

Fedex presents a playground of analytical problems

FedEx Express schedules tens of thousands of workers to match anticipated work.

FedEx moves more than eight million packages a day.

help promote Analytics

police vs. smartphone Dui apps

the traveling spaceman problem

Figure 1: Nine-galaxy universe.

Das könnte Ihnen auch gefallen