Lean Six Sigma Meets Data Science - Integrating Two Approaches Based On Three Case Studies

Quality Engineering
ISSN: 0898-2112 (Print) 1532-4222 (Online) Journal homepage: http://www.tandfonline.com/loi/lqen20
Lean Six Sigma meets data science: Integrating

two approaches based on three case studies
Inez M. Zwetsloot, Alex Kuiper, Thomas S. Akkerhuis & Henk de Koning
To cite this article: Inez M. Zwetsloot, Alex Kuiper, Thomas S. Akkerhuis & Henk de Koning
(2018): Lean Six Sigma meets data science: Integrating two approaches based on three case
studies, Quality Engineering, DOI: 10.1080/08982112.2018.1434892
To link to this article: https://doi.org/10.1080/08982112.2018.1434892
Accepted author version posted online: 05

Feb 2018.
Published online: 21 Mar 2018.
Submit your article to this journal
Article views: 34
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=lqen20
QUALITY ENGINEERING
https://doi.org/./..
Lean Six Sigma meets data science: Integrating two approaches based on three
case studies
a b
Inez M. Zwetsloot , Alex Kuiper , Thomas S. Akkerhuisc , and Henk de Koningd
a
Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong; b Department of
Operations Management, University of Amsterdam, Amsterdam, The Netherlands; c Shell Projects and Technology, Royal Dutch Shell, The
Netherlands; d Blackbelt Team, ING Group, Amsterdam, The Netherlands
ABSTRACT KEYWORDS
The amount of available data is rapidly increasing, which is an opportunity to the Lean Six Sigma case-study research;
(LSS) methodology. Starting off with a well-established definition of LSS as theoretical foundations we CRISP-DM; data science;
employ theory-generating case-study research. Three successful improvement projects from a large DMAIC; Lean Six Sigma;
process improvement; Six
financial services firm in the Netherlands are analyzed. Clear differences to the definition of LSS are
Sigma method
observed. The research leads to three recommendations for integrating data science in LSS. Concern-
ing the structure of an improvement organization, skills of employees and, practical modifications to
LSS’s celebrated DMAIC roadmap to solidify its applicability in the modern age of data.
Introduction
it provides guidelines concerning project selection,
The amount of data that is generated and stored project management, as well as deployment.
increases every day, as does the capacity of computers LSS projects are data driven: in a first stage, pro-
to process these data. Competitive advantage can be cess data are collected and baseline performance is
obtained by using these data effectively (Manyika quantified. In a second stage, vital influence factors
et al. 2011). Large datasets are used for various goals are selected and its effects are quantified, leading
within organizations, in this article, we consider its to evidence-based improvement actions. Traditionally,
applicability in operations improvement. A variety of the data collected in a LSS project consist of some pro-
methodologies for operations improvement have been cess characteristics that should be improved (Y) and
proposed, such as total quality management, business some influence factors (X). For a successful project,
process reengineering, lean management, theory of typically 30–1000 observations are collected. However,
constraints, Six Sigma, and Lean Six Sigma (LSS). In over the last decades, larger data volumes of a wide vari-
this article, we focus on the latter: LSS, since it has been ety of process metrics have become readily available as
proven a widely used and successful method (Hahn a result of rapid developments in IT, automation, sen-
et al. 1999; Pande et al. 2000). sors, and storage media. The traditional datasets are
LSS consists of a well-established methodological looking smaller and smaller compared to the typical
framework for improving operational efficiency and dataset that is available nowadays. Traditional statisti-
effectiveness in organizations (George 2003). It is cal methods, typically used within LSS projects, like the
mostly known because of its stepwise approach to t-test and linear regression turn out to be less effective
improvement, called DMAIC, which is an acronym with larger data sets. Analysis of large data sets require
for Define, Measure, Analyze, Improve, and Control. different tools which have become known as “data sci-
The approach is very much like the scientific approach ence” tools (see Section ‘Data science: Definitions and
to problem solving. As a complete methodology, it projects structures’).
also lays out how a culture of effective and lasting Most research in the field of LSS focuses either on
continuous improvement can be realized. For example, specific tools or analytical techniques that are used
CONTACT Inez M. Zwetsloot i.m.zwetsloot@cityu.edu.hk Department of Systems Engineering and Engineering Management, City University of Hong
Kong College of Science and Engineering, AC- P,  Tat Chee Avenue, Kowloon, Hong Kong.
Color versions of one or more of the figures in the article can be found online at www.tandfonlie.com/lqen.
©  Taylor & Francis
2 I. M. ZWETSLOOT ET AL.
within the approach (e.g., control charts and design of science LSS projects and we conclude and discuss our
experiments), or on success factors in developing an results.
improvement organization (Lameijer et al. 2016). Some
initial ideas have been shared in the commercial liter- Methodology and definitions of Lean Six Sigma
ature on the integration of Lean and Six Sigma on the and data science
one side and big data and data science on the other side
(Auschitzky et al. 2014; Dhawan et al. 2014; Dutcher Before discussing the synergy of LSS and data science,
2014). Still little is known how to integrate both fields, it is important to understand what they are. In the fol-
whereas it has been recognized by Antony et al. (2017) lowing two subsections, we provide some background.
that there is a huge opportunity for LSS to use big data We assume that the reader has some knowledge of
and data science. Therefore, in this article we focus on LSS and a basic understanding of data science. For an
the capacity of the methodology to adapt to this devel- introduction to LSS the reader is referred to De Mast
opment. We will pose the term data science LSS for LSS et al. (2016) and to data science please refer to James
that revolve around large data sets (big data) and apply- et al. (2013). Finally, we discuss our case-based research
ing data science tools that are beyond LSS’s standards. methodology.
In this article, we investigate what a LSS project that
Lean Six Sigma
uses data science techniques looks like. Some examples
of such projects are described by Gaudard et al. (2009), LSS is a methodological framework for establish-
Köksal et al. (2011), Jang and Jeon (2009), Stojanovic ing continuous improvement within organizations
et al. (2015), and Oliff and Liu (2017). We study three (De Mast et al. 2016). Although it started out in indus-
cases, which are improvements projects at a financial try, it has become increasingly popular in other sectors
service provider and compare these to the traditional as well, like healthcare (De Koning et al. 2006), finance
LSS approach. We aim to formulate an answer to the (De Koning et al. 2008; Delgado 2010), and others
following research questions: (Lameijer et al. 2016).
r What distinguishes a data science LSS project For the case study in this article, we use the defini-
from a traditional LSS improvement project? tion of the methodology by Schroeder et al. (2008). The
r What modifications are needed in order to make definition is based on four elements:
the LSS methodology compatible for projects r There is a parallel-meso structure, see Figure 1.
based on big data sets and data science tools? r An improvement specialist leads the project.
We answer the first question from a descriptive r A structured methodology is used prescribed by a
angle; in which we check whether or not the cases roadmap called DMAIC.
match the four elements that define LSS according r The goal is to improve performance metrics
to Schroeder et al. (2008). By analyzing the case within the organization.
(within-analysis) and looking for difference among
cases (cross-case analysis) we can formulate an answer
to our first question. The second question has a pre-
scriptive nature in which we try to pinpoint what mod-
ifications are needed to blend LSS with the field of data
science.
This article is structured as follows. In the next sec-
tion, we provide some background on and definitions
of LSS and data science. Afterwards, we provide sum-
maries and highlights of the considered cases. Next, we
analyze each case and perform a cross-case analysis and
we answer our first research question. Following, we
blend data science and LSS by proposing three modifi-
cations. We also answer our second research question.
Finally, we provide guidelines regarding the use of data Figure . Parallel meso-structure in a LSS organization.
QUALITY ENGINEERING 3
However, the amount of data is usually limited since

project leaders often manually collect data to ensure
reliability. As a consequence, 30 observations of the
process metrics are usually considered to be sufficient
to get a good indication of the performance. As a con-
sequence the complexity of statistical tools is relatively
low, and the toolbox is limited to basic tools like analy-
sis of variance and simple linear regression, and graph-
ical methods such as histograms and Pareto charts.
Figure . DMAIC phases and their relations for a LSS improvement Data science: Definition and project structures
project.
There have been various innovations in statistics and
A parallel-meso structure helps to achieve operations modeling, which has resulted in new terminology for
improvement (Raje 2007), because the structure of the working with data. Such as data analytics, which tries to
organization is used as a template for the organization provide insights by extracting information from data.
of improvement projects. Figure 1, based on Schroeder In that sense (statistical) analysis is a part of analyt-
et al. (2008) and Akkerhuis et al. (2015), visualizes such ics. Business analytics (Bartlett 2013) focuses on the
a parallel-meso structure. insights that data can provide for the business to make
At the core of the (parallel) LSS organizational struc- better decisions. Data mining is focused on the applica-
ture are the Green and Black Belts (see Figure 1). tion of specific algorithms to extract patterns from the
These professionals get extensive training in a wide data (Fayyad et al. 1996).
variety of topics and are therefore improvement spe- Big data, finally, refers to data mining techniques
cialists, who are responsible for the execution of the that are related to data that adhere to the three Vs:
project. Every project has a Champion: the prob- velocity, volume, and variety, and sometimes also a
lem owner who is accountable for the project. Evi- fourth V (veracity) (Megahed and Jones-Farmer 2015).
dently not all problems have equal priority, so pro- It also refers to the physical practice of data storage
gram management at the level of senior management (Sagiroglu and Sinanc 2013). An example of this type of
determines the strategy and sets out which prob- data is click registration in websites, where the amount
lems are to be tackled first. Finally, Orange and Yel- of information that is released cannot be processed by
low Belts are supporting improvement projects and most storage media. We find that all these data-related
are often found on the work floor (De Mast et al. terms fall under the heading of data science. The three
2016). cases considered in this research fall in the definition of
LSS offers a structured method for process improve- “data science” as they utilize large data sets to provide
ment: the DMAIC cycle, which is used by project lead- insights.
ers as a roadmap to success. Its phases are summa- This article focuses on applied data science. Azevedo
rized in Figure 2 and Table 1. The roadmap prescribes and Santos (2008) provide a comparison of vari-
a sequential project trajectory in which every phase ous industry standards for execution of data science
should be discussed with the Champion before the projects. From their comparison, we conclude that
project can continue to the next phase. CRISP-DM, which stands for Cross Industry Standard
In these DMAIC phases, data are essential. Without Process for Data Mining, is a widely used and most
data, the importance of the project cannot be estab- comprehensive methodology. Figure 3 and Table 2
lished and demonstrated, and more importantly, the show the phases and the relations of the methodology
effectiveness of improvement actions cannot be proven. as described by Wirth and Hipp (2000). CRISP-DM is
Table . DMAIC phases of a LSS project (De Mast et al. ).
Define Select project and project leader and establish objectives and conditions.
Measure Make the problem quantifiable and measurable.
Analyze Analyze and diagnose the current situation.
Improve Develop and implement improvement actions.
Control Control the improved process performance and close the project.
events of these cases, with the events that are intended

by the traditional DMAIC approach.
Case-study descriptions
We discuss three projects at a large bank in the
Netherlands. The bank is active in the international
market, has about 50,000 employees and serves about
35 million customers (including small- and medium-
sized enterprises).
The bank has a 10-year history with the LSS method-
ology, as it started in 2007 with some pilot projects.
Figure . CRISP-DM phases and their relations for a data science Nowadays, it has its own operational excellence (OpEx)
project.
team. The team consists of more than 30 employees
with a thorough background in LSS, as most of them
a methodology consisting of various phases providing are educated at the level of a Black Belt. The project
structure to data mining projects. Note that there are leaders have also been trained on the job, using a buddy
loops in between some of the six phases, which is dis- system.
tinctive to the five sequential phases prescribed by LSS, In 2014, the organization started to become aware
cf. Figure 2. of the phenomenon called big data, and started to hire
data scientists. It did not take long until these data sci-
entists were also called in to support projects within
Case-study research
the OpEx team. Before they could actually support LSS
We use a qualitative case-study research to address projects, they were asked to follow a short course on an
our research questions, see Eisenhardt (1989). Case- Orange Belt level.
study research is used either inductively or deductively, One of the authors was the OpEx team lead. Within
to study research questions in the field of operations the OpEx team, Black Belts lead the improvement
management. A comprehensive review is given by Bar- projects, and the data scientists function as indispens-
ratt et al. (2011), where also a standard for inductive able team members. Although Black Belts are full-time
research is given. Our justification for doing case-study improvers, a data scientist supports LSS projects only
research lies in the fact that there is limited research on on a part-time basis. Within the OpEx team, Black Belts
how LSS is, and should be, used in conjunction with manage the data scientists on a functional level, but
data science. outside of the team, this is not necessarily so.
Our case-study research revolves around three In the next sections, we discuss the three cases. In
projects that have been well-documented. We believe the selected projects, data science was vital for suc-
that data science played a crucial role for the success cess and DMAIC was used as template to execute
of these projects. We only consider three cases so that the projects. We selected these cases because they are
we can do an in-depth analysis of each case (Voss well documented, they involved data scientists (and
et al. 2002). We collected the documentation of these would have been unsuccessful without them), they
projects and performed three interviews. To ensure use complex modeling, the legal department allowed
correctness we shared our within-case analyses with publication of (some of) the background and results,
the employees involved. In a final stage we compare the and they are judged as successful projects.
Table . CRISP-DM phases of a data science project.
Phase Description
Business understanding Understanding project objectives and requirements from a business perspective.
Data understanding Initial data collection and familiarization with the data. Assess data quality.
Data preparation Construct final dataset from raw data.
Modeling Build, estimate, and assess data models.
Evaluation Evaluate model to be certain it properly achieves the business objectives.
Deployment Implementation of selected model and documentation.
Case I—Increasing the number of private banking different treatment and this resulted in high conver-
customers sion rates from approached prospect to customers. The
final result was approximately 370 new PB customers
In August 2014, a project was initiated to increase the
(yearly).
number of private banking (PB) customers; a PB cus-
tomer is a customer with more than 1 million euro
assets. A benchmark revealed that approximately 10 Case II—Preventing capital outflow
percent of all millionaires in the Netherlands were PB In November 2014 a project, entitled CANDY, was
customers of the bank in case, whereas the overall initiated with the objective to prevent capital outflow of
market share (in mass segment) was substantially the 440,000 so-called personal banking (PerBa) clients.
higher. Therefore, there was an opportunity to grow These clients have between 100,000 and 1,000,000 euro
the (profitable) PB segment. It was decided that the in assets in their bank account. In the current situ-
focus of this project, in order to realize growth, would ation the amount of outflowing money was approxi-
be the acquisition of new PB customers. The goal was mately equal to the amount of inflowing money. Reduc-
to acquire 320 new PB customers on annual basis. The ing the outflow will therefore increase the banks’ assets,
selected quality metric (CTQ) was therefore the num- resulting in significant benefits in terms of interest and
ber of new PB customers. fee income. The selected CTQ was “capital outflow.” To
The current acquisition process was a one size fits all reach the objective of reducing outflow a project team
process, see the macro process description in Figure 4. was formed consisting of two Black Belts, two data sci-
The main problem was that, although bankers reached entists, and a marketer.
out to potential PB customer in their own network, they An extensive amount of time was spent on prepar-
only saw limited opportunities and they had too little ing data, in which databases were connected by means
leads. Key to being successful was to identify a larger of name matching algorithms and missing values
number of higher quality leads. were corrected with imputation techniques. After these
During the Analyze phase, the team, consisting of efforts an initial analysis showed that a substantial part
Black Belts and data scientists, focused on identifying of outflow was hard to influence (because related to
typical profiles of potential new clients. A model was consumption, paying of a mortgage, etc.), but a sig-
built to predict whether a prospect has sufficient assets nificant part of the PerBa clients (18,000) restructured
and/or he is willing to switch. The model involved ana- accounts every year. This means that their total amount
lyzing a vast amount of clients’ behavior to be found in of assets does not change, but the division over banks
internal and external databases. In these large datasets does, leading to regular outflow of assets to competi-
variables and important indicators were obtained from tors. This part of the outflow was determined as the
statistical analysis, but also with the help of private project scope in the Analyze phase of the project.
bankers’ expert knowledge. As a next step, customers (exhibiting this kind
In the improvement step the model was used iter- of outflow) were clustered into ten groups capturing
atively, first the prospects were put into segments and 75% of the outflow. Segmentation of the clients was
next for each segment a different proposition was for- necessary as the total group was too large and very
mulated and tested. As a result three different prospect diverse, rendering any homogenous action to retain
groups were formulated and correspondingly three the capital fruitless. It appeared that customers could
different propositions were developed. Finally, a new be segmented according to their target of outflow. For
process was implemented based on the segmentation instance, some of them typically put their assets at a
of prospects into two groups. Both groups receive a “green” bank, others at investment brokers, etc. The
Figure . Macro process for Case I: description of acquisition PB clients.

destination of the outflow could easily be related to the (2) build a structured data base, and (3) build a file for
reason of outflow. This segmentation was determined analysis.
based on a full year of data which incorporated 96,000 In the Analyze phase, the Black Belt noticed that for
records. some clients, a simple warning might have been enough
Next, small pilots of 500 e-mails and 50 phone because they just forgot to execute the payment. How-
calls for the five largest segments were used to ver- ever, some clients are in serious debt and require a
ify the segmentation and test possible improvement tailored approach. The Improve phase was aimed at
actions. A/B-tests (comparing two alternatives: A vs. finding out which client needs which approach. This
B) were used on the e-mail messages. A number of is a two-step improvement: finding the correct clients
these outflow segments were targeted with a tailor requires correct segmentation of the clients, and find-
made proposition (resulting in small adjustment to ing an effective approach for each of the segments.
both segmentation and proposition). After this test Various techniques were used to classify cus-
period the clustering algorithm was run again, now tomers (naïve Bayes, decision trees, logistic regression,
on 14 months of data and improvement actions were k-nearest neighbors, and random forest). Metrics
implemented. Overall the project resulted in reaching quantifying the detection probability, but also the
out to 10,000 customers per year and ensuring that 23 probabilities of false alarm, like the receiver operating
percent less capital left the bank. characteristic curve (ROC curve), were applied to
assess the performance of these techniques. The best
performance to handle 17 months of historic data in
Case III—Reducing the number of impairments terms of speed and accuracy was attained by using
Between 2006 and 2013, the amount of costs asso- random forests. A number of combinations of seg-
ciated with impairments on the mortgage portfolio mentation and approach have been tried iteratively,
had increased roughly 600 percent. In January 2013, a until a satisfactory client categorization and approach
project was initiated to drastically reduce the number was obtained. In the end a model was built in which 65
of impairments. The selected characteristics were the percent of all customer that require help are identified
percentage of customers in arrears, and the amount of and in which 25 percent of all customer labeled need
the arrear. They decided to follow the LSS approach, help. Moreover, pilots showed that the number of
but included data science elements to build a classifica- impairments was reduced by 34 percent using the new
tion model indicating the extent to which the customer process. Figure 5 shows the old and the new improved
is to become financially self-supporting without help. process timeline.
An improvement team was formed, consisting of a LSS
Black Belt (providing direction, act as team lead), a data
Three differences between data science LSS and
scientist (building the prediction model), a front office
traditional LSS
team (contacting the customers and doing the actual
pilots), and risk (to sanity check and approve of new In order to answer our first research question “What
processes). distinguishes a data science LSS project from a tra-
In the current process, clients are contacted as soon ditional LSS improvement project?” we compare the
as they are 60 days behind on their payments. Usu- three cases, described in the previous section, with each
ally they are contacted only via mail, sometimes they of the elements in the definition of LSS by Schroeder
receive a call. No consistent process was in place. et al. (2008) as stated above. We present our findings
Approaching the customer earlier, than after 60 days, per element.
would require too much manpower. This seems not
efficient because most payment problems resolve them-
Parallel-meso structure: Include data scientists
selves without any action from the bank (natural recov-
ery). For this reason, it seemed logical to build a model The original team, named the OpEx team, has had a
enabling to label customers that need help in a very place in the organization since 2007, and was func-
early stage. First a database needed to be built, this was tioning at a management level; senior management
one in three steps: (1) extract data from various sources, assigns task and projects to the OpEx team. Since
Figure . Timeline old and new process for Case III.
2014, the organization has employed data scientists as data models are important skills that a data scientist has
it became aware of the importance of data. From then in contrast to a LSS professional.
on, there is a weak hierarchy in the OpEx team, where
data scientists support Black Belts in projects. During
the execution of a project they go to the work floor
Structured method: Iterative nature not facilitated
and ask the help and involvement of line management
and line personnel for analysis and implementation of From the case descriptions it follows that data science
improvements. LSS projects go back and forth between the Improve
Therefore, we conclude that the team’s hierarchy par- phase and the Analyze phase. For example, in Case
allels that of the entire organization, where a data sci- I, three iterations were needed from segmentation to
entist works for the Black Belt within the improvement testing of improvements (propositions) before a mean-
team, both are managed by senior management. ingful division into groups was found. Similarly, in
Case II, the project success depended on iteratively seg-
menting the clients based on their cash outflow. And
Improvement specialists: Data scientists managed also Case III shows an iterative project trajectory. As
by Black Belts the Improve phase was aimed at finding out which
In each of the three cases, the projects are led by one clients needed which approach. This was a two-step
or more full-time Black Belts. These are specialized improvement: a number of combinations of segmenta-
employees and have received training in the DMAIC tion and improvements had been tried iteratively, until
method, project management, and organizational a satisfactory client categorization and approach was
politics. These Black Belts are actual improvement obtained.
specialists. The tremendous amount of data, compared to LSS
The data scientists are not only involved in pro- projects that use a small sample, allow to zoom in on
cess improvement, but are used throughout the entire customer groups and propose tailored and effective
organizations. They are thus not full-time improve- actions for each group. This search is described, by
ment specialists. And this is not necessary, as they are the team lead, as an iterative project trajectory, which
managed by Black Belts. From our interviews, it differs greatly from the structured sequential method
appears that it is considered the Black Belt’s job to trans- followed in traditional LSS projects.
late business questions into research questions for the
data scientists. This does not appear to be easy in all
Performance metrics: No differences
cases, as the backgrounds of the project leaders and
data scientists are quite different. No noteworthy differences were observed. The projects
The data scientists are necessary for these projects of the OpEx team, start with finding the correct CTQs
because they possess essential skills to deal with data. belonging to the project objectives that are aligned
Especially, evaluating the three case studies on these with the organization’s strategic focal points. Improve-
skills we observe that data wrangling and skills to deal ment actions are tested before final implementation,
with large volumes of data, the application of vari- and project success is calculated by comparing baseline
ous algorithms and iteratively adjusting and improving and final performance of these metrics.
Table . Observed differences between data science LSS and the traditional elements of Lean Six Sigma.
Elements of Lean Six Sigma Observations based on three cases
Parallel-meso structure Team of data scientists incorporated in parallel structure and data scientist is part of the improvement team
Improvement specialists Data scientists are managed by improvement specialists
Structured method DMAIC does not facilitate iterative nature of data science-driven project
Performance metrics No differences
For example, in the first case in the define phase improvement specialist. And third, DMAIC is a strictly
the objective was to acquire 320 new PB customers sequential methodology and does not facilitate itera-
belonging to the organization goal of expanding their tions embedded in the data mining process.
business. In the control phase it was already shown The observed differences in Table 3 will help us to
that the project exceeded the expectations by 370 new formulate modifications to integrate data science in
PB customers annually. Furthermore, in the other LSS, which we will do so in the next section, thereby
two cases we see that the focus lies on increasing answering our second research question. These modi-
market share and improving their customer portfolio, fications even amount to suggestions for adapting the
translated into performance metrics, capital outflow current LSS framework.
per year, and the number of customers in arrears.
Both projects revolved on getting these performance
Overview: Data science tools used in data science LSS
metrics down, which turned out to be a successful
effort as these metrics were decreased by 23% and 34%, Table 4 provides an overview of the data science tools
respectively. used in data science LSS projects we considered. Based
on the experience of the OpEx team supervisor these
tools are classified in how important this knowledge is
Overview: Differences between LSS and data science
for a Black Belt in this company. For this purpose, we
LSS
distinguish cognitive levels based on the taxonomy of
Table 3 summarizes the findings and thereby answers Bloom (Anderson et al. 2001); in increasing order of
our first research question. We observe three differ- understanding we have: remember, apply, analyze, and
ences between data science LSS projects and the tradi- evaluate.
tional LSS projects organized according to the elements
of LSS defined by Schroeder et al. (2008). First a team
Blending data science and Lean Six Sigma
of data scientists is added to the parallel-meso structure
and these data scientists take place in each improve- In the previous section, we demonstrated that data sci-
ment team. Second, the data scientist is managed by the ence LSS projects differ from traditional LSS projects
Table . Overview of data science tools in LSS projects and its importance for a Black Belt in terms of understanding. Note that the Black
Belt is assisted by data scientists.
Category Tools Importance of knowledge
Data wrangling as part of preparing data Data queries Remember

Data restructuring
Descriptive techniques as part of data analysis Descriptive statistics Evaluate
Boxplot, histogram, control chart
Distributions & probability plots
Clustering as part of finding relevant factors X-means (# groups unknown) Apply
K-means (# groups known)
Principle component analysis Remember
Hierarchical clustering
Topic modeling (text mining)
Regression as part of determining impact Univariate regression Analyze
Multivariate regression Apply
Tree-based methods
Classification as part of determining impact Neural nets Apply
Decision trees
K-nearest neighbors Remember
Naïve Bayes
in three aspects. In this section, we try to formulate

an answer to our second research question; what
modifications are needed in order to make the LSS
methodology compatible for projects based on large
data sets and data science tools?
Data specialists are a necessity in a Lean Six Sigma

organization
The OpEx team modernized its approach to pro-

cess improvement by including data scientists in key
projects, such as the cases described above. According
to the team lead, these data scientists have been impor-
tant in realizing success. Their value to the business
is more general, as demonstrated in scientific litera-
ture. For example, Dahwan et al. (2014) discuss how
combining advanced analytics and lean management
Figure . Organizational structure for a data science improvement
is worthwhile. They stipulate that combining these team.
two will “typically require forming a small team of
econometrics specialists, operations research experts, prescribes a strictly sequential structure, CRISP-DM
and statisticians familiar with the appropriate [data allows an iterative structure between its phases (com-
analytics] tools.” pare Figures 2 and 3).
Indeed, the amount and complexity of statisti- DMAIC is a problem-driven approach in which a
cal knowledge required for a successful data science project is formed to tackle a business problem. Part of
project, surpasses what can be expected from LSS the approach is to make the problem quantifiable and
project leaders who also need a solid basis of project to collect data. CRISP-DM is focused on data mining
management and management skills. The involvement projects in which the data takes center stage from the
of improvement specialists has been identified as one start till the end of the project. We believe that the syn-
of nine critical success factor for contemporary LSS thesis of both approaches will help both fields to be
projects (Arumugam et al. 2014; Easton and Rosen- more valuable. A sensible starting point is to integrate
zweig 2012). The data scientist can, in some sense, be the CRISP-DM roadmap into DMAIC, since DMAIC
seen as an improvement specialist as well, as he/she is is problem driven. In addition, the cases demonstrate
responsible for a part of the improvement project: the that using LSS for these data science projects was bene-
data analysis and modeling. The project leader remains ficial. However, we do need a (large) dataset to start off
responsible for the team and the outcomes, see Figure 6. with. Integrating both approaches results in the follow-
Therefore, the first modification we propose is to the ing Define phase: select project and project leader and
parallel-meso structure. establish objectives and (data) conditions.
Modification I: A pool of data scientists should be In the Measure phase of DMAIC, we make the
available within the parallel LSS organization and data problem quantifiable and measurable. Traditionally, it
scientists are to be included in data science LSS projects means that we relate our project objectives to quan-
as team members. tifiable performance metrics (CTQs) followed by vali-
dating the measurement procedures and collecting the
data. A big difference is now that we do not have to
Unifying two roadmaps CRISP-DM and DMAIC
measure the data ourselves, but instead we have data
A second recommendation concerns the blend- available. Therefore, the choice of CTQs depend criti-
ing of two fields, that is, the field of operations cally on the available data, or it may be approximated
improvement and the field of doing data science. by (a combination of) other metrics. This trouble spot
Both fields have their own proven roadmaps to suc- is exactly at the intersection between business and data
cess, that is, DMAIC and CRISP-DM. Where DMAIC understanding that can be found in the CRISP-DM
approach. As an example consider Case II in which

an initial data analysis helped define the scope of the
project. After the data and business understanding have
converged the data quality is assessed and prepared for
further analysis.
In the Analyze phase of LSS, we diagnose the current
situation in terms of the CTQ(s), which comprehends
the modeling phase in the CRISP-DM roadmap. It can
occur that the model is too general or incorrectly spec-
ified, since we deviate from Six Sigma’s standards for
doing analyses, since we do not control the data gov-
erning process. In such situation where the model has
to be reconsidered, the project leader may have to go
back to the Measure phase to get additional data from
the databases available and/or has to prepare the data Figure . Integration of CRISP-DM in the DMAIC roadmap.
in another way. Finally, a project leader ends up with an
adequate and applicable model in the Analyze phase. Creating an effective improvement environment
Traditionally in LSS the Improve phase consists of
developing and implementing improvement actions. Data scientists reached employment at the bank in
But, a first critical step in the combined approach, is to quite a different way than the Green or Black Belts
evaluate the model in terms of which factors have the did. They have followed a different type of educa-
biggest impact on the CTQ(s). If the vital influence fac- tion (especially regarding management and business
tors are found the focus is shifted to propose interven- versus statistics and mathematics). We have observed
tions or improvement actions to be put in practice. This that the background of these individuals can be miles
is not a trivial exercise, as one of the case interviews apart. Adequate training is important to bridge this gap
demonstrates: “[we] develop and test propositions for ensuring that these people work together effectively,
homogenous groups in at least three iterations. Deter- which leads to another recommendation.
mine the success of groups (identification and pre- Modification III: Ensure that data scientists have,
dictability) and propositions (impact).” The Improve to some extent, knowledge about LSS, and vice versa:
phase of LSS ends with the project leader translating ensure that improvement specialists have, to some
the intervention in an effect on the CTQ(s), guaran- extent, knowledge about data science.
teeing a direct translation to business impact, which is Specifically, depending on the type of training that
depicted in the CRISP-DM roadmap by an arrow back the Belt has received, the project leader might need
to the Business understanding phase. additional training to better understand what a data
Finally, deployment of CRISP-DM resembles that scientist does, that is, a Black Belt should have thor-
the model is implemented and the documentation is ough understanding of the various tools that are avail-
updated accordingly. This is part of the Control phase able and can be used to tackle problems, cf. Table 4. A
of LSS. In Figure 7 we show our comprehensive data
science LSS model and in Figure 8 the iterative nature
in the DMAIC phases. Furthermore, Figure 7 resonates
Table 4, where data preparation can be categorized
under Measure, data analysis and clustering under
Analyze, and regression and classification belongs to
the DMAIC’s Improve phase.
Modification II: Integrate CRISP-DM in the
DMAIC roadmap. Specifically, relax the sequential
DMAIC roadmap and allow for iterations between Figure . Iterative DMAIC roadmap to facilitate data science LSS
various phases as depicted in Figures 7 and 8. projects.
Figure . Selection matrix of solution methods.
high-level course on data cleaning, multivariate tech- available, and are to be utilized for the project, the
niques, and machine learning ensures that project lead- project should be executed according to the iterative
ers can better understand what a data scientist has DMAIC phases and a data scientist should be included
to offer and how such a specialist can contribute. in the project team. Finally we remark that if the com-
This curriculum can be added to the current body of plexity is low, the solution is unknown and there is a
knowledge for Black Belts (see, e.g., ASQ 2017). We lot of data available that a data science analysis would
hypothesize that such a “data science black belt” may likely lead to a solution to the problem.
more effective in leading a successful data science LSS
project. Discussion and conclusion
As team member the data scientist can benefit from
This article discusses three cases of projects were LSS
a Yellow/Orange Belt level course, in which the basic
is used in conjunction with data science. Case-study
principles of DMAIC are explained.
research enabled us to point out differences between
the traditional LSS approach (as defined in Schroeder
Project selection matrix: Differentiation between et al. 2008) and a combined version in which the LSS
traditional LSS and data science LSS approach has been modified to accommodate opera-
tions improvement based on large data sets and data
As we are differentiating between traditional LSS and
science. We provide recommendations to adapt the LSS
data science LSS projects, a guideline to select the
framework. We foresee such that this modified frame-
appropriate methodology is useful. In order to do this
work will dramatically increase the impact of LSS in
we need to define differentiators between these two
modern-day operations improvement and the current
types of projects. We turn to Hoerl and Snee (2013)
era of increasing importance and availability of data. Of
who distinguish between two axes in process innova-
course additional research is needed to study further
tion projects; namely whether a solution for the pro-
generalizability.
cess or problem is known or unknown and whether
the complexity of the problem at hand is high or low.
Three differences between traditional LSS and data
Figure 9 shows these axis, where we follow the repre-
science LSS projects
sentation as provided by Akkerhuis et al. (2015).
In order to differentiate between LSS and data By comparing the three case-study projects to the defi-
science LSS projects a third axis has to be added: the nition of LSS as provided by Schroeder et al. (2008) we
availability of data. In case of low data volumes at observed that LSS projects based on data science differ
hand the traditional DMAIC project structure should regarding three aspects (also see Table 3). First data sci-
be followed. However, if high volumes of data are entists are part of the parallel organizational structure
and within the project team they are managed by the Thomas Akkerhuis has a Ph.D. in industrial statistics. Previ-
improvement specialists (generally the Black Belt). The ously, he was a consultant in Lean Six Sigma and applied statis-
third difference is to the DMAIC roadmap. DMAIC tics at the Institute for Business and Industrial Statistics. Cur-
rently, he works as a consultant in Statistics and Data Science at
is a sequential framework, whereas data science LSS Shell Global Solutions.
projects typically are iterative. Henk de Koning has been employed as a Lean Six Sigma
Master Black Belt since 2011 at ING. Currently he fulfills the
role of team Leader of the OpEx team at ING Domestic Bank.
Practical implications and modifications to the LSS He obtained his Ph.D. at the University of Amsterdam in 2008,
framework where his research was focused on the effectiveness and appli-
cation of Lean Six Sigma.
We recommend organizations that use LSS to improve
operations, and strive to use data as a competitive
advantage, that they should (1) form a data science ORCID
team and incorporate data scientists into project teams.
Also in order to facilitate a common language, (2) Inez M. Zwetsloot http://orcid.org/0000-0002-6144-4188
Alex Kuiper http://orcid.org/0000-0001-8408-4018
the data scientist should be trained in basic lean tools
and the Black Belt should be educated to be a “Data
Scientist Black Belt.” We furthermore recommend to References
reconsider the exact sequential DMAIC phases that are
traditionally used, and (3) propose to allow for iter- Akkerhuis, T. S., C. M. Spanjaard, S. E. Nuijten, O. G. Berendes,
ations between the DMAIC phases as prescribed by and R. J. M. M. Does. 2015. Quality quandaries: realizing
strategic focal points at a business school. Quality Engineer-
Figure 8.
ing 27 (2):267–273.
Anderson, L. W., D. R. Krathwohl, P. Airasian, K. Cruikshank, R.
Mayer, P. Pintrich, and M. Wittrock. 2001. A taxonomy for
Limitations and discussion learning, teaching and assessing: A revision of Bloom’s taxon-
A limitation of this research is that we considered one omy. New York: Longman Publishing.
Antony, J., R. Snee, and R. Hoerl. 2017. Lean Six Sigma: yester-
financial organization and only three cases. We do
day, today and tomorrow. International Journal of Quality &
believe that (part of) the obtained results are generic. Reliability Management 34 (7):1073–1093.
Practitioners in LSS and data science should consider Arumugam, V., J. Antony, and K. Linderman. 2014. A multilevel
this article as a first attempt to support the evolu- framework of Six Sigma: a systematic review of the litera-
tion of the celebrated LSS methodology into the age of ture, possible extensions and future research. Quality Man-
data. agement Journal 21 (4):36–61.
ASQ. 2017. Certified Six Sigma black belt body of knowl-
edge. Available at www.asq.org/cert/six-sigma-black-belt
(accessed 18 July 2017).
About the authors
Auschitzky, E., M. Hammer, and A. Rajagopaul. 2014. How
Inez Zwetsloot obtained her master’s degree (M.Phil.) in econo- big data can improve manufacturing. McKinsey & Com-
metrics from the University of Amsterdam in 2013. Currently, pany. Accessed August 15, 2017. http://www.mckinsey.
she works as an assistant professor at the Systems Engineering com/business-functions/operations/our-insights/how-big-
and Engineering Management department of City University of d ata-can-improve-manufacturing.
Hong Kong. Previously, she worked for the Institute for Busi- Azevedo, A. and M. F. Santos. 2008. KDD, SEMMA and CRISP-
ness and Industrial Statistics as a Lean Six Sigma consultant and DM: a parallel overview. Proceedings of IADIS European
obtained her Ph.D. at the University of Amsterdam. Her current Conference on Data Mining:182–185.
research interests include process monitoring, quality engineer- Barratt, M., T. Y. Choi, and M. Li. 2011. Qualitative case studies
ing, and quality improvement. in operations management: trends, research outcomes, and
Alex Kuiper obtained his master’s degrees in mathematics future research implications. Journal of Operations Manage-
and econometrics at the University of Amsterdam in 2013. He ment 29 (4):329–342.
obtained his Ph.D. at the University of Amsterdam in 2016, Bartlett, R. 2013. A practitioner’s guide to business analytics:
where he has continued to work as an assistant professor. His Using data analysis tools to improve your organization’s deci-
current research focuses on operations management in the sion making and strategy. McGraw Hill Professional, United
application domain of healthcare. Furthermore, he works for the States of America.
Institute for Business and Industrial Statistics as a senior consul- De Koning, H., J. de Mast, R. J. M. M. Does, T. Simons, and S.
tant in Lean Six Sigma. Vermaat. 2008. Generic Lean Six Sigma project definitions
in financial services. The Quality Management Journal 15 on Multiple Criteria Decision Making (72-80). Berlin, Hei-
(4):32–45. delberg: Springer.
De Koning, H., J. P. Verver, J. Heuvel, S. Bisgaard, and R. J. M. Köksal, G., İ. Batmaz, and M. C. Testik. 2011. A review of
M. Does. 2006. Lean Six Sigma in healthcare. Journal for data mining applications for quality improvement in man-
Healthcare Quality 28 (2):4–11. ufacturing industry. Expert systems with Applications 38
De Mast, J., R. J. M. M. Does, and H. De Koning. 2016. Lean Six (10):13448–13467.
Sigma for service and healthcare. Second Edition. Beaumont Lameijer, B., R. J. M. M. Does, and J. De Mast. 2016. Inter-
Quality Publications, The Netherlands. industry generic Lean Six Sigma project definitions. Inter-
Delgado, C., M. Ferreira, and M. Castelo Branco. 2010. The national Journal of Lean Six Sigma 7 (4):369–393.
implementation of LSS in financial services organiza- Manyika, J., M. Chui, B. Brown, J. Bughin, R. Dobbs, C.
tions. Journal of Manufacturing Technology Management 21 Roxburgh, and A. Hung Byers. 2011. McKinsey &
(4):512–523. Company. Big data: The next frontier for innovation,
Dhawan, R., K. Singh, and A. Tuteja 2014. When big data goes competition, and productivity. Accessed August 15, 2017.
lean. McKinsey Quarterly 24 (2):97–105. http://www.mckinsey.com/businessfunctions/digital-mcki
Dutcher, R. 2014. Capgemini, Linking Big Data to Big nsey/our-insights/big-data-the-next910-frontier-for-innov
Process Improvement. An Imperative. Capgemini post. ation.
Accessed August 15, 2017. www.capgemini.com/blog/bpo- Megahed, F. M., and L. A. Jones-Farmer. 2015. Statistical per-
thought-process/2014/03/linking-big-data-to-big-p rocess- spective on ‘big data’. Frontiers in statistical quality control
improvementan-imperative. 11:29–47. Switzerland: Springer International Publishing.
Easton, G. S., and E. D. Rosenzweig. 2012. The role and expe- Oliff, H., and Y. Liu. 2017. Towards Industry 4.0 utilizing data-
rience in six sigma project success: an empirical analysis of mining techniques: a case study on quality improvement.
improvement projects. Journal of Operations Management Procedia CIRP 63:167–172.
30 (7–8):481–493. Pande, P. S., R. P. Neuman, and R. R. Cavanagh. 2000. The six
Eisenhardt, K. 1989. Building theories from case study research. sigma way. United States of America: McGraw-Hill.
The Academy of Management Review 14 (4):532–550. Raje, P. 2007. Six Sigma maturity model. Available at: https://
Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth. 1996. From data www.isixsigma.com/implementation/basics/maturity-mod
mining to knowledge discovery in databases. AI magazine el-describes-stages-six-sigma-evolution/ (accessed on 15
17 (3):37–54. August 2017).
Gaudard, M., P. Ramsey, and M. Stephens. 2009. Interactive data Sagiroglu, S., and D. Sinanc, 2013. Big data: A review. In Collab-
mining informs designed experiments. Quality and Reliabil- oration Technologies and Systems (CTS), IEEE International
ity Engineering International 25 (3):299–315. Conference on CTS, 42–47.
George, M. 2003. Lean Six Sigma for Service. New York, NY: Schroeder, R. G., K. Linderman, C. Liedtke, and A. S. Choo.
McGraw-Hill. 2008. Six Sigma: definition and underlying theory. Journal
Hahn, G. J., W. J. Hill, R. W. Hoerl, and S. A. Zinkgraf. 1999. of Operations Management 26 (4):536–554.
The impact of Six Sigma improvement—a glimpse into Stojanovic, N., M. Dinic, and L. Stojanovic. 2015. Big data pro-
the future of statistics. The American Statistician 53 (3): cess analytics for continuous process improvement in man-
208–215. ufacturing. Big Data, IEEE International Conference on Big
Hoerl, R. W., and R. D. Snee. 2013. One size does not fit all: Data, Santa Clara, CA.
identifying the right improvement methodology. Quality Voss, C., N. Tsikriktsis, and M. Frohlich. 2002. Case research in
Progress 46 (5):48–50. operations management. International Journal of Operations
James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. An & Production Management 22 (2):195–219.
introduction to statistical learning. New York: Springer. Wirth, R., and J. Hipp. 2000. CRISP-DM: Towards a stan-
Jang, G. S., and J. H. Jeon. 2009. A Six Sigma methodology using dard process model for data mining. In Proceedings of
data mining: a case study on Six Sigma project for heat effi- the 4th international conference on the practical applica-
ciency improvement of a hot stove system in a Korean steel tions of knowledge discovery and data mining, AAAI Press,
manufacturing company. In Cutting-Edge Research Topics 29–39.

Lean Six Sigma Meets Data Science - Integrating Two Approaches Based On Three Case Studies

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lean Six Sigma Meets Data Science - Integrating Two Approaches Based On Three Case Studies

Hochgeladen von

Copyright:

Verfügbare Formate

Quality Engineering

ISSN: 0898-2112 (Print) 1532-4222 (Online) Journal homepage: http://www.tandfonline.com/loi/lqen20

Lean Six Sigma meets data science: Integrating

Inez M. Zwetsloot, Alex Kuiper, Thomas S. Akkerhuis & Henk de Koning

To link to this article: https://doi.org/10.1080/08982112.2018.1434892

Accepted author version posted online: 05

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

However, the amount of data is usually limited since

events of these cases, with the events that are intended

Figure . Macro process for Case I: description of acquisition PB clients.

Figure . Timeline old and new process for Case III.

Data wrangling as part of preparing data Data queries Remember

in three aspects. In this section, we try to formulate

Data specialists are a necessity in a Lean Six Sigma

The OpEx team modernized its approach to pro-

approach. As an example consider Case II in which

Figure . Selection matrix of solution methods.

Das könnte Ihnen auch gefallen