CH 16

Big Data, Data Mining, and Machine Learning: Value Creation for Business
Leaders and Practitioners. Jared Dean.

2014 SAS Institute Inc. Published 2014 by John Wiley & Sons, Inc.
C HA P T E R 16
Case Study
of a High
Tech Product
Manufacturer
S
emiconductor wafer fabrication is very competitive. Companies
compete on cost, quality, and delivery time to market. In the age
of digital information, a large amount of data (e.g., process data,
equipment data, and lots of historical data) has been automatically or
semiautomatically collected, recorded, and accumulated for monitor-
ing the process, diagnosing faults, and managing the manufacturing
process. Decision makers may potentially use the information buried
in the raw data to assist their decisions through data mining for possibly
identifying the specific patterns of the data.
However, in the hightech industry of semiconductor manufac-
turing, many factors that are interrelated affect the yield of fabri-
cated wafers. Engineers who rely on specific domain knowledge
cannot find possible root causes of defects rapidly and effectively.
(See Figure 16.1.)
229
230 BIG DATA, DATA MINING, AND MACHINE LEARNING
Figure16.1 Time Series Similarity
In addition, when conducting similarity analysis for the time se-

ries data, the company also needs to identify and report separately the
similarity according to the definable time windows; users are allowed
to adjust and define these time windows when necessary. For exam-
ple, in Figure 16.1, the similarity of time window 1, 5, 6 in time series
A has a stronger similarity pattern than the others. Companies can
decompose these time series similarities in different time windows for
different machines.
HANDLING THE MISSING DATA
The collected semiconductor data often include missing and inconsis-

tent data. Depending on the sources of the data, some of the yields
in some processes are captured and measured only by sampling (for
particular LOT ID). And some of the yields measured with particular
LOTs which will be combined or split to other LOTs in the manufactur-
ing processes.
Data preprocessing is required to improve the quality of the data
and facilitate efficient data mining tasks. In particular, missing values
are replaced or deleted; and in addition to some statistical way of miss-
ing data imputation, business rules are also adopted for handling the
CASE STUDY OF A HIGH-TECH PRODUCT MANUFACTURER 231
missing data. Some of the new variables combining machine number

with date are generated.
The causal relationships between the machines of specific processes
and the yield rate are examined and investigated by engineers. Different
statistical methods are used to identify possible root causes (e.g., the spe-
cific machine where the failure occured). Semiconductor companies have
been using decision trees to identify the root causes of the equipment path
and other vital variables. By running the >2000 times including >5million
devices data, the target variable is the equipment condition (good or not
good), based on the equipment path (e.g., equipment ID, Step ID. . .).
In addition to the decision tree models for root cause analysis, the
company has been exploring and validating different techniques for
identification of root causes, including partial least square regression
and gradient boosting. In the latest proof of concept, it requested consid-
eration of the time dimension; the time series similarity matrix was used
to investigate the pattern of processes (and machines) that leads to low
yields. Clustering to automatically choose the best number of cluster
groups can be done using cubic clustering criterion to estimate the num-
ber of clusters using Wards minimum variance method, kmeans, or
other methods based on minimizing the withincluster sum of squares.
Requirements focused on analyzing/processing events in motion
for the machine/equipmentprocessed data. Continuous queries were
made on data in motion (with incrementally updated results) and with
the high volumes (more than 100,000 events/sec) stream processing
was used to identify events via the time series similarity or kernel
library events.
APPLICATION BEYOND MANUFACTURING
After successful results are realized in the manufacturing division for

the company, methods and techniques will be rolled out to other parts
of the company to allow them to take advantage of the big data ana-
lytic architecture. Some examples of planned projects are:
Usage pattern analysis with log data from manufactured elec-

tronic devices
232 BIG DATA, DATA MINING, AND MACHINE LEARNING
Setup strategy in business to consumer device with big data

Social media analysis
Brand reputation
Monitoring new trends in technology
Call center performance
Retail market quality
Claim risk monitoring

CH 16

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CH 16

Hochgeladen von

Copyright:

Verfügbare Formate

Big Data, Data Mining, and Machine Learning: Value Creation for Business

Leaders and Practitioners. Jared Dean.

Figure16.1 Time Series Similarity

In addition, when conducting similarity analysis for the time se-

HANDLING THE MISSING DATA

The collected semiconductor data often include missing and inconsis-

missing data. Some of the new variables combining machine number

APPLICATION BEYOND MANUFACTURING

After successful results are realized in the manufacturing division for

Usage pattern analysis with log data from manufactured elec-

Setup strategy in business to consumer device with big data

Das könnte Ihnen auch gefallen