Sie sind auf Seite 1von 2

1. Discuss thoroughly the concept of data mining. What is data mining?

In your answer address


the following:

Its relationship to databases, statistics, machine learning, pattern recognition and other
technologies
Data mining – is the practice of automatically searching large stores of data to
discover patterns and trends that go beyond simple analysis. Data mining uses
sophisticated mathematical algorithms to segment the data and evaluate the
probability of future events. Data mining is also known as Knowledge Discovery
in Data (KDD).

The key properties of data mining are:

 Automatic discovery of patterns


 Prediction of likely outcomes
 Creation of actionable information
 Focus on large data sets and databases

Data mining – can answer questions that cannot be addressed through simple
query and reporting techniques.

Automatic Discovery
Data mining is accomplished by building models. A model uses an algorithm to
act on a set of data. The notion of automatic discovery refers to the execution of
data mining models.

- Data mining models can be used to mine the data on which they are built, but
most types of models are generalizable to new data. The process of applying a
model to new data is known as scoring.

2.
3. Define each of the following data mining functionalities :
characterization, discrimination, association and correlation analysis,
classification, regression, clustering, and outlier analysis.

ANSWER

Characterization: Can be done by summarizing general characteristics of a population


or sample.

 Characteristics of Users that like to play games of mobile phones and Users that
dont play games on mobile phones.
Discrimination: Compares general characteristics of the target population or sample
with a contrary population or sample to the target one.

 Users that like to use their mobile phones in Loud and users that like to use their
phone is silent.

Correlation/Association Analysis: Is determine by the association of attribute and/or


values that have a common frequency in a given data set.

 A survey showing that people that own two smartphones also own a Home
theater system.

Classification: Is can be define as a set of models/functions that describe/distinguish


different types of data and concepts

 When a data set is missing numerical values we can still use classification
methods to gain insides of that particular data set.

Regression: Is an statistical process for estimating the relationships among variables.


There are many techniques for modeling and analyzing multiple variables (Dependent
VS Independent variable).

 We can use regression when calculating the mileage per gallon of a new hybrid
car.

Clustering: I a way to analyzes data objects disregarding preconceive ideas about it. The
data is can be clustered/grouped in different ways one is by maximizing the intraclass
similarity and one other by minimizing the interclass similarity.

 In marketing research cluster analysis is widely as it helps to segment consumers


into groups making the analysis process more focus and reliable.

Outlier analysis: Is an observation that deviates drastically from the other observations
in the population/sample.

 An outlier may indicate bad data, either a mistake made during the data entry of
software problems also outliers may be due to random variation in the sample
and in some cases they are very important in order to prove or disprove theories.

4.

Das könnte Ihnen auch gefallen