Beruflich Dokumente
Kultur Dokumente
(a) Consider the data as two-dimensional data points. Given a new data point,
x=(1.4, 1.6) as a query, rank the database points based on similarity with the
query using (1) Euclidean distance, and (2) cosine similarity
(b) Normalize the data set to make the norm of each data point equal to 1. Use
Euclidean distance on the transformed data to rank the data points.
5. Briefly compare the following concepts. You may use an example to explain
your point(s).
(a) Snowflake schema, fact constellation, starnet query model
(b) Data cleaning, data transformation, refresh
(c) Enterprise warehouse, data mart, virtual warehouse
6. What are the differences between the three main types of data warehouse usage:
information processing, analytical processing, and data mining? Discuss the
motivation behind OLAP mining (OLAM).
7. What is Principal Component Analysis (PCA)? Please describe the relationship
between PCA and Singular Value Decomposition (SVD).
n
2
d (i, j ) = ∑ ( xik − x jk )
k =1
where xt is a transposition of vector x, ||x|| is the Euclidean norm of vector x, ||y|| is the
Euclidean norm of vector y.