Sie sind auf Seite 1von 7

UNIT-1

1) A data warehouse is usually constructed by integrating multiple heterogeneous sources.

2) An OLTP system is customer-oriented and is used for transaction and query processing by clerks,

clients, and information technology professionals.

3) An OLAP system is market-oriented and is used for data analysis by knowledge workers, including

managers, executives, and analysts.

4) Data warehouses often adopt a three-tier architecture.

5) Data warehouses and OLAP tools are based on a multidimensional data model.

6) A data cube allows data to be modeled and viewed in multiple dimensions.

7) Additive facts are facts that can be summed up through all of the dimensions in the fact table.

8) Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but

not the others.

9) Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact

table.

10) A factless fact table is a fact table that does not have any measures.

11) The slice operation performs a selection on one dimension of the given cube, resulting in a subcube.

12) The dice operation defines a subcube by performing a selection on two or more dimensions.

13) Pivot is a visualization operation that rotates the data axes in view in order to provide an alternative

presentation of the data.

14) Relational OLAP servers are the intermediate servers that stand in between a relational back-end server

and client front-end tools.

15) Multidimensional OLAP servers support multidimensional views of data through array-based

multidimensional storage engines.

16) The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater

scalability of ROLAP and the faster computation of MOLAP.

17) Each dimension has a table associated with it, called a dimension table.

18) Facts are numerical measures.

19) In data warehousing the data cube is n-dimensional.


20) The cuboid that holds the lowest level of summarization is called the base cuboid.

21) An itemset X is closed in a data set S if there exists no proper super-itemset Y such that Y has the same

support count as X in S.

22) An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S.

23) An itemset X is a maximal frequent itemset (or max-itemset) in set S if X is frequent, and there exists

no super-itemset Y such that X ⸦Y and Y is frequent in S.

24) OLAP contains historical data.

25) Fact constellation schema contains multiple fact tables.

26) In snow flake schema the dimension tables are normalized.

27) Each dimension has a table associated with it, called a dimension table.

28) OLTP stands for online transaction processing.

29) OLAP stands for online analytical processing.

30) The cuboid that holds the lowest level of summarization is called the base cuboid.

31) The 0-D cuboid, which holds the highest level of summarization, is called the apex cuboid.
UNIT-II

1) Data cleaning can be applied to remove noise and correct inconsistencies in the data.

2) Data integration merges data from multiple sources into a coherent data store, such as a data warehouse.

3) Data reduction can reduce the data size by aggregating, eliminating redundant features, or clustering,

for instance.

4) Data discretization is a form of data reduction that is very useful for the automatic generation of concept

hierarchies from numerical data.

5) A distributive measure is a measure that can be computed for a given data set by partitioning the data

into smaller subsets.

6) An algebraic measure is a measure that can be computed by applying an algebraic function to one or

more distributive measures.

7) A holistic measure is a measure that must be computed on the entire data set as a whole.

8) Noise is a random error or variance in a measured variable.

9) Binning methods smooth a sorted data value by consulting its “neighborhood,” that is, the values

around it.

10) The first step in data cleaning as a process is discrepancy detection.

11) Data scrubbing tools use simple domain knowledge (e.g., knowledge of postal addresses, and spell-

checking) to detect errors and make corrections in the data.

12) Data auditing tools find discrepancies by analyzing the data to discover rules and relationships, and

detecting data that violate such conditions.

13) In data transformation, the data are transformed or consolidated into forms appropriate for mining.

14) Smoothing is used to remove noise from the data. Such techniques include binning, regression, and

clustering.

15) Normalization is a process where the attribute data are scaled so as to fall within a small specified

range.

16) Data cube aggregation is a process where aggregation operations are applied to the data in the

construction of a data cube.


17) Attribute subset selection is a process where irrelevant, weakly relevant, or redundant attributes or

dimensions may be detected and removed.

18) Dimensionality reduction is a process where encoding mechanisms are used to reduce the data set size.

19) Numerosity reduction is a process where the data are replaced or estimated by alternative, smaller data

representations.

20) If the discretization process uses class information, then it is supervised discretization.

21) If the discretization process does not use class information, then it is unsupervised discretization.

22) Binning is a top-down splitting technique based on a specified number of bins.

23) Data mining refers to extracting or “mining” knowledge from large amounts of data.

24) The similarity between two objects is a numeral measure of the degree to which the two objects are

alike.

25) The dissimilarity between two objects is the numerical measure of the degree to which the two objects

are different.

26) The term proximity is used to refer to either similarity or dissimilarity.

27) In regression, Data can be smoothed by fitting the data to a function.

28) Linear regression involves finding the “best” line to fit two attributes, so that one attribute can be used

to predict the other.

29) Clustering is a method where similar values are organized into groups, or “clusters.”

30) Outliers are the values which does not belong to any cluster.
UNIT-III

1) Frequent patterns are patterns (such as itemsets, subsequences, or substructures) that appear in a data

set frequently.

2) a set of items, that appear frequently together in a transaction data set is a frequent itemset.

3) A set of items is referred as itemset.

4) If a set cannot pass a test, all of its supersets will fail the same test as well. It is called antimonotone.

5) Join and Prune steps are the part of Apriori algorithm.

6) Association rules are considered interesting if they satisfy both a minimum support threshold and a

minimum confidence threshold.

7) The Boolean values can be analyzed for buying patterns that reflect items that are frequently associated

or purchased together. These patterns can be represented in the form of association rules.

8) The occurrence frequency of an itemset is the number of transactions that contain the itemset.

9) An itemset X is closed in a data set S if there exists no proper super-itemset Y such that Y has the same

support count as X in S.

10) An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S.

11) An itemset X is a maximal frequent itemset (or max-itemset) in set S if X is frequent, and there exists

no super-itemset Y such that X Y and Y is frequent in S.

12) If the items or attributes in an association rule reference only one dimension, then it is a single-

dimensional association rule.

13) If, the rules within a given set do not reference items or attributes at different levels of abstraction, then

the set contains single-level association rules.

14) If a rule references two or more dimensions, such as the dimensions age, income, and buys, then it is

a multidimensional association rule.

15) If a rule involves associations between the presence or absence of items, it is a Boolean association

rule.

16) If a rule describes associations between quantitative items or attributes, then it is a quantitative

association rule.

17) Finding frequent itemsets without candidate generation is done in FP-growth algorithm.
UNIT-IV

1) Classification predicts categorical (discrete, unordered) labels, prediction models continuous valued

functions.

2) The accuracy of a classifier on a given test set is the percentage of test set tuples that are correctly

classified by the classifier.

3) Speed refers to the computational costs involved in generating and using the given classifier or

predictor.

4) Robustness is the ability of the classifier or predictor to make correct predictions given noisy data or

data with missing values.

5) Scalability refers to the ability to construct the classifier or predictor efficiently given large amounts

of data.

6) Interpretability refers to the level of understanding and insight that is provided by the classifier or

predictor.

7) Decision tree induction is the learning of decision trees from class-labeled training tuples.

8) A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a

test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal

node) holds a class label.

9) Bayesian classifiers are statistical classifiers.

10) Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent

of the values of the other attributes. This assumption is called class conditional independence.

11) P(H|X) is the posterior probability, of H conditioned on X.

12) P(H) is the prior probability, or a priori probability, of H.

13) P(X|H) is the posterior probability of X conditioned on H.

14) P(X) is the prior probability of X.

15)
UNIT-V

1) The process of grouping a set of physical or abstract objects into classes of similar objects is called

clustering.

2) A cluster is a collection of data objects that are similar to one another within the same cluster.

3) A cluster is a collection of data objects that are dissimilar to the objects in other clusters.

4) Clustering is also called data segmentation in some applications because clustering partitions large data

sets into groups according to their similarity.

5) Given a database of n objects or data tuples, a partitioning method constructs k partitions of the data,

where each partition represents a cluster.

6) A hierarchical method creates a hierarchical decomposition of the given set of data objects.

Das könnte Ihnen auch gefallen