DWDM Bit Questions

UNIT-1
1) A data warehouse is usually constructed by integrating multiple heterogeneous sources.
2) An OLTP system is customer-oriented and is used for transaction and query processing by clerks,
clients, and information technology professionals.
3) An OLAP system is market-oriented and is used for data analysis by knowledge workers, including
managers, executives, and analysts.
4) Data warehouses often adopt a three-tier architecture.
5) Data warehouses and OLAP tools are based on a multidimensional data model.
6) A data cube allows data to be modeled and viewed in multiple dimensions.
7) Additive facts are facts that can be summed up through all of the dimensions in the fact table.
8) Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but
not the others.
9) Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact
table.
10) A factless fact table is a fact table that does not have any measures.
11) The slice operation performs a selection on one dimension of the given cube, resulting in a subcube.
12) The dice operation defines a subcube by performing a selection on two or more dimensions.
13) Pivot is a visualization operation that rotates the data axes in view in order to provide an alternative
presentation of the data.
14) Relational OLAP servers are the intermediate servers that stand in between a relational back-end server
and client front-end tools.
15) Multidimensional OLAP servers support multidimensional views of data through array-based
multidimensional storage engines.
16) The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater
scalability of ROLAP and the faster computation of MOLAP.
17) Each dimension has a table associated with it, called a dimension table.
18) Facts are numerical measures.
19) In data warehousing the data cube is n-dimensional.

20) The cuboid that holds the lowest level of summarization is called the base cuboid.
21) An itemset X is closed in a data set S if there exists no proper super-itemset Y such that Y has the same
support count as X in S.
22) An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S.
23) An itemset X is a maximal frequent itemset (or max-itemset) in set S if X is frequent, and there exists
no super-itemset Y such that X ⸦Y and Y is frequent in S.
24) OLAP contains historical data.
25) Fact constellation schema contains multiple fact tables.
26) In snow flake schema the dimension tables are normalized.
27) Each dimension has a table associated with it, called a dimension table.
28) OLTP stands for online transaction processing.
29) OLAP stands for online analytical processing.
30) The cuboid that holds the lowest level of summarization is called the base cuboid.
31) The 0-D cuboid, which holds the highest level of summarization, is called the apex cuboid.
UNIT-II
1) Data cleaning can be applied to remove noise and correct inconsistencies in the data.
2) Data integration merges data from multiple sources into a coherent data store, such as a data warehouse.
3) Data reduction can reduce the data size by aggregating, eliminating redundant features, or clustering,
for instance.
4) Data discretization is a form of data reduction that is very useful for the automatic generation of concept
hierarchies from numerical data.
5) A distributive measure is a measure that can be computed for a given data set by partitioning the data
into smaller subsets.
6) An algebraic measure is a measure that can be computed by applying an algebraic function to one or
more distributive measures.
7) A holistic measure is a measure that must be computed on the entire data set as a whole.
8) Noise is a random error or variance in a measured variable.
9) Binning methods smooth a sorted data value by consulting its “neighborhood,” that is, the values
around it.
10) The first step in data cleaning as a process is discrepancy detection.
11) Data scrubbing tools use simple domain knowledge (e.g., knowledge of postal addresses, and spell-
checking) to detect errors and make corrections in the data.
12) Data auditing tools find discrepancies by analyzing the data to discover rules and relationships, and
detecting data that violate such conditions.
13) In data transformation, the data are transformed or consolidated into forms appropriate for mining.
14) Smoothing is used to remove noise from the data. Such techniques include binning, regression, and
clustering.
15) Normalization is a process where the attribute data are scaled so as to fall within a small specified
range.
16) Data cube aggregation is a process where aggregation operations are applied to the data in the
construction of a data cube.

17) Attribute subset selection is a process where irrelevant, weakly relevant, or redundant attributes or
dimensions may be detected and removed.
18) Dimensionality reduction is a process where encoding mechanisms are used to reduce the data set size.
19) Numerosity reduction is a process where the data are replaced or estimated by alternative, smaller data
representations.
20) If the discretization process uses class information, then it is supervised discretization.
21) If the discretization process does not use class information, then it is unsupervised discretization.
22) Binning is a top-down splitting technique based on a specified number of bins.
23) Data mining refers to extracting or “mining” knowledge from large amounts of data.
24) The similarity between two objects is a numeral measure of the degree to which the two objects are
alike.
25) The dissimilarity between two objects is the numerical measure of the degree to which the two objects
are different.
26) The term proximity is used to refer to either similarity or dissimilarity.
27) In regression, Data can be smoothed by fitting the data to a function.
28) Linear regression involves finding the “best” line to fit two attributes, so that one attribute can be used
to predict the other.
29) Clustering is a method where similar values are organized into groups, or “clusters.”
30) Outliers are the values which does not belong to any cluster.
UNIT-III
1) Frequent patterns are patterns (such as itemsets, subsequences, or substructures) that appear in a data
set frequently.
2) a set of items, that appear frequently together in a transaction data set is a frequent itemset.
3) A set of items is referred as itemset.
4) If a set cannot pass a test, all of its supersets will fail the same test as well. It is called antimonotone.
5) Join and Prune steps are the part of Apriori algorithm.
6) Association rules are considered interesting if they satisfy both a minimum support threshold and a
minimum confidence threshold.
7) The Boolean values can be analyzed for buying patterns that reflect items that are frequently associated
or purchased together. These patterns can be represented in the form of association rules.
8) The occurrence frequency of an itemset is the number of transactions that contain the itemset.
9) An itemset X is closed in a data set S if there exists no proper super-itemset Y such that Y has the same
support count as X in S.
10) An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S.
11) An itemset X is a maximal frequent itemset (or max-itemset) in set S if X is frequent, and there exists
no super-itemset Y such that X Y and Y is frequent in S.
12) If the items or attributes in an association rule reference only one dimension, then it is a single-
dimensional association rule.
13) If, the rules within a given set do not reference items or attributes at different levels of abstraction, then
the set contains single-level association rules.
14) If a rule references two or more dimensions, such as the dimensions age, income, and buys, then it is
a multidimensional association rule.
15) If a rule involves associations between the presence or absence of items, it is a Boolean association
rule.
16) If a rule describes associations between quantitative items or attributes, then it is a quantitative
association rule.
17) Finding frequent itemsets without candidate generation is done in FP-growth algorithm.
UNIT-IV
1) Classification predicts categorical (discrete, unordered) labels, prediction models continuous valued
functions.
2) The accuracy of a classifier on a given test set is the percentage of test set tuples that are correctly
classified by the classifier.
3) Speed refers to the computational costs involved in generating and using the given classifier or
predictor.
4) Robustness is the ability of the classifier or predictor to make correct predictions given noisy data or
data with missing values.
5) Scalability refers to the ability to construct the classifier or predictor efficiently given large amounts
of data.
6) Interpretability refers to the level of understanding and insight that is provided by the classifier or
predictor.
7) Decision tree induction is the learning of decision trees from class-labeled training tuples.
8) A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a
test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal
node) holds a class label.
9) Bayesian classifiers are statistical classifiers.
10) Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent
of the values of the other attributes. This assumption is called class conditional independence.
11) P(H|X) is the posterior probability, of H conditioned on X.
12) P(H) is the prior probability, or a priori probability, of H.
13) P(X|H) is the posterior probability of X conditioned on H.
14) P(X) is the prior probability of X.
15)
UNIT-V
1) The process of grouping a set of physical or abstract objects into classes of similar objects is called
clustering.
2) A cluster is a collection of data objects that are similar to one another within the same cluster.
3) A cluster is a collection of data objects that are dissimilar to the objects in other clusters.
4) Clustering is also called data segmentation in some applications because clustering partitions large data
sets into groups according to their similarity.
5) Given a database of n objects or data tuples, a partitioning method constructs k partitions of the data,
where each partition represents a cluster.
6) A hierarchical method creates a hierarchical decomposition of the given set of data objects.

DWDM Bit Questions

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

DWDM Bit Questions

Hochgeladen von

Copyright:

Verfügbare Formate

UNIT-1

1) A data warehouse is usually constructed by integrating multiple heterogeneous sources.

clients, and information technology professionals.

managers, executives, and analysts.

4) Data warehouses often adopt a three-tier architecture.

6) A data cube allows data to be modeled and viewed in multiple dimensions.

not the others.

presentation of the data.

and client front-end tools.

multidimensional storage engines.

scalability of ROLAP and the faster computation of MOLAP.

18) Facts are numerical measures.

19) In data warehousing the data cube is n-dimensional.

no super-itemset Y such that X ⸦Y and Y is frequent in S.

24) OLAP contains historical data.

25) Fact constellation schema contains multiple fact tables.

26) In snow flake schema the dimension tables are normalized.

28) OLTP stands for online transaction processing.

29) OLAP stands for online analytical processing.

hierarchies from numerical data.

into smaller subsets.

more distributive measures.

8) Noise is a random error or variance in a measured variable.

10) The first step in data cleaning as a process is discrepancy detection.

checking) to detect errors and make corrections in the data.

detecting data that violate such conditions.

construction of a data cube.

dimensions may be detected and removed.

22) Binning is a top-down splitting technique based on a specified number of bins.

26) The term proximity is used to refer to either similarity or dissimilarity.

27) In regression, Data can be smoothed by fitting the data to a function.

to predict the other.

3) A set of items is referred as itemset.

5) Join and Prune steps are the part of Apriori algorithm.

minimum confidence threshold.

no super-itemset Y such that X Y and Y is frequent in S.

dimensional association rule.

the set contains single-level association rules.

a multidimensional association rule.

classified by the classifier.

data with missing values.

node) holds a class label.

9) Bayesian classifiers are statistical classifiers.

11) P(H|X) is the posterior probability, of H conditioned on X.

12) P(H) is the prior probability, or a priori probability, of H.

13) P(X|H) is the posterior probability of X conditioned on H.

14) P(X) is the prior probability of X.

sets into groups according to their similarity.

where each partition represents a cluster.

Das könnte Ihnen auch gefallen