Sie sind auf Seite 1von 5

Reporting Service Document Join

Behavior

inShare

Report Services Compound Join Behavior


When multiple dataset reports are used in a Report Services document, the datasets are
joined using a strategy termed compound join by MicroStrategy. The exact join
behavior depends on the relative dimensionalities of the datasets. Dimensionality in
this context refers to the attributes that are present on the template and in the Report
Objects window of each dataset.

Four possible combinations are noted in the MicroStrategy Document Creation Guide
product manual.

Scenario

Defining characteristics
Join behavior
Datasets are at the same level
Case 1:Same(dimensionality), and the unique set
attributes,
of attribute elements for each
same result dataset is the same (for instance, if
elements
the same filter is used in each).
Datasets are at the same level
The join in these cases behaves
(dimensionality), but some attribute exactly like a database full outer
Case 2:Sameelements in one or more datasets I join. All attribute elements will b
attributes,
cannot be found in other datasets. preserved, with null values for
different
The filters may be different, or the metrics where attribute element
result
reports may be using different fact in one dataset do not match
elements
tables with sparse data.
another.
Case
Datasets have different
This case also behaves like a
3:Dataset
dimensionalities, and one
database full outer join. Multiple
with a
dimensionality completely overlaps rows of the lower-level
superset of the other: e.g., Category and
dimensionality match a single ro
attributes in Region in one dataset versus Region in the higher; so one metric valu
another
in the other. {Category, Region} is from the single row will be
dataset
a superset of {Region}. Useful for replicated across the multiple row

Case
4:Different
attributes

percent-to-total calculations.
Considering two datasets, one
dataset has attributes present in
the other and vice versa; e.g.,
Category and Region in one dataset
versus Quarter and Region in the
other. Neither datasets
dimensionality is a superset of the
others.

that match it in the lower-level


dataset.

Compound join is used. Elemen


of common attributes are match
between the datasets. There is n
a clear, predefined relationship
between elements of attributes
that are not held in common.

Cases 1-3 are straightforward and should be familiar to users of relational databases.
Case 4 can return results that are not expected.

In Case 4, it may be helpful to compare the compound join behavior to the


MicroStrategy SQL Generation Engine behavior when joining unrelated attributes. In
data modeling terms, the proper relationship between unrelated attributes, in the
absence of facts that can serve as an intersection between unrelated hierarchies, is a
cross join. In fact, the MicroStrategy SQL Generation Engine uses a cross join to handle a
situation analogous to Case 4: namely, a Downward outer join to outer join metric
result sets with different dimensionalities.

In a cross join, the relationship among the attribute elements can be considered
arbitrary: when every element of one attribute is matched to every element of the
other, no particular meaning can be inferred from the fact that element 1 of attribute A
appears alongside element 2 of attribute B.

A cross join will produce a final result with many more rows than either source table has
individually. This is an undesirable outcome for Report Services documents, because
repeated elements for a grouping attribute would result in entire sections of the
document being repeated. For this reason, an algorithm was chosen that would
eliminate redundancies in the joined attribute element set.

A Case 4 compound join between two datasets (A and B) takes place according to the
following general methodology:

1.

One row from dataset A is matched to one row from dataset B.

If the two datasets have any common attributes, common elements will be
matched.
2.
Once the rows are paired up, they are no longer considered for future matches.
3.
If one dataset has more rows than the other, the remaining rows will be added to
the compound join result with null values for metrics and/or attributes coming from
the smaller datasets. This holds true for:

Dataset slices on both sides that have common attribute elements.

The datasets considered as a whole.


For example, consider the following two datasets that have the following properties:

The attributes Category and Quarter are not held in common.


The common attribute, Region, has two elements in one dataset and only one in
the other.

Region
Northeast
Northeast
Northeast
Northeast
Mid-Atlantic
Mid-Atlantic
Mid-Atlantic
Mid-Atlantic

Category
Books
Electronics
Movies
Music
Books
Electronics
Movies
Music

Revenue
9093
1550784
387667
387320
13578
2281847
557250
560665

Profit
2412
423535
95900
42732
3630
623124
137923
61692

Region
Northeast
Northeast
Northeast
Northeast
Northeast
Northeast
Northeast
Northeast

Quarter
2003 Q1
2003 Q2
2003 Q3
2003 Q4
2004 Q1
2004 Q2
2004 Q3
2004 Q4

Units Sold
6444
8590
5083
9251
4221
8070
5147
9942

When these datasets are used in a Report Services document, the results are as follows:

Some observations can be made about this result:

For the common attribute, Region, the Northeast and Mid-Atlantic elements are
not mixed in the result.
There are more Categories than Quarters; thus some of the Quarters do not have
corresponding Category values.
The mid-Atlantic Region does not have any corresponding values in the {Region,
Quarter} dataset; thus it has no Quarter or Units Sold values.
This result has the smallest number of rows to capture the data from the two
datasets. By contrast, if a cross join were used to combine the unrelated attributes
Category and Quarter, the Northeast region would have 32 rows (48); each Revenue
value would be repeated eight times and each Units Sold value would be repeated four
times.

If a cross join were used, the user could infer nothing from the fact that Books
sits alongside 2003 Q1. The same is true of the compound join.
Note:
In Case 4, there may be relationships in the schema between attributes that are not in
common between the two datasets, but those attribute relationships are not considered
when resolving the compound join. The data relationships must be present in the
datasets as given to the Report Services document. For example, in the case of datasets
at the level of {Region, Category} and {Region, Subcategory}, where Category is a
parent of Subcategory, the only way to preserve the category-subcategory data
relationship is to run an additional query against the warehouse. By design, this is not
part of the Report Services execution flow. (If the datasets come from different data
sources, there is no relationship table to poll.)

If, however, the lower-level dataset includes the Category-Subcategory relationship,


then the compound join algorithm has the information it needs to produce the expected
result. The datasets would have to be at the levels of {Region, Category} and {Region,
Category, Subcategory}. This means that the join scenario is no longer Case 4; it
becomes Case 3 (superset), which can be processed in the same way that a database
would.

Best Practices
The Report Services compound join functions best when the following conditions are
met:

The primary dataset is at the lowest level dimensionality (that is, the finest data
granularity).

The primary datasets dimensionality is the same as, or is a superset of, every
other datasets dimensionality.

The other datasets in the document do not introduce attributes that are not
present in the primary dataset.
Under these conditions, every dataset join will fall into Cases 1-3.

It may not be necessary to meet these conditions in every documents. If dashboards


keep their datasets in distinct grid and graph objects, without presenting data in the
documents Detail section, the compound join is likely to remain transparent to the enduser, even in Case 4. However, if data from multiple datasets are combined in the Detail
section, users are advised to adhere to these recommendations for maximum data
consistency.

Das könnte Ihnen auch gefallen