Sie sind auf Seite 1von 3

Why do join types matter in a SAP

HANA information view


By Jonathan Haun

5 Votes

What are we joining?

When you design an SAP HANA information view, we must also define joins between tables or
attribute views. These joins can be configured in the foundation of attribute views and analytic
views. They can also logically occur in analytic and calculation views.

Why do joins matter?

In some cases, joining tables can be an expensive process in the SAP HANA engines. This is
especially true when the cardinality between tables is very high. For example, an order header table
with 20 million rows is joined to an order line detail table with 100 million rows. Excessive joining in
the model is also something that can lead to performance issues. Excessive joining is something that
occurs often when we are attempting to denormalized OLTP data using SAP HANA’s information
views.

What is the solution to work around join cost?

Join pruning is a process where SAP HANA eliminates both tables and joins from its execution plan
based how the joins are defined in the information view and how data is queried from the
information view.

For example, let’s assume that we have a customer table or attribute view configured with a left
outer join to a sales transaction table in ouranalytic view. If we query the information view and
only select a SUM(SALES_AMOUNT) with no GROUP BY or WHERE clause, then the execution
plan might skip the join process between the customer table and the sales transaction table.
However, if we execute SELECT CUSTOMER_NAME, SUM(SALES_AMOUNT) then the join
between the customer table and sales transaction will be required regardless of the join type used.
The process is subjective and related to how the joins are configured in our information view. It is
also subjective to how the information view is queried. If every query we execute selects all attribute
and measures, then join pruning would never be invoked. Fortunately, in the real world, this is not
typically the case. Most queries only select a subset of data from the information view. For this
reasons, we should try to design our information views to take advantage of join pruning.

What join options do I have?

There are five main types of joins to look at in this context. Inner Joins, Referential Joins, Left Outer,
Right Outer and Text Joins. Below is an outline on how SAP HANA handles each join type.

INNER JOIN

Inner Joins are always evaluated in the model. No join pruning occurs because an inner join
effectively tells the SAP HANA engine to always join no matter what query is executed. Records are
only returned when a matching value is found in both tables. Overall they are one of the most
expensive join types that can be defined in an information view. They also have the potential to
remove records from the results assuming that referential integrity is poorly maintained in the
source tables.

REFERENTIAL JOIN

For all intents and purposes, the referential join acts like an inner join. However, under specific
circumstances, the referential join will be pruned from the execution of the information view.
Assuming that there are no filters defined in an attribute view’s foundation, the SAP HANA
engines will likely prune the join from the execution plan of the analytic view. This assumes that
no columns are queried within the same attribute view. However, if a column from the attribute
view is included in a query, this join will act as an inner join.

Let me reiterate the exception to the pruning rule one more time. If filters are defined in
the attribute view, the join will be included and enforced as an inner join even if no attribute
view columns are selected. This means that records are only returned when a match is found in both
the attribute viewand analytic view foundation table. It also means that referential joins are not
always pruned from the information view execution plan. Pay close attention to this exception
because referential joins can be just as expensive as inner joins.

When a referential join is pruned we have to make sure that the referential integrity between tables is
sound. If it is not, we might get varying results based on the columns selected in our queries. For
example, a transaction might not have a shipped date defined. When we query for transactions with a
shipped date, this transaction would be excluded. When we exclude any shipped date attributes from
our query, the transaction would magically appear again because of join pruning. Based on
experience, this can confuse the heck out of an end user… or a developer
LEFT OUTER

Based on several query scenarios that I tested, the left outer join is always pruned from the execution
plan assuming that no columns are selected from anattribute view defined within the model. Left
outer joins are often defined in instances where the analytic view foundation table is known to
have no matching record within an attribute view. By design, they prevent the model from
excluding transactions. This is because it will ensure that all possible transactions are returned, even
if we are missing master data. Overall I would say that the left outer join has the potential to be the
best performing join within our information view definition.

However, we have to make sure that it makes sense before we arbitrarily set up our models with left
outer joins. We sometime need the inner join to help exclude records. For example, let’s assume that
we have a date attribute viewjoined to a data column in our analytic view foundation table.
Using an inner join we can exclude any transaction that has no defined date. There are other
instances where we might find phantom transactions in our aggregates with a left outer join. For
example, a transaction that has no customer, date or product. In those cases, something was likely
only partially deleted from the system. Therefore, referential integrity issues can
create problems.If you recall a similar issue can happen with referential joins. However, with
left outer joins we will always return the transaction.

RIGHT OUTER

The right outer join is used in instances where all attribute view values need to be returned even if
they have no matching transactions. For example, I need a complete list of all customers even if they
have never been issued an invoice. This would be helpful if the query were targeting both existing
and potential customers.

Keep in mind that a mismatched transaction will be excluded from anyanalytic view’s aggregates.
Based on my testing, a right outer join is always performed in the execution of the model. I was not
able to devise a single scenario where the join was pruned from the execution of the model. With this
in mind, it will have a high cost when used in the information view definition.

TEXT JOIN

This is effectively an inner join setup specifically to work with the SPRAS field found in SAP
BusinessSuite tables. Based on my brief testing, it is always executed and never pruned from the
model.

Das könnte Ihnen auch gefallen