Sie sind auf Seite 1von 14

The use of the Power

Query / Get & Transform


tools in Excel
1. Introduction
1.1 Not just data analysis
The tools that were formerly part of the Power Query Add-in and now, in Excel 2016, form the Get &
Transform group of the Data Ribbon tab, have the potential to change the way many spreadsheet
models are constructed and, in so doing, to avoid some existing sources of risk and error and
introduce new ones.

Although the tools have an obvious role in data acquisition for data analysis and business
intelligence, they are also capable of being used to replace an extensive set of current grid and
formula-based spreadsheet techniques.

Whether or not we choose to use these new tools and approaches, with Get & Transform now being
an integral part of Excel, it is inevitable that many users will adopt them. Consequently, an
understanding of the way in which the tools work is likely to be important in ensuring that research
continues to reflect how spreadsheets, and Excel spreadsheets in particular, are used in practice.

The Power Query tools might also provide an additional method of checking some types of
spreadsheet by allowing calculations to be performed in a different way, with a comparison
highlighting discrepancies.

The situation is complicated by the current state of development of the tools in question. Unlike
many of the Excel functions and formula constructions that we have grown used to working with
over a long period, Power Query is just a few years old and continues to evolve quite rapidly.
Changes are frequent and many of the possible techniques are yet to be subjected to extensive
exploration and testing in practice. Consequently, at this stage, we are likely to be asking more
questions than providing definitive answers.

As an introduction to these possibilities, the presentation will cover the use of the Power Query
tools to replace some 'standard' Excel techniques.

1.2 Power Query in practice


Power Query works in a very different way to 'traditional' Excel. Rather than using the Excel grid and
formulae, it uses commands, usually entered via the user interface, to create a series of steps which
process the source data into an output table of data. Each step usually uses the output of the
previous step as its starting point. Refreshing the query processes all of the steps in turn. If you click
on a step, the query preview displays the result up to that step. Clicking on the ‘gear’ icon for a step
allows you to edit the step:
Behind the scenes, the interface is creating code, much like recording a macro. You can see the code
(known as 'M' code) by clicking the Advanced Editor command in the Query group of the Home
Ribbon tab:

Power Query includes a very wide range of commands to process and transform the data. Custom
columns can be added with the results calculated using Power Query functions. Although the way
these functions work is similar to Excel functions, the function names are different and, in particular,
they are case sensitive. If a query name is entered using incorrect case, the formula will return an
error:
A key difference between a formula-based approach and the use of Power Query tools is the need to
refresh queries. Just as for PivotTables, recalculation is not automatic and, although queries can be
set up to refresh at defined time intervals, users will need to adapt to a situation where recalculation
is periodic or manual, rather than based on the recalculation chain.

2. Append Tables
2.1 Problem
The task here is to turn a variable number of Excel Tables, each containing a variable number of
rows, into a single table making it possible to easily base calculations on the consolidated data.

2.2 Excel content as data source


As well as an extensive range of external data sources, Power Query can use Excel workbook
contents as a data source. This can be data held in an Excel Table, a named range or just the used
cells in the worksheets in Excel workbooks. In this example we will just demonstrate the technique
using some Excel Tables in a single worksheet:

Step 1 is to make each of our Tables the source of a separate query. We can do this by clicking in
each Table and using the From Table command:
We can then 'Close & Load' our date to a range of different outputs. In our case we just want to
create a connection to use in a future step:

Having repeated this for the other three Tables, we then use New Query, Combine Queries, Append:
You will notice that the help text says 'Append two queries from this workbook'. In fact, the April
2016 update has introduced an option to append multiple queries in one go:

Where our Table column headings are consistent this will create a single, consolidated Table which
can be loaded to an Excel Table as shown here:
When we change or add to data in any of our Tables we can manually refresh our queries, or we can
set our 'Appended' query to refresh every so many minutes using the Data Ribbon tab, Connections
option:
3. Lookup and reference functions
3.1 Problem
VLOOKUP().

Enough said.

3.2 Merge Queries


We have just seen the Append option of the Combine Queries command, but there is also a Merge
option that allows us to establish database-type relationships between queries. This allows us to
replace the use of many different kinds of lookup operations that would normally require the use of
formulae in multiple cells with a single join operation. Merge allows for an extensive range of Join
Kinds:

This example is very contrived in order to compare speed of operation. We have a list of over 1
million IDs with corresponding values. We are using the exact form of VLOOKUP() to return the
values for 10,000 IDs. The relationship between the time taken to recalculate and the position of our
match value in our base table is close to linear.
Using Combine, Merge replaces the 10,000 individual VLOOKUP() formulae with a single join
operation and can significantly improve the speed of calculation:

3.3 Approximate lookup


This is a bit more controversial. It is certainly possible to use the M language to create an equivalent
of the approximate lookup 1 but an alternative approach could be to create a Full Outer join between
the data table and the lookup table and then to create a custom column that uses the lookup table
code where there is no match with the data table code. In this example we want to report on our
individual balances by category. A simplified coding chart allocates codes to categories:

1 http://www.excelguru.ca/blog/2015/01/28/creating-a-vlookup-function-in-power-query/#comment-267328
We can use our two Tables as the source for two separate 'Connection Only' queries and then Merge
them using the Code columns as the join. By using a Full Outer join, any codes that are in the Coding
Chart Table but not in the Balances Table will also be included. This is vital because, to perform the
approximate lookup we are going to sort by code and then use Fill Down. If a code in the coding
chart isn't matched and we only use the codes in the Balances Table for the Fill Down, balances with
codes between the missing code and the next matching code will be allocated to the wrong
category.

4. Case Study
4.1 Introduction
This is one of the techniques used in a case study that uses the From File, From Folder data source to
consolidate lists of balances held in a set of Excel workbooks stored in a particular folder. The idea is
to allow a workbook containing a simple list of codes and values in one or more sheets that start
with 'Data-' to be created in, or moved to a particular folder, and for the balances to then be
automatically incorporated into a consolidated set of financial statements with no further manual
intervention:
In our case study, our coding chart includes the range 280-319 as Telephone and communications:

However, none of our balance values have been allocated to code 280 so, without the full outer join,
the 280 row would not be included in our table and our Fill Down operation would fill all codes from
270 to 319 with the Advertising category:
Note: I wouldn't claim to have exhaustively tested this approximate lookup approach so would be
grateful for confirmation, or otherwise, that it contains no devastating logical or practical flaw.

5. Grouping
5.1 Problem
Multiple complex conditional sum expressions.

5.2 Group By
Power Query allows the grouping of query records by multiple fields and then the choice of
aggregate operation for multiple columns. Using our case study example again, we have grouped our
balances by our Coding Chart categories and summed the Value column:
This gives us a simple table that we can use with a straightforward SUMIF() or SUMIFS() function to
populate a summary report:

Das könnte Ihnen auch gefallen