Sie sind auf Seite 1von 25

Custom Geocoding Filled Areas

in Tableau
tabgeohack.exe
Version 1.0

Caveat
The tabgeohack utility is, as its name suggests, an unsupported hack. The
utility allows the creation of custom geocoding roles in Tableau with
associated filled areas. It does this by extending the database schema in
Tableaus custom geocoding database to hold geometry and populating
the additional columns from spatial data files.
It is unsupported in several senses.

If it doesn't work as you expect, there's no guarantee it will ever get


fixed. Best endeavours, if I'm interested and not too busy, that sort
of thing.
If you have any problems with a workbook that uses this approach;
dont even think about asking Tableau for help until you remove the
custom geocoding. (I have no idea what Tableaus attitude would
be, but I know what mine would be if I were them.)
It is virtually certain that a future release of Tableau will change how
geocoding works in some way that will stop this from working
altogether simply because this approach relies on very specific
(and unpublished!) details of the internal structure of the geocoding
database. That is bound to change at some point. Hopefully any
release that changes things in this way will also include adding
support for similar extensibility capabilities, but theres absolutely
no telling.
It uses an open source GIS library - and at least one of the features
I'm using (simplification of complex shapes) doesn't work as well as
I'd like - but there's nothing I can do about it.

The implications of all of this are clear: dont use it for anything which you
care about. In particular dont use it for anything which needs to keep
working beyond the next release of Tableau.
Personally, I intend to use it for point-in time, throw-away analysis: blog
posts and the like, and also to explore how this sort of capability would be
useful if it were a supported part of the product. I strongly suggest you
limit your use similarly.
You have been warned.

Page 2

Overview
The utility takes one or more spatial data files containing polygon data,
transforms them to an appropriate geographic coordinate reference
system for Tableau to use (i.e. to lat/long coordinates) and generates CSV
files in the format needed for creating custom geocoding roles. After the
custom geocoding has been imported to Tableau the utility is run again to
insert the polygon boundary data into the custom geocoding database.
The source spatial data can (in principle) be in any spatial data format
supported by the Geographic Data Abstraction Library (GDAL, an open
source GIS library). I say in principle because Ive only tested with ESRI
shape files and a handful of other formats, but I see no reason why it
shouldnt work with anything the GDAL utilities can understand (which is
an extensive list).
The utility also supports purging of unneeded geographic roles from the
resulting custom geocoding database. Reducing the size of the database in
this way can improve performance and also reduces the size of any
packaged workbooks (which minimises the use of Tableau Public quota
when publishing the workbook).

One of the key factors which determines the viability and usability of the
resulting geocoding database is the number and complexity of loaded
shapes. Too many, or too complex shapes can lead to very poor
performance or even an out of memory error - it can simply take Tableau
outside the envelope it is designed for.

To help ensure you don't overload it, the utility provides the option to
simplify the boundaries of the shapes using the GDAL library and also to
display statistics about the complexity which help in deciding the
appropriate simplification settings.

However, dont expect too much. Simplification of spatial data is a


notoriously difficult task and can often lead to anomalies and artefacts in
the simplified data (such as missing or overlapping slivers at the
boundaries of adjoining shapes).
For example, the two screenshots below show a sample of New Zealand
electoral boundaries simplified with a tolerance of 1,000 metres (left) and
100 metres (right). The original shape file with no simplification results in
almost 600,000 boundary points being loaded. Simplifying at 1,000 metre
tolerance reduces that to 3,000 boundary points (a factor of 200), which
makes the view much more responsive, but introduces a lot of error. At
100 metres tolerance the number of points is around 16,000 (a factor of
40 on the original), which still allows the view to respond quickly whilst
also retaining acceptable accuracy.

Page 3

Finding the best compromise between simplicity (and hence performance)


and accuracy can involve a lot of trial and error. Getting satisfactory
results may require manual intervention using a GIS tool. It can be
particularly difficult if there is a wide range of sizes of shapes in the one
file, since the same level of simplification has to apply to the whole file.
The utility is run from a DOS command line, driven by a configuration file
in YAML format (YAML is a structured text format). It has no user interface.
This means its not easy to use unless you understand what is going on. A
sample configuration file and associated shape files are provided with the
utility. These illustrate most of the key features and allow the whole
process to be run, in order to get familiar with how it works.

Page 4

Summary of Commands
The tabgeohack utility has the following syntax. The options are described
briefly below.
Usage: tabgeohack.pl [options] <config_file>
OPTIONS:
--info
display shape file metadata
--roles
generate custom geocoding CSV files
--shapes
load custom shapes into geocoding D/B
--assign <twb_file>
assign custom geocoding instance to workbook
--analyse
display summary statistics for all geometry in D/B
--activate
activate the processed custom geocoding database for
this configuration
--revert
restore the custom geocoding database to the
unprocessed state
--version
display version number and exit
The configuration file <config_file> is a YAML file.

--info
The --info option runs the GDAL ogrinfo command for each shape file
referenced in the configuration file. This displays summary information
about the file (number of features, geographic area extent, coordinate
reference system used, units of measurement) and also displays a list of
metadata attributes contained.
--roles
The --roles option parses the shape files specified for each role, calculates
the location of the centroid of each shape and generates a CSV file for
each role containing specified identifying fields from the shape file plus the
latitude and longitude of the centroid. These files are created in a single
directory, in the format needed to import the roles into Tableau as custom
geocoding (initially without any associated shapes).
Optionally, an additional file of feature metadata extracted from the shape
files can be generated for each geographic role.
The input shape files are transformed to an appropriate geographic
(lat/lon) coordinate reference system, if necessary.
--shapes
The --shapes option modifies the schema for the tables supporting the
newly created roles in the custom geocoding database to accommodate
the associated shape data. It then inserts the shapes into the database.
Optionally, the shapes can be simplified, reducing the number of points
per boundary line. This reduces the size of the custom geocoding database
and improves performance, at the cost of loss of accuracy.
It then optionally purges any unneeded geocoding details (the custom
geocoding database includes a full copy of all geocoding data supplied
with Tableau). Purging the unneeded data in this way can improve
performance (if only a subset of the data for a particular role is needed for
the particular analysis) and also allows the geocoding database to be
made much smaller, which reduces the size of resulting packaged
workbooks and saves quota on Tableau Public used by any workbooks
published there.

Page 5

Finally it compresses the custom geocoding database and saves a copy


which can be referenced even after the current custom geocoding has
been replaced with a different set. This avoids the need to keep regenerating custom geocoding when swapping between workbooks
requiring different custom geocoding.
--assign
The --assign option associates a Tableau workbook with the saved instance
of the geocoding database specified in the given configuration file. This
can be extremely useful for switching to and fro between different custom
geocoding instances, without having to keep copying the files back into
the standard location in the repository.
Unfortunately, this assignment is not retained when the workbook is
saved, so the only way to keep the assignment is to save the workbook as
a packaged workbook (which actually embeds a compressed copy of the
custom geocoding database in the packaged workbook).
--analyse
The --analyse option displays summary statistics for the numbers of
shapes and the numbers of boundary points for all geographic roles with
shape data. This can be useful in determining the level of simplification
needed.
--activate
The --activate option switches the current custom geocoding to use the
saved copy of the processed geocoding database associated with the
specified configuration file. This is an alternative way of switching to and
fro between different geocoding instances.
--revert
The --revert option switches the current custom geocoding to use the
saved copy of the unprocessed geocoding database associated with the
specified configuration file (i.e. the version after importing custom
geocoding with Tableau, but before inserting shape data with the --shapes
option).
This can be extremely useful when experimenting with different levels of
simplification. Simply run the shapes option with one simplification
setting and examine the results, then change the simplification setting in
the configuration file, use revert to return to the unprocessed geocoding
database and run shapes again to import shapes with the new
simplification level.
--version
The version option simply displays the version number.

Page 6

Installation Instructions
1) Download TabGeoHack.zip from here:
http://dl.dropbox.com/u/59458890/TabGeoHack.zip
and save it on a local drive.
The default location specified in the sample configuration files
included is C:\Data\Tableau putting it there means making less
changes to the configuration files.
2) Unzip the file, which will create a subdirectory of TabGeoHack,
containing the utility plus a couple of Firebird utilities (needed for
accessing the geocoding database) and some other components.
There are also two sub-directories: Sample is exactly what it says,
gdal is the suggested location to install the required GDAL utilities
(step 4).
3) Add the location where you have installed the utility to the PATH
(edit system environment variables from control panel), e.g.:
C:\Data\Tableau\TabGeoHack
4) Download version 1.9 of GDAL (Geographic Data Abstraction
Library) and save it in an appropriate location (such as the gdal
directory under TabGeoHack). The current stable release of GDAL
and a nightly build of the latest version are available at
GISINTERNALS. I have been using release-1600-gdal-1-9-0-mapserver6-0-1 (choose the zip file containing all components).
As the GISINTERNALS site often seems to be unavailable, Ive put a
copy of the version Ive been using in my Dropbox account, here:
http://dl.dropbox.com/u/59458890/release-1600-gdal-1-9-0mapserver-6-0-1.zip
5) Unzip the GDAL package.
Running GDAL components requires various directories to be on the
path and other environment variables to be set. This is done
automatically by tabgeohack, based on a setting in the
configuration file (step 6).
If you want to run the GDAL components standalone, as well as from
tabgeohack youll need to do this by following some fairly confusing
instructions available on the GDAL site by following the
information link next to each download. Or the script SDKShell.bat
can be run in a command window to set the environment variables,
but this is not persistent, so it is probably better to set them up
permanently.

Page 7

6) Modify the configuration file (tabgeohack.yml), which is located in


the installation directory, specifying the location of your Tableau
repository and the GDAL installation directory.
For example:
# path of Tableau repository

tableau_repository_path: C:\Users\richard\Documents\My
Tableau Repository
# GDAL installation path

GDAL_installation_path: C:\Data\Software\GDAL_1.9
A few other optional and little-used options are available, including
the ability to specify a German or French installation of Tableau
(although currently this only works for German). Refer to the
reference section at the end of this document for details and to
the known issues section for issues and workarounds when using
the French or German installations.

Page 8

Configuration File Format


The configuration file is in YAML format. YAML is a data serialisation
language which aims to allow complex data structures to be expressed in
a simple, human-readable format. It also makes loading and using that
data extremely easy, which is the primary reason I chose it. Judge for
yourself about the simple, human-readable bit.
There are a couple of key things to understand about YAML before
attempting to edit the configuration files.

YAML works on textual indentation, so it is vital to keep the


indentation level consistent. The best bet is to keep it exactly as
you find it in the example files. It is best to edit the YAML files in a
text editor, using a fixed-width font. Note that YAML only accepts
indentation based on spaces, not tabs, so make sure your editor
isnt helpfully converting white space to tabs for you. I have used
an indentation level of 4 spaces, but that isnt required, it can be
anything as long as the indentation level remains consistent.
YAML includes simple lists, in which the list elements are introduced
by a dash (-) and also hashes (named values), in which the name
and value are separated by a colon (:).
Comments are indicated by #. Hopefully Ive included enough of
them to give you a fighting chance.

Just remember that the positions of spaces, dashes and colons are all
crucial, and refer back to the example if it breaks. The utility attempts to
give meaningful messages about format errors.
The configuration file has four main sections:

Miscellaneous details defining the input and output file locations


and formats and such like.
Details of the geographic roles to be added, specifying the source
spatial files, lists of attributes and details of any simplification
required.
A definition of the hierarchical structure of the geographic roles:
both Tableaus built-in geographic roles and those being added by
this configuration.
A definition of geographic roles to be purged from the resulting
custom geocoding database (to improve performance and also
reduce the size of any packaged workbooks using the custom
geocoding). Roles may either be purged completely, or trimmed
down to just the members required. The purge processing walks
down the hierarchy defined in the preceding section, deleting or
retaining children as appropriate.

Page 9

Example
An example configuration file and its associated shape files are included in
the sample directory supplied with the utility. This is for the Tsunami
warning zones for my home town of Porirua in New Zealand, along with
meshblock (a grouping of land parcels) data. Sample data associating
meshblocks with street addresses is also included. A sample workbook
allowing meshblocks to be located by street address and displayed
overlaying the Tsunami evacuation zones is published on Tableau Public
here.
The sample workbook illustrates the impact of simplifying the shapes, by
including three versions of the Tsunami zone boundaries: at full detail,
simplified to 10 metre tolerance and simplified to 100 metre tolerance
(see the various tabs in the workbook). To keep it as simple as possible,
the sample YAML file only generates one version of the Tsunami zone
boundaries: simplified to 10 metres.
The sample YAML file also illustrates how purging works, by selectively
retaining Australia and New Zealand and also retaining just the Wellington
Region (which corresponds to a State in Tableau). All cities within the
Wellington region are also implicitly retained.
The sample file is shown below.
Porirua Tsunami Warnings.yml
# location of input spatial data files
shape_file_dir: C:\Data\Tableau\TabGeoHack\Sample\Shape Files
# location of various generated files
output_dir: C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami
Warnings
# list of geographic roles to process
geographic_roles:
# list of geographic roles to create
# maximum length is due to Firebird identifier length limit
# of 31 and the need for a 'LocalData<role_name>' table
role_name: Porirua_Meshblock
# shape file(s) - note that this is a list of files to allow
# for layers being split across shape files normally there
# will be a single entry
shape_file_names:
porirua_mb_wgs84.shp
# list of fields from shape file to include in geocoding
# database
# Firebird identifier names are limited to 31 characters
required_geocoding_fields:
MB11:
# column name to be used in geocoding database
alias: Meshblock Code
# unique ID indicator (default false)
unique_id: true
# list of phrases to be used for automatic geocoding
heuristics:
meshblock

Page 10

# list of fields from shape file to include in separate file


# of feature data which can be joined to datasource field
# names from the shape file are listed, with optional aliases
# to be used as CSV column names (otherwise the original
# field name is used)
required_feature_fields:
MB11: Meshblock Code
AU11: Area Unit Code
TA11: Territorial Authority Code
WARD11: Ward Code
REGC11: Regional Council Code
X_GCEN:
Y_GCEN:
# whether or not to generate CSV file of points
generate_points: true
role_name: Porirua_Tsunami
shape_file_names:
porirua-tsunami-evacuatio.shp
required_geocoding_fields:
OBJECTID:
alias: Tsunami Object ID
unique_id: true
COL_CODE:
alias: Colour Code
required_feature_fields:
OBJECTID:
ZONE_CLASS:
COL_CODE:
EVAC_ZONE:
LOCATION:
INFO:
HEIGHTS:
simplify_tolerance: 10
generate_points: true

# Definition of Role Hierarchy to allow purging of unwanted roles


# First the built-in geographic roles
role_hierarchy:
role: Country
children:
role: State
children:
role: City
role: County
role: ZipCode
role: AreaCode
role: CMSA
# Custom geocoding roles needs to be defined at the appropriate
# position in the hierarchy if they are to be (partially) purged.
# This can be a useful way to trim down the volume of imported data
# to just the region of interest. In this case we are not purging the
# custom roles, so there is no need define them here.
# Whether or not to purge synonyms for any kept roles
purge_synonyms: true
# Definition of geographic roles to purge. Note that children in the
# role hierarchy are automatically purged if their parents are
# purged. Additional children can be purged by specifying the role
# explictly here.
purge_roles_exceptions:
# All countries except New Zealand and Australia are purged.

Page 11
Country:
New Zealand
Australia
# All states (aka regions in NZ) are purged except the Wellington
# Region. Note that all states of other countries (including
# Australia) will be purged with this definition.
State:
Wellington
# City is not listed, so all cities except those within the
# Wellington Region (aka "State") are purged.
# City:
# County, ZipCode, AreaCode and CMSA are listed with no
# exceptions, so even New Zealand Postcodes (aka ZipCodes) are
# purged.
County:
ZipCode:
AreaCode:
CMSA:

Page 12

Example - Loading the Sample Data


The utility comes with a sample configuration, the associated shape files
and a directory structure set up as needed to run it. The sample data
contains the boundaries of the Tsunami warning zones for my home town
of Porirua in New Zealand, along with local meshblock data (meshblocks
are a grouping of land parcels) and street address data.
The steps needed to load the data are explained below. A sample
workbook using this data (and also demonstrating the impact of various
levels of simplification) is published on Tableau Public here.
The sample also illustrates how purging works, by selectively retaining
Australia and New Zealand and also retaining just the Wellington Region
(aka State). All cities within the Wellington region are also implicitly
retained.
The sample YAML configuration file is included earlier in this document.
This section works through how to use all of the commands with the
sample data.

1. --help - display the list of options for the


command
To make sure the utility is installed and available on the PATH, open a DOS
window and run the command with the --help option:
tabgeohack --help
C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --help
Usage: tabgeohack [options] <config_file>
OPTIONS:
--info
display shape file metadata
--roles
generate custom geocoding CSV files
--shapes
load custom shapes into geocoding D/B
--assign <twb_file>
assign custom geocoding instance to workbook
--analyse
display summary statistics for all geometry
in D/B
--activate
activate the processed custom geocoding
database for this configuration
--revert
restore the custom geocoding database to the
unprocessed state
--version
display version number and exit
The configuration file <config_file> is a YAML file.
C:\Data\Tableau\TabGeoHack\Sample>

2. --info - display details of referenced shape files


The --info option runs the GDAL ogrinfo command for each shape file
referenced in the configuration file. This displays summary information
about the file (number of features, geographic area extent, coordinate
reference system used, units of measurement) and also displays a list of
metadata attributes contained.

Page 13

Particularly useful details are highlighted in the listing below.


tabgeohack --info "Porirua Tsunami Warnings.yml"
C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --info "Porirua Tsunami Warnings.yml"
Displaying shapefile metadata using ogrinfo...
Porirua_Meshblock
=================
INFO: Open of `C:\Data\Tableau\TabGeoHack\Sample\Shape Files\porirua_mb_wgs84.shp'
using driver `ESRI Shapefile' successful.
Layer name: porirua_mb_wgs84
Geometry: Polygon
Feature Count: 602
Extent: (174.770784, -41.164136) - (174.995543, -41.003921)
Layer SRS WKT:
GEOGCS["GCS_WGS_1984",
DATUM["WGS_1984",
SPHEROID["WGS_84",6378137,298.257223563]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]]
MB11: String (7.0)
TA11: String (3.0)
WARD11: String (5.0)
CB11: String (5.0)
TASUB11: String (5.0)
REGC11: String (2.0)
CON11: String (4.0)
MCON11: String (4.0)
AU11: String (6.0)
UA11: String (3.0)
X_GCEN: Integer (9.0)
Y_GCEN: Integer (9.0)
AREA: Real (19.5)
======================
Porirua_Tsunami
===============
INFO: Open of `C:\Data\Tableau\TabGeoHack\Sample\Shape Files\porirua-tsunamievacuatio.shp'
using driver `ESRI Shapefile' successful.
Layer name: porirua-tsunami-evacuatio
Geometry: Polygon
Feature Count: 3
Extent: (1749174.069200, 5443680.348800) - (1762544.429300, 5459038.196500)
Layer SRS WKT:
PROJCS["NZGD2000 / New Zealand Transverse Mercator 2000",
GEOGCS["GCS_NZGD_2000",
DATUM["New_Zealand_Geodetic_Datum_2000",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",173],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",1600000],
PARAMETER["false_northing",10000000],
UNIT["Meter",1]]
OBJECTID: Integer (10.0)
ZONE_CLASS: Real (19.11)
COL_CODE: String (15.0)
EVAC_ZONE: String (50.0)
LOCATION: String (50.0)
INFO: String (254.0)
HEIGHTS: String (100.0)
======================
Done in 0 seconds
C:\Data\Tableau\TabGeoHack\Sample>

The highlighted fields are all useful in setting the details in the
configuration file.

Page 14

In particular, note that the map units shown are the units which must be
used for the simplify_tolerance: setting, if that is used. So in the case of
the example, the simplify_tolerance for the tsunami zone boundaries is
specified in metres, since the tsunami shape file uses a projected
coordinate reference system with those units. To simplify the sample
meshblock data, however, the tolerance would need to be specified in
(fractional) degrees, since that shape file is using a geographic (lat/long)
coordinate reference system.

3. --roles generate CSV files for custom


geocoding
The --roles option parses the shape files specified for each role, calculates
the location of the centroid of each shape and generates a CSV file for
each role containing specified identifying fields from the shape file plus the
latitude and longitude of the centroid. These files are created in the subdirectory Custom Geocoding Files under the output files location
specified in the configuration file. These files are created in the format
needed to import the roles into Tableau as custom geocoding (initially
without any associated shapes).
If the option to create additional files of feature data was chosen for any of
the roles, these are created in the sub-directory Feature Files under the
output location.
tabgeohack --roles "Porirua Tsunami Warnings.yml"

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --roles "Porirua Tsunami Warnings.yml"


Generating custom geocoding files...
Porirua_Meshblock... (602)
Porirua_Tsunami... (3)
Done in 2 seconds
C:\Data\Tableau\TabGeoHack\Sample>

4. Import custom geocoding into Tableau


The custom geocoding CSV files generated in the previous step should be
imported into Tableau in the usual way. (Note that if you are using the
French or German interface, youll need to switch to English while you
import the Geocoding.)
For example:

Create a new workbook by opening the file Porirua Tsunami


Zones.csv from the sample directory with Tableau 7.0. Choose a
live connection. There is no need to add any fields to the view at
this stage.

Select Map->Geocoding->Import Custom Geocoding and select


the directory location holding the custom geocoding CSV files just
created in the previous step by default this is:

Page 15

C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami
Warnings\Custom Geocoding Files

Save the workbook as Tsunami.twb.

Close (all copies of) Tableau.

Page 16

5. --shapes add shape boundaries to the


custom geocoding
The --shapes option modifies the schema for the tables supporting the
newly created roles in the custom geocoding database to accommodate
the associated shape data. It then inserts the shapes into the database.
It then optionally purges any unneeded geocoding details (the custom
geocoding database includes a full copy of all geocoding data supplied
with Tableau). Purging the unneeded data in this way can improve
performance (if only a subset of the data for a particular role is needed for
the particular analysis) and also allows the geocoding database to be
made much smaller, which reduces the size of resulting packaged
workbooks and saves quota on Tableau Public used by any workbooks
published there.
Finally it compresses the custom geocoding database and optionally saves
a copy which can be referenced even after the current custom
geocoding has been replaced with a different set.
tabgeohack --shapes "Porirua Tsunami Warnings.yml"
C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --shapes "Porirua Tsunami Warnings.yml"
Generating shapes...
Porirua_Meshblock... added 602 rows with a total of 18706 points (min: 5, avg: 31,
max: 595)
Porirua_Tsunami... added 3 rows with a total of 3387 points (min: 729, avg: 1129,
max: 1507)
Overall totals: 605 rows, 22093 points
Purging unwanted geocoding data...
Processing role: Country
- Keeping: 'New Zealand', 'Australia'
Processing role:
State
- Keeping: 'Wellington'
Processing role:
City
Processing role:
County
Processing role:
ZipCode
Processing role:
AreaCode
Processing role:
CMSA
Total rows deleted: 1324
Compressing geocoding D/B...Saving a copy of the unprocessed custom geocoding data at:
C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (no
geometry)
Saving a copy of the processed custom geocoding data at:
C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (with
geometry)
Done in 20 seconds
C:\Data\Tableau\TabGeoHack\Sample>

6. Check that it has worked


Reopen the workbook Tsunami.twb saved in step 4, assign the
geographic role Tsunami Object ID to the field [OBJECTID], drag
[OBJECTID] onto Level of Detail and set the mark type to Filled Map. Drag
[COL_CODE] onto the color shelf and you should have something that
looks like this. Tableau isnt quite smart enough to get the colours right
automatically.

Page 17

7. --assign associate Tableau workbooks with


saved geocoding
The --assign option associates a Tableau workbook with the saved instance
of the geocoding database specified in the given configuration file. This
can be useful for switching to and fro between different custom geocoding
instances, without having to keep copying the files back into the standard
location in the repository.
Unfortunately, this assignment is not retained when the workbook is
saved, so the only way to keep the assignment is to save the workbook as
a packaged workbook (which actually embeds a compressed copy of the
custom geocoding database in the packaged workbook).
tabgeohack --assign Tsunami.twb "Porirua Tsunami Warnings.yml"
C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --assign Tsunami.twb "Porirua Tsunami
Warnings.yml"
Assigning workbook 'Tsunami.twb' to custom geocoding instance 'Porirua Tsunami
Warnings'...
Done in 0 seconds
C:\Data\Tableau\TabGeoHack\Sample>

8. --analyse display statistics for all geometry


objects
The --analyse option displays summary statistics for the numbers of
shapes and the numbers of boundary points for all geographic roles with
shape data. This can be useful in determining the level of simplification
needed.
tabgeohack --analyse "Porirua Tsunami Warnings.yml"
C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --analyse "Porirua Tsunami Warnings.yml"
Role: Country, 2 out of 2 rows with geometry.
2565 points (min: 766, avg: 1282, max: 1799 per row).

Page 18
Role: State, 1 out of 1 rows with geometry.
100 points (min: 100, avg: 100, max: 100 per row).
Role: County, 0 out of 0 rows with geometry.
Role: ZipCode, 0 out of 0 rows with geometry.
Role: Porirua_Meshblock, 602 out of 602 rows with geometry.
18706 points (min: 5, avg: 31, max: 595 per row).
Role: Porirua_Tsunami, 3 out of 3 rows with geometry.
3387 points (min: 729, avg: 1129, max: 1507 per row).
Done in 1 seconds
C:\Data\Tableau\TabGeoHack\Sample>

9. --activate switch to previously processed


geocoding D/B
The --activate option switches the current custom geocoding to use the
saved copy of the processed geocoding database associated with the
specified configuration file. This is an alternative way of switching to and
fro between different geocoding instances.
tabgeohack --activate "Porirua Tsunami Warnings.yml"
C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --activate "Porirua Tsunami Warnings.yml"
Activating the saved copy of the processed custom geocoding data from:
C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (with
geometry)
Done in 0 seconds
C:\Data\Tableau\TabGeoHack\Sample>

10.
--revert switch to saved unprocessed
geocoding D/B
The revert option switches the current custom geocoding to use the
saved copy of the unprocessed geocoding database associated with the
specified configuration file (i.e. the version after importing custom
geocoding with Tableau, but before inserting shape data with the shapes
option.
tabgeohack --revert "Porirua Tsunami Warnings.yml"
C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --revert "Porirua Tsunami Warnings.yml"
Reverting to the saved copy of the unprocessed custom geocoding data from:
C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (no
geometry)
Done in 0 seconds
C:\Data\Tableau\TabGeoHack\Sample>

Page 19

Troubleshooting and Known Issues


The utility does some rudimentary validation of the configuration files and
attempts to handle any errors that occur during processing as gracefully
as possible. A failed run should leave all files in the repository in their
initial state, so there should not often be a need to intervene to recover
following failures.
However if the custom geocoding database does ever get into a state that
Tableau is not happy with, it is straightforward either to revert to the
unmodified custom geocoding database using the revert option, or to
remove custom geocoding from the Tableau menu:
Map->Geocoding->Remove Custom Geocoding
If all else fails, simply delete the Local Data directory from the repository
and start again Tableau will automatically recreate it when you add
custom geocoding.
The most common issues are caused either by inconsistencies in the YAML
configuration file, or by invalid data in the shape files being read. I have
encountered several examples of invalid shape files including official
government supplied public information files. There are also a few known
bugs and limitations.

Known Issues
1. The tableau_country_code option doesnt work for the French
installation of Tableau.
This is due to the need to use Unicode characters in the path for the
custom geocoding database (for the acute accent on the e in
Donnes locale, which is the directory name in the French version.
I have had a brief go at getting this working and not managed to
figure it out. For now the workaround is to rename the Donnes
locale directory in the repository to Local Data before running the
utility and back again afterwards.
2. Generated custom geocoding CSV files cannot be imported if the
user interface language is set to French or German. The CSV files
are required to use the correct French or German words for Latitude
and Longitude, as per the current language setting.
Again, I had a brief go at getting this going in German, but the a
umlaut in Lngengrad defeated me.
3. The simplify option can lead to gaps and overlaps in the shapes.
Unfortunately this is just how the GDAL library works. Experiment
with different settings to find the best compromise between speed

Page 20

and accuracy. Alternatively, the shape file can be simplified using a


GIS system before processing with tabgeohack.
I hope to make this better in a future version, but dont count on it.
4. Some shape files do not contain a field which uniquely identifies
each feature.
Tableau requires a unique identifier in order to import custom
geocoding and tabgeohack also needs it to associate the shapes
with the right features.
To help identify any potential identifier fields in the shape file,
tabgeohack checks all fields mentioned in the
required_geocoding_fields and required_feature_fields lists for
uniqueness, and suggests candidates if the specified field is not
unique.
Failing that, the option generate_unique_id can be set for the role.
This will generate a unique id field (simply the sequence in the file).
5. Views are slow to render or Tableau fails with out of memory errors.
If too much geocoding data is loaded, Tableau performance will
degrade. This may be caused either by too many features or by too
much detail in the shape boundaries.
The workarounds for reducing the total number of rows are either to
limit the number of rows imported by choosing shape files limited to
the region of interest, or by purging unneeded features during the
processing of the --shapes step.
The workaround for too many boundary marks per feature is to use
the simplify option, or simplify the shape file before loading.
6. Errors in the YAML configuration files.
There are numerous errors that can occur due to invalid layout,
incorrect field names or invalid characters in the YAML file. Carefully
compare the file that is giving problems with the sample file and
pay close attention to the exact layout (particularly the number of
spaces used for indenting).
For example:
Errors found in configuration file:
YAML::Tiny failed to classify line ' shape_file_names:'
at C:\Data\Performance Testing\Tableau\Filled Maps\tabgeohack.pl line 97

This error is caused by too few spaces indenting the


shape_file_names: entry.
Errors found in configuration file:
[/geographic_roles/0/] 'shape_file_name' is not one of the allowed keys:
generate_points, generate_unique_id, no_transform_crs, precision,
required_fea...
at C:\Data\Performance Testing\Tableau\Filled Maps\tabgeohack.pl line 97

Page 21

This error is caused by a missing s it should be


shape_file_names: not shape_file_name:.
7. Unknown Coordinate Reference system.
Sometimes shape files do not have the necessary information to
allow the GDAL Utilities to identify the coordinate reference system
used by the file, which means the utility doesnt know how to
transform the file to the required geographic coordinate reference
system (lat/lon) for Tableau to use. There are a couple of options in
this case.
If the file is already expressed using latitude and longitude, there is
probably no need to transform its coordinates. Any errors introduced
like this are likely to be insignificant at the sort of scales normally
used for Tableau visualisations. In this case, simply add the
no_transform_crs option, with a value of true, for the role in
question.
If the file is in a projected coordinate reference system, and you
know (or can guess) the CRS, the gdal utilitie allow you to specify
the source CRS when transforming a file. This can be specified by
adding the source_crs option to the relevant role.
Refer to the configuration file syntax reference at the end of this
document for both of these options.
8. Bad shape files.
Various errors can be caused by invalid shape files. These generally
result in failures during calls to the GDAL utilities used by
tabgeohack and can be very hard to pin down. The only way to fix
these is to edit the shape file with a GIS tool. In some cases the
errors are detected by the validation options provided by the GIS
tool, which at least locates the troublesome feature. In other cases
these errors are not detected by the GIS tool Im using (QGIS), which
makes it even trickier.
Examples I have seen include:

A divide by zero error during the --roles step. This was due to a
polygon with only 3 boundary points (which is not valid
according to the ESRI shape file specification). This caused the
area of the polygon to be zero, which broke the calculation of the
centre of the shape. Deleting that feature (or that polygon if the
feature has multiple polygons) is the best option.
The points in a polygon do not form a closed ring. This causes a
failure during processing of the --shapes option. In this case QGIS
did not detect the error.
The boundary of a polygon crosses itself. Again this can happen
during --shapes processing. This case was detected by QGIS and
can be relatively easily fixed.
An Out of Memory error during processing of the --shapes
option can also be caused by a self-intersecting polygon. How
the issue manifests itself depends on options chosen (such as

Page 22

whether or not the simplify option is being used). Once again,


locating and fixing the troublesome feature with a GIS tool is the
only option.

Page 23

Structure and Allowed Values of Configuration Files


The structure and allowed values for all options supported by the
tabgeohack installation configuration file and the custom geocoding
instance configuration files are defined in the YAML schema shown
below. Hopefully this is just about intelligible. Most of the interesting
options are illustrated in the example above, so only refer to this section
for reference for the more obscure options.
(Dont bother Googling YAML schema, by the way. The YAML schema
language I am using is an invention of a colleague of mine who hasnt
quite got around to donating it to the YAML community yet but I think the
meaning is fairly self-explanatory.)

Installation configuration: tabgeohack.yml


# path of Tableau repository
tableau_repository_path: required text
# GDAL installation path
GDAL_installation_path: required text
# working directory (defaults to TEMP environment variable)
temp_loc: text
# optionally retain temporary files (default false)
keep_temp: boolean
# country code for international Tableau edition (US/DE/FR)
# (default US)
tableau_country_code: values US, DE, FR
# default number of digits of precision for latitude and longitude
# values (default 4)
default_precision: integer
# optional geographic coordinate reference system to be used
# (default WGS84).
# Specified in any format understood by ogr2ogr (eg, WGS84 is
# specified as EPSG:4326)
target_crs: text

Custom Geocoding Instance Configuration


# location of input spatial data files
shape_file_dir: required text
# location of various generated files
output_dir: required text
# whether or not to split polygons which span the dateline (default
# false)
wrapdateline: boolean
# list of geographic roles to process
geographic_roles:
# list of geographic roles to create
# maximum length is due to Firebird identifier length limit of 31

Page 24
# and the need for a 'LocalData<role_name>' table
- role_name: required text length 1 to 22
# names of shape files (or potentially any other type of spatial
# file supported by GDAL)
# this is a list of files to allow for layers being split across
# shape files normally just a single file
shape_file_names:
- required text
# coordinate reference system used in source spatial files. Only
# needed if not defined in source files
# Specified in any format understood by ogr2ogr (eg, WGS84 is
# specified as EPSG:4326)
source_crs: text
# suppress transformation of coordinate reference system, if
# shape file uses an unknown but geographic (lat/long) CRS
no_transform_crs: boolean
# number of digits of precision for latitude and longitude values
# (default set at tabgeohack config level)
precision: integer
# whether to generate a unique feature ID in the generated output
# files (default false)
generate_unique_id: boolean
# list of fields from shape file to include in geocoding database
# Firebird identifier names are limited to 31 characters
required_geocoding_fields:
<field_name>:
# column name to be used in geocoding database
alias: required text length 1 to 31
# unique ID indicator (default false)
unique_id: boolean
# list of phrases to be used for automatic geocoding
heuristics:
- required text
# list of fields from shape file to include in separate file of
# feature data which can be joined to datasource - field names
# from the shape file are listed, with optional aliases to be
# used as CSV column names (otherwise the original field name is
# used)
required_feature_fields:
<field_name>: optional text
# whether or not to generate CSV file of points
generate_points: boolean
# optional tolerance (in map units of source CRS) for
# simplification
simplify_tolerance: number
# optional number of iterations to allow coarse and then finer
# simplification (for example, specifying:
# simplify_tolerance: 100
# simplify_iterations: 10
# will simplify at 10, 20, 30 ... 100 metre tolerance)
simplify_iterations: integer

Page 25

# Definition of role hierarchy - this is required for both built in


# and custom roles in order to support automatic purging of unneeded
# roles, whilst complying with referential integrity rules.
#
# Note that specifying a hierarchy in this version of the YAML
# validator requires Kwalify notation - so this deifinition uses a
# mixture of Kwalify and Compact notation - which is very confusing.
role_hierarchy:
type:
seq
required: yes
define:
hierarchy-node-rule
sequence:
- role: required text
children:
use: hierarchy-node-rule
# Whether or not to purge synonyms for any kept roles
purge_synonyms: boolean
# definition of geographic roles to purge - named roles are purged,
# except for any listed exceptions
# child roles in the hierarchy are automatically purged, keeping any
# children of the exceptions listed
# additional children may also be purged by specifying the child role
# explicitly
purge_roles_exceptions:
<role>:
- required text

Das könnte Ihnen auch gefallen