Beruflich Dokumente
Kultur Dokumente
in Tableau
tabgeohack.exe
Version 1.0
Caveat
The tabgeohack utility is, as its name suggests, an unsupported hack. The
utility allows the creation of custom geocoding roles in Tableau with
associated filled areas. It does this by extending the database schema in
Tableaus custom geocoding database to hold geometry and populating
the additional columns from spatial data files.
It is unsupported in several senses.
The implications of all of this are clear: dont use it for anything which you
care about. In particular dont use it for anything which needs to keep
working beyond the next release of Tableau.
Personally, I intend to use it for point-in time, throw-away analysis: blog
posts and the like, and also to explore how this sort of capability would be
useful if it were a supported part of the product. I strongly suggest you
limit your use similarly.
You have been warned.
Page 2
Overview
The utility takes one or more spatial data files containing polygon data,
transforms them to an appropriate geographic coordinate reference
system for Tableau to use (i.e. to lat/long coordinates) and generates CSV
files in the format needed for creating custom geocoding roles. After the
custom geocoding has been imported to Tableau the utility is run again to
insert the polygon boundary data into the custom geocoding database.
The source spatial data can (in principle) be in any spatial data format
supported by the Geographic Data Abstraction Library (GDAL, an open
source GIS library). I say in principle because Ive only tested with ESRI
shape files and a handful of other formats, but I see no reason why it
shouldnt work with anything the GDAL utilities can understand (which is
an extensive list).
The utility also supports purging of unneeded geographic roles from the
resulting custom geocoding database. Reducing the size of the database in
this way can improve performance and also reduces the size of any
packaged workbooks (which minimises the use of Tableau Public quota
when publishing the workbook).
One of the key factors which determines the viability and usability of the
resulting geocoding database is the number and complexity of loaded
shapes. Too many, or too complex shapes can lead to very poor
performance or even an out of memory error - it can simply take Tableau
outside the envelope it is designed for.
To help ensure you don't overload it, the utility provides the option to
simplify the boundaries of the shapes using the GDAL library and also to
display statistics about the complexity which help in deciding the
appropriate simplification settings.
Page 3
Page 4
Summary of Commands
The tabgeohack utility has the following syntax. The options are described
briefly below.
Usage: tabgeohack.pl [options] <config_file>
OPTIONS:
--info
display shape file metadata
--roles
generate custom geocoding CSV files
--shapes
load custom shapes into geocoding D/B
--assign <twb_file>
assign custom geocoding instance to workbook
--analyse
display summary statistics for all geometry in D/B
--activate
activate the processed custom geocoding database for
this configuration
--revert
restore the custom geocoding database to the
unprocessed state
--version
display version number and exit
The configuration file <config_file> is a YAML file.
--info
The --info option runs the GDAL ogrinfo command for each shape file
referenced in the configuration file. This displays summary information
about the file (number of features, geographic area extent, coordinate
reference system used, units of measurement) and also displays a list of
metadata attributes contained.
--roles
The --roles option parses the shape files specified for each role, calculates
the location of the centroid of each shape and generates a CSV file for
each role containing specified identifying fields from the shape file plus the
latitude and longitude of the centroid. These files are created in a single
directory, in the format needed to import the roles into Tableau as custom
geocoding (initially without any associated shapes).
Optionally, an additional file of feature metadata extracted from the shape
files can be generated for each geographic role.
The input shape files are transformed to an appropriate geographic
(lat/lon) coordinate reference system, if necessary.
--shapes
The --shapes option modifies the schema for the tables supporting the
newly created roles in the custom geocoding database to accommodate
the associated shape data. It then inserts the shapes into the database.
Optionally, the shapes can be simplified, reducing the number of points
per boundary line. This reduces the size of the custom geocoding database
and improves performance, at the cost of loss of accuracy.
It then optionally purges any unneeded geocoding details (the custom
geocoding database includes a full copy of all geocoding data supplied
with Tableau). Purging the unneeded data in this way can improve
performance (if only a subset of the data for a particular role is needed for
the particular analysis) and also allows the geocoding database to be
made much smaller, which reduces the size of resulting packaged
workbooks and saves quota on Tableau Public used by any workbooks
published there.
Page 5
Page 6
Installation Instructions
1) Download TabGeoHack.zip from here:
http://dl.dropbox.com/u/59458890/TabGeoHack.zip
and save it on a local drive.
The default location specified in the sample configuration files
included is C:\Data\Tableau putting it there means making less
changes to the configuration files.
2) Unzip the file, which will create a subdirectory of TabGeoHack,
containing the utility plus a couple of Firebird utilities (needed for
accessing the geocoding database) and some other components.
There are also two sub-directories: Sample is exactly what it says,
gdal is the suggested location to install the required GDAL utilities
(step 4).
3) Add the location where you have installed the utility to the PATH
(edit system environment variables from control panel), e.g.:
C:\Data\Tableau\TabGeoHack
4) Download version 1.9 of GDAL (Geographic Data Abstraction
Library) and save it in an appropriate location (such as the gdal
directory under TabGeoHack). The current stable release of GDAL
and a nightly build of the latest version are available at
GISINTERNALS. I have been using release-1600-gdal-1-9-0-mapserver6-0-1 (choose the zip file containing all components).
As the GISINTERNALS site often seems to be unavailable, Ive put a
copy of the version Ive been using in my Dropbox account, here:
http://dl.dropbox.com/u/59458890/release-1600-gdal-1-9-0mapserver-6-0-1.zip
5) Unzip the GDAL package.
Running GDAL components requires various directories to be on the
path and other environment variables to be set. This is done
automatically by tabgeohack, based on a setting in the
configuration file (step 6).
If you want to run the GDAL components standalone, as well as from
tabgeohack youll need to do this by following some fairly confusing
instructions available on the GDAL site by following the
information link next to each download. Or the script SDKShell.bat
can be run in a command window to set the environment variables,
but this is not persistent, so it is probably better to set them up
permanently.
Page 7
tableau_repository_path: C:\Users\richard\Documents\My
Tableau Repository
# GDAL installation path
GDAL_installation_path: C:\Data\Software\GDAL_1.9
A few other optional and little-used options are available, including
the ability to specify a German or French installation of Tableau
(although currently this only works for German). Refer to the
reference section at the end of this document for details and to
the known issues section for issues and workarounds when using
the French or German installations.
Page 8
Just remember that the positions of spaces, dashes and colons are all
crucial, and refer back to the example if it breaks. The utility attempts to
give meaningful messages about format errors.
The configuration file has four main sections:
Page 9
Example
An example configuration file and its associated shape files are included in
the sample directory supplied with the utility. This is for the Tsunami
warning zones for my home town of Porirua in New Zealand, along with
meshblock (a grouping of land parcels) data. Sample data associating
meshblocks with street addresses is also included. A sample workbook
allowing meshblocks to be located by street address and displayed
overlaying the Tsunami evacuation zones is published on Tableau Public
here.
The sample workbook illustrates the impact of simplifying the shapes, by
including three versions of the Tsunami zone boundaries: at full detail,
simplified to 10 metre tolerance and simplified to 100 metre tolerance
(see the various tabs in the workbook). To keep it as simple as possible,
the sample YAML file only generates one version of the Tsunami zone
boundaries: simplified to 10 metres.
The sample YAML file also illustrates how purging works, by selectively
retaining Australia and New Zealand and also retaining just the Wellington
Region (which corresponds to a State in Tableau). All cities within the
Wellington region are also implicitly retained.
The sample file is shown below.
Porirua Tsunami Warnings.yml
# location of input spatial data files
shape_file_dir: C:\Data\Tableau\TabGeoHack\Sample\Shape Files
# location of various generated files
output_dir: C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami
Warnings
# list of geographic roles to process
geographic_roles:
# list of geographic roles to create
# maximum length is due to Firebird identifier length limit
# of 31 and the need for a 'LocalData<role_name>' table
role_name: Porirua_Meshblock
# shape file(s) - note that this is a list of files to allow
# for layers being split across shape files normally there
# will be a single entry
shape_file_names:
porirua_mb_wgs84.shp
# list of fields from shape file to include in geocoding
# database
# Firebird identifier names are limited to 31 characters
required_geocoding_fields:
MB11:
# column name to be used in geocoding database
alias: Meshblock Code
# unique ID indicator (default false)
unique_id: true
# list of phrases to be used for automatic geocoding
heuristics:
meshblock
Page 10
Page 11
Country:
New Zealand
Australia
# All states (aka regions in NZ) are purged except the Wellington
# Region. Note that all states of other countries (including
# Australia) will be purged with this definition.
State:
Wellington
# City is not listed, so all cities except those within the
# Wellington Region (aka "State") are purged.
# City:
# County, ZipCode, AreaCode and CMSA are listed with no
# exceptions, so even New Zealand Postcodes (aka ZipCodes) are
# purged.
County:
ZipCode:
AreaCode:
CMSA:
Page 12
Page 13
The highlighted fields are all useful in setting the details in the
configuration file.
Page 14
In particular, note that the map units shown are the units which must be
used for the simplify_tolerance: setting, if that is used. So in the case of
the example, the simplify_tolerance for the tsunami zone boundaries is
specified in metres, since the tsunami shape file uses a projected
coordinate reference system with those units. To simplify the sample
meshblock data, however, the tolerance would need to be specified in
(fractional) degrees, since that shape file is using a geographic (lat/long)
coordinate reference system.
Page 15
C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami
Warnings\Custom Geocoding Files
Page 16
Page 17
Page 18
Role: State, 1 out of 1 rows with geometry.
100 points (min: 100, avg: 100, max: 100 per row).
Role: County, 0 out of 0 rows with geometry.
Role: ZipCode, 0 out of 0 rows with geometry.
Role: Porirua_Meshblock, 602 out of 602 rows with geometry.
18706 points (min: 5, avg: 31, max: 595 per row).
Role: Porirua_Tsunami, 3 out of 3 rows with geometry.
3387 points (min: 729, avg: 1129, max: 1507 per row).
Done in 1 seconds
C:\Data\Tableau\TabGeoHack\Sample>
10.
--revert switch to saved unprocessed
geocoding D/B
The revert option switches the current custom geocoding to use the
saved copy of the unprocessed geocoding database associated with the
specified configuration file (i.e. the version after importing custom
geocoding with Tableau, but before inserting shape data with the shapes
option.
tabgeohack --revert "Porirua Tsunami Warnings.yml"
C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --revert "Porirua Tsunami Warnings.yml"
Reverting to the saved copy of the unprocessed custom geocoding data from:
C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (no
geometry)
Done in 0 seconds
C:\Data\Tableau\TabGeoHack\Sample>
Page 19
Known Issues
1. The tableau_country_code option doesnt work for the French
installation of Tableau.
This is due to the need to use Unicode characters in the path for the
custom geocoding database (for the acute accent on the e in
Donnes locale, which is the directory name in the French version.
I have had a brief go at getting this working and not managed to
figure it out. For now the workaround is to rename the Donnes
locale directory in the repository to Local Data before running the
utility and back again afterwards.
2. Generated custom geocoding CSV files cannot be imported if the
user interface language is set to French or German. The CSV files
are required to use the correct French or German words for Latitude
and Longitude, as per the current language setting.
Again, I had a brief go at getting this going in German, but the a
umlaut in Lngengrad defeated me.
3. The simplify option can lead to gaps and overlaps in the shapes.
Unfortunately this is just how the GDAL library works. Experiment
with different settings to find the best compromise between speed
Page 20
Page 21
A divide by zero error during the --roles step. This was due to a
polygon with only 3 boundary points (which is not valid
according to the ESRI shape file specification). This caused the
area of the polygon to be zero, which broke the calculation of the
centre of the shape. Deleting that feature (or that polygon if the
feature has multiple polygons) is the best option.
The points in a polygon do not form a closed ring. This causes a
failure during processing of the --shapes option. In this case QGIS
did not detect the error.
The boundary of a polygon crosses itself. Again this can happen
during --shapes processing. This case was detected by QGIS and
can be relatively easily fixed.
An Out of Memory error during processing of the --shapes
option can also be caused by a self-intersecting polygon. How
the issue manifests itself depends on options chosen (such as
Page 22
Page 23
Page 24
# and the need for a 'LocalData<role_name>' table
- role_name: required text length 1 to 22
# names of shape files (or potentially any other type of spatial
# file supported by GDAL)
# this is a list of files to allow for layers being split across
# shape files normally just a single file
shape_file_names:
- required text
# coordinate reference system used in source spatial files. Only
# needed if not defined in source files
# Specified in any format understood by ogr2ogr (eg, WGS84 is
# specified as EPSG:4326)
source_crs: text
# suppress transformation of coordinate reference system, if
# shape file uses an unknown but geographic (lat/long) CRS
no_transform_crs: boolean
# number of digits of precision for latitude and longitude values
# (default set at tabgeohack config level)
precision: integer
# whether to generate a unique feature ID in the generated output
# files (default false)
generate_unique_id: boolean
# list of fields from shape file to include in geocoding database
# Firebird identifier names are limited to 31 characters
required_geocoding_fields:
<field_name>:
# column name to be used in geocoding database
alias: required text length 1 to 31
# unique ID indicator (default false)
unique_id: boolean
# list of phrases to be used for automatic geocoding
heuristics:
- required text
# list of fields from shape file to include in separate file of
# feature data which can be joined to datasource - field names
# from the shape file are listed, with optional aliases to be
# used as CSV column names (otherwise the original field name is
# used)
required_feature_fields:
<field_name>: optional text
# whether or not to generate CSV file of points
generate_points: boolean
# optional tolerance (in map units of source CRS) for
# simplification
simplify_tolerance: number
# optional number of iterations to allow coarse and then finer
# simplification (for example, specifying:
# simplify_tolerance: 100
# simplify_iterations: 10
# will simplify at 10, 20, 30 ... 100 metre tolerance)
simplify_iterations: integer
Page 25