Beruflich Dokumente
Kultur Dokumente
TERR
Tutorial
A STEP BY STEP GUIDE
By Daniel Smith,
Director of Data Science and Innovation
I’ll admit, when I first started doing analytics, I didn’t know how to
code a thing… I leaned heavily on tools like SPSS and JMP – tools
which allowed me to see all my data like a spreadsheet, then point
and click to compare columns or add columns and factors to a
model.
My idea of an “advanced” model was something put together in a
spreadsheet or that I pieced together with a calculator.
I knew there were more powerful analytical tools out there, but in the
early 2000’s the concept of a data scientist was immature at best,
unknown at worst. The only community of programming statisticians
with a low knowledge barrier to entry was SAS and I had neither the
financial means nor the inclination to become a SAS developer.
Therefore, I turned to the only option for a broke analyst to become a
not-so-broke statistician (as they were called at the time) – the R
language.
After lots of books and wading through obtuse forum posts, I
managed to obtain a working knowledge of the language. My
analytics ability, and job opportunities, skyrocketed.
Now, lucky for us all, R is much more accessible: point and click tools
like Rattle and RCommander easily extend the platform; quick start
guides are available online for free; and RStudio has become a
mature IDE (No more command line!).
If you have any questions about this content, feel free to reach out at
Daniel.Smith@Syntelli.com
Out of the box, Spotfire Analyst contains TERR. The only requirement
is to be connected to some data to analyze and have the appropriate
permissions.
Let’s start exploring TERR with the basics, a point and click linear
regression. With every Spotfire installation, there should be a set
demo data in your library. I’ll be using the Baseball data set in this
example. If you don’t have it, don’t worry, you can follow along, it’s
really simple!
Note:
There is a slight correlation between hits and errors, but not enough to
not do this example. This is likely an artifact of number of games
played, we could probably control for this by dividing all our values by
number of games played; however, this example is about using TERR,
not modeling best practices.
Here you also see all the other out of the box statistical capabilities
Spotfire offers. All of these use TERR to provide insights.
In our regression model dialog box, we see a few options:
The two most important items here are the “Input parameter name”
field and “Type” drop down. Our input parameter tells Spotfire what
value in our script we want to accept our data. In this example it is
column1. As you can see below, the blue text matches the blue
highlighted value in the above image.
Now that the script is defined and we have our input and output
parameters linked to the script, it is time to run the function!
Save the function if you like (you’ll have to be connected to a spotfire
server as data functions are saved in the library) and click “Run”.
Before the function completely executes you will need to complete
the handshake between the script parameters and the Spotfire
application. First we’ll need to tell the function what Spotfire data to
pass into the input parameter and where to put the data from the
output parameter.
Click OK and you should see a new column in your data table!
The next chapters will focus on functionality that may not be obvious
in TERR, such as pulling from an API, using R inside TERR, and how
to develop your TERR scripts inside the RStudio IDE. However, all will
build upon the concepts of getting data into and out of Spotfire. If
you are not completely confident in the Spotfire TERR dataflow,
there will be a brief review in next chapter.
Please note, I will be referring to the Census web service as the “Census
API” for much of this chapter; however, at times I may say “Census URL”
particularly when speaking of the parameter’s required in the URL. For
clarity, the Census URL is the “Interface” portion of the Census
Application Program Interface.
But first, a warning: This solution is only for a local device running TERR
and R. It does not address using Statistics Service or configuring
analysis for online consumers.
There Are Several R was initially entered line by line and that option
Ways to Access R remains today. Assuming you installed R in your
default directory you can access R Console
here: C:\Program Files\R\R-3.0.2\bin. Simply
double click R.exe to open the Console. It looks
just like a command line interface in Windows or
a Terminal in UNIX.
2. R GUI (Rgui.exe)
R also comes packaged with a basic GUI application. There are 32-
and 64-bit versions, both found in the x32and x64 folders contained in
the “bin” folder you found R.exe and both named Rgui.exe. The R GUI
is also the application opened when selecting the R application from
your start menu or desktop, if you elected to have those shortcuts
created when installing R.
Now you’ve installed R and you can open it, so let’s use it!
See the image below for some help if this is your first time using R:
All you get initially is a fancy version of the R Console. The red cursor
at the bottom is where you enter commands, so let’s enter a
command.
First, to verify R is working, let’s try something simple. Type 2+2 and
hit Enter:
It should return 4.
What. Is.
Happening?
Once you select a CRAN mirror, jsonlite will begin to install. R may
also install other dependent packages required for jsonlite to function
properly if they are not already present. So don’t panic if it starts
“Installing Dependencies.”
If there are no error messages when calling the library, you have
successfully installed jsonlite.
Our first step is figuring out how to use the Census API within R.
2) Request a key
I changed the key after entering the URL, but it gives an example of
what the final product will look like.
Open up the R GUI if it isn’t already, then go to File > New Script.
A new window will open. This window will allow you to type multiple
lines of R code which can be ran in a batch. Copy and paste the
following, and make sure to use with your API key instead of [KEY]:
library(jsonlite)
fromJSON(http://api.census.gov/data/2010/sf1?key=[KEY]&get=P00
10001,NAME&for=state:*
,stringsAsFactors = FALSE
data
Highlight everything, then hit F5 or Ctrl + R to run the script., the output
should look like this:
We are pulling in data with the Census API, fantastic! Now we just
copy and paste this into the register data function window and call it a
day, right? Nope… turns out jsonlite is one of those packages that
doesn’t work in TERR.
But all is not lost, TERR includes a package “RinR” pre-installed and
“RinR” includes a function “REvaluate” that allows the use of a local R
instance within TERR.
Note:
this will not work in R GUI, so keep reading for the chapterdescribing
how to create a TERR and R development environment in Rstudio.
library(jsonlite);
data_R <-data.frame(
fromJSON(“http://api.census.gov/data/2010/sf1?key=[KEY]&get=P0
010001,NAME&for=state:*”)
,stringsAsFactors = FALSE)
data_R
)
colnames(data_pre) <- as.character((data_pre[1,, drop = TRUE]))
data <- data_pre[-1,]: Finally, we do not need the first row anymore,
so we delete the first row of data_pre (data_pre[-1,])and assign it to
our ultimate data variable “data.”
as Table
Final Script:
Now select “Run” at the top to try it out. If everything goes according
to plan you will get a prompt telling you where your data needs to go
in Spotfire and which Spotfire tables you are retrieving data from, if
you had inputs.
Now that we have the connection there is a lot more we can do with
the Census API. In subsequent chapters we will explore how to pass
different values into the API URL via input parameters – creating a
way to dynamically change the data pulled from the Census API
without ever having to store files locally.
The request was to query Twitter from Spotfire then score the returned
Tweets. We decided to welcome this challenge by treating it as another
formidable task that could not be ignored. In other words, we couldn’t turn
down a chance to just go for it.
The basis for making the solution work in TERR was twofold:
from my previous chapter, TERR does not play nice with Curl. This
meant we had to use REvaluate() to access Twitter. However, using
REvaluate() presented another challenge in and of itself. That is, we
knew how to get data out of REvaluate() and into Spotfire, but we did
not know how to get data into REvaluate().
4. Make a new document property with data type "String" then click
“OK.”
5. Now enter the first value you would like to search in Twitter. (e.g.
#cats)
4. Your script will need some value for Spotfire to input your
Do not forget to include REvaluator and data. By doing so, you will get
that input value that was just set into your regular R instance.
Step 3:
use REvaluate()
2. Tell your script what you want to pass into your R script – through
REvaluate() of course.
demonstrated in Step 1.
5. Assign your output parameter to your desired output and you are
good to go!
Tip:
If you want the data to refresh every time someone types a new value
into the property control, select refresh function automatically.
For those of you using Spotfire and not using TERR, it's an incredibly
powerful feature you really need to add to your toolbox. It effectively
gives you the ability to use every type of analytical function and data
manipulation process ever developed. From Latent Factor Clustering
to scripted ETL - it pretty much does it all. However, trying to
develop code inside the Spotfire TERR console can be a bit of a pain.
Why? Because you have to click through three menus to execute the
code, then if there is an error, you have to parse through Spotfire and
TERR's errors to find the cause.
"But RStudio uses R not TERR, so you can't do that!" Actually, you
can! But it requires a few not so obvious steps. Let me show you
how:
Over the last few years, we have gained significant expertise in TIBCO
®Spotfire, which is our solution of choice for deeper analytics and
integration with R. We have done over 100 TIBCO® Spotfire projects
and employ some of the most experienced consultants in the
industry. The verticals in which we have worked include
sports/entertainment, life sciences/pharmaceuticals, oil & gas, retail,
manufacturing, professional services, hospitality, telecom, financials,
and healthcare. The clients we’ve worked with have revenues from
$100M to $100BN. The horizontals in which we have expertise include
sales analytics, marketing analytics, campaign analytics, and supply
chain and distribution analytics.