Sie sind auf Seite 1von 10

Previous post

Next post

Extracting data from Taleo


MAY 17TH, 2017. Karol

5 14 0 0 19

Over the past few years we have seen companies focusing more and more on Human Resources / Human Capital
activities. This is no surprise, considering that nowadays a large number of businesses depend more on people skills
and creativity than on machinery or capital, so hiring the right people has become a critical process. As a
consequence, more and more emphasis is put on having the right software to support HR/HC activities, and this in
turn leads to the necessity of building a BI solution on top of those systems for a correct evaluation of processes and
resources. One of the most commonly used HR systems is Taleo, an Oracle product that resides in the cloud, so
there is no direct access to its underlying data. Nevertheless, most BI systems are still on-premise, so if we want to
use Taleo data, we need to extract it from the cloud first.

1. Taleo data extraction methods


As mentioned before, there is no way of direct access to Taleo data; nevertheless, there are several ways to extract
it, and once extracted we will be able to use it in the BI solution:

Querying Taleo API


Using Cloud Connector
Using Taleo Connect Client

API is very robust, but the most complex of the methods, since it requires a separate application to be written.
Usually, depending on the configuration of the BI system, either Oracle Cloud Connector, Taleo Connect Client or a
combination of both is used.
Figure 1: Cloud connector in ODI objects tree

Oracle Cloud Connector is a component of OBI Apps, and essentially its Java code that replicates Taleo entities /
tables. Its also easy to use: just by creating any Load Plan in BIACM using Taleo as the source system, a series of
calls to Cloud Connector are generated that effectively replicate Taleo tables to local schema. Although it works well,
it has 2 significant disadvantages:

Its only available as a component of BI Apps


It doesnt extract Taleo UDFs

So even if we have BI Apps installed and we use Cloud Connector, there will be some columns (UDFs) that will not
get extracted. This is why the use of Taleo Connect Client is often a must.

2. Taleo Connect Client


Taleo Connect Client is a tool that is used to export or import data from / to Taleo. In this article were going to focus
on extraction. It can extract any field, including UDFs, so it can be used in combination with BI Apps Cloud Connector
or, if thats not available, then as a unique extraction tool. There are versions for both Windows and Linux operating
systems. Lets look at the Windows version first.

Part 1 - Installation & running: Taleo Connect Client can be downloaded from the Oracle e-delivery website; just
type Taleo Connect Client into the searcher and you will see it on the list. Choose the necessary version, select
Application Installer and Application Data Model (required!), remembering that it must match the version of the Taleo
application you will be working with; then just download and install. Important the Data Model must be installed
before the application is installed.
Figure 2: Downloading TCC for Windows

After TCC is installed, we run it, providing the necessary credentials in the initial screen:
Figure 3: Taleo Connect Client welcome screen

And then, after clicking on ping, we connect to Taleo. The window that we see is initially empty, but we can create
or execute new extracts from it. But before going on to this step, lets find out how to see the UDFs: in the right panel,
go to the Product integration pack tab, also selecting the correct product and model. Then, in the Entities tab, we
can see a list of entities / tables, and in fields / relations, we can see columns and relations with other entities / tables
(through foreign keys). After the first run, you will probably have some UDFs that are not on the list of fields / relations
available. Why is this? Because what we initially see in the field list are only Taleo out-of-the-box fields, installed with
the Data Model installer. But dont worry, this can easily be fixed: use the Synchronize custom fields icon
(highlighted on the screenshot). After clicking on it you will be taken to a log-on screen where youll have to provide
log-on credentials again, and after clicking on the Synchronize button, the UDFs will be retrieved.

Figure 4: Synchronizing out-of-the-box model with User Defined Fields


Figure 5: Synchronized list of fields, including some UDFs (marked with 'person' icon)

Part 2 - Preparing the extract: Once we have all the required fields available, preparing the extract is pretty
straightforward. Go to File->New->New Export Wizard, then choose the desired Entity, and click on Finish. Now, in
the General window, set Export Mode to CSV-Entity, and in the Projections tab, select the columns that you want to
extract by dragging and dropping them from the Entity->Structure window on the right. You can also add filters or sort
the result set. Finally, save the export file. The other component necessary to actually extract the data is so-called
configuration. To create it, we select File->New->New Configuration Wizard, then we point the export file that weve
created in the previous step and, in the subsequent step, our endpoint (the Taleo instance that we will extract the
data from). Then, on the following screen, there are more extract parameters, like request format and encoding,
extract file name format and encoding and much more. In most cases, using the default values of parameters will let
us extract the data successfully, so unless its clearly required, there is no need to change anything. So now the
configuration file can be saved and the extraction process can start, just by clicking on the Execute the configuration
button (on the toolbar just below the main menu). If the extraction is successful, then all the indicators in the
Monitoring window on the right will turn green, as in the screenshot below.
Figure 6: TCC successfull extraction

By using a bat file created during the installation, you can schedule TCC jobs to be executed on a timely basis, using
Windows Scheduler, but its much more common to have your OBI / BI Apps (or almost any other DBMS that your
organization uses as a data warehouse) installed on a Linux / Unix server. This is why were going to have a look at
how to install and set up TCC in a Linux environment.

Part 3 - TCC in a Linux / Unix environment: TCC setup in a Linux / Unix environment is a bit more complex. To
simplify it, we will use some of the components that were already created and used when we worked with Windows
TCC, and although the frontend of the application is totally different (to be precise, there is no frontend at all in the
Linux version as its strictly command-line), the way the data is extracted from Taleo is exactly the same (using
extracts designed as XML files and Taleo APIs). So, after downloading the application installer and data model from
edelivery.oracle.com , we install both components. Installation is actually just extracting the files, first from zip to tgz,
and then from tgz to uncompressed content. But this time, unlike in Windows, we recommend installing (extracting)
the application first, and then extracting the data model files to an application subfolder named featurepacks (this
must be created, it doesnt exist by default). Its also necessary to create a subfolder system in the application
directory. Once this is done, you can move some components of your Windows TCC instance to the Linux one (of
course, if you have no Windows machine available, you can create any of these components manually):

Copy file default.configuration_brd.xml from windows TCC/system to the Linux


TCC/system
Copy extract XML and configuration XML files, from wherever you had them
created, to the main Linux TCC directory

There are also some changes that need to be made in the TaleoConnectClient.sh file

Set JAVA_HOME variable there, at the top of the file (just below #!/bin/bash
line), setting it to the path of your Java SDK installation (for some reason, in our
installation, system variable JAVA_HOME wasnt captured correctly by the
script)
In the line below #Execute the client, after the TCC_PARAMETERS variable,
add:

parameters of proxy server if it is to be used:


1 -Dhttp.proxyHost=ipNumber Dhttp.proxyPort=portNumber

path of Data Model:


-Dcom.taleo.integration.client.productpacks.dir=/u01/oracle/tcc-
1 15A.2.0.20/featurepacks

So, in the end, the TaleoConnectClient.sh file in our environment has the following content (IP addresses where
masked):

#!/bin/sh
1 JAVA_HOME=/u01/middleware/Oracle_BI1/jdk
2 # Make sure that the JAVA_HOME variable is defined if [ ! "${JAVA_HOME}" ] then
3 echo +-----------------------------------------+
4 echo "+ The JAVA_HOME variable is not defined. +"
5 echo +-----------------------------------------+
exit 1
6 fi
7
8 # Make sure the IC_HOME variable is defined if [ ! "${IC_HOME}" ] then
9 IC_HOME=.
10 fi
11
# Check if the IC_HOME points to a valid taleo Connect Client folder if [ -e "${IC_HOM
12 integrationclient.jar" ] then
13 # Define the class path for the client execution
14 IC_CLASSPATH="${IC_HOME}/lib/taleo-integrationclient.jar":"${IC_HOME}/log"
15
16 # Execute the client
17 ${JAVA_HOME}/bin/java ${JAVA_OPTS} -Xmx256m ${TCC_PARAMETERS}
-Dhttp.proxyHost=10.10.10.10 -Dhttp.proxyPort=8080 -Dco m.taleo.integration.client.pro
18 15A.2.0.20/featurepacks
19 -Dcom.taleo.integration.client.i
20 nstall.dir="${IC_HOME}" -Djava.endorsed.dirs="${IC_HOME}/lib/endorsed"
21 -Djavax.xml.parsers.SAXParserFactory=org.apache.xe
rces.jaxp.SAXParserFactoryImpl
22
-Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl
23 -Dorg.apache.
24 commons.logging.Log=org.apache.commons.logging.impl.Log4JLogger
25 -Djavax.xml.xpath.XPathFactory:http://java.sun.com/jaxp/x
path/dom=net.sf.saxon.xpath.XPathFactoryImpl -classpath ${IC_CLASSPATH} com.taleo.inte
26 echo +----------------------------------------------------------------------------
27 echo "+ The IC_HOME variable is defined as (${IC_HOME}) but does not contain the T
28 echo "+ The library ${IC_HOME}/lib/taleo-integrationclient.jar
29 cannot be found. "
echo +----------------------------------------------------------------------------
30 exit 2
31 fi
32
33
34
35
36

Once this is ready, we can also apply the necessary changes to the extract and configuration files, although there is
no need to change anything in the extract definition (file blog_article_sq.xml). Lets have a quick look at content of this
file:

<?xml version="1.0" encoding="UTF-8"?>


<quer:query productCode="RC1501"
model="http://www.taleo.com/ws/tee800/2009/01"
projectedClass="JobInformation" locale="en" mode="CSV-ENTITY"
largegraph="true" preventDuplicates="false"
1xmlns:quer="http://www.taleo.com/ws/integration/query"><quer:subQueries/><qu
er:projections><quer:projection><quer:field
2path="BillRateMedian"/></quer:projection><quer:projection><quer:field
path="JobGrade"/></quer:projection><quer:projection><quer:field
path="NumberToHire"/></quer:projection><quer:projection><quer:field
path="JobInformationGroup,Description"/></quer:projection></quer:projections
><quer:projectionFilterings/><quer:filterings/><quer:sortings/><quer:sorting
Filterings/><quer:groupings/><quer:joinings/></quer:query>

Just by seeing the file we can figure out how to add more columns manually: we just need to add more quer tags, like

<quer:projection><quer:field
1 path="DesiredFieldPath"/></quer:projection>

With regard to the configuration file, we need to make some small changes: in tags cli:SpecificFile and cli:Folder
absolute Windows paths are used. Once we move the files to Linux, we need to replace them with Linux filesystem
paths, absolute or relative.

Once the files are ready, the only remaining task is to run the extract, which is done by running:

1 ./TaleoConnectClient.sh blog_article_cfg.xml

See the execution log:

[KKanicki@BIApps tcc-15A.2.0.20]$ ./TaleoConnectClient.sh blog_article_cfg.xml


1 2017-03-16 20:18:26,876 [INFO] Client - Using the following log file: /biapps/tcc_linu
2 2017-03-16 20:18:26,876 [INFO] Client - Using the following log file: /biapps/tcc_linu
3 2017-03-16 20:18:27,854 [INFO] Client - Taleo Connect Client invoked with configuratio
2017-03-16 20:18:27,854 [INFO] Client - Taleo Connect Client invoked with configuratio
4 2017-03-16 20:18:31,010 [INFO] WorkflowManager - Starting workflow execution
5 2017-03-16 20:18:31,010 [INFO] WorkflowManager - Starting workflow execution
6 2017-03-16 20:18:31,076 [INFO] WorkflowManager - Starting workflow step: Prepare Expor
7 2017-03-16 20:18:31,076 [INFO] WorkflowManager - Starting workflow step: Prepare Expor
2017-03-16 20:18:31,168 [INFO] WorkflowManager - Completed workflow step: Prepare Expo
8 2017-03-16 20:18:31,168 [INFO] WorkflowManager - Completed workflow step: Prepare Expo
9 2017-03-16 20:18:31,238 [INFO] WorkflowManager - Starting workflow step: Wrap SOAP
10 2017-03-16 20:18:31,238 [INFO] WorkflowManager - Starting workflow step: Wrap SOAP
11 2017-03-16 20:18:31,249 [INFO] WorkflowManager - Completed workflow step: Wrap SOAP
12 2017-03-16 20:18:31,249 [INFO] WorkflowManager - Completed workflow step: Wrap SOAP
2017-03-16 20:18:31,307 [INFO] WorkflowManager - Starting workflow step: Send
13 2017-03-16 20:18:31,307 [INFO] WorkflowManager - Starting workflow step: Send
14 2017-03-16 20:18:33,486 [INFO] WorkflowManager - Completed workflow step: Send
15 2017-03-16 20:18:33,486 [INFO] WorkflowManager - Completed workflow step: Send
16 2017-03-16 20:18:33,546 [INFO] WorkflowManager - Starting workflow step: Poll
2017-03-16 20:18:33,546 [INFO] WorkflowManager - Starting workflow step: Poll
17 2017-03-16 20:18:34,861 [INFO] Poller - Poll results: Request Message ID=Export-JobInf
18 Number=123952695;State=Completed;Record Count=1;Record Index=1;
19 2017-03-16 20:18:34,861 [INFO] Poller - Poll results: Request Message ID=Export-JobInf
20 Number=123952695;State=Completed;Record Count=1;Record Index=1;
21 2017-03-16 20:18:34,862 [INFO] WorkflowManager - Completed workflow step: Poll
2017-03-16 20:18:34,862 [INFO] WorkflowManager - Completed workflow step: Poll
22 2017-03-16 20:18:34,920 [INFO] WorkflowManager - Starting workflow step: Retrieve
23 2017-03-16 20:18:34,920 [INFO] WorkflowManager - Starting workflow step: Retrieve
24 2017-03-16 20:18:36,153 [INFO] WorkflowManager - Completed workflow step: Retrieve
25 2017-03-16 20:18:36,153 [INFO] WorkflowManager - Completed workflow step: Retrieve
26 2017-03-16 20:18:36,206 [INFO] WorkflowManager - Starting workflow step: Strip SOAP
2017-03-16 20:18:36,206 [INFO] WorkflowManager - Starting workflow step: Strip SOAP
27 2017-03-16 20:18:36,273 [INFO] WorkflowManager - Completed workflow step: Strip SOAP
28 2017-03-16 20:18:36,273 [INFO] WorkflowManager - Completed workflow step: Strip SOAP
29 2017-03-16 20:18:36,331 [INFO] WorkflowManager - Completed workflow execution
30 2017-03-16 20:18:36,331 [INFO] WorkflowManager - Completed workflow execution
2017-03-16 20:18:36,393 [INFO] Client - The workflow execution succeeded.
31
32
33
34
35
36

And thats it! Assuming our files were correctly prepared, the extract will be ready in the folder declared in cli:Folder
tag of the configuration file.

As for scheduling, different approaches are available, the most basic being to use the Linux crontab as the scheduler,
but you can also use any ETL tool that is used in your project easily. See the screenshot below for an ODI example:
Figure 7: TCC extracts placed into 1 ODI package