Sie sind auf Seite 1von 2

Talend Tutorial Task Aid >

Joining Two Sources of Data

This tutorial uses Talend Open Studio Data Integration version 6

1. Create a new Job and add a source component


a. In a new Job, add the movies metadata file to the Job as a tFileInputDelimited component.
Note that the last column in the movies file is directorID. In this tutorial, you will use this
column to join the movies file to the directorID column in the directors file.

2. Create a new metadata file based on the simple file named


directors.txt and add it as the second source
a. Create a new delimited metadata file named directors.
b. To specify a sample file, click Browse next to the File field, select the file directors.txt from
the local disk, and click Open.
c. The field delimiter in this file is a comma. To change the delimiter, in the Field Separator
dropdown list, select Comma.
d. To refresh the file display to reflect the change made, click the Refresh Preview button.
Note: This file only has two columns: directorID and directorName.
e. Change the column names to directorID and directorName and the length of the
directorName field to 40.
The new metadata file, directors 0.1, is displayed under Metadata in the Project Repository.
You can now join the directorID column from the movies file with the directorID column from
the directors file.
f. Add the directors metadata file to the Job as the second source input.

3. Add and configure a tMap component to join the two source


components
a. Add a tMap component to the Job Designer.
b. To establish a flow of data between the two components to the tMap component, first
link movies to the tMap_1 component, and then link directors to the tMap_1 component.

Talend takes the complexity out of integration


Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend
Talend Tutorial Task Aid >

c. To rename the link between the source component and tMap components, click the link
twice and type movies. Similarly, rename the other link as directors.
d. To open the tMap component wizard, double-click the component.
Note that in the left section of the wizard, both movies and directors are listed as inputs to
the tMap component.
e. To create a new output data flow, in the output section of the tMap wizard, click the [+]
button. Name the new output as joinedOutput and click OK.
A blank output data flow is created.
f. To add the movieID, title, releaseYear, and url fields to the output data flow, select the
four fields from the movies input and drop them on the output data flow. Similarly, add
the directorName field to the output data flow from the directors input.
g. To configure the join, select the directorID column in the movies table and drag it to the
directorID column of the directors table.

4. Store the result in a file


a. To specify a destination for the output, add a tFileOutputDelimited component to the Job
Designer and link the joinedOutput output of the tMap component to it.
b. To configure the output component, in the Component view of the component, specify
the path and name for the output file and include a header row in the output file.
c. To run the Job, in the Run view, click Run.
d. To check the moviesComplete.csv file, navigate to the folder in which the file was created
and open the file.

Note: Some movies show no director name. These movies had no directorID.

Default behavior in the ETL tool:

All rows in the driving table will be displayed.


If the join condition was not satisfied, corresponding output fields will be blank.
This is a left outer join.

In contrast, you can configure the join in tMap as an inner join, which only displays rows
where the join condition is satisfied.

Talend takes the complexity out of integration


Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend

Das könnte Ihnen auch gefallen