Beruflich Dokumente
Kultur Dokumente
Talend
Presentation on Talend MDM
Bhushan Maindarkar.
Table of Contents
1. General Information ....................................................................................................................... 4
1.1. What is ETL .............................................................................................................................. 4
1.2. What is Talend ......................................................................................................................... 4
1.3. What is Talend Open Studio .................................................................................................... 4
2. Installation .......................................................................................................................... 5
2.1. Hardware requirement ............................................................................................................ 5
2.2. Software requirement ............................................................................................................. 5
2.3. Configure the memory settings ............................................................................................... 5
2.4. Launch the Studio .................................................................................................................... 2
3.Talend Integration ................................................................................................................. 6
3.1. Create New Project ................................................................................................................. 7
3.2. Delete Project .......................................................................................................................... 7
3.3. Getting started with a basic Job Creating a Job ...................................................................... 8
3.4. Workspace window ................................................................................................................. 9
3.5. Add components to the Job ................................................................................................. 10
3.6. List of components ................................................................................................................ 11
3.7 Connect the components together ....................................................................................... 13
3.8. Connect components using drag and drop method .............................................................. 13
3.9. Configuring the components ................................................................................................. 14
3.10. Execute Job ........................................................................................................................... 15
3.11. Custom code components ................................................................................................... 16
3.11.1. tjava component ..................................................................................................... 16
3.11.2. tjavaRow component .............................................................................................. 18
3.11.3. tjavaFlex component ............................................................................................... 20
3.11.4. tLibraryLoad component ......................................................................................... 22
3.11.5. tSetGlobalVar component ....................................................................................... 23
3.12. Connection components ............................................................................................ 24
3.12.1. tMysqlInput component ............................................................................ 25
3.12.2. tMysqlOutput component ......................................................................... 26
1
Fidel Technologies Pvt Ltd
2
Fidel Technologies Pvt Ltd
3
Fidel Technologies Pvt Ltd
1. General Information
Business modeling
Graphical development
Metadata-driven design and execution
Real-time debugging
Robust execution
4
Fidel Technologies Pvt Ltd
2. Installation
Before installing your Talend product, make sure the machines you are using meet
the following hardware requirements recommended by Talend.
Memory usage heavily depends on the size and nature of your Talend projects.
However, in summary, if your Jobs include many transformation components, you
should consider upgrading the total amount of memory allocated to your servers,
based on the following recommendations
Memory Usage
Disk usage:
Product Client/Server Required disk space Required disk space for use
for installation
Studio Client 3 GB 3+GB
5
Fidel Technologies Pvt Ltd
5. Under System Variables, select the Path variable, click Edit... and add the
following variable at the end of the Path variable
value: ;%JAVA_HOME%\bin
2.3. Download
Download the product from talend website.
Note that the .zip file contains binaries for ALL platforms (Linux/Unix,
Windows and MacOS).
Once the download is complete, extract the archive file on your hard drive.
3. Talend Integration:
Fast and cost effective way to connect data
Maximize the value of data to your business with Talend Data Integration software,
a modern data platform based on an open and scalable architecture. Graphical
tools and wizards help you develop and deploy data integration jobs 10 times
faster than hand coding, at 1/5th the cost of competitors. Increase your productivity
today with a free trial of our commercial edition.
6
Fidel Technologies Pvt Ltd
2. On the login window, select the Create a new project option and enter a
project name in the field.
3. Click Finish to create the project and open it in the Studio.
1. On the login screen, click Manage Connections, then on the dialog box that
opens click Delete Existing Project(s) to open the [Select Project] dialog box.
7
Fidel Technologies Pvt Ltd
8
Fidel Technologies Pvt Ltd
9
Fidel Technologies Pvt Ltd
1. Enter the search keyword(s) in the search field of the Palette and press
Enter to validate your search.
2. Select the component you want to use and click on the design workspace
where you want to drop the component.
3. Note that you can also drop a note to your Job the same way you drop
components.
10
Fidel Technologies Pvt Ltd
11 tMysqlInput READ MYSQL table and extract fields based on Mysql query.
12 tMysqlOutput INSERT or UPDATE lines into MYSQL Database.
13 tMysqlConnection Create a connection to a MYSQL Database.
14 tAggregateRow tAggregateRow receives a input and aggregates it based on one or more
columns.
15 tAggregateSortedRow tAggregateRow receives a input and aggregates it based on one or more
11
Fidel Technologies Pvt Ltd
columns.
16 tExternalSortedRow tAggregateSortedRow receives a sorted flow and aggregates it based on
one or more columns. For each output line, are provided the aggregation
key and the relevant result of set operations (min, max, sum)
17 tFilterRow tFilterRow component is used to filter input rows by setting conditions on
the selected columns.
18 tMap tMap allow Join, columns row filtering, transformation and sort type and
order.
19 tSampleRow tSampleRow filter rows according to the line numbers.
20 tSortRow tSortRow component sorts input data based on one or several columns, by
sort type and order.
21 tXMLMap tXMLMap allow Allows Join, columns row filtering ,transformation and
multiple output.
22 tFileInputDeliminated tFileInputDelimited reads a given file row by row with simple separated
fields.
23 tFileInputExcel tFileInputExcel reads an Excel file (.xls or .xlsx) and extracts data line by
line.
24 tFileInputFullRow tFileInputFullRow opens a file and reads it row by row and sends complete
rows as defined in the Schema to the next job component, via a Row link.
25 tFileInputLDIF tFileOutputLDIF outputs data to an LDIF type of file which can then be
loaded into a LDAP directory.
26 tFileInputMail reads the header and content parts of an email file defined
27 tFileInputMSDeliminated tFileInputMSDelimited reads a complex multi-structured delimited file.
28 tFileInputMSPositional tFileInputMSDelimited reads a complex multi-structured delimited file.
29 tFileInputXML tFileInputXML reads an XML structured file and extracts data row by row.
30 tFileInputRegrex Powerful feature which can replace number of other components of the File
family. Requires some advanced knowledge on regular expression syntax
31 tFileOutputDeliminated tFileOutputDeliminated Write to a file row by row with simple separated
fields
32 tFileOutputExcel tFileOutputExcel writes an MS Excel file with separated data value according to a
defined schema.
33 tFileOutputRow tFileOutputRow write data into file.
34 tFileOutputLDIF tFileOutputLDIF writes or modifies a LDIF file with data separated in respective
entries based on the schema defined,.or else deletes content from an LDIF file.
35 tFileOutputMSDeliminated tFileOutputMSDeliminated writes into file based on schema
36 tFileOutputMSPositional tFileOutputMSPositional writes into file based on position of field in a string.
37 tFileOutputXML tFileOutputXML writes an XML file with separated data value according to a
defined schema.
38 tHttpRequest The tHttpRequest component is part of the Internet family of components, and
makes both POST and GET requests to the
39 tRest The tREST component serves as a REST Web service client that sends HTTP
requests to a REST Web service provider and gets the responses.
40 tExtractJSONField tExtractJSONFields extracts the data from JSON fields stored in a file, a database
table, etc., based on the XPath query.
41 tMsgBox It displayed the message box
42 tUnite Merges data from various sources, based on a common schema.
43 tReplicate Duplicate the incoming schema into two identical output flows.
12
Fidel Technologies Pvt Ltd
Now that the components have been added on the workspace, they have to be
connected together. Components connected together form a subjob. Jobs are
composed of one or several subjobs carrying out various processes.In this
example, as the tLogRow and tFileOutputDelimited components are already
connected, you only need to connect the tFileInputDelimited to the tLogRow
component.To connect the components together, use either of the following
methods:
3. In the contextual menu that opens, select the type of connection you want to
use to link the components, Row > Main in this example.
4. Click the target component to create the link, tLogRow in this example
13
Fidel Technologies Pvt Ltd
14
Fidel Technologies Pvt Ltd
15
Fidel Technologies Pvt Ltd
Batch design:
tRowGenerator_1
16
Fidel Technologies Pvt Ltd
tJava Code:
String abc;
17
Fidel Technologies Pvt Ltd
System.out.println("Hello");
Output:
3.11.2 tjavaRow
tRowGenerator_1
18
Fidel Technologies Pvt Ltd
tJava Code:
//Code generated according to input schema and output schema
System.out.println("tJavaRow");
output_row.First_Name = StringHandling.UPCASE(input_row.First_Name);
19
Fidel Technologies Pvt Ltd
output_row.Last_Name = input_row.Last_Name;
output_row.City = input_row.City;
Output:
3.11.3 tjavaFlex
Batch Design:
20
Fidel Technologies Pvt Ltd
tJavaFlex Code
Schema of tJavaFlex :
Output :
21
Fidel Technologies Pvt Ltd
[statistics] disconnected
Job tjava ended at 16:12 18/05/2017. [exit code=0]
3.11.4 tLibraryLoad
22
Fidel Technologies Pvt Ltd
3.11.5 tSetGlobalVar
Batch Design :
tJava Code :
23
Fidel Technologies Pvt Ltd
Output :
Starting job tjava at 18:04 18/05/2017.
24
Fidel Technologies Pvt Ltd
3.12.1. tMysqlInput :
Batch Design :
tMysqlInput Schema :
Output :
Starting job tjava at 18:52 18/05/2017.
25
Fidel Technologies Pvt Ltd
[statistics] disconnected
Job tjava ended at 18:52 18/05/2017. [exit code=0]
3.12.2. tMysqlOutput
Batch Design :
Output :
Starting job tjava at 18:52 18/05/2017.
26
Fidel Technologies Pvt Ltd
[statistics] disconnected
Job tjava ended at 18:52 18/05/2017. [exit code=0]
3.12.3. tMysqlConnection :
Batch Design :
27
Fidel Technologies Pvt Ltd
Output :
28
Fidel Technologies Pvt Ltd
3.13.1. taddCRCRow
Batch Design :
29
Fidel Technologies Pvt Ltd
Output:
Starting job dataquality at 11:34 19/05/2017.
[statistics] disconnected
Job dataquality ended at 11:34 19/05/2017. [exit code=0]
Batch Design :
tRowGenertor1:
30
Fidel Technologies Pvt Ltd
tRowGenertor2:
Output :
Starting job chgfileEncode at 12:17 19/05/2017.
31
Fidel Technologies Pvt Ltd
'----------+----------+--------------'
[statistics] disconnected
Job chgfileEncode ended at 12:17 19/05/2017. [exit code=0
1.13.3. tReplace :
Batch Design :
tReplace Component:
Output :
Starting job chgfileEncode at 12:41 19/05/2017.
32
Fidel Technologies Pvt Ltd
|4 |Cohen |John |
|5 |Park |Umar |
|6 |Knipp |Troy |
|7 |Lunberg |Greg |
|8 |Brown |Sami |
|9 |Barnhill |Pascal|
|10 |Rose |Aaron |
|11 | | |
|12 | | |
'---------+----------+------'
[statistics] disconnected
Job chgfileEncode ended at 12:41 19/05/2017. [exit code=0]
3.13.4. tUniqRow :
Batch Design :
33
Fidel Technologies Pvt Ltd
34
Fidel Technologies Pvt Ltd
Output:
Starting job tuniqRow at 18:21 22/05/2017.
35
Fidel Technologies Pvt Ltd
|=------------------+--=|
|ABC |PQR|
|=------------------+--=|
|-6.333333333333332 |A |
|18.499999999999996 |B |
|21.055555555555557 |C |
|-1.2222222222222219|X |
|32.666666666666664 |Q |
'-------------------+---'
.-------------------+---.
| Duplicate |
|=------------------+--=|
|ABC |PQR|
|=------------------+--=|
|2.000000000000001 |A |
|-1.8333333333333337|A |
'-------------------+---'
[statistics] disconnected
Job tuniqRow ended at 18:21 22/05/2017. [exit code=0]
36
Fidel Technologies Pvt Ltd
3.14.1. tAggregateRow:
Batch Design :
tAggregateRow component
37
Fidel Technologies Pvt Ltd
tMap component
[statistics] disconnected
Job tAggregateRow ended at 15:09 19/05/2017. [exit code=0]
38
Fidel Technologies Pvt Ltd
Batch Design :
39
Fidel Technologies Pvt Ltd
Output :
3.14.3. tSortRow:
Batch Design :
tSortRow Component :
40
Fidel Technologies Pvt Ltd
Output :
3.14.4. tAggregateSortedRow:
Batch Design :
41
Fidel Technologies Pvt Ltd
tAggregateSortedRow :
Output :
Starting job tAggregateSorted at 16:00 19/05/2017.
[statistics] disconnected
Job tAggregateSorted ended at 16:00 19/05/2017. [exit code=0]
42
Fidel Technologies Pvt Ltd
3.14.5. tSampleRow:
Batch Design :
tSampleRow Component :
Output :
43
Fidel Technologies Pvt Ltd
3.14.6. tXMLMap
Job Design :
tXMLMap :
44
Fidel Technologies Pvt Ltd
tFileOutputDelimited :
Output :
45
Fidel Technologies Pvt Ltd
3.15.1. tHttpRequest:
tHttpRequest Component :
46
Fidel Technologies Pvt Ltd
Output:
3.15.2 tRest :
Job Design :
47
Fidel Technologies Pvt Ltd
tRest Component :
3.15.3. tExtractJSONField:
Output:
48
Fidel Technologies Pvt Ltd
3.15.4. tUnite :
Batch Design :
49
Fidel Technologies Pvt Ltd
Schema
Output
50
Fidel Technologies Pvt Ltd
3.15.5. tReplicate :
Batch Design :
Schema of tReplicate:
51
Fidel Technologies Pvt Ltd
tFilterRow 1:
tFilterRow2:
Output :
52
Fidel Technologies Pvt Ltd
53
Fidel Technologies Pvt Ltd
54
Fidel Technologies Pvt Ltd
55
Fidel Technologies Pvt Ltd
56
Fidel Technologies Pvt Ltd
57
Fidel Technologies Pvt Ltd
58
Fidel Technologies Pvt Ltd
59
Fidel Technologies Pvt Ltd
60
Fidel Technologies Pvt Ltd
61
Fidel Technologies Pvt Ltd
Modeling:
Before we get started, lets first get a common understanding of the most important
MDM terms:
Term Description
(business) element Also referred to as business attribute. The actual name of the data
point.
(business) entity Describes the actual data (the elements), its nature, its structure and
its relationships.1 An entity can have one or more business elements.
The Talend MDM jargon for this concept is data model entity.
data model type This is an element or collection of elements which is globally defined
and can be used across various entities. This makes maintenance of
common elements easier.
data model Defines the attributes (elements), user access rights and relationships
of entities mastered by the MDM Hub. The data model is the central
component of Talend MDM. A data model maps to one or more
(business) entities that can be explicitly defined. Any concept can be
defined by a data model.1 A data model can have multiple entities.
(business) domain A collection of data models that define a particular concept. For
instance, the customer domain may be defined by the organization,
62
Fidel Technologies Pvt Ltd
Talend MDM Architecture can be broken down into functional blocks that enable
interaction between users and the MDM Hub and their corresponding IT needs. Here
are the main building blocks of Talend MDM
The Clients block includes one or more Talend Studio and Web browsers that
could be on the same or on different machines.
From the Studio, you can set up and operate a centralized repository. You can
build data models that employ
The necessary business and data rules to create a single copy of the master data. This
master data will be propagated back to target and source systems.
From the Web browser, you can search, display or edit master data with tasks
defined by the Studio.
The Server block includes an MDM server - where the master data are governed
and monitored.
The Database block includes the MDM database - where the master data and the
system data are stored
63
Fidel Technologies Pvt Ltd
In the Studio workspace, an editor opens where you can define the details of your new
data model. The new data model and data container are listed in the MDM Repository
tree view.
64
Fidel Technologies Pvt Ltd
65
Fidel Technologies Pvt Ltd
66
Fidel Technologies Pvt Ltd
Web GUI:
On successful installation, http://localhost:8080/talendmdm will show:
67
Fidel Technologies Pvt Ltd
The open source version comes with only two user accounts (it is restricted to these two
ones):
standard user
user: user
password: user
admin
user: administrator
password: administrator
68
Fidel Technologies Pvt Ltd
69