TalendOpenStudio Components RG 40b en

Talend Open Studio
Components
4.X
Reference Guide
Intentionally
Blank
ii Talend Open Studio Components Creative Commons License

Version 4.0_b
Adapted for Talend Open Studio v4.0.x. Supersedes previous Reference Guide releases.
Copyright
This documentation is provided under the terms of the Creative Commons Public License (CCPL).
For more information about what you can and cannot do with this documentation in accordance with the
CCPL, please check: http://creativecommons.org/licenses/by-nc-sa/2.0/
Creative Commons License Talend Open Studio Components iii

iv Talend Open Studio Components Creative Commons License
Talend Open Studio Related Scenario ....................................................38
tParAccelSCD ............................................................39
Components tParAccelSCD Properties .......................................39
Related scenario .....................................................40
Reference Guide ......................... i tPostgresPlusSCD ......................................................41
tPostgresPlusSCD Properties .................................41
Related scenario .....................................................42
tPostgresPlusSCDELT ..............................................43
Preface ....................................................................... xxi tPostgresPlusSCDELT Properties ..........................43
Purpose ................................................................. xxi Related Scenario ....................................................45
Audience ............................................................... xxi tPostgresqlSCD ..........................................................46
Typographical conventions .................................. xxi tPostgresqlSCD Properties .....................................46
History of changes ...................................................xxii Related scenario .....................................................47
Feedback and Support ............................................xxii tPostgresqlSCDELT ..................................................48
tPostgresqlSCDELT Properties .............................48
Related Scenario ....................................................50
Business Intelligence components .............. 1 tSPSSInput .................................................................51
tDB2SCD ...................................................................... 2 tSPSSInput properties ............................................51
tDB2SCD properties ................................................ 2 Scenario: Displaying the content of an SPSS .sav file
Related scenarios ..................................................... 3 51
tDB2SCDELT .............................................................. 4 tSPSSOutput ..............................................................54
tDB2SCDELT Properties ........................................ 4 tSPSSOutput properties .........................................54
Related Scenario ...................................................... 6 Scenario: Writing data in an .sav file .....................54
tGreenplumSCD .......................................................... 7 tSPSSProperties .........................................................57
tGreenplumSCD Properties ..................................... 7 tSPSSProperties properties ....................................57
Related scenario ...................................................... 8 Related scenarios ...................................................57
tInformixSCD .............................................................. 9 tSPSSStructure ..........................................................58
tInformixSCD properties ......................................... 9 tSPSSStructure properties ......................................58
Related scenario .................................................... 10 Related scenarios ...................................................58
tIngresSCD ................................................................ 11 tSybaseSCD ................................................................59
tIngresSCD Properties ........................................... 11 tSybaseSCD properties ..........................................59
Related scenario .................................................... 12 Related scenarios ...................................................60
tMondrianInput ........................................................ 13 tSybaseSCDELT ........................................................61
tMondrianInput Properties .................................... 13 tSybaseSCDELT Properties ...................................61
Scenario: Cross-join tables .................................... 14 Related Scenario ....................................................63
tMSSqlSCD ................................................................ 17
tMSSqlSCD Properties .......................................... 17
Related scenario .................................................... 18 Business components .................................65
tMysqlSCD ................................................................. 19 tAlfrescoOutput .........................................................66
tMysqlSCD Properties ........................................... 19 tAlfresco Properties ...............................................66
SCD management methodologies ...................... 20 Installation procedure .........................................67
SCD keys ....................................................... 22 Prerequisites ...................................................68
Combining SCD types ................................... 22 Installing the Talend Alfresco module ...........68
Scenario: Tracking changes using Slowly Changing Useful information for advanced use .............69
Dimensions (type 0 through type 3) ........................... 23 Dematerialization, tAlfrescoOutput, and Enterprise
tMysqlSCDELT ......................................................... 31 Content Management ...................................................70
tMysqlSCDELT Properties ................................... 31 Scenario: Creating documents on an Alfresco server
Related Scenario .................................................... 33 71
tOracleSCD ................................................................ 34 tCentricCRMInput ....................................................77
tOracleSCD Properties .......................................... 34 tCentricCRMInput Properties ................................77
Related scenario .................................................... 35 Related Scenario ....................................................77
tOracleSCDELT ........................................................ 36 tCentricCRMOutput .................................................78
tOracleSCDELT Properties ................................... 36 tCentricCRMOutput Properties .............................78
Creative Commons License Talend Open Studio Components v

Related Scenario .................................................... 78 Salesforce.com ...........................................................121
tHL7Input .................................................................. 79 tSalesforceOutputBulkExec ....................................126
tHL7Input Properties ............................................. 79 tSalesforceOutputBulkExec Properties ................126
Scenario: Retrieving information about patients and Scenario: Inserting bulk data into your Sales-
events from a HL7 file. ............................................... 80 force.com ...................................................................127
tMicrosoftCRMInput ................................................ 83 tSAPCommit ............................................................131
tMicrosoftCRMInput Properties ............................ 83 tSAPCommit Properties .......................................131
Scenario: Writing data in a Microsoft CRM database Related scenario ...................................................131
and putting conditions on columns to extract specified tSAPConnection .......................................................132
rows ............................................................................. 84 tSAPConnection properties ..................................132
tMicrosoftCRMOutput ............................................. 91 Related scenarios .................................................132
tMicrosoftCRMOutput Properties ......................... 91 tSAPInput .................................................................133
Related Scenario .................................................... 92 tSAPInput Properties ...........................................133
tMSAXInput .............................................................. 93 Scenario 1: Retrieving metadata from the SAP system
tMSAXInput properties ......................................... 93 134
Related scenarios ................................................... 93 Scenario 2: Reading data in the different schemas of
tMSAXOutput ........................................................... 94 the RFC_READ_TABLE function ............................140
tMSAXOutput properties ...................................... 94 tSAPOutput ..............................................................146
Scenario 1: Inserting data in a defined table in a Mi- tSAPOutput Properties .........................................146
crosoftAX server ......................................................... 95 Related scenario ...................................................147
Scenario 2: Deleting data from a defined table in a tSAPRollback ...........................................................148
MicrosoftAX server .................................................... 98 tSAPRollback properties ......................................148
tOpenbravoERPInput ............................................ 101 Related scenarios .................................................148
tOpenbravoERPInput properties ........................ 101 tSugarCRMInput .....................................................149
Related Scenario ................................................. 102 tSugarCRMInput Properties .................................149
tOpenbravoERPOutput .......................................... 103 Scenario: Extracting account data from SugarCRM ..
tOpenbravoERPOutput properties ...................... 103 149
Related scenario ................................................. 103 tSugarCRMOutput ..................................................151
tSalesforceBulkExec ................................................ 104 tSugarCRMOutput Properties ..............................151
tSalesforceBulkExec Properties .......................... 104 Related Scenario ..................................................151
Related Scenario: ................................................ 105 tVtigerCRMInput ....................................................152
tSalesforceConnection ............................................ 106 tVtigerCRMInput Properties ................................152
tSalesforceConnection properties ........................ 106 Related Scenario ..................................................152
Related scenario .................................................. 106 tVtigerCRMOutput .................................................153
tSalesforceGetDeleted ............................................. 107 tVtigerCRMOutput Properties .............................153
tSalesforceGetDeleted properties ........................ 107 Related Scenario ..................................................153
Scenario: Recovering deleted data from the Sales-
force server ................................................................ 108
tSalesforceGetServerTimestamp ........................... 111 Custom Code components ......................155
tSalesforceGetServerTimestamp properties ........ 111 tGroovy .....................................................................156
Related scenarios ................................................. 112 tGroovy Properties ...............................................156
tSalesforceGetUpdated ........................................... 113 Related Scenarios .................................................156
tSalesforceGetUpdated properties ....................... 113 tGroovyFile ...............................................................157
Related scenarios ................................................. 114 tGroovyFile Properties .........................................157
tSalesforceInput ...................................................... 115 Scenario: Calling a file which contains Groovy code
tSalesforceInput Properties .................................. 115 157
Related scenario .................................................. 116 tJava ..........................................................................159
tSalesforceOutput .................................................... 117 tJava Properties ....................................................159
tSalesforceOutput Properties ............................... 117 Scenario: Printing out a variable content .............159
Scenario: Deleting data from the Account object 118 tJavaFlex ...................................................................162
tSalesforceOutputBulk ........................................... 121 tJavaFlex properties .............................................162
tSalesforceOutputBulk Properties ....................... 121 Scenario1: Generating data flow ..........................163
Scenario: Inserting transformed bulk data into your Scenario2: Processing rows of data with tJavaFlex ...
vi Talend Open Studio Components Creative Commons License

165 tAccessRow ...............................................................211
tJavaRow .................................................................. 168 tAccessRow properties .........................................211
Propriétés du tJavaRow ....................................... 168 Related scenarios .................................................212
Related scenario .................................................. 168 tAS400Close .............................................................213
tLibraryLoad ........................................................... 169 tAS400Close properties .......................................213
tLibraryLoad Properties ...................................... 169 Related scenario ...................................................213
Scenario: Checking the format of an e-mail addressl tAS400Commit .........................................................214
169 tAS400Commit Properties ...................................214
tPerl .......................................................................... 172 Related scenario ...................................................214
tPerl properties .................................................... 172 tAS400Connection ...................................................215
Scenario: Displaying a number of processed lines .... tAS400Connection Properties ..............................215
172 Related scenario ...................................................216
tAS400Input .............................................................217
tAS400Input properties ........................................217
Data Quality components ....................... 175 Related scenarios .................................................218
tAddCRCRow .......................................................... 176 tAS400LastInsertId .................................................219
tAddCRCRow properties .................................... 176 tAS400LastInsertId properties .............................219
Scenario: Adding a surrogate key to a file .......... 176 Related scenario ...................................................219
tExtractRegexFields ............................................... 179 tAS400Output ..........................................................220
tFuzzyMatch ............................................................ 180 tAS400Output properties .....................................220
tFuzzyMatch properties ....................................... 180 Related scenarios .................................................222
Scenario 1: Levenshtein distance of 0 in first names tAS400Rollback .......................................................223
181 tAS400Rollback properties ..................................223
Scenario 2: Levenshtein distance of 1 or 2 in first Related scenarios .................................................223
names ....................................................................... 183 tAS400Row ...............................................................224
Scenario 3: Metaphonic distance in first name ... 184 tAS400Row properties .........................................224
tIntervalMatch ........................................................ 185 Related scenarios .................................................226
tIntervalMatch properties .................................... 185 tCreateTable .............................................................227
Scenario: Identifying Ip country (Perl and Java) . 186 tCreateTable Properties ........................................227
tParseAddress .......................................................... 190 Scenario: Creating new table in a Mysql Database ...
tParseAddress properties ..................................... 190 229
Related scenario .................................................. 191 tDB2BulkExec ..........................................................232
tParseName .............................................................. 192 tDB2BulkExec properties ....................................232
tParseName Properties ........................................ 192 Related scenarios .................................................234
Related scenario .................................................. 193 tDB2Close .................................................................235
tReplaceList ............................................................. 194 tDB2Close properties ...........................................235
tReplaceList Properties ........................................ 194 Related scenario ...................................................235
Scenario: Replacement from a reference file ...... 195 tDB2Commit ............................................................236
tSchemaComplianceCheck ..................................... 198 tDB2Commit Properties .......................................236
tSchemaComplianceCheck Properties ................ 198 Related scenario ...................................................236
Scenario: Validating dates against schema (java) 199 tDB2Connection .......................................................237
tUniqRow ................................................................. 203 tDB2Connection properties .................................237
tUniqRow Properties ........................................... 203 Related scenarios ................................................238
Scenario: Unduplicating entries .......................... 203 tDB2Input .................................................................239
tDB2Input properties ...........................................239
Related scenarios .................................................240
Databases components ............................ 205 tDB2Output ..............................................................241
tAccessInput ............................................................ 206 tDB2Output properties .........................................241
tAccessInput properties ....................................... 206 Related scenarios .................................................243
Related scenarios ................................................. 207 tDB2Rollback ...........................................................244
tAccessOutput .......................................................... 208 tDB2Rollback properties .....................................244
tAccessOutput properties .................................... 208 Related scenarios .................................................244
Related scenarios ................................................. 210 tDB2Row ...................................................................245
Creative Commons License Talend Open Studio Components vii

tDB2Row properties ............................................ 245 Related scenario ...................................................285
Related scenarios ................................................. 247 tFirebirdConnection ................................................286
tDB2SCD .................................................................. 248 tFirebirdConnection properties ............................286
tDB2SCDELT .......................................................... 249 Related scenarios ................................................286
tDB2SP ..................................................................... 250 tFirebirdInput ..........................................................288
tDB2SP properties ............................................... 250 tFirebirdInput properties ......................................288
Related scenarios ................................................. 251 Related scenarios .................................................289
tDBInput ................................................................. 252 tFirebirdOutput .......................................................290
tDBInput properties ............................................. 252 tFirebirdOutput properties ...................................290
Scenario 1: Displaying selected data from DB table . Related scenarios .................................................292
253 tFirebirdRow ............................................................293
Scenario 2: Using StoreSQLQuery variable ....... 254 tFirebirdRow properties .......................................293
tDBOutput ............................................................... 256 Related scenarios .................................................295
tDBOutput properties .......................................... 256 tGreenplumBulkExec ..............................................296
Scenario: Displaying DB output ......................... 258 tGreenplumBulkExec Properties .........................296
tDBSQLRow ............................................................ 261 Related scenarios .................................................298
tDBSQLRow properties ...................................... 261 tGreenplumClose .....................................................299
Scenario: Resetting a DB auto-increment ........... 262 tGreenplumClose properties ................................299
tEXAInput ............................................................... 264 Related scenario ...................................................299
tEXAInput properties .......................................... 264 tGreenplumCommit ................................................300
Related scenarios ................................................. 265 tGreenplumCommit Properties ............................300
tEXAOutput ............................................................ 266 Related scenario ...................................................300
tEXAOutput properties ........................................ 266 tGreenplumConnection ...........................................301
Related scenario .................................................. 268 tGreenplumConnection properties .......................301
tEXARow ................................................................. 269 Related scenarios ................................................302
tEXARow properties ........................................... 269 tGreenplumInput .....................................................303
Related scenarios ................................................. 270 tGreenplumInput properties .................................303
tEXistConnection .................................................... 271 Related scenarios .................................................304
tEXistConnection properties ............................... 271 tGreenplumOutput ..................................................305
Related scenarios ................................................ 271 tGreenplumOutput Properties ..............................305
tEXistDelete ............................................................. 272 Related scenarios .................................................307
tEXistDelete properties ....................................... 272 tGreenplumOutputBulk ..........................................308
Related scenario ................................................. 273 tGreenplumOutputBulk properties .......................308
tEXistGet .................................................................. 274 Related scenarios .................................................309
tEXistGet properties ............................................ 274 tGreenplumOutputBulkExec .................................310
Related scenario ................................................. 275 tGreenplumOutputBulkExec properties ...............310
tEXistList ................................................................. 276 Related scenarios .................................................311
tEXistList properties ............................................ 276 tGreenplumRollback ...............................................312
Related scenarios ................................................ 277 tGreenplumRollback properties ...........................312
tEXistPut .................................................................. 278 Related scenarios .................................................312
tEXistPut properties ............................................ 278 tGreenplumRow .......................................................313
Related scenario ................................................. 279 tGreenplumRow Properties ..................................313
tEXistXQuery .......................................................... 280 Related scenarios .................................................315
tEXistXQuery properties ..................................... 280 tGreenplumSCD ......................................................316
Related scenario ................................................. 281 tHiveClose .................................................................317
tEXistXUpdate ........................................................ 282 tHiveClose properties ...........................................317
tEXistXUpdate properties ................................... 282 Related scenario ...................................................317
Related scenario ................................................. 283 tHiveConnection ......................................................318
tFirebirdClose .......................................................... 284 tHiveConnection properties .................................318
tFirebirdClose properties ..................................... 284 Related scenario ...................................................318
Related scenario .................................................. 284 tHiveRow ..................................................................319
tFirebirdCommit ..................................................... 285 tHiveRow properties ............................................319
tFirebirdCommit Properties ................................. 285 Related scenarios .................................................320
viii Talend Open Studio Components Creative Commons License

tHSQLDbInput ........................................................ 321 tIngresInput properties .........................................356
tHSQLDbInput properties ................................... 321 Related scenarios .................................................357
Related scenarios ................................................. 322 tIngresOutput ...........................................................358
tHSQLDbOutput ..................................................... 323 tIngresOutput properties ......................................358
tHSQLDbOutput properties ................................ 323 Related scenarios .................................................360
Related scenarios ................................................. 325 tIngresRollback ........................................................361
tHSQLDbRow ......................................................... 326 tIngresRollback properties ...................................361
tHSQLDbRow properties .................................... 326 Related scenarios .................................................361
Related scenarios ................................................. 327 tIngresRow ...............................................................362
tInformixBulkExec .................................................. 328 tIngresRow properties ..........................................362
tInformixBulkExec Properties ............................. 328 Related scenarios .................................................363
Scénario associé .................................................. 330 tIngresSCD ...............................................................364
tInformixClose ......................................................... 331 tInterbaseClose ........................................................365
tInformixClose properties .................................... 331 tInterbaseClose properties ....................................365
Related scenario .................................................. 331 Related scenario ...................................................365
tInformixCommit .................................................... 332 tInterbaseCommit ....................................................366
tInformixCommit properties ................................ 332 tInterbaseCommit Properties ...............................366
Related Scenario .................................................. 332 Related scenario ...................................................366
tInformixConnection .............................................. 333 tInterbaseConnection ..............................................367
tInformixConnection properties .......................... 333 tInterbaseConnection properties ..........................367
Related scenario .................................................. 334 Related scenarios ................................................367
tInformixInput ........................................................ 335 tInterbaseInput ........................................................369
tInformixInput properties .................................... 335 tInterbaseInput properties ....................................369
tInformixOutput ...................................................... 337 tInterbaseOutput .....................................................371
tInformixOutput properties .................................. 337 tInterbaseOutput properties ..................................371
tInformixOutputBulk ............................................. 340 tInterbaseRollback ..................................................374
tInformixOutputBulk properties .......................... 340 tInterbaseRollback properties ..............................374
Related scenario .................................................. 341 Related scenarios .................................................374
tInformixOutputBulkExec ..................................... 342 tInterbaseRow ..........................................................375
tInformixOutputBulkExec properties .................. 342 tInterbaseRow properties .....................................375
tInformixRollback ................................................... 345 tJavaDBInput ...........................................................378
tInformixRollback properties .............................. 345 tJavaDBInput properties ......................................378
Related Scenario .................................................. 345 Related scenarios .................................................379
tInformixRow .......................................................... 346 tJavaDBOutput ........................................................380
tInformixRow properties ..................................... 346 tJavaDBOutput properties ....................................380
tInformixSCD .......................................................... 349 tJavaDBRow .............................................................383
tInformixSP ............................................................. 350 tJavaDBRow properties .......................................383
tInformixSP properties ........................................ 350 Related scenarios .................................................384
Related scenario .................................................. 351 tJDBCColumnList ...................................................385
tIngresClose ............................................................. 353 tJDBCColumnList Properties ..............................385
tIngresClose properties ........................................ 353 Related scenario ...................................................385
Related scenario .................................................. 353 tJDBCClose ..............................................................386
tIngresCommit ........................................................ 354 tJDBCClose properties .........................................386
tIngresCommit Properties .................................... 354 Related scenario ...................................................386
Related scenario .................................................. 354 tJDBCCommit .........................................................387
tIngresConnection ................................................... 355 tJDBCCommit Properties ....................................387
tIngresConnection Properties .............................. 355 Related scenario ...................................................387
Related scenarios ................................................. 355 tJDBCConnection ....................................................388
tIngresInput ............................................................. 356 tJDBCConnection Properties ...............................388
Creative Commons License Talend Open Studio Components ix

tJDBCInput ............................................................. 390 tMSSqlConnection ...................................................429
tJDBCInput properties ......................................... 390 tMSSqlConnection properties ..............................429
tJDBCOutput .......................................................... 392 tMSSqlInput .............................................................431
tJDBCOutput properties ...................................... 392 tMSSqlInput properties ........................................431
tJDBCRollback ........................................................ 395 tMSSqlLastInsertId .................................................433
tJDBCRollback properties ................................... 395 tMSSqlLastInsertId properties .............................433
tJDBCRow ............................................................... 396 MSSqlOutput ...........................................................434
tJDBCRow properties .......................................... 396 tMSSqlOutput properties .....................................434
tJDBCSP .................................................................. 399 tMSSqlOutputBulk ..................................................438
tJDBCSP Properties ............................................ 399 tMSSqlOutputBulk properties .............................438
tJDBCTableList ...................................................... 401 tMSSqlOutputBulkExec .........................................440
tJDBCTableList Properties .................................. 401 tMSSqlOutputBulkExec properties .....................440
tLDAPAttributesInput ........................................... 402 tMSSqlRollback .......................................................443
tLDAPAttributesInput Properties ........................ 402 tMSSqlRollback properties ..................................443
tLDAPInput ............................................................. 405 tMSSqlRow ...............................................................444
tLDAPInput Properties ........................................ 405 tMSSqlRow properties .........................................444
Scenario: Displaying LDAP directory’s filtered con- Related scenarios .................................................446
tent ............................................................................. 406 tMSSqlSCD ..............................................................447
tLDAPOutput .......................................................... 409 tMSSqlSP ..................................................................448
tLDAPOutput Properties ..................................... 409 tMSSqlSP Properties ............................................448
Scenario: Editing data in a LDAP directory ........ 410 Related scenario ...................................................449
tLDAPRenameEntry .............................................. 413 tMSSqlTableList ......................................................450
tLDAPRenameEntry properties .......................... 413 tMSSqlTableList Properties .................................450
Related scenarios ................................................. 414 Related scenario ...................................................450
tMaxDBInput .......................................................... 415 tMysqlBulkExec .......................................................451
tMaxDBInput properties ...................................... 415 tMysqlBulkExec properties .................................451
tMaxDBOutput ........................................................ 417 tMysqlClose ..............................................................454
tMaxDBOutput properties ................................... 417 tMysqlClose properties ........................................454
tMaxDBRow ............................................................ 420 tMysqlColumnList ...................................................455
tMaxDBRow properties ...................................... 420 tMysqlColumnList Properties ..............................455
Related scenario .................................................. 421 Scenario: Iterating on a DB table and listing its col-
tMSSqlBulkExec ..................................................... 422 umn names .................................................................455
tMSSqlBulkExec properties ................................ 422 tMysqlCommit .........................................................459
Related scenarios ................................................. 424 tMysqlCommit Properties ....................................459
425 Related scenario ...................................................459
tMSSqlColumnList ................................................. 426 tMysqlConnection ....................................................460
tMSSqlColumnList Properties ............................ 426 tMysqlConnection Properties ...............................460
Related scenario .................................................. 426 Scenario: Inserting data in mother/daughter tables ....
tMSSqlClose ............................................................ 427 460
tMSSqlClose properties ....................................... 427 tMysqlInput ..............................................................465
Related scenario .................................................. 427 tMysqlInput properties .........................................465
tMSSqlCommit ........................................................ 428 Related scenarios .................................................466
tMSSqlCommit properties ................................... 428 tMysqlLastInsertId ..................................................467
x Talend Open Studio Components Creative Commons License

tMysqlLastInsertId properties ............................. 467 tNetezzaOutput properties ...................................527
Scenario: Get the ID for the last inserted record . 467 Related scenarios .................................................530
tMysqlOutput .......................................................... 472 tNetezzaRollback .....................................................531
tMysqlOutput properties ...................................... 472 tNetezzaRollback properties ................................531
Scenario 1: Adding a new column and altering data in Related scenarios .................................................531
a DB table .................................................................. 476 tNetezzaRow .............................................................532
Scenario 2: Updating data in a database table ..... 480 tNetezzaRow properties .......................................532
Scenario 3: Retrieve data in error with a Reject link Related scenarios .................................................534
483 tOracleBulkExec ......................................................535
tMysqlOutputBulk .................................................. 489 tOracleBulkExec properties .................................535
tMysqlOutputBulk properties .............................. 489 Scenario: Truncating and inserting file data into Ora-
Scenario: Inserting transformed data in MySQL data- cle DB ........................................................................538
base ............................................................................ 490 tOracleClose .............................................................541
tMysqlOutputBulkExec .......................................... 494 tOracleClose properties ........................................541
tMysqlOutputBulkExec properties ...................... 494 Related scenario ...................................................541
Scenario: Inserting data in MySQL database ...... 495 tOracleCommit ........................................................542
tMysqlRollback ....................................................... 498 tOracleCommit Properties ...................................542
tMysqlRollback properties .................................. 498 Related scenario ...................................................542
Scenario: Rollback from inserting data in moth- tOracleConnection ...................................................543
er/daughter tables ...................................................... 498 tOracleConnection Properties ..............................543
tMysqlRow ............................................................... 500 Related scenario ...................................................544
tMysqlRow properties ......................................... 500 tOracleInput .............................................................545
Scenario 1: Removing and regenerating a MySQL ta- tOracleInput properties ........................................545
ble index .................................................................... 502 Related scenarios .................................................546
Scénario 2: Using PreparedStatement objects to query tOracleOutput ..........................................................547
data ............................................................................ 503 tOracleOutput properties ......................................547
tMysqlSCD ............................................................... 509 Related scenarios .................................................550
tMysqlSCDELT ....................................................... 510 tOracleOutputBulk ..................................................551
tMysqlSP .................................................................. 511 tOracleOutputBulk properties ..............................551
tMysqlSP Properties ............................................ 511 Related scenarios .................................................552
Scenario: Finding a State Label using a stored proce- tOracleOutputBulkExec .........................................553
dure ............................................................................ 512 tOracleOutputBulkExec properties ......................553
tMysqlTableList ...................................................... 515 Related scenarios .................................................555
tMysqlTableList Properties ................................. 515 tOracleRollback .......................................................556
Scenario: Iterating on DB tables and deleting their tOracleRollback properties ..................................556
content using a user-defined SQL template .............. 515 Related scenario ...................................................556
Related scenario .................................................. 519 tOracleRow ...............................................................557
tNetezzaBulkExec .................................................... 520 tOracleRow properties .........................................557
tNetezzaBulkExec properties .............................. 520 Related scenarios .................................................559
Related scenarios ................................................. 521 tOracleSCD ..............................................................560
tNetezzaClose ........................................................... 522 tOracleSCDELT ......................................................561
tNetezzaClose properties ..................................... 522 tOracleSP ..................................................................562
Related scenario .................................................. 522 tOracleSP Properties ............................................562
tNetezzaCommit ...................................................... 523 Scenario: Checking number format using a stored
tNetezzaCommit Properties ................................. 523 procedure ...................................................................564
Related scenario .................................................. 523 tOracleTableList ......................................................569
tNetezzaConnection ................................................ 524 tOracleTableList properties .................................569
tNetezzaConnection Properties ........................... 524 Related scenarios .................................................569
Related scenarios ................................................. 524 tParAccelBulkExec ..................................................570
tNetezzaInput .......................................................... 525 tParAccelBulkExec Properties .............................570
tNetezzaInput properties ...................................... 525 Related scenarios .................................................572
Related scenarios ................................................. 526 tParAccelClose .........................................................573
tNetezzaOutput ........................................................ 527 tParAccelClose properties ....................................573
Creative Commons License Talend Open Studio Components xi

Related scenario .................................................. 573 tPostgresPlusRollback .............................................608
tParAccelCommit .................................................... 574 tPostgresPlusRollback properties .........................608
tParAccelCommit Properties ............................... 574 Related scenarios .................................................608
Related scenario .................................................. 574 tPostgresPlusRow ....................................................609
tParAccelConnection .............................................. 575 tPostgresPlusRow properties ...............................609
tParAccelConnection Properties .......................... 575 Related scenarios .................................................611
Related scenario .................................................. 576 tPostgresPlusSCD ....................................................612
tParAccelInput ........................................................ 577 tPostgresPlusSCDELT ............................................613
tParAccelInput properties .................................... 577 tPostgresqlBulkExec ................................................614
Related scenarios ................................................. 578 tPostgresqlBulkExec properties ...........................614
tParAccelOutput ..................................................... 579 Related scenarios .................................................615
tParAccelOutput Properties ................................. 579 tPostgresqlCommit ..................................................616
Related scenarios ................................................. 581 tPostgresqlCommit Properties .............................616
tParAccelOutputBulk ............................................. 582 Related scenario ...................................................616
tParAccelOutputBulk properties ......................... 582 tPostgresqlClose .......................................................617
Related scenarios ................................................. 583 tPostgresqlClose properties ..................................617
tParAccelOutputBulkExec ..................................... 584 Related scenario ...................................................617
tParAccelOutputBulkExec Properties ................. 584 tPostgresqlConnection .............................................618
Related scenarios ................................................. 585 tPostgresqlConnection Properties ........................618
tParAccelRollback .................................................. 586 Related scenario ...................................................618
tParAccelRollback properties .............................. 586 tPostgresqlInput .......................................................620
Related scenario .................................................. 586 tPostgresqlInput properties ..................................620
tParAccelRow .......................................................... 587 Related scenarios .................................................621
tParAccelRow Properties .................................... 587 tPostgresqlOutput ....................................................622
Related scenarios ................................................. 589 tPostgresqlOutput properties ................................622
tParAccelSCD .......................................................... 590 Related scenarios .................................................624
tParseRecordSet ...................................................... 591 tPostgresqlOutputBulk ...........................................625
tParseRecordSet properties .................................. 591 tPostgresqlOutputBulk properties ........................625
tPostgresPlusBulkExec ........................................... 592 tPostgresqlOutputBulkExec ...................................627
tPostgresPlusBulkExec properties ....................... 592 tPostgresqlOutputBulkExec properties ................627
tPostgresPlusClose .................................................. 594 tPostgresqlRollback .................................................628
tPostgresPlusClose properties ............................. 594 tPostgresqlRollback properties ............................628
tPostgresPlusCommit .............................................. 595 tPostgresqlRow ........................................................629
tPostgresPlusCommit Properties ......................... 595 tPostgresqlRow properties ...................................629
tPostgresPlusConnection ........................................ 596 tPostgresqlSCD ........................................................632
tPostgresPlusConnection Properties .................... 596 tPostgresqlSCDELT ................................................633
Related scenario .................................................. 597 tSASInput .................................................................634
tPostgresPlusInput .................................................. 598 tSASInput properties ............................................634
tPostgresPlusInput properties .............................. 598 Related scenarios .................................................635
Related scenarios ................................................. 599 tSASOutput ..............................................................636
tPostgresPlusOutput ............................................... 600 tSASOutput properties .........................................636
tPostgresPlusOutput properties ........................... 600 Related scenarios .................................................638
Related scenarios ................................................. 603 tSQLiteClose ............................................................639
tPostgresPlusOutputBulk ....................................... 604 tSQLiteClose properties .......................................639
tPostgresPlusOutputBulk properties .................... 604 Related scenario ...................................................639
Related scenarios ................................................. 605 tSQLiteCommit ........................................................640
tPostgresPlusOutputBulkExec ............................... 606 tSQLiteCommit Properties ...................................640
tPostgresPlusOutputBulkExec properties ............ 606 Related scenario ...................................................640
Related scenarios ................................................. 607 tSQLiteConnection ..................................................641
xii Talend Open Studio Components Creative Commons License

SQLiteConnection properties .............................. 641 tSybaseSP properties ............................................682
Related scenarios ................................................ 641 Related scenarios .................................................683
tSQLiteInput ............................................................ 642 tTeradataClose .........................................................684
tSQLiteInput Properties ....................................... 642 tTeradataClose properties ....................................684
Scenario: Filtering SQlite data ............................ 643 Related scenario ...................................................684
tSQLiteOutput ......................................................... 646 tTeradataCommit ....................................................685
tSQLiteOutput Properties .................................... 646 tTeradataCommit Properties ...............................685
Related Scenario .................................................. 648 Related scenario ..................................................685
tSQLiteRollback ...................................................... 649 tTeradataConnection ...............................................686
tSQLiteRollback properties ................................. 649 tTeradataConnection Properties ..........................686
Related scenarios ................................................. 649 Related scenario ..................................................687
tSQLiteRow ............................................................. 650 tTeradataFastExport ...............................................688
tSQLiteRow Properties ........................................ 650 tTeradataFastExport Properties ............................688
Scenario: Updating SQLite rows ......................... 651 Related scenario ...................................................689
tSybaseBulkExec ..................................................... 654 tTeradataFastLoad ..................................................690
tSybaseBulkExec Properties ................................ 654 tTeradataFastLoad Properties ..............................690
tSybaseClose ............................................................ 657 tTeradataFastLoadUtility .......................................692
tSybaseClose properties ...................................... 657 tTeradataFastLoadUtility Properties ....................692
tSybaseCommit ........................................................ 658 tTeradataInput .........................................................694
tSybaseCommit Properties .................................. 658 tTeradataInput Properties .....................................694
tSybaseConnection .................................................. 659 tTeradataMultiLoad ................................................696
tSybaseConnection Properties ............................. 659 tTeradataMultiLoad Properties ............................696
tSybaseInput ............................................................ 660 tTeradataOutput ......................................................698
tSybaseInput Properties ....................................... 660 tTeradataOutput Properties ..................................698
tSybaseIQBulkExec ................................................ 662 tTeradataRollback ...................................................702
tSybaseIQBulkExec Properties ........................... 662 tTeradataRollback Properties ..............................702
tSybaseIQOutputBulkExec .................................... 664 tTeradataRow ..........................................................703
tSybaseIQOutputBulkExec properties ................ 664 tTeradataRow Properties ......................................703
tSybaseOutput ......................................................... 667 tTeradataTPump .....................................................706
tSybaseOutput Properties .................................... 667 tTeradataTPump Properties .................................706
tSybaseOutputBulk ................................................. 671 tVerticaBulkExec .....................................................708
tSybaseOutputBulk properties ............................. 671 tVerticaBulkExec Properties ................................708
Related scenarios ................................................. 672 Related scenarios ................................................709
tSybaseOutputBulkExec ......................................... 673 tVerticaClose ............................................................711
tSybaseOutputBulkExec properties ..................... 673 tVerticaClose properties .......................................711
tSybaseRollback ...................................................... 676 tVerticaCommit .......................................................712
tSybaseRollback properties ................................. 676 tVerticaCommit Properties .................................712
tSybaseRow .............................................................. 677 tVerticaConnection ..................................................713
tSybaseRow Properties ........................................ 677 tVerticaConnection Properties ............................713
tSybaseSCD .............................................................. 680 tVerticaInput ............................................................715
tSybaseSCDELT ...................................................... 681 tVerticaInput Properties ......................................715
tSybaseSP ................................................................. 682 Related scenarios ................................................716
Creative Commons License Talend Open Studio Components xiii

tVerticaOutput ........................................................ 717 tELTMysqlMap properties ...................................754
tVerticaOutput Properties ................................... 717 Connecting ELT components ...........................754
Related scenarios ................................................. 719 Mapping and joining tables ..............................755
tVerticaOutputBulk ................................................ 721 Adding where clauses .......................................755
tVerticaOutputBulk Properties ........................... 721 Generating the SQL statement ..........................755
Related scenarios ................................................ 722 Scenario1: Aggregating table columns and filtering .
tVerticaOutputBulkExec ........................................ 723 756
tVerticaOutputBulkExec Properties ................... 723 Scenario 2: ELT using Alias table .......................759
Related scenarios ................................................ 724 tELTMysqlOutput ...................................................763
tVerticaRollback ..................................................... 725 tELTMysqlOutput properties ...............................763
tVerticaRollback Properties ............................... 725 Related scenarios .................................................764
Related scenario ................................................. 725 tELTOracleInput .....................................................765
tVerticaRow ............................................................. 726 tELTOracleInput properties .................................765
tVerticaRow Properties ...................................... 726 Related scenarios .................................................765
Related scenario ................................................. 728 tELTOracleMap ......................................................766
tELTOracleMap properties ..................................766
Connecting ELT components ...........................767
ELT components ..................................... 729 Mapping and joining tables ..............................767
tELTAggregate ........................................................ 730 Adding where clauses .......................................768
tELTAggregate properties ................................... 730 Generating the SQL statement ..........................768
Scenario: Filtering and aggregating table columns di- Scenario: Updating Oracle DB entries .................768
rectly on the DBMS .................................................. 731 tELTOracleOutput ..................................................771
tELTCommit ........................................................... 737 tELTOracleOutput properties ..............................771
tELTCommit properties ...................................... 737 Scenario: Using the Oracle MERGE function to up-
Related scenario .................................................. 738 date and add data simultaneously ..............................773
tELTFilterColumns ................................................ 739 tELTPostgresqlInput ...............................................779
tELTFilterColumns Properties ............................ 739 tELTPostgresqlInput properties ...........................779
tELTFilterRows ...................................................... 741 tELTPostgresqlMap ................................................780
tELTFilterRows Properties .................................. 741 tELTPostgresqlMap properties ............................780
Related Scenario .................................................. 742 Related scenario: ..................................................781
tELTJDBCInput ..................................................... 743 tELTPostgresqlOutput ............................................782
tELTJDBCInput properties ................................. 743 tELTPostgresqlOutput properties ........................782
tELTJDBCMap ....................................................... 744 tELTRollback ...........................................................784
tELTJDBCMap properties .................................. 744 tELTRollback properties .....................................784
Related scenario: ................................................. 745 Related scenarios .................................................785
tELTJDBCOutput .................................................. 746 tELTSybaseInput .....................................................786
tELTJDBCOutput properties ............................... 746 tELTSybaseInput properties ................................786
tELTMSSqlInput .................................................... 748 tELTSybaseMap ......................................................787
tELTMSSqlInput properties ................................ 748 tELTSybaseMap properties .................................787
tELTMSSqlMap ...................................................... 749 tELTSybaseOutput ..................................................789
tELTMSSqlMap properties ................................. 749 tELTSybaseOutput properties ..............................789
Related scenario: ................................................. 750 Related scenarios .................................................790
tELTMSSqlOutput ................................................. 751 tELTTeradataInput .................................................791
tELTMSSqlOutput properties ............................. 751 tELTTeradataInput properties ..............................791
tELTMysqlInput ..................................................... 753 tELTTeradataMap ..................................................792
tELTMysqlInput properties ................................. 753 tELTTeradataMap properties ...............................792
Related scenarios ................................................. 753 Connecting ELT components ...........................792
tELTMysqlMap ....................................................... 754 Mapping and joining tables ..............................793
xiv Talend Open Studio Components Creative Commons License

Adding WHERE clauses .................................. 793 Scenario: Extracting data from the fields of a JSON
Generating the SQL statement ......................... 793 format file ..................................................................843
Related scenarios ................................................. 793 tFileInputLDIF .......................................................846
tELTTeradataOutput ............................................. 794 tFileInputLDIF Properties ....................................846
tELTTeradataOutput properties .......................... 794 Related scenario ...................................................847
Related scenarios ................................................. 795 tFileInputMail ..........................................................848
tFileInputMail properties .....................................848
Scenario: Extracting key fields from an email .....849
File components ....................................... 797 tFileInputMSDelimited ..........................................851
tAdvancedFileOutputXML .................................... 798 tFileInputMSDelimited properties .......................851
tApacheLogInput .................................................... 799 The Multi Schema Editor .................................851
tApacheLogInput properties ................................ 799 Scenario: Reading a multi structure delimited file ....
Scenario: Reading an Apache access-log file ...... 800 852
tCreateTemporaryFile ........................................... 802 tFileInputMSPositional ..........................................858
tCreateTemporaryFile properties ........................ 802 tFileInputMSPositional properties .......................858
Scenario: Creating a temporary file and writing data Related scenario ...................................................859
in it ............................................................................ 802 tFileInputMSXML ...................................................860
tFileArchive ............................................................. 807 tFileInputMSXML Properties ..............................860
tFileArchive properties ........................................ 807 Scenario: Reading a multi structure XML file .....861
Scenario: Zip files using a tFileArchive .............. 807 tFileInputPositional ................................................864
tFileCompare ........................................................... 809 tFileInputPositional properties .............................864
tFileCompare properties ...................................... 809 Scenario: From Positional to XML file ...............865
Scenario: Comparing unzipped files ................... 809 tFileInputProperties ................................................869
tFileCopy .................................................................. 812 tFileInputProperties properties .............................869
tFileCopy Properties ............................................ 812 Scenario: Reading and matching the keys and the val-
Scenario: Restoring files from bin ....................... 812 ues of different .properties files and outputting the results
tFileDelete ................................................................ 814 in a glossary ...............................................................869
tFileDelete Properties .......................................... 814 tFileInputRegex .......................................................873
Scenario: Deleting files ....................................... 814 tFileInputRegex properties ...................................873
tFileExist .................................................................. 816 Scenario: Regex to Positional file ........................874
tFileExist Properties ............................................ 816 tFileInputXML .........................................................877
Scenario: Checking for the presence of a file and cre- tFileList ....................................................................878
ating it if it does not exist .......................................... 816 tFileList properties ...............................................878
tFileInputARFF ....................................................... 820 Scenario: Iterating on a file directory ..................878
tFileInputARFF properties .................................. 820 tFileOutputARFF ....................................................882
Scenario : Display the content of a ARFF file .... 821 tFileOutputARFF properties ................................882
tFileInputDelimited ............................................... 824 Related scenarios .................................................883
tFileInputDelimited properties ............................ 824 tFileOutputDelimited ..............................................884
Scenario: Delimited file content display ............. 825 tFileOutputDelimited properties ..........................884
Scénario 2 : Reading data from a remote file in Scenario: Writing data in a delimited file ............885
streaming mode ......................................................... 827 tFileOutputEBCDIC ...............................................890
tFileInputEBCDIC .................................................. 830 tFileOutputEBCDIC properties ...........................890
tFileInputEBCDIC properties .............................. 830 Scenario: Creating an EBCDIC file using two delim-
Scenario: Extracting data from an EBCDIC file and ited files .....................................................................890
populating a database ................................................ 830 tFileOutputExcel ......................................................893
tFileInputExcel ....................................................... 836 tFileOutputExcel Properties .................................893
tFileInputExcel properties .................................. 836 Related scenario ...................................................894
Related scenarios ................................................. 838 tFileOutputLDIF .....................................................895
tFileInputFullRow .................................................. 839 tFileOutputLDIF Properties .................................895
tFileInputFull Row properties ............................. 839 Scenario: Writing DB data into an LDIF-type file ....
Scenario: Reading full rows in a delimited file ... 839 896
tFileInputJSON ...................................................... 842 tFileOutputMSDelimited .......................................898
tFileInputJSON properties .................................. 842 tFileOutputMSDelimited properties ...................898
Creative Commons License Talend Open Studio Components xv

tFileOutputMSPositional ....................................... 900 tFTPFileExist ..........................................................926
tFileOutputMSPositional properties .................... 900 tFTPFileExist properties .....................................926
tFileOutputMSXML ............................................... 901 tFTPFileList ............................................................928
tFileOutputMSXML Properties ........................... 901 tFTPFileList properties .......................................928
Defining the MultiSchema XML tree .............. 901 Scenario: Iterating on a remote directory ............929
Importing the XML tree .............................. 902 tFTPFileProperties ..................................................933
Creating manually the XML tree ................. 904 tFTPFileProperties Properties ..............................933
Mapping XML data from multiple schema sources Related scenario ...................................................934
904 tFTPGet ....................................................................935
Defining the node status ................................... 905 tFTPGet properties ...............................................935
Loop element ............................................... 905 Related scenario ...................................................935
Group element ............................................. 905 tFTPPut ....................................................................936
Related scenario .................................................. 906 tFTPPut properties ...............................................936
tFileOutputPositional .............................................. 907 Scenario: Putting files on a remote FTP server ...937
tFileOutputPositional Properties ......................... 907 tFTPRename ............................................................939
Related scenario ................................................. 908 tFTPRename Properties .......................................939
tFileOutputProperties ............................................. 909 Related scenario ...................................................940
tFileOutputProperties properties ......................... 909 tFTPTruncate ..........................................................941
Related scenarios ................................................. 909 tFTPTruncate properties .....................................941
tFileOutputXML ..................................................... 910 Related scenario ...................................................942
tFileProperties ......................................................... 911 tJMSInput ................................................................943
tFileProperties Properties .................................... 911 tJMSInput properties ............................................943
Scenario: Displaying the properties of a processed file Related scenario ..................................................944
911 tJMSOutput .............................................................945
tFileRowCount ........................................................ 913 tJMSOutput properties .........................................945
tFileRowCount properties ................................... 913 Related scenario ..................................................945
Related scenario .................................................. 913 tMomInput ...............................................................946
tFileTouch ................................................................ 914 tMomInput Properties ..........................................946
tFileTouch properties .......................................... 914 Scenario: asynchronous communication via a MOM
Related scenario ................................................. 914 server .........................................................................947
tFileUnarchive ......................................................... 915 tMomMessageIdList ................................................950
tFileUnarchive Properties .................................... 915 tMomMessageIdList Properties ...........................950
tPivotToColumnsDelimited .................................... 916 tMomOutput ............................................................951
tPivotToColumnsDelimited Properties ............... 916 tMomOutput Properties .......................................951
Scenario: Using a pivot column to aggregate data .... Related scenario ...................................................951
916 tPOP ..........................................................................952
tPOP properties ....................................................952
Scenario: Retrieving a selection of email messages
Internet components ............................... 919 from an email server .................................................953
tFileFetch ................................................................. 920 tRSSInput .................................................................956
tFileFetch properties ............................................ 920 tRSSInput Properties ............................................956
Related scenario 1: Fetching data through HTTP 921 Scenario: Fetching frequently updated blog entries. ..
Related scenario 2: Reading the data from a remote 956
file in streaming mode ............................................... 922 tRSSOutput ..............................................................959
tFileInputJSON ....................................................... 923 tRSSOutput Properties .........................................959
tFTPConnection ...................................................... 924 Scenario 1: Creating an RSS flow and storing files on
tFTPConnection properties .................................. 924 an FTP server .............................................................960
Related scenarios ................................................. 924 Scenario 2: Creating an RSS flow that contains meta-
tFTPDelete ............................................................... 925 data .............................................................................963
tFTPDelete properties .......................................... 925 tSCPConnection .......................................................966
xvi Talend Open Studio Components Creative Commons License

tSCPConnection properties ................................. 966 tAssertCatcher Properties ...................................1004
Related scenarios ................................................. 966 Related scenarios ...............................................1005
tSCPDelete ............................................................... 967 tChronometerStart ................................................1006
tSCPDelete properties ......................................... 967 tChronometerStart Properties .............................1006
Related scenario .................................................. 967 Related scenario .................................................1006
tSCPFileExists ......................................................... 968 tChronometerStop .................................................1007
tSCPFileExists properties .................................... 968 tChronometerStop Properties .............................1007
Related scenario .................................................. 968 Scenario: Measuring the processing time of a subjob
tSCPFileList ............................................................. 969 and part of a subjob .................................................1007
tSCPFileList properties ....................................... 969 tDie ..........................................................................1012
Related scenario .................................................. 969 tDie properties ....................................................1012
tSCPGet .................................................................... 970 Related scenarios ...............................................1012
tSCPGet properties .............................................. 970 tFlowMeter .............................................................1013
Scenario: Getting files from a remote SCP server .... tFlowMeter Properties .......................................1013
970 Related scenario .................................................1013
tSCPPut .................................................................... 972 tFlowMeterCatcher ...............................................1014
tSCPPut properties .............................................. 972 tFlowMeterCatcher Properties ...........................1014
Related scenario .................................................. 972 Scenario: Catching flow metrics from a Job ......1015
tSCPRename ............................................................ 973 tLogCatcher ...........................................................1019
tSCPRename properties ....................................... 973 tLogCatcher properties ......................................1019
Related scenario .................................................. 973 Scenario1: warning & log on entries .................1019
tSCPTruncate .......................................................... 974 Scenario 2: Log & kill a Job ..............................1021
tSCPRename properties ....................................... 974 tLogRow ................................................................1023
Related scenario .................................................. 974 tLogRow properties ...........................................1023
tSendMail ................................................................. 975 Scenario: Delimited file content display ............1023
tSendMail Properties ........................................... 975 tStatCatcher ...........................................................1024
Scenario: Email on error ...................................... 976 tStatCatcher Properties .......................................1024
tSocketInput ............................................................ 978 Scenario: Displaying job stats log ......................1024
tSocketInput properties ........................................ 978 tWarn ......................................................................1027
Scenario: Passing on data to the listening port (Java) tWarn Properties ................................................1027
980 Related scenarios ...............................................1027
tSocketOutput .......................................................... 983
tSocketOutput properties ..................................... 983
Related Scenario .................................................. 984 Misc group components ........................1029
tWebServiceInput ................................................... 985 tAddLocationFromIP ...........................................1030
tWebServiceInput Properties ............................... 985 tAddLocationFromIP Properties ........................1030
Scenario 1: Extracting images through a Web service Scenario: Identifying a real-world geographic loca-
987 tion of an IP .............................................................1031
Scenario 2: Reading the data published on a Web ser- tBufferInput ...........................................................1033
vice using the tWebServiceInput advanced features (Java tBufferInput properties .......................................1033
only) .......................................................................... 988 Scenario: Retrieving bufferized data (Java) .......1033
tXMLRPCInput ...................................................... 993 tBufferOutput ........................................................1036
tXMLRPCInput Properties .................................. 993 tBufferOutput properties ....................................1036
Scenario: Guessing the State name from an XMLRPC Scenario1: Buffering data (Java) .......................1036
993 Scenario 2: Buffering output data on the webapp serv-
er .............................................................................1039
Scenario 3: Calling a Job with context variables from
Logs & Errors components .................... 997 a browser .................................................................1042
tAssert ...................................................................... 998 Scenario 4: Calling a Job exported as Webservice in
tAssert Properties ................................................ 998 another Job ...............................................................1044
Scenario: Setting up the assertive condition for a Job tContextDump ........................................................1047
execution ................................................................... 998 tContextDump properties ...................................1047
tAssertCatcher ....................................................... 1004 Related Scenario ..............................................1047
Creative Commons License Talend Open Studio Components xvii

tContextLoad ......................................................... 1048 Related scenario ................................................1087
tContextLoad properties .................................... 1048 tWaitForSqlData ...................................................1088
Scenario: Dynamic context use in MySQL DB insert tWaitForSqlData properties ...............................1088
1049 Scenario: Waiting for insertion of rows in a table ....
tFixedFlowInput .................................................... 1052 1089
tFixedFlowInput properties ............................... 1052
Related scenarios ............................................... 1052
tMsgBox ................................................................ 1054 Processing components .........................1093
tMsgBox properties ........................................... 1054 tAggregateRow ......................................................1094
Scenario: ‘Hello world!’ type test ..................... 1054 tAggregateRow properties .................................1094
tRowGenerator ..................................................... 1056 Scenario: Aggregating values and sorting data ..1095
tRowGenerator properties ................................. 1056 tAggregateSortedRow ...........................................1099
Defining the schema ....................................... 1056 tAggregateSortedRow properties .......................1099
Defining the function ..................................... 1057 Related scenario .................................................1100
Scenario: Generating random java data ............. 1058 tConvertType .........................................................1101
tConvertType properties ....................................1101
Scenario: Converting java types ........................1101
Orchestration components ................... 1061 tDenormalize ..........................................................1106
tFileList ................................................................. 1062 tDenormalize Properties .....................................1106
tFlowToIterate ....................................................... 1063 Scenario 1: Denormalizing on one column in Perl ....
tFlowToIterate Properties .................................. 1063 1106
Scenario: Transforming data flow to a list ........ 1063 Scenario 2: Denormalizing on multiple columns .......
tInfiniteLoop .......................................................... 1067 1108
tInfiniteLoop Properties ................................... 1067 tDenormalizeSortedRow .......................................1111
Related scenario ................................................ 1067 tDenormalizeSortedRow properties ...................1111
tIterateToFlow ....................................................... 1068 Scenario: Regrouping sorted rows .....................1111
tIterateToFlow Properties .................................. 1068 tEmptyToNull ........................................................1115
Scenario: Transforming a list of files as data flow .... tEmptyToNull properties ...................................1115
1068 Scenario: Replacing empty fields by NULL fields
tLoop ...................................................................... 1071 (fields of unknown value) .......................................1115
tLoop Properties ................................................ 1071 tExternalSortRow ..................................................1119
Scenario: Job execution in a loop ...................... 1071 tExternalSortRow properties ..............................1119
tPostjob .................................................................. 1074 Related scenario .................................................1120
tPostjob Properties ............................................. 1074 tExtractDelimitedFields ........................................1121
Related scenario ................................................ 1074 tExtractDelimitedFields properties ....................1121
tPrejob .................................................................... 1075 Scenario: Extracting fields from a comma-delimited
tPrejob Properties .............................................. 1075 file ............................................................................1122
Related scenario ................................................ 1075 tExtractPositionalFields ........................................1126
tReplicate ............................................................... 1076 tExtractPositionalFields properties ....................1126
tReplicate Properties .......................................... 1076 Related scenario .................................................1127
Related scenario ................................................ 1076 tExtractRegexFields ..............................................1128
tRunJob ................................................................. 1077 tExtractRegexFields properties ..........................1128
tSleep ...................................................................... 1078 Scenario: Extracting name, domain and TLD from
tSleep Properties ................................................ 1078 e-mail addresses .......................................................1129
Related scenarios ............................................... 1078 tExtractXMLFields ..............................................1132
tUnite ...................................................................... 1079 tFilterColumns .......................................................1133
tUnite Properties ................................................ 1079 tFilterColumns Properties ..................................1133
Scenario: Iterate on files and merge the content 1079 Related Scenario ................................................1133
tWaitForFile .......................................................... 1083 tFilterRow ...............................................................1134
tWaitForFile properties ..................................... 1083 tFilterRow Properties .........................................1134
Scenario: Waiting for a file to be removed ...... 1084 Scenario: Filtering and searching a list of names ......
tWaitForSocket ..................................................... 1087 1135
tWaitForSocket properties ................................. 1087 tMap .......................................................................1137
xviii Talend Open Studio Components Creative Commons License

tMap properties .................................................. 1137 1199
Scenario 1: Mapping with filter and simple explicit tMDMInput ............................................................1203
join ......................................................................... 1137 tMDMInput properties .......................................1203
Scenario 2: Mapping with Inner join rejection .. 1142 Scenario: Reading master data in an MDM hub 1204
Scenario 3: Cascading join mapping ................. 1146 tMDMOutput .........................................................1208
Scenario 4: Advanced mapping with filters, explicit tMDMOutput properties ....................................1208
joins and Inner join rejection .................................. 1147 Scenario: Writing master data in an MDM hub .1210
Scenario 5: Advanced mapping with filters and a tMDMReceive ........................................................1216
check of all rows ................................................... 1151 tMDMReceive properties ...................................1216
Scenario 6: Advanced mapping with lookup reload at Related scenario .................................................1217
each row (java) ........................................................ 1154
Scenario 7: Mapping with join output tables ..... 1160
tNormalize .............................................................. 1163 XML components ..................................1219
tNormalize Properties ........................................ 1163 tAdvancedFileOutputXML ...................................1220
Scenario: Normalizing data (in Perl) ................. 1163 tAdvancedFileOutputXML properties ...............1220
tPivotToRows ........................................................ 1166 Defining the XML tree ...................................1221
tPivotToRows properties ................................... 1166 Importing the XML tree .............................1222
Scenario: Concatenating a list of columns in a table Creating the XML tree manually ...............1223
by using the other table columns as pivot .............. 1167 Mapping XML data ........................................1223
tReplace .................................................................. 1170 Defining the node status .................................1224
tReplace Properties ............................................ 1170 Loop element .............................................1224
Scenario: multiple replacements and column filtering Group element ............................................1225
1171 Scenario: Creating an XML file using a loop ....1226
tSampleRow ........................................................... 1175 tDTDValidator .......................................................1231
tSampleRow properties ..................................... 1175 tDTDValidator Properties ..................................1231
Scenario: Filtering rows and groups of rows ..... 1175 Scenario: Validating xml files ...........................1231
tSortRow ................................................................ 1178 tExtractXMLField .................................................1234
tSortRow properties ........................................... 1178 tExtractXMLField properties .............................1234
Scenario: Sorting entries ................................... 1179 Scenario 1: Extracting XML data from a field in a da-
tabase table ..............................................................1235
Scenario 2: Extracting correct and erroneous data
System components ............................... 1181 from an XML field in a delimited file .....................1237
tRunJob .................................................................. 1182 tFileInputXML .......................................................1241
tRunJob Properties ............................................ 1182 tFileInputXML Properties ..................................1241
Scenario: Executing a child Job ........................ 1183 Scenario 1: Reading and extracting data from an
tSetEnv ................................................................... 1188 XML structure .........................................................1243
tSetEnv Properties ............................................. 1188 Scenario 2: Extracting erroneous XML data via a re-
Scenario: Modifying the Date variable during the ex- ject flow ...................................................................1244
ecution of a Job ...................................................... 1188 tFileOutputXML ....................................................1248
tSSH ........................................................................ 1192 tFileOutputXML properties ...............................1248
tSSH Properties ................................................. 1192 Scenario: From Positional to XML file .............1249
Scenario: Remote system information display via tWriteXMLField ....................................................1250
SSH ........................................................................ 1192 tWriteXMLField properties ...............................1250
tSystem ................................................................... 1194 Scenario: Extracting the structure of an XML file and
tSystem Properties ............................................. 1194 inserting it into the fields of a database table ..........1251
Scenario: Echo ‘Hello World!’ .......................... 1195 tXSDValidator .......................................................1256
tDTDValidator Properties ..................................1256
Related scenario .................................................1256
Talend MDM components .................... 1197 tXSLT .....................................................................1257
tMDMDelete .......................................................... 1198 tXSLT Properties ...............................................1257
tMDMDelete properties .................................... 1198 Scenario: Transforming XML to html using an XSL
Scenario: Deleting master data from an MDM hub .. stylesheet .................................................................1257
Creative Commons License Talend Open Studio Components xix

xx Talend Open Studio Components Creative Commons License
Preface
Purpose
This Reference Guide explains in detail the major components you can find in each of the different
groups in the Palette of Talend Open Studio.
Information presented in this document applies to Talend Open Studio releases beginning with
4.0.x.
Audience
This guide is for users and administrators of Talend Open Studio.

The layout of GUI screens provided in this document may vary slightly from your
actual GUI.
Typographical conventions
This guide uses the following typographical conventions:

• text in bold: window and dialog box buttons and fields, keyboard keys, menus, and menu
options,
• text in [bold]: window, wizard, and dialog box titles,
• text in courier: system parameters typed in by the user,
• text in italics: file, schema, column, row, and variable names referred to in all use cases, and
also names of the fields in the Basic and Advanced setting views referred to in the property
table for each component,
• In the properties section for every component, the or icon indicates whether the
component is available in Java and/or in Perl.
• The icon indicates an item that provides additional information about an important
point. It is also used to add comments related to a table or a figure,
• The icon indicates a message that gives information about the execution requirements
or recommendation type. It is also used to refer to situations or information the end-user need
to be aware of or pay special attention to.
Creative Commons License Talend Open Studio Components xxi

History of changes
The below table lists the changes made in the 4.x release of the Talend Open Studio Reference Guide.
Version Date History of Change

v3.2_a 20/10/2009 Updates in Talend Open Studio Reference Guide include:
-New components in the File, Internet, Processing, Database, Business
intelligence and Data quality chapters.
-Modifications in the settings and scenarios of many components to
match the modifications in the GUI.
-Modifications in tSAPInput + new scenario.
-Modifications in tRunJob and its scenario.
- Reorganized book chapter related to various editions of Talend Open
Studio.
v4.0_a 06/04/2010 Updates in Talend Open Studio Reference Guide include:

-New components in the File, Database, Business and Data quality
chapters.
-Modifications in tMap + a new scenario.
- Deleted the Multischema chapter from the book and place all the
multischema components in the File chapter.
v4.0_b 27/05/2010 Updates in Talend Open Studio Reference Guide include:

-New components in the File, Database, Business, Internet and MDM
chapters.
-EXist components have been added to the Databases chapter.
Feedback and Support
Your feedback is valuable. Do not hesitate to give your input, make suggestions or requests regarding
this documentation or product and find support from the Talend team, on Talend’s Forum website at:
http://talendforge.org/forum
xxii Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components xxiii
xxiv Talend Open Studio Components Creative Commons License
Business Intelligence components
This chapter details the main components which belong to the Business Intelligence family in the
Talend Open Studio Palette.
The BI family groups connectors that cover needs such as reading or writing multidimensional or
OLAP databases, outputting Jasper reports, tracking DB changes in slow changing dimension tables
and so on.
tDB2SCD
tDB2SCD
tDB2SCD properties
Component family Databases/DB2
Function tDB2SCD reflects and tracks changes in a dedicated DB2 SCD table.
Purpose tDB2SCD addresses Slowly Changing Dimension needs, reading regularly a
source of data and logging the changes into a dedicated SCD table
Basic settings Use an existing Select this check box and click the relevant DB
connection connection component on the Component list to
reuse the connection details you already defined.
When a Job contains the parent Job and the

child Job, Component list presents only the
connection components in the same Job level,
so if you need to use an existing connection from the
other level, make sure that the available connection
components are sharing the intended connection.
For more information on how to share a DB connec-
tion across Job levels, see Use or register a shared
DB connection in any database connection compo-
nent corresponding to the database you are using, in
Databases components on page 205.
Otherwise, you can as well deactivate the connection
components and use Dynamic settings of the compo-
nent to specify the intended connection manually. In
this case, make sure the connection name is unique
and distinctive all over through the two Job levels.
For more information about Dynamic settings, see
your studio User Guide.
Property type Either Built-in or Repository.
Built-in: No property data stored centrally.
Repository: Select the Repository file where
properties are stored. The following fields are
pre-filled in using fetched data.
Host Database server IP address.
Port Listening port number of DB server.
Database Name of the database.
Table Schema Name of the DB schema.
Username and DB user authentication data.
Password
Table Name of the table to be written. Note that only one
table can be written at a time.
2 Talend Open Studio Components Creative Commons License

tDB2SCD
Schema and Edit A schema is a row description, i.e., it defines the

Schema number of fields to be processed and passed on to the
next component. The schema is either built-in or
remotely stored in the Repository.
Built-in: The schema is created and stored locally for
this component only. Related topic: Setting a built-in
schema of Talend Open Studio User Guide.
Repository: The schema already exists and is stored
in the Repository, hence can be reused. Related topic:
Setting a Repository schema of Talend Open Studio
User Guide.
SCD Editor The SCD editor helps to build and configure the data
flow for slowly changing dimension outputs.
For more information, see SCD management
methodologies on page 20.
Use memory saving Select this check box to maximize system
Mode performance.
Advanced settings tStatCatcher Statistics Select this check box to collect log data at the
component level.
Debug mode Select this check box to display each step during
processing entries in a database.
Usage This component is used as Output component. It requires an Input component
and Row main link as input.
Related scenarios
For related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0
through type 3) on page 23.
Creative Commons License Talend Open Studio Components 3

tDB2SCDELT
tDB2SCDELT
tDB2SCDELT Properties
Function tDB2SCDELT reflects and tracks changes in a dedicated DB2 SCD table.
Purpose tDB2SCDELT addresses Slowly Changing Dimension needs through SQL
queries (server-side processing mode), and logs the changes into a dedicated
DB2 SCD table.
Basic settings Property type Either Built-in or Repository.
Built-in: No property data stored centrally. Enter
properties manually.
Repository: Select the repository file where
Properties are stored. The fields that come after are
pre-filled in using the fetched data.
Use an existing Select this check box and click the relevant
connection tDB2Connection component on the Component
List to reuse the connection details you already
defined.

your studio user guide.
Host The IP address of the database server.
Port Listening port number of database server.
Database Name of the database
Username and User authentication data for a dedicated database.
Password
Source table Name of the input DB2 SCD table.
table can be written at a time

tDB2SCDELT
Action on table Select to perform one of the following operations on

the table defined:
None: No action carried out on the table.
Drop and create table: The table is removed and
created again
Create table: A new table gets created.
Create table if not exists: A table gets created if it
does not exist.
Clear table: The table content is deleted. You have
the possibility to rollback the operation.
Truncate table: The table content is deleted. You don
not have the possibility to rollback the operation.
User Guide.
Surrogate Key Select the surrogate key column from the list.
Creation Select the method to be used for the surrogate key
generation.
Source Keys Select one or more columns to be used as keys, to
ensure the unicity of incoming data.
Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD
Type 1 should be used for typos corrections for
example. Select the columns of the schema that will
be checked for changes.
Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD
Type 2 should be used to trace updates for example.
Select the columns of the schema that will be checked
for changes.
Start date: Adds a column to your SCD schema to
hold the start date value. You can select one of the
input schema columns as Start Date in the SCD table.
End Date: Adds a column to your SCD schema to
hold the end date value for the record. When the
record is currently active, the End Date column shows
a null value, or you can select Fixed Year value and
fill it in with a fictive year to avoid having a null value
in the End Date field.
Log Active Status: Adds a column to your SCD
schema to hold the true or false status value. This
column helps to easily spot the active record.
Log versions: Adds a column to your SCD schema to
hold the version number of the record.

tDB2SCDELT
Advanced settings Encoding Type Select the encoding type from the list or select
Custom and define it manually. This field is
compulsory for DB data handling.
Enable debug mode Select this check box to display each step during
tStatCatcher Statistics Select this check box to collect log data at the
component level.
Usage This component is used as an output component. It requires an input
component and Row main link as input.
Related Scenario
For related topics, see tDB2SCD on page 2 Scenario: Tracking changes using Slowly Changing
Dimensions (type 0 through type 3) on page 23.

tGreenplumSCD
tGreenplumSCD
tGreenplumSCD Properties
Component family Databases/Greenplum
Function tGreenplumSCD reflects and tracks changes in a dedicated Greenplum SCD

table.
Purpose tGreenplumSCD addresses Slowly Changing Dimension needs, reading
regularly a source of data and logging the changes into a dedicated SCD table
Use an existing Select this check box and click the relevant DB

Connection type Select the relevant driver on the list.
Schema Name of the DB schema.
Password

tGreenplumSCD

User Guide.
Mode performance.
component level.
Related scenario
For related scenarios, see tMysqlSCD Scenario: Tracking changes using Slowly Changing

tInformixSCD
tInformixSCD
tInformixSCD properties
Component family Databases/Business

Intelligence/Informix
Function tInformixSCD tracks and shows changes which have been made to Informix
SCD dedicated tables.
Purpose tInformixSCD addresses Slowly Changing Dimension transformation needs,
by regularly reading a data source and listing the modifications in an SCD
dedicated table.
Basic settings Property type Built-in or Repository.
pre-filled in using fetched data

Port DB server listening port.
Schema Name of the schema.
Username et User authentication information.
Password
Instance Name of the Informix instance to be used. This
information can generally be found in the SQL hosts
file.

tInformixSCD
Table Name of the table to be created

User Guide.
Use memory saving Select this check box to improve system performance.
Mode
Use Transaction Select this check box when the database is configured
in NO_LOG mode.
Advanced settings tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Debug mode Select this check box to display each step of the
process by which data is written in the database.
Usage This component is an output component. Consequently, it requires an input
component and a connection of the Row > Main type.
Related scenario
For a scenarion in which tInformixSCD might be used, see, the tMysqlSCD Scenario: Tracking
changes using Slowly Changing Dimensions (type 0 through type 3) on page 23

tIngresSCD
tIngresSCD
tIngresSCD Properties
Component family Databases/Ingress
Function tIngresSCD reflects and tracks changes in a dedicated Ingres SCD table.
Purpose tIngresSCD addresses Slowly Changing Dimension needs, reading regularly
a source of data and logging the changes into a dedicated SCD table
properties are stored. The fields to follow are
Server Database server IP address.
Password
User Guide.
Mode performance.
component level.

tIngresSCD
Related scenario

tMondrianInput
tMondrianInput
tMondrianInput Properties
Component family Business

Intelligence/OLAP
Cube
Function tMondrianInput reads data from relational databases and produces
multidimensional data sets based on an MDX query.
Purpose tMondrianInput executes a multi-dimensional
expression (MDX) query corresponding to the dataset structure and schema
definition. Then it passes on the multidimensional dataset obtained to the next
component via a Main row link.
Basic settings DB type Select the relevant type of relational database
Properties are stored. The following fields are
Datasource Name and path of the file containing the data.
Password
User Guide.
Catalog Path to the catalog (structure of the data warehouse).
MDX Query Type in the MDX query paying particularly attention
to properly sequence the fields in order to match the
schema definition and the data warehouse structure.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB
data handling.
component level.
Usage This component covers MDX queries for multi-dimensional datasets.

tMondrianInput
Scenario: Cross-join tables
This Job extracts multi-dimensional datasets from relational database tables stored in a MySql base.
The data are retrieved using a multidimensional expression (MDX query). Obviously you need to
have to know the structure of your data, or at least have a structure description (catalog) as a
reference for the dataset to be retrieved in the various dimensions.
• Drop tMondrianInput and tLogRow from the Palette to the design workspace.
• Link the Mondrian connector to the output component using a Row Main connection.
• Select the tMondrianInput component and select the Component view.
• In DB type field, select the relational database you are using with Mondrian.
• Select the relevant Repository entry as Property type, if you store your DB connection
details centrally. In this example the properties are built-in.
• Fill out the details of connection to your DB: Host, Port, Database name, User Name and
Password.
• Select the relevant Schema in the Repository if you store it centrally. In this example, the
schema is to be set (built-in).

tMondrianInput
• The relational database we want to query contains five columns: media, drink, unit_sales,
store_cost and store_sales.
• The query aims at retrieving the unit_sales, store_cost and store_sales figures for various
media / drink using an MDX query such as in the example below:
• Back on the Basic settings tab of the tMondrianInput component, set the Catalog path to
the data warehouse. This catalog describes the structure of the warehouse.
• Then type in the MDX query such as:
"select
{[Measures].[Unit Sales], [Measures].[Store Cost],
[Measures].[Store Sales]} on columns,
CrossJoin(
{ [Promotion Media].[All Media].[Radio],
[Promotion Media].[All Media].[TV],
[Promotion Media].[All Media].[Sunday Paper],
[Promotion Media].[All Media].[Street Handout] },
[Product].[All Products].[Drink].children) on rows
from Sales
where ([Time].[1997])"

tMondrianInput
• Eventually, select the Encoding type on the list.

• Select the tLogRow component and select the Print header check box to display the column
names on the console.
• Then press F6 to run the Job.
The console shows the result of the unit_sales, store_cost and store_sales for each type of Drink
(Beverages, Dairy, Alcoholic beverages) crossed with each media (TV, Sunday Paper, Street
handout) as shown previously in a table form.

tMSSqlSCD
tMSSqlSCD
tMSSqlSCD Properties
Component family Databases/MSSQL

Server
Function tMSSqlSCD reflects and tracks changes in a dedicated MSSQL SCD table.
Purpose tMSqlSCD addresses Slowly Changing Dimension needs, reading regularly a
source of data and logging the changes into a dedicated SCD table

Password

tMSSqlSCD

User Guide.
Mode performance.
Advanced settings Additional JDBC Specify additional connection properties for the DB
parameters connection you are creating. This option is not
available if you have selected the Use an existing
connection check box in the Basic settings..
component level.
Related scenario

tMysqlSCD
tMysqlSCD
tMysqlSCD Properties
Component family Databases/MySQL
Function tMysqlSCD reflects and tracks changes in a dedicated MySQL SCD table.
Purpose tMysqlSCD addresses Slowly Changing Dimension needs, reading regularly

Password

tMysqlSCD
Action on table On the table defined, you can perform one of the
following operations:
None: No operation is carried out.
Create a table: The table does not exist and gets
created.
Create a table if not exists: The table is created if it
does not exist.
User Guide.
Mode performance.
connection check box in the Basic settings.
component level.
SCD management methodologies

Slowly Changing Dimensions (SCD)s are dimensions that have data that slowly changes. The
SCD editor offers the simplest method of building the data flow for the SCD outputs. In the SCD
editor, you can map columns, select surrogate key columns, and set column change attributes
through combining SCD types.
The below figure illustrates an example of the SCD editor.

tMysqlSCD

tMysqlSCD
SCD keys
You must choose one or more source keys columns from the incoming data to ensure its
unicity.
You must set one surrogate key column in the dimension table and map it to an input column
in the source table. The value of the surrogate key links a record in the source to a record
in the dimension table. The editor uses this mapping to locate the record in the dimension
table and to determine whether a record is new or changing. The surrogate key is typically
the primary key in the source, but it can be an alternate key as long as it uniquely identifies
a record and its value does not change.
Source keys: Drag one or more columns from the Unused panel to the Source keys panel
to be used as the key (s) that ensure the unicity of the incoming data.
Surrogate keys: Set the column where the generated surrogate key will be stored. A
surrogate key can be generated based on a method selected on the Creation list.
Creation: Select any of the below methods to be used for the key generation:
• Auto increment: auto-incremental key.
• Input field: key is provided in an input field.
• Routine: you can access the basic functions through Ctrl+ Space bar combination.
• Table max +1: the maximum value from the SCD table is incremented to create a
surrogate key.
Combining SCD types

The Slowly Changing Dimensions support four types of changes: Type 0 through Type 3.
You can apply any of the SCD types to any column in a source table by a simple
drag-and-drop operation.
• Type 0: is not used frequently. Some dimension data may be overwritten and other
may stay unchanged over time. This is most appropriate when no effort has been
made to deal with the changing dimension issues.
• Type 1: no history is kept in the database. New data overwrites old data. Use this
type if tracking changes is not necessary. this is most appropriate when correcting
certain typos, for example the spelling of a name.

tMysqlSCD
• Type2: the whole history is stored in the database. This type tracks historical data
by inserting a new record in the dimensional table with a separate key each time a
change is made. This is most appropriate to track updates, for example.
SCD Type 2 principle lies in the fact that a new record is added to the SCD table
when changes are detected on the columns defined. Note that although several
changes may be made to the same record on various columns defined as SCD Type
2, only one additional line tracks these changes in the SCD table.
The SCD schema in this type should include SCD-specific extra columns that hold
standard log information such as:
-start: adds a column to your SCD schema to hold the start date. You can select one
of the input schema columns as a start date in the SCD table.
-end: adds a column to your SCD schema to hold the end date value for a record.
When the record is currently active, the end date is NULL or you can select Fixed
Year Value and fill in a fictive year to avoid having a null value in the end date field.
-version: adds a column to your SCD schema to hold the version number of the
record.
-active: adds a column to your SCD schema to hold the true or false status value. this
• Type 3: only the information about a previous value of a dimension is written into
the database. This type tracks changes using separate columns. This is most
appropriate to track only the previous value of a changing column.
Scenario: Tracking changes using Slowly Changing Dimensions

(type 0 through type 3)
This five-component Java scenario describes a Job that tracks changes in four of the columns in a
source delimited file and writes changes and the history of changes in an SCD table.
The source delimited file contains various personal details including firstname, lastname,address,
city, company, age, and status. An id column helps ensuring the unicity of the data.
We want any change in the marital status to overwrite the existing old status record. This type of
change is equivalent to an SCD Type 1.
We want to insert a new record in the dimensional table with a separate key each time a person
changes his/her company. This type of change is equivalent to an SCD Type 2.
We want to track only the previous city and previous address of a person. This type of change is
equivalent to an SCD Type 3.

tMysqlSCD
To realize this kind of scenario, it is better to divide it into three main steps: defining the main flow
of the Job, setting up the SCD editor, and finally creating the relevant SCD table in the database.
Step 1: Defining the main flow of the job:
• Drop the following components from the Palette onto the design workspace:
tMysqlConnection, tFileInputDelimited, tLogRow, tMysqlSCD, tMysqlCommit.
• Connect tFileInputDelimited, tLogRow, and tMysqlSCD using the Row Main link. This
is the main flow of your Job.
• Connect tMysqlConnection to tFileInputDelimited and tMysqlSCD to tMysqlCommit
using the OnComponntOk trigger.
• In the design workspace, double-click tMysqlConnection to display its Basic settings view
and set the database connection details manually. The tMysqlConnection component
should be used to avoid setting several times the same DB connection when multiple DB
components are used.
If you have already stored the connection details locally in the Repository, drop the
needed metadata item to the design workspace and the database connection detail will
automatically display in the relevant fields. For more information about Metadata, see
Defining Metadata items of Talend Open Studio User Guide.
In this scenario, we want to connect to the SCD table where all changes in the source delimited file
will be tracked down.
• In the design workspace, double-click tFileInputDelimited to display its Basic settings

view.

tMysqlSCD
• Click the three-dot [...] button next to the File Name field to select the path to the source
delimited file, dataset, that contains the personal details.
• Define the row and field separators used in the source file.
The File Name, Row separator, and Field separators are mandatory.
• If needed, set Header, Footer, and Limit.

In this scenario, set Header to 1. Footer and limit for the number of processed rows are not
set.
• Click Edit schema to describe the data structure of the source delimited file.
In this scenario, the source schema is made of eight columns: firstname, lastname,address, city,
company, age, status and id.
• Set the basic settings for tLogRow in order for the content of the source file with varying
attributes to display in cells of a table on the console before being processed through the SCD
component.
• In the design workspace, double-click tMysqlSCD to define its basic settings.

tMysqlSCD
• In the Basic settings view, select the Use an existing connection check box to reuse the
connection details defined on the tMysqlConnection properties.
• In the Table field, enter the table name to be used to track changes.
• Click Sync columns to auto propagate all the columns of the delimited file.
• If needed, click Edit schema to view the output data structure of tMysqlSCD.
• In the design workspace, double-click tMysqlCommit to define its basic settings.

• Select the relevant connection on the Component list if more than one connection exists.
Step 2: Setting up the SCD editor:
• In tMysqlSCD basic settings, click the three-dot [...] button next to the SCD Editor to open
the SCD editor and build the data flow for the SCD outputs.

tMysqlSCD
All columns of the source delimited file, coming from the previous component, display in the
Unused panel of the SCD editor. All other panels in the SCD editor display empty.
• From the Unused list, drop the id column to the Source keys panel to use it as the key to
ensure the unicity of the incoming data.
• In the Surrogate keys panel, enter a name for the surrogate key in the Name field, SK1 in
this scenario.
• From the Creation list, select the method to be used for the surrogate key generation,
Auto-increment in this scenario.
• From the Unused list, drop the firstname and lastname columns to the Type 0 panel, changes
in these two columns do not interest us.
• Drop the status column to the Type 1 panel. The new value will overwrite the old value.
• Drop the company column to the Type 2 panel. Each time a person changes his/her company,
a new record will be inserted in the dimensional table with a separate key.
In the Versioning area:
-Define the start and end columns of your SCD table that will hold the start and end date
values. The end date is null for current records until a change is detected. Then the end date
gets filled in and a new record is added with no end date.
In this scenario, we select Fixed Year Value for the end column and fill in a fictive year to
avoid having a null value in the end date field.
-select the version check box to hold the version number of the record.
-select the active check box to spot the column that will hold the True or False status. True
for the current active record and False for the modified record.
• Drop the address and city columns to the Type 3 panel to track only the information about
the previous value of the address and city.
For more information about SCD types, see SCD management methodologies on page 20.

tMysqlSCD
• Click Ok to validate your configuration and close the SCD editor.

• Click Edit schema to view the input and output data flow. The SCD output schema should
include the SCD-specific columns that hold standard log information and that were defined
in the SCD editor.

tMysqlSCD
Step 3: Creating the SCD table:

You must create in the database the SCD table you defined in the tMysqlSCD basic settings to be
the table that will track changes. The SCD table must include the delimited file schema along with
the SCD output schema.
In this scenario, the SCD table has the following schema: firstname, lastname,address, city,
company, age status, id, SK1, previous_address, previuos city, scd_active, scd_end, scd version, ,
and scd_date.
To write changes and the history of changes in the defined SCD table, save your Job and press F6
to execute it.
The console shows the content of the input delimited file.

Janet gets divorced and moves to Adelanto at 355 Golf Rd. She works at Greenwood.
Adam gets married and moves to Belmont at 2505 Alisson ct.He works at Scoop.
Update the delimited file with the above information and press F6 to run your Job.

tMysqlSCD
The console shows the changes in the personal information and the SCD table shows the history of
changes made to the input file along with the status and version number.

tMysqlSCDELT
tMysqlSCDELT
tMysqlSCDELT Properties
Function tMysqlSCDELT reflects and tracks changes in a dedicated MySQL SCD

table.
Purpose tMysqlSCDELT addresses Slowly Changing Dimension needs through SQL
MySQL SCD table.
connection tMySqlConnection component on the Component
list to reuse the connection details you already
defined.

Password
Source table Name of the input MySQL SCD table.

tMysqlSCDELT

the table defined:
Drop and create the table: The table is removed and
created again
Create a table: A new table gets created.
Create a table if not exists: A table gets created if it
does not exist.
Clear a table: The table content is deleted. You have
Truncate a table: The table content is deleted. You
don not have the possibility to rollback the operation.
User Guide.
generation.
for changes.
hold the strat date value. You can select one of the

tMysqlSCDELT
Related Scenario
For related topics, see tMysqlSCD on page 19 Scenario: Tracking changes using Slowly Changing

tOracleSCD
tOracleSCD
tOracleSCD Properties
Component family Databases/Oracle
Function tOracleSCD reflects and tracks changes in a dedicated Oracle SCD table.
Purpose tOracleSCD addresses Slowly Changing Dimension needs, reading regularly

DB Version Select the Oracle version you are using.
Password

tOracleSCD

User Guide.
Mode performance.
component level.
Related scenario

tOracleSCDELT
tOracleSCDELT
tOracleSCDELT Properties
Function tOracleSCDELT reflects and tracks changes in a dedicated Oracle SCD table.
Purpose tOracleSCDELT addresses Slowly Changing Dimension needs through SQL
DB2 SCD table.
connection tOracleConnection component on the Component
defined.

Password

tOracleSCDELT

the table defined:
created again
does not exist.
User Guide.
generation.
for changes.

tOracleSCDELT
component level.
Related Scenario
For related topics, see tOracleSCD on page 34 Scenario: Tracking changes using Slowly Changing

tParAccelSCD
tParAccelSCD
tParAccelSCD Properties
Component family Databases/ParAccel
Function tParAccelSCD reflects and tracks changes in a dedicated ParAccel SCD table.
Purpose tParAccelSCD addresses Slowly Changing Dimension needs, reading

Password

tParAccelSCD

User Guide.
Mode performance.
component level.
Related scenario

tPostgresPlusSCD
tPostgresPlusSCD
tPostgresPlusSCD Properties
Component family Databases/PostgresPl

us Server
Function tPostgresPlusSCD reflects and tracks changes in a dedicated MSSQL SCD

table.
Purpose tPostgresPlusSCD addresses Slowly Changing Dimension needs, reading

Password

tPostgresPlusSCD

User Guide.
Mode performance.
component level.
Related scenario

tPostgresPlusSCDELT
tPostgresPlusSCDELT
tPostgresPlusSCDELT Properties
Component family Databases/Postgresql
Function tPostgresPlusSCDELT reflects and tracks changes in a dedicated Oracle SCD

table.
Purpose tPostgresPlusSCDELT addresses Slowly Changing Dimension needs
through SQL queries (server-side processing mode), and logs the changes into
a dedicated PostgresPlus SCD table.
connection tPostgresPlusConnection component on the
Component List to reuse the connection details you
already defined.

Schema Exact name of the schema
Password

tPostgresPlusSCDELT

the table defined:
created again
does not exist.
User Guide.
generation.
for changes.

tPostgresPlusSCDELT
component level.
Related Scenario

tPostgresqlSCD
tPostgresqlSCD
tPostgresqlSCD Properties

Server
Function tPostgresqlSCD reflects and tracks changes in a dedicated Postrgesql SCD

table.
Purpose tPostgresqlSCD addresses Slowly Changing Dimension needs, reading

Password

tPostgresqlSCD

User Guide.
Mode performance.
component level.
Related scenario

tPostgresqlSCDELT
tPostgresqlSCDELT
tPostgresqlSCDELT Properties
Function tPostgresqlSCDELT reflects and tracks changes in a dedicated Postgresql

SCD table.
Purpose tPostgresqlSCDELT addresses Slowly Changing Dimension needs through
SQL queries (server-side processing mode), and logs the changes into a
dedicated DB2 SCD table.
connection tPostgresqlConnection component on the
already defined.

Password

tPostgresqlSCDELT

the table defined:
created again
does not exist.
User Guide.
generation.
for changes.

tPostgresqlSCDELT
component level.
Related Scenario

tSPSSInput
tSPSSInput
Before being able to benefit from all functional objectives of the SPSS components, make sure to do the
following:
-If you have already installed SPSS, add the path to the SPSS directory as the following: SET
PATH=%PATH%;<DR>:\program\SPSS, or
-If you have not installed SPSS, you must copy the SPSS IO “spssio32.dll” lib from the SPSS installation CD and
paste it in Talend root directory.
tSPSSInput properties
Component family Business Intelligence
Function tSPSSInput reads data from an SPSS .sav file.

Purpose tSPSSInput addresses SPSS .sav data to write it for example in another file.
Basic settings Sync schema Click this button to synchronize with the columns of
the input SPSS .sav file.
Schema and Edit The schema metadata in this component is retrieved
Schema directly from the input SPSS .sav file and thus is
read-only.
You can click Edit schema to view the retrieved
metadata.
Filename Name or path of the SPSS .sav file to be read.
Translate labels Select this check box to translate the labels of the
stored values.
If you select this check box, you need to re-
trieve the metadata again.
component level.
Usage This component is used as a start component. It requires an output flow.
Scenario: Displaying the content of an SPSS .sav file
The following scenario creates a two-component Job, which aims at reading each row of a .sav file
and displaying the output on the log console.
• Drop a tSPSSInput component and a tLogRow component from the Palette onto the design
workspace.
• Right-click on tPSSInput and connect it to tLogRow using a Main Row link.

tSPSSInput
• Click tSPSSInput to display its Basic settings view and define the component properties.
• Click the three-dot button next to the Filename field and browse to the SPSS .sav file you
want to read.
• Click the three-dot button next to Sync schema. A message opens up prompting you to
accept retrieving the schema from the defined SPSS file.
• Click Yes to close the message and proceed to the next step.
• If required, click the three-dot button next to Edit schema to view the pre-defined data
structure of the source SPSS file.
• Click OK to close the dialog box.

• Save the Job and press F6 to execute it.
The SPSS file is read row by row and the extracted fields are displayed on the log console.

tSPSSInput
To carry out translation on the stored values, complete the following:

• In the Basic settings view, select the Translate label check box.
• Click Sync Schema a second time to retrieve the schema after translation.
A message opens up prompting you to accept retrieving the schema from the defined SPSS
file.
A second message opens up prompting you to accept propagating the changes.
The SPSS file is read row by row and the extracted fields are displayed on the log console after
translating the stored values.

tSPSSOutput
tSPSSOutput
following:
tSPSSOutput properties
Function tSPSSOutput writes data entries in an .sav file.

Purpose tSPSSOutput writes or appends data to an SPSS .sav file. It creates SPSS files
on the fly and overwrites existing ones.
Basic settings Sync schema Click this button to synchronize with the columns of
the SPSS .sav file.
User Guide.
Filename Name or path of the SPSS .sav file to be written.
Write Type Select an operation from the list:
Write: simply writes the new data.
Append: writes the new data at the end of the existing
data.
component level.
Usage This component can not be used as start component. It requires an input flow
Scenario: Writing data in an .sav file
This Java scenario describes a very simple Job that writes data entries in an .sav file.
• Drop a tRowGenerator component and a tSPSSOutput component from the Palette onto
the design workspace.
• Right-click on tRowGenerator and connect it to tSPSSOutput using a Main Row link.

tSPSSOutput
• In the design workspace, double click tRowGenerator to display its Basic Settings view
and open its editor. Here you can define your schema.
• Click the plus button to add the columns you want to write in the .sav file.
• Define the schema and set the parameters to the columns.
Make sure to define the length of your columns. Otherwise, an error message will display when building
your Job.
• Click OK to validate your schema and close the editor.

• Click tSPSSOutput to display its Basic settings view and define the component properties.
• Click the three-dot button next to the Filename field and browse to the SPSS .sav file in
which you want to write data.

tSPSSOutput
• Click the three-dot button next to Sync columns to synchronize columns with the previous
component. In this example, the schema to be inserted in the .sav file consists of the two
columns: id and country.
• If required, click Edit schema to view/edit the defined schema.
• From the Write Type list, select Write or Append to simply write the input data in the .sav
file or add it to the end of the .sav file.
The data generated by the tRowGenerator component is written in the defined .sav file.

tSPSSProperties
tSPSSProperties
In order to benefit from all of the functional objectives of the SPSS components, do the following:
paste it in the Talend root directory.
tSPSSProperties properties
Function tSPSSProperties describes the properties of a defined SPSS .sav file.

Purpose tSPSSProperties allows you to obtain information about the main properties
of a defined SPSS .sav file.
Basic settings Schema and Edit The schema metadata in this component is predefined
Schema and thus read-only. You can click Edit schema to
view the predefined metadata.
A schema is a row description, i.e., it defines the
number of fields to be processed and passed on to the
User Guide.
Filename Name or path of the .sav file to be processed.
component level.
Usage Use this component as a start component. It needs an output flow.
Related scenarios
For related topics, see:

• Scenario: Displaying the content of an SPSS .sav file on page 51.
• Scenario: Writing data in an .sav file on page 54.

tSPSSStructure
tSPSSStructure
following:
tSPSSStructure properties
Function tSPSSStructure retrieves information about the variables inside .sav files.
Purpose tSPSSStructure addresses variables inside .sav files. You can use this
component in combination with tFileList to gather information about existing
*.sav files to further analyze or check the findings.
Basic settings Schema and Edit The schema metadata in this component is predefined
Schema and thus read-only. It is based on the internal SPSS
convention. You can click Edit schema to view the
predefined metadata.
User Guide.
Filename Name or path of the .sav file to be processed.
Related scenarios

• Scenario: Displaying the content of an SPSS .sav file on page 51.
• Scenario: Writing data in an .sav file on page 54.

tSybaseSCD
tSybaseSCD
tSybaseSCD properties
Component family Databases/Sybase
Function tSybaseSCD reflects and tracks changes in a dedicated Sybase SCD table.
Purpose tSybaseSCD addresses Slowly Changing Dimension needs, reading regularly

Password

tSybaseSCD

User Guide.
Mode performance.
component level.
Related scenarios

tSybaseSCDELT
tSybaseSCDELT
tSybaseSCDELT Properties
Function tSybaseSCDELT reflects and tracks changes in a dedicated Sybase SCD

table.
Purpose tSybaselSCDELT addresses Slowly Changing Dimension needs through SQL
Sybase SCD table.
connection tSybaseConnection component on the Component
defined.

Password
Source table Name of the input Sybase SCD table.

tSybaseSCDELT

the table defined:
created again
does not exist.
User Guide.
generation.
for changes.

tSybaseSCDELT
Related Scenario
For related topics, see tMysqlSCD on page 19 Scenario: Tracking changes using Slowly Changing

tSybaseSCDELT

Business components
This chapter details the major components that you can find in Business group of the Palette of
Talend Open Studio.
The Business component family groups connectors that covers specific Business needs, such as
reading and writing CRM, or ERP types of database and reading from or writing to an SAP system.
Business components
tAlfrescoOutput
tAlfrescoOutput
tAlfresco Properties
Function Creates dematerialized documents in an Alfresco server where they are

indexed under meaningful models.
Purpose Allows to create and manage documents in an Alfresco server.
Basic settings URL Type in the URL to connect to the Alfresco Web
application.
Login and Password Type in the user authentication data to the Alfresco
server.
Base Type in the base path where to put the document, or
Select the Map... check box and then in the Column
list, select the target location column.
Note: When you type in the base name, make sure to
use the double backslash (\\) escape character.
Document Mode Select in the list the mode you want to use for the
created document.
Create only: creates a document if it does not exist.
Note that an error message will display if you try to
create a document that already exists
Create or update: creates a document if it does not
exist or updates the document if it exists.
Container Mode Select in the list the mode you want to use for the
destination folder in Alfresco.
Update only: updates a destination folder if the
folder exists.
Note that an error message will display if you try to
update a document that does not exist
Create or update: creates a destination folder if it
does not exist or updates the destination folder if it
exists.
Define Document Click the three-dot button to display the
Type tAlfrescoOutput editor. This editor enables you to:
- select the file where you defined the metadata
according to which you want to save the document in
Alfresco
-define the type f the document
-select any of the aspects in the available aspects list
of the model file and click the plus button to add it in
the list to the left.
Property Mapping Displays the parameters you set in the
tAlfrescoOutput editor and according to which the
document will be created in the Alfresco server.
Note that in the Property Mapping area, you can
modify any of the input schemas.

Business components
tAlfrescoOutput

schema number of fields that will be processed and passed on
to the next component. The schema is either built-in
or remote in the Repository.
Click Edit Schema to make changes to the schema.
Note that if you make changes, the schema
automatically becomes built-in.
Result Log File Name Browse to the file where you want to save any logs
related to the job execution.
Advanced settings Configure Target Allows to configure the (by default) type of
Location Container containers (folders)
Select this check box to display new fields where you
can modify the container type to use your own
created types based on the father/child model.
Configure When selected, allows to manually configure access
Permissions rights to containers and documents.
Select the Inherit Permissions check box to
synchronize access rights between containers and
documents.
Click the Plus button to add new lines to the
Permissions list, then you can assign roles to user or
group columns.
Encoding Type Select the encoding type from the list or select
compulsory.
Association Target Allows to create new documents in Alfresco with
Mapping associated links towards other documents already
existing in Alfresco, to facilitate the navigation
process for example.
To create associations:
-Open the tAlfresco editor.
-Click the Add button and select a model where you
have already defined aspects that contain
associations.
-Click the drop-down arrow at the top of the editor
and select the corresponding document type.
-Click OK to close the editor and display the created
association in the Association Target Mapping list.
tStatCatcher Statistics Select this check box to gather the job processing
metadata at a job level as well as at each component
level.
Usage Usually used as an output component. An input component is required.
Limitation/Prerequisites To be able to use the tAlfrescoOutput component, few relevant resources
need to be installed: check the Installation procedure sub section for more
information.
Installation procedure
To be able to use tAlfrescoOutput in Talend Open Studio, you need first to install the Alfresco
server with few relevant resources.
The below sub sections detail the prerequisite and the installation procedure.

Business components
tAlfrescoOutput
Prerequisites
Start with the below operations:
• Download the file alfresco-community-tomcat-2.1.0.zip
• Unzip the file in an installation folder, for example: C:\Program
Files\Java\jdk1.50_16
• Install JDK 1.5.0+
• Update the environment variable JAVA_HOME (JAVA_HOME=
C:\alfresco)
• From the installation folder (C:\alfresco), launch the alfresco server using the
script alf_start.bat
Make sure that the Alfresco server is launched correctly before start using the tAlfrescoOutput
component.
Installing the Talend Alfresco module

Note that the talendalfresco_20081014.zip is provided with the
tAlfrescoOutput component in Talend Open Studio.
To install the talendalfresco module:
• From talendalfresco_20081014.zip and in the
talendalfresco_20081014\alfresco folder, look for the following jars:
stax-api-1.0.1.jar, wstx-lgpl-3.2.7.jar,
talendalfresco-client_1.0.jar, and
talendalfresco-alfresco_1.0.jar and move them to
C:\alfredsco\tomcat\webapps\alfresco\WEB-INF\lib
• Add the authentification filter of the commands to the web.xml file located in the
path C:\alfredsco\tomcat\webapps\alfresco\WEB-INF son
WEB-INF/ following the model of the example provided in
talendalfresco_20081014/alfresco folder of the zipped file
talendalfresco_20081014.zip
The below figures show the portion of lines (in blue) to add in the file web.xml
alfresco.

Business components
tAlfrescoOutput
Useful information for advanced use

Installing new types for Alfresco:
From the package_jeu_test.zip and in the
package_jeu_test/fichiers_conf_alfresco2.1 folder, look for the
following files: xml H76ModelCustom.xml (description of the model),
web-client-config-custom.xml (web interface of the model), and
custom-model-context.xml (registration of the new model) and paste them in the
following folder:
C:/alfredsco/tomcat/shared/classes/alfresco/extension
Dates:
• The dates must be of the Talend date type java.util.Date
• Columns without either mapping or default values, for example of the type Date,
are written as empty strings.
Solution: delete all columns without mapping or default values. Note that any
modification of the type Alfresco will put them back.
Content:
• Do not mix up between the file path which content you want to create in Alfresco
and its target location in Alfresco.
• Provide a URL! It can target various protocols, among which are file, HTTP and so
on.
• For URLs referring to files on the file system, precede them by "file:" for Windows
used locally, and by "file://" for Windows on a network (which accepts as well "file:
\ \") or for Linux.
• Do not double the backslash in the target base path (automatic escape), unless you
type in the path in the basic settings of the tAlfrescoOutput component, or doing
concatenation in the tMap editor for example.
Multiple properties or associations:
• It is possible to create only one association by document if it is mapped to a string
value, or one or more associations by document if it is mapped to a list value (object).
• You can empty an association by mapping it to an empty list, which you can create,
for example, by using new java.util.ArrayList()in the tMap component.

Business components
tAlfrescoOutput
• However, it is impossible to delete an association.

• Building List(object)with tAggregate:
-define the table of the relation n-n in a file, containing a name line for example
(included in the input rows), and a category line (that can be defined with its
mapping in a third file).
-group by: input name, output name.
-operation: output categoryList, function list(object), input
category. ATTENTION list (object) and non simple list!
References (documents and folders):
• References are created by mapping one or more existing reference nodes (xpath or
namepath) using String type or List(object).
• An error in the association or the property of the reference type does not prevent the
creation of the node that holds the reference.
• Properties of the reference type are created in the Basic Settings view.
• Associations are created in the Advanced Settings view.
Dematerialization, tAlfrescoOutput, and Enterprise Content Management

Dematerialization is the process that convert documents held in physical form into electronic
form, and thus helps to move away from the use of physical documentation to the use of
electronic Enterprise Content Management (ECM) systems. The range of documents that can
be managed with an Enterprise Content Management system include just about everything from
basic documents to stock certificates, for example.
Enterprises dematerialize their content via a manual document handling, done by man, or an
automatic document handling, machine-based.
Considering the varied nature of the content to be dematerialized, enterprises have to use varied
technologies to do it. Scanning paper documents, creating interfaces to capture electronic
documents from other applications, converting document images into
machine-readable/editable text documents, and so on are examples of the technologies
available.
Furthermore, scanned documents and digital faxes are not readable texts. To convert them into
machine-readable characters, different character recognition technologies are used.
Handwritten Character Recognition (HCR) and Optical Mark Recognition (OMR) are two
examples of such technologies.
Equally important as the content that is captured in various formats from numerous sources in
the dematerialization process is the supporting metadata that allows efficient identification of
the content via specific queries.
Now how can this document content along with the related metadata be aggregated and indexed
in an Enterprise Content Management system so that it can be retrieved and managed in
meaningful ways? Talend provides the answer through the tAlfrescoOutput component.
The tAlfrescoOutput component allows you to stock and manage your electronic documents
and the related metadata on the Alfresco server, the leading open source enterprise content
management system.
The below figure illustrates Talend’s role between the dematerialization process and the
Enterprise Content Management system (Alfresco).

Business components
tAlfrescoOutput
Scenario: Creating documents on an Alfresco server
This Java scenario describes a two-component Job which aims at creating two document files with
the related metadata in an Alfresco server, the java-based Enterprise Control Management system.
• Drop the tFileInputDelimited and tAlfrescoOutput components from the Palette onto the
design workspace.
• Connect the two components together using a Row Main link.
• In the design workspace, double-click tFileInputDelimited to display its basic settings.

• Set the File Name path and all related properties. Note that if you have already stored your
input schemas locally in the Repository, you can simply drop the relevant file item from the
Metadata folder onto the design workspace and the delimited file settings will automatically
display in the relevant fields in the component Basic settings view.
For more information about metadata, see Setting up a File Delimited schema in Talend
Open Studio User Guide.

Business components
tAlfrescoOutput
In this scenario, the delimited file provides the metadata and path of two documents we want to
create in the Alfresco server. The input schema for the documents consists of four columns:
file_name, destination_folder name, source_path, and author.
And therefore the input schema of the delimited file will be as the following:
• In the design workspace, double-click tAlfrescoOutput to display its basic settings.

Business components
tAlfrescoOutput
• In the Alfresco Server area, enter the Alfresco server URL and user authentication
information in the corresponding fields.
• In the TargetLocation area, either type in the base name where to put the document in the
server, or Select the Map... check box and then in the Column list, select the target location
column, destination_folder_name in this scenario.
When you type in the base name, make sure to use the double backslash (\\) escape
character.
• In the Document Mode list, select the mode you want to use for the created documents.
• In the Container Mode list, select the mode you want to use for the destination folder in
Alfresco.
• Click the Define Document Type three-dot button to open the tAlfrescoOutput editor.

Business components
tAlfrescoOutput
• Click the Add button to browse and select the xml file that holds the metadata according to
which you want to save the documents in Alfresco.
All available aspects in the selected model file display in the Available Aspects list.
You can browse for this model folder locally or on the network. After defining the
aspects to use for the document to be created in Alfresco, this model folder is not
needed any more.
• If needed, select in the Available Aspects list the aspect(s) to be included in the metadata to
write in the Alfresco server. In this scenario we want the author name to be part of the
metadata registered in Alfresco.
• Click the drop-down arrow at the top of the editor to select from the list the type to give to
the created document in Alfresco, Content in this scenario.
All the defined aspects used to select the metadata to write in the Alfresco server display in
the Property Mapping list in the Basic Settings view of tAlfrescoOutput, three aspects in
this scenario, two basic for the Content type (content and name) and an additional one
(author).
• Click Sync columns to auto propagate all the columns of the delimited file.
• If needed, click Edit schema to view the output data structure of tAlfrescoOutput.

Business components
tAlfrescoOutput
• Click the three-dot button next to the Result Log File Name field and browse to the file
where you want to save any logs after job execution.
• Save your Job, and press F6 to execute it.
The two documents are created in Alfresco using the metadata provided in the input schemas.

Business components
tAlfrescoOutput

Business components
tCentricCRMInput
tCentricCRMInput
tCentricCRMInput Properties
Component family Business/CentricCR

M
Function Connects to a module of a Centric CRM database via the relevant webservice.
Purpose Allows to extract data from a Centric CRM DB based on a query.
Basic settings CentricCRM URL Type in the webservice URL to connect to the
CentricCRM DB.
Module Select the relevant module in the list
Server Type in the IP address of the DB server.
UserID and Password Type in the Webservice user authentication data.
Schema number of fields that will be processed and passed on
In this component the schema is related to the
Module selected.
Query condition Type in the query to select the data to be extracted.
Usage Usually used as a Start component. An output component is required.
Limitation n/a
Related Scenario
No scenario is available for this component yet.

Business components
tCentricCRMOutput
tCentricCRMOutput
tCentricCRMOutput Properties
Component family Business/CentricCR

M
Function Writes data in a module of a CentricCRM database via the relevant

webservice.
Purpose Allows to write data into a CentricCRM DB.
Basic settings CentricCRM URL Type in the webservice URL to connect to the
CentricCRM DB.
Server IP address of the DB server
UserID and Password Type in the Webservice user authentication data.
Action Insert, Update or Delete the data in the CentricCRM
module.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.
Usage Used as an output component. An Input component is required.
Limitation n/a
Related Scenario

Business components
tHL7Input
tHL7Input
tHL7Input Properties
Function tHL7Input reads an HL7 structured file and extracts data row by row.
Purpose Opens an HL7 structured file and reads it row by row to split them up into
fields then sends fields as defined in the Schema to the next component, via
a Row link.
Basic settings Property type Either Built-in or Repository:
properties are stored. The fields that come after are
Click this icon to open a connection wizard and store
the Excel file connection parameters you set in the
component Basic settings view.
For more information about setting up and storing
file connection parameters, see Setting up a File XML
Multi Schemas Editor The [Multi Schema Editor] helps to build and
configure the data flow in a multi-structure delimited
file to associate one schema per output.
Segment Lists Connection: The columns are automatically
retrieved from the input file. The column name is the
segment name.
Column Mapping:. The mapping in this array is
retrieved from the mapping you have done in the
editor.
Not Validate HL7 Select this check box if you do not want to validate
Message HL7 messages.
Advanced settings Advanced separator Select this check box to modify the separators to be
(for numbers) used for the numbers. Either:
Thousands separator
or
Decimal separator
compulsory.
level.
Limitation n/a

Business components
tHL7Input
Scenario: Retrieving information about patients and events from a

HL7 file.
This scenario describes a four-component Job which aims at retrieving information about patients
and events from a HL7 file.
• From the Palette, drag and drop a tHL7Input, and three tLogRow components onto the
design workspace.
• Double-click the tHL7Input in order to open its editor.

Business components
tHL7Input
• In the File path field, click [Browse...], browse your directory and select your HL7 file.
• In the File Setting area, type in your segment Start character and your segment End
character.
• In the Schema View area, in the Segment(As Schema) list, select MSH.
• Drag and drop the MSH-3(1)[HD] and MSH-7(1)[TS] segments from the Message View to
the Schema View.
• In the Schema View area, in the Segment(As Schema) list, select EVN.
• Drag and drop the EVN-1(1)-1-1[ID] and EVN-2(1)-1-1[ST] segments from the Message
View to the Schema View.
• In the Schema View area, in the Segment(As Schema) list, select PID.
• Drag and drop the following segments from the Message View to the Schema View:
PID-1(1)-1-1[SI], PID-5(1)-1-1[ST], PID-5(1)-2-1[ST], PID-5(1)-3-1[ST],
PID-5(1)-4-1[ST], PID-5(1)-5-1[ST], PID-5(1)-7-1[ID].
• Click Ok to close the editor.
• Link the tHL7Input to the three tLogRow, using respectively MSH, EVN and PID links.
• Save your Job and press F6 to execute it.

Business components
tHL7Input
The console displays the three tLogRow tables, which return different types of information. The
first one give the message header label and its date. The second table shows the information about
the patient. The third one displays the event ID and its date.

Business components
tMicrosoftCRMInput
tMicrosoftCRMInput
tMicrosoftCRMInput Properties
Function Connects to an entity of Microsoft CRM database via the relevant webservice.
Purpose Allows to extract data from a MicrosoftCRM DB based on conditions set on
specific columns.
Microsoft Webservice Type in the webservice URL to connect to the
URL MicrosoftCRM DB.
Organizename Enter the name of the user or organization, set by an
administrator, that needs to access the
MicrosoftCRM database.
Username and Type in the Webservice user authentication data.
Password
Domain Type in the domain name of the server on which
MicrosoftCRM is hosted.
Host Type in the IP address of Microsoft CRM database
server.
Port Listening port number of Microsoft CRM database
server.
Time out (seconds) Number of seconds for the port to listen before
closing.
Entity Select the relevant entity in the list.
if you make changes, the schema automatical-
ly becomes built-in.
selected entity.
Logical operators In the case you want to combine the conditions you
used to combine set on columns, select the combine mode you want to
conditions use.

Business components
tMicrosoftCRMInput
Conditions Click the plus button to add as many conditions as

needed.
The conditions are performed one after the other for

each row.
Input column: Click in the cell and select the
column of the input schema the condition is to be set
on.
Operator: Click in the cell and select the operator to
bind the input column with the value.
Value: Type in the column value, between quotes if
need be.
Advanced settings tStatCatcher Statistics Select this check box to gather the job processing
level.
Limitation n/a
Scenario: Writing data in a Microsoft CRM database and putting

conditions on columns to extract specified rows
This scenario describes a four-component Job which aims at writing the data included in a delimited
input file in a custom entity in a MicrosoftCRM database. It then extracts specified rows to an output
file using the conditions set on certain input columns.
If you want to write in a CustomEntity in Microsoft CRM database, make sure to name the columns in
accordance with the naming rule set by Microsoft, that is “name_columnname” all in lower case.
• Drop the following components from the Palette to the design workspace:
tFileInputdelimited, tFileOutputDelimited, tMicrosoftCRMInput, and
tMicrosoftCRMOutput.
• Connect tFileInputDelimited to tMicrosoftCRMOutput using a Row Main connection.

• Connect tMicrosoftCRMIntput to tFileOutputDelimited using a Row Main connection.
• Connect tFileInputDelimited to tMicrosoftCRMInput using OnSubjobOk connection.

Business components
tMicrosoftCRMInput
• Double-click tFileInputDelimited to display its Basic settings view and define its
properties
• Set the Property Type to Repository if you have stored the input file properties centrally
in the Metadata node in the Repository tree view. Otherwise, select Built-In and fill the
fields that follow manually. In this example, property is set to Built-In.
• Click the three-dot button next to the File Name/Input Stream field and browse to the
delimited file that holds the input data. The input file in this example contains the following
columns: new_id, new_status, new_firstname, new_email, new_city, new_initial and
new_zipcode.
• In the Basic settings view, define the Row Separator allowing to identify the end of a row.
Then define the Field Separator used to delimit fields in a row.
• If needed, define the header, footer and limit number of processed rows in the corresponding
fields. In this example, the header, footer and limits are not set.
• Click Edit schema to open a dialog box where you can define the input schema you want
to write in Microsoft CRM database.

Business components
tMicrosoftCRMInput

• Double-click tMicrosoftCRMOutput to display the component Basic settings view and
define its properties.
• Enter the Microsoft Web Service URL as well as the user name and password in the
corresponding fields.
• In the OrganizeName field, enter the name that is given the right to access the Microsoft
CRM database.
• In the Domain field, enter the domain name of the server on which Microsoft CRM is
hosted, and then enter the host IP address and the listening port number in the corresponding
fields.

Business components
tMicrosoftCRMInput
• In the Action list, select the operation you want to carry on. In this example, we want to insert
data in a custom entity in Microsoft CRM.
• In the Time out field, set the amount of time (in seconds) after which the Job will time out.
• In the Entity list, select one among those offered. In this example, CustomEntity is selected.
If CustomEntity is selected, a Custom Entity Name field displays where you
need to enter a name for the custom entity.
• The Schema is then automatically set according to the entity selected. If needed, click Edit
schema to display a dialog box where you can modify this schema and remove the columns
that you do not need in the output.
• Click Sync columns to retrieve the schema from the preceding component.
• Double-click tMicrosoftCRMInput to display the component Basic settings view and


Business components
tMicrosoftCRMInput
• Set the Property Type to Repository if you have stored the input file properties centrally
in the Metadata node in the Repository tree view. Otherwise, select Built-In and fill the
fields that follow manually. In this example, property is set to Built-In.
• Enter the Microsoft Web Service URL as well as the user name and password in the
corresponding fields.
• In the OrganizeName field, enter the name that is given the right to access the Microsoft
CRM database.
• In the Domain field, enter the domain name of the server on which Microsoft CRM is
hosted, and then enter the host IP address and the listening port number in the corresponding
fields.
• In the Time out field, set the amount of time (in seconds) after which the Job will time out.
• In the Entity list, select the one among those offered you want to connect to. In this example,
CustomEntity is selected.

Business components
tMicrosoftCRMInput
• The Schema is then automatically set according to the entity selected. But you can modify
it according to your needs. In this example, you should set the schema manually since you
want to access a custom entity. Copy the seven-column schema from
tMicrosoftCRMOutput and paste it in the schema dialog box in tMicrosoftCRMInput.
• Click OK to close the dialog box. You will be prompted to propagate changes. Click Yes in
the popup message.
• In the Basic settings view, select And or Or as the logical operator you want to use to
combine the conditions you set on the input columns. In this example, we want to set two
conditions on two different input columns and we use And as the logical operator.
• In the Condition area, click the plus button to add as many lines as needed and then click in
each line in the Input column list and select the column you want to set condition on. In this
example, we want to set conditions on two columns, new-city and new_id. We want to extract
all customer rows whose city is equal to “New York” and whose id is greater than 2.
• Click in each line in the Operator list and select the operator to bind the input column with
its value, in this example Equal is selected for new_city and Greater Than for new_id.
• Click in each line in the Value list and set the column value, New York for new_city and
2 for new_id in this example. You can use a fixed or a context value in this filed.
• Double-click tFileOutputdelimited to display the component Basic settings view and

Business components
tMicrosoftCRMInput
• Set Property Type to Built-In and then click the three-dot button next to the File Name field
and browse to the output file.
• Set row and field separators in the corresponding fields.
• Select the Append check box if you want to add the new rows at the end of the records.
• Select the Include Header check box if the output file includes a header.
Only customers who live in New York city and those whose “id” is greater than 2 are listed
in the output file you stored locally.

Business components
tMicrosoftCRMOutput
tMicrosoftCRMOutput
tMicrosoftCRMOutput Properties
Function Writes in an entity of a Microsoft CRM database via the relevant webservice.
Purpose Allows to write data into a MicrosoftCRM DB.
Basic settings Microsoft Webservice Type in the webservice URL to connect to the
URL Microsoft CRM DB.
Organizename Enter the name of the organization that needs to
access the MicrosoftCRM database
Password
Domain Type in the domain name of the server that installs
MicrosoftCRM.
Host Type in the IP address of Microsoft CRM database
server.
Port Listening port number of Microsoft CRM database
server.
Action Select in the list the action you want to do on the
CRM data. Available actions are: insert, update, and
delete.
Time out (seconds) Number of seconds for the port to listen before
closing.
Entity Select the relevant entity in the list.
level.
Limitation n/a

Business components
tMicrosoftCRMOutput
Related Scenario
For a related use case, see Scenario: Writing data in a Microsoft CRM database and putting
conditions on columns to extract specified rows on page 84.

Business components
tMSAXInput
tMSAXInput
tMSAXInput properties
Component family Business/ Microsoft

AX
Function tMSAXInput connects to a MicrosoftAX server.

Purpose This component allows to extract data from a MicrosoftAX server based on
a query.
Host Type in the IP address of the MicrosoftAX server.
Domain Type in the domain name on which the MicrosoftAX
server is hosted.
User and Password Type in user authentication data.
Table Name Name of the table to read.
Query Enter your SQL query paying particular attention to
properly sequence the fields in order to match the
schema definition.
level.
Usage This component is usually used as a start component. An output component
is required.
Limitation n/a
Related scenarios

Business components
tMSAXOutput
tMSAXOutput
tMSAXOutput properties
Component family Business/ Microsoft

AX
Function tMSAXOutput connects to a MicrosoftAX server.

Purpose This component allows to write data in a MicrosoftAX server.
Host Type in the IP address of the MicrosoftAX server.
Domain Type in the domain name on which the MicrosoftAX
server is hosted.
Username and Type in user authentication data.
Password
Table Name Name of the table you want to connect to and
write/modify data in.
Action on data You can do any of the following operations on the
data in a MicrosoftAX server:
Insert: insert data.
Update: update data.
Insert or update: add data or update existing one.
Update or insert: update existing data or create it if
it does not exist.
Delete: delete data.
Die on error This check box is selected by default. Clear the check
box to skip the row on error and complete the process
for error-free rows. If needed, you can retrieve the
rows on error via a Row > Rejects link.

Business components
tMSAXOutput
Additional Columns This option allows you to use Local expressions to

perform actions on columns. For example, you can
alter values in columns of the defined table.
When you update or delete data in a column, this
option provides you with other possibilities on
WHERE statements through using different
operators from the Operator column.
Name: name of the schema column to be altered or
inserted as a new column.
Operator: select in the list the operator you want to
use with the WHERE statement.
This column is not available when you use In-
sert as action on data.
Data type: type of data.

Local expression: Type in the Local statement to be
executed in order to alter or insert the relevant
column data, for example row1.[row name]. Or,
press Ctrl + Space and select any of the context
variables available in the list.
Position: select in the list Before, After or Replace
following the action you want to perform on the
reference column.
Reference column: type in a column of reference
that the component can use to place/replace the new/
altered column.
level.
Usage This component is used as an output component. An Input component is
required.
Limitation n/a
Scenario 1: Inserting data in a defined table in a MicrosoftAX server
Before being able to use this component, make sure that you install and launch the MicrosoftAX server
correctly.
This Java scenario describes a two-component Job that uses tMSAXOutput to insert four columns
in a defined table in a MicrosoftAX server after it alters values in one of the inserted columns.

Business components
tMSAXOutput
• Drop tFixedFlowInput and tMSAXOutput from the Palette to the design workspace.
• Double click tFixedFlowInput to display its Basic settings view and define the component
properties.
• Set Schema type to Built-in and click the three-dot button next to Edit schema to display
a dialog box where you can define the input schema.
• Click the plus button and add the input schema columns, three in this example: name, city
and street.
• Click OK to close the dialog box and accept propagating the changes when prompted by the
system.
The three schema columns display automatically in the Values list.
• Click in the Value column and enter a value for each of the input columns.
• Double-click tMSAXOutput to open its Basic settings view and define the component
properties.

Business components
tMSAXOutput
• Set Property type to Built-in.

• In the Host field, type in the IP address of the MicrosoftAX server.
• In the Domain field, type in the domain name on which the MicrosoftAX server is hosted.
• Enter your username and password for the server in the corresponding fields.
• In the Table Name field, enter the name of the table you want to write data in, ADDRESS
in this example.
• In the Action on data list, select the action you want to carry on, Insert in this example.
In this example, we want to retrieve the three input columns: name, city and street and write
the data included in the three input columns in the microsoftAX server without any changes.
• If needed, click the three-dot button next to Edit Schema to verify the retrieved schema.
• In the Additional columns list, click the plus button to add one line where you can define
parameters for the new column to add to the row you want to write in the ADDRESS table.
• Set a name, a data type, a position and a reference column in the corresponding columns for
the line you added.
In this example, we want to add a new column we call “address” after the street column.
• Click in the Local expression column and press Ctrl + space on your keyboard to open the
context variable list and select:
StringHandling.UPCASE(row2.city)+"-"+row2.street. This expression
will write the city name initially capped followed by the street name to form the address of
Bryant park. Thus the address column in this example will contain the string: New
York-Midtown Manhattan.

Business components
tMSAXOutput
tMSAXOutput inserts in the ADDRESS table in the MicrosoftAX server a row that holds the three
input columns, name, city and street in addition to the new address column that combines the city
name and the street name.
Scenario 2: Deleting data from a defined table in a MicrosoftAX

server
Before being able to use this component, make sure that you install and launch the MicrosoftAX server
correctly.
This Java scenario describes a two-component Job that uses tMSAXOutput to delete from a
defined table in a MicrosoftAX server all rows that do not match the data included in a key column.
In this example, the input schema we use is an address column that holds the following data: New
York-Midtown Manhattan. We want to delete from the MicrosoftAX server all addresses that are not
identical with this one.
• Drop tFixedFlowInput and tMSAXOutput from the Palette to the design workspace.
• Double click tFixedFlowInput to display its Basic settings view and define the component
properties.
• Set Schema type to Built-in and click the three-dot button next to Edit schema to display
a dialog box where you can define the input schema.
• Click the plus button and add the input schema columns, address in this example.
• Click OK to close the dialog box. The schema column displays automatically in the Values
list.
• Click in the Value column and enter a value for the input column.

Business components
tMSAXOutput
• Double-click tMSAXOutput to open its Basic settings view and define the component
properties.
• Set Property type to Built-in.

• In the Host field, type in the IP address of the MicrosoftAX server.
• In the Domain field, type in the domain name on which the MicrosoftAX server is hosted.
• Enter your username and password for the server in the corresponding fields.
• In the Table Name field, enter the name of the table you want to delete data from, ADDRESS
in this example.
• In the Action on data list, select the action you want to carry on, Delete in this example.
• Click Sync columns to retrieve the schema from the preceding component. In this example,
we want to retrieve the input column: address.
• Click the three-dot button next to Edit Schema to open a dialog box where you can verify
the retrieved schema.

Business components
tMSAXOutput
• In the output schema, select the Key check box next to the column name you want to define
as a key column.
When you select Delete as an action on data, you must always define the Reference
column as a key column in order for tMSAXOutput to delete rows based on this key
column.
• Click OK to validate your changes and close the dialog box.

• In the Additional columns list, click the plus button to add one line and define the
parameters the component will use as basis for the delete operation.
• Set a name, an operator, a data type, a local expression, a position and a reference column
in the corresponding columns for the line you added.
In this example, we want to delete from the ADDRESS table in the MicrosoftAX server all
rows in which the address column is not equal to the address in the key address column and
that reads as the following: New York-Midtown Manhattan.
When you select Delete as an action on data, you must always set Position to Replace.
Otherwise, all settings in the Additional columns will not be taken into account when
executing your Job.

tMSAXOutput deletes from the ADDRESS table in the MicrosoftAX server all rows where the
address string is not equal to the address in the key column.

Business components
tOpenbravoERPInput
tOpenbravoERPInput
tOpenbravoERPInput properties
Component Family Business
Function tOpenbravoERPInput connects to an OpenbravoERP database entity via

the appropriate webservice.
Purpose This component allows you to extract data from OpenBravoERP database
according to the conditions defined in specific columns.
Basic settings Openbravo REST Enter the URL of the Web service that allows you to
WebService URL connect to the OpenbravoERP database.
Password
Entity Select the appropriate entity from the drop-down list.
Click Edit Schema to make changes to the schema..
If you make any changes, the schema will au-
tomatically become built-in.
For this component, the schema corresponds
to a selected entity.
WHERE Clause Enter your WHERE clause.
Order by Select this check bow to define how to order the
results (the elements in the drop-down list depend on
the entity selected)
Sort: Choose whether to organise the results in either
Ascending or Descending order.
First result Enter the row number you want to retrieve first.
Max result Enter the maximum number of results you want to
retrieve.
Advanced settings Advanced separator Select this check box to modify the separators to be
(for numbers) used for the numbers. Either:
Thousands separator
or
Decimal separator
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Utilisation This component is generally used as an input component. An output
component is required.
Limitation n/a

Business components
tOpenbravoERPInput
Related Scenario
For a scenario in which tOpenbravoERPInput might be used, see Scenario: Writing data in a
Microsoft CRM database and putting conditions on columns to extract specified rows on page 84

Business components
tOpenbravoERPOutput
tOpenbravoERPOutput
tOpenbravoERPOutput properties
Component Family Business
Function tOpenbravoERPOutput writes an object in an OpenbravoERP database via

the appropriate Web service.
Purpose This component writes data in an OpenbravoERP database.
Basic settings Openbravo REST Enter the URL of the Web service that allows you to
Webservice URL connect to the OpenbravoERP database.
Password
Action on data From the list, select the one of the following actions:
Update/Create
or
Remove
Use existing data file Select this check box if desired and then select the
file by browsing your directory.
Entity Select the appropriate entity from the drop-down list.
Click Edit Schema to make changes to the schema..
Note that if you modify the schema, it automatically
built-in..
Click on Sync columns to retrieve the schema from
the previous component.
component level.
component.
Limitation n/a
Related scenario
For a scenario in which tOpenbravoERPOutput may be used, see Scenario: Writing data in a
Microsoft CRM database and putting conditions on columns to extract specified rows on page 84

Business components
tSalesforceBulkExec
tSalesforceBulkExec
tSalesforceBulkExec Properties
tSalesforceOutputBulk and tSalesforceBulkExec components are used together to output the

needed file and then execute intended actions on the file for your Salesforce.com. These two steps
compose the tSalesforceOutputBulkExec component, detailed in a separate section. The interest
in having two separate elements lies in the fact that it allows transformations to be carried out before
the data loading.
Function tSalesforceBulkExec executes the intended actions on the prepared bulk

data.
Purpose As a dedicated component, tSalesforceBulkExec gains performance while
carrying out the intended data operations into your Salesforce.com.
Basic settings Use an existing Select this check box to use an established
connection connection from tSalesforceConnection. Once you
select it, the Component list field appears allowing
you to choose the tSalesforceConnection
component to be used.
For more information on tSalesforceConnection,
see section tSalesforceConnection on page 106.

connection components in the same Job lev-
el, so if you need to use an existing connection from
the other level, you can use Dynamic settings to
share the intended connection. In this case, make
sure that the connection name is unique and distinc-
tive. For more information about Dynamic settings,
see your studio user guide.
Salesforce Webservice Type in the webservice URL to connect to the
URL Salesforce DB.
Password
Bulk file path Directory where are stored the bulk data you need to
process.
Action You can do any of the following operations on the
data of the Salesforce object:
Upsert: update and insert data.
Module Select the relevant module in the list.
if you select the Use Custom module option,
you display the Custom Module Name field
where you can enter the name of the module you
want to connect to.

Business components
tSalesforceBulkExec

Advanced settings Rows to commit Specify the number of lines per data batch to be
processed.
Bytes to commit Specify the number of bytes per data batch to be
processed.
Use Socks Proxy Select this check box if you want to use a proxy
server.
level.
Limitation The bulk data to be processed should be .csv format.
Related Scenario:
For a related scenario, see Scenario: Inserting transformed bulk data into your Salesforce.com on
page 121.

Business components
tSalesforceConnection
tSalesforceConnection
tSalesforceConnection properties
Function tSalesforceConnection opens a connection to a Salesforce system in order to

carry out a transaction.
Putpose The component enables connection to a Salesforce.
Basic settings Salesforce Webservice Enter the Webservice URL required to connect to the
URL Salesforce database.
Username et Enter your Web service authentication details.
Password
Timeout Type in the intended number of query timeout in
(milliseconds) Salesforce.com.
For salesforce bulk Select this check box if you use bulk data processing
component components from the salesforce family.
For more information on these bulk data processing
components, see tSalesforceOutputBulk on page
121, tSalesforceBulkExec on page 104 and
tSalesforceOutputBulkExec on page 126.
Use Soap Select this check box if you want to activate SOAP
Compression compression.
The compression of SOAP messages results in
increased performance levels.
Use Socks Proxy Select this check box if you want to use a proxy.
component level.
Usage This component is normally used with Salesforce components..
Limitation n/a
Related scenario
For further information regarding the usage of tSalesforceConnection, see tMysqlConnection on

page 460.

Business components
tSalesforceGetDeleted
tSalesforceGetDeleted properties
Function tSalesforceGetDeleted recovers deleted data from a Salesforce object over a

given period of time.
Purpose This component can collect the deleted data from a Salesforce object during
a specific period of time.
select it, the Component list field appear allowing

URL Salesforce DB.
Password
want to connect to.

Business components
Start Date Type in between double quotes the date at which you
want to start the search. Use the following date
format: “yyy-MM-dd HH:mm:ss”.
You can do the search only on the past 30
days.
End Date Type in between double quotes the date at which you
want to end the search. Use the following date
format:“yyy-MM-dd HH:mm:ss”.
Advanced settings Use Soap Select this check box to activate the SOAP
The compression of SOAP messages optimizes
system performance.
level.
Usage You can use this component as an output component. tSalesforceGetDeleted
requires an input component.
Limitation n/a
Scenario: Recovering deleted data from the Salesforce server
This scenario describes a two-component Job that collects the deleted data over the past 5 days from
the Salesforce server.
• Drop tSalesforceGetDeleted and tLogRow from the Palette onto the design workspace.
• Connect the two components together using a Row > Main link.
• Double-click tSalesforceGetDeleted to display its Basic settings view and define the
component properties.

Business components
• In the Salesforce WebService URL filed, use the by-default URL of the Salesforce Web
service or enter the URL you want to access.
• In the Username and Password fields, enter your login and password for the Web service.
• From the Module list, select the object you want to access, Account in this example.
• From the Schema list, select Repository and then click the three-dot button to open a dialog
box where you can select the repository schema you want to use for this component. If you
have not defined your schema locally in the metadata, select Built-in from the Schema list
and then click the three-dot button next to the Edit schema field to open the dialog box
where you can set the schema manually.
• In the Start Date and End Date fields, enter respectively the start and end dates for
collecting the deleted data using the following date format: “yyyy-MM-dd HH:mm:ss”. You
can collect deleted data over the past 30 days. In this example, we want to recover deleted
data over the past 5 days.
• Double-click tLogRow to display its Basic settings view and define the component
properties.
• In the Mode area, select Vertical to display the results in a tabular form on the console.

Business components
Deleted data collected by the tSalesforceGetDeleted component is displayed in a tabular form on

the console.

Business components
tSalesforceGetServerTimestamp
tSalesforceGetServerTimestamp properties
Function tSalesforceGetServerTimestamp retrieves the current date of the Salesforce

server.
Purpose This component retrieves the current date of the Salesforce server presented
in a timestamp format.

URL Salesforce DB.
Password
Advanced settings Use Socks Proxy Select this check box if you want to use a proxy
server.
Use Soap Select this check box to activate the SOAP
The compression of the SOAP messages opti-
mizes system performance.

Business components
level.
Usage You can use this component as an output component.
tSalesforceGetServerTimestamp requires an input component.
Limitation n/a
Related scenarios

Business components
tSalesforceGetUpdated
tSalesforceGetUpdated properties
Function tSalesforceGetUpdated recovers updated data from a Salesforce object over

a given period of time.
Purpose This component can collect all updated data from a given Salesforce object
during a specific period of time.

URL Salesforce DB.
Password
want to connect to.

Business components
Start Date Type in between double quotes the date at which you
want to start the search. Use the following date
format: “yyy-MM-dd HH:mm:ss”.
You can do the search only on the past 30
days.
End Date Type in between double quotes the date at which you
want to end the search. Use the following date
format:“yyy-MM-dd HH:mm:ss”.
Advanced settings Use Soap Select this check box to activate the SOAP
system performance.
level.
Usage You can use this component as an output component. tSalesforceGetUpdate
requires an input component.
Limitation n/a
Related scenarios

Business components
tSalesforceInput
tSalesforceInput
tSalesforceInput Properties
Function tSalesforceInput connects to an object of a Salesforce database via the

relevant webservice.
Purpose Allows to extract data from a Salesforce DB based on a query.
Use an existing Select this check box to use an established

URL Salesforce DB.
Password
want to connect to.

Business components
tSalesforceInput

Module selected.
Example: account_name= ‘Talend’
Maunal input of Select this check box to display the Query field
SOQL query where you can manually enter the desired query.
Advanced settings Batch Size Number of registrations in each processed batch.
server.
Normalize delimited Characters, strings or regular expressions used to
(for child normaliz the data that is collected by queries set on
relationship) different hierarchical Salsforce objects.
Column name Characters, strings or regular expressions used to
delimiter (for child separate the name of the parent object from the name
relationship) of the child object when you use a query on the
hierarchical relations among the different Salesforce
objects.
system performance, in particular for the batch
operations.
level.
Limitation n/a
Related scenario
For a related scenario, see Scenario: Deleting data from the Account object on page 118.
The operation is similar to the connection to SugarCRM, therefore see scenario of tSugarCRMInput
on page 149 for more information.

Business components
tSalesforceOutput
tSalesforceOutput
tSalesforceOutput Properties
Function tSalesforceoutput writes in an object of a Salesforce database via the relevant

webservice.
Purpose Allows to write data into a Salesforce DB.

URL Salesforce DB.
Password
Delete: delete data.
want to connect to.

Business components
tSalesforceOutput

Advanced settings Extended Output This check box is selected by default. It allows to
transfer output data in batches. You can specify the
number of lines per batch in the Rows to commit
field.
rows on error via a Row > Reject link.
Error logging file If you want to create a file that holds all error logs,
click the three-dot button next to this field and
browse to the specified file to set its access path and
its name.
server.
system performance.
Retrieve inserted ID Select this check box to allow Salesforce.com to
return the salesforce ID produced for a new row that
is to be inserted. The ID column is added to the
processed data schema in Salesforce.com.
This option is available only when you have
chosen insert action yet not in batch mode, i.e.
not in the Extended Output option.
level.
Limitation n/a
Scenario: Deleting data from the Account object
This scenario describes a two-component Job that removes an entry from the Account object.

Business components
tSalesforceOutput
•Drop tSalesforceInput and tSalesforceOutput from the Palette onto the design workspace.
•Connect the two components together using a Row > Main link.
•Double-click tSalesforceInput to display its Basic settings view and define the component
properties.
• From the Property Type list, select Repository if you have already stored the connection
to the salesforce server in the Metadata node of the Repository tree view. The property
fields that follow are automatically filled in. If you have not defined the server connection
locally in the Repository, fill in the details manually after selecting Built-in from the
Property Type list.
For more information about metadata, see Managing Metadata.
• In the Salesforce WebService URL field, use the by-default URL of the Salesforce Web
service or enter the URL you want to access or select the Use an existing connection check
box to use an established connection.
•In the Username and Password fields, enter your login and password for the Web service.
• Type in your intended query timeout in the Timeout (milliseconds) field. In this example,
use the default number.
•From the Module list, select the object you want to access, Account in this example.
•From the Schema list, select Repository and then click the three-dot button to open a dialog
where you can set the schema manually.

Business components
tSalesforceOutput
•In the Query Condition field, enter the query you want to apply. In this example, we want to
retrieve the clients whose names are sForce. To do this, we use the query:
“name=’sForce’”.
•For a more advanced query, select the Manual input of SOQL query and enter the query
manually.
•Double-click tSalesforceOutput to display its Basic settings view and define the component
properties.
•In the Salesforce WebService URL field, use the by-default URL of the Salesforce Web
service or enter the URL you want to access.
•In the Username and Password fields, enter your login and password for the Web service.
• Type in your intended query timeout in the Timeout (milliseconds) field. In this example,
use the default number.
• From the Action list, select the operation you want to carry out. In this example we select
Delete to delete the sForce account selected in the previous component.
•From the Module list, select the object you want to access, Account in this example.
•Click Sync columns to retrieve the schema of the preceding component.
•save your Job and press F6 to execute it.
Check the content of the Account object and verify that the sForce account(s) is/are deleted from
the server.

Business components
tSalesforceOutputBulk
tSalesforceOutputBulk Properties

the data loading.
Function tSalesforceOutputBulk generates files in suitable format for bulk

processing.
Purpose Prepares the file to be processed by tSalesforceBulkExec for executions in
Salesforce.com.
Basic settings File Name Type in the directory where you store the generated
file.
Append Select the check box to write new data at the end of
the existing data. Or the existing data will be
overwritten.
level.
Usage This component is intended for the use along with tSalesforceBulkExec
component. Used together they gain performance while feeding or modifying
information in Salesforce.com.
Limitation n/a
Scenario: Inserting transformed bulk data into your Salesforce.com
This scenario describes a six-component Job that transforms .csv data suitable for bulk processing,
load them in Salesforce.com and then displays the Job execution results in the console.

Business components
This Job is composed of two steps: preparing data by transformation and processing the transformed
data.
Before starting this scenario, you need to prepare the input file which offers the data to be processed
by the Job. In this use case, this file is sforcebulk.txt, containing some customer information.
Then to create and execute this Job, operate as follows:
• Drop tFileInputDelimited, tMap, tSalesforceOutputBulk, tSalesforceBulkExec and
tLogRow from the Palette onto the workspace of your studio.
• Use Row > Main link to connect tFileInputDelimited to tMap, and Row > out1 from
tMap to tSalesforceOutputBulk.
• Use Row > Main and Row > Reject to connect tSalesforceBulkExec respectively to the
two tLogRow components.
• Use Trigger > OnSubjobOk link to connect tFileInputDelimited and
tSalesforceBulkExec.
• Double-click tFileInputDelimited to display its Basic settings view and define the

Business components
Property Type list.
• Next to the File name/Stream field, click the button to browse to the input file you
prepared for the scenario, for example, sforcebulk.txt.
where you can set the schema manually. In this scenario, the schema is made of four
columns: Name, ParentId, Phone and Fax.
• According to your input file to be used by the Job, set the other fields like Row Separator,
Field Separator...
• Double-click the tMap component to open its editor and set the transformation.
• Drop all columns from the input table to the output table.

Business components
• Add .toUpperCase() behind the Name column.

• Click OK to validate the transformation.
• Double-click tSalesforceOutputBulk to display its Basic settings view and define the
• In the File Name field, type in or browse to the directory where you want to store the
generated .csv data for bulk processing.
• Click Sync columns to import the schema from its preceding component.
• Double-click tSalesforceBulkExect to display its Basic settings view and define the
• Use the by-default URL of the Salesforce Web service or enter the URL you want to access.
• In the Bulk file path field, browse to the directory where is stored the generated .csv file by
tSalesforceOutputBulk.
• From the Action list, select the action you want to carry out on the prepared bulk data. In this
use case, insert.
where you can set the schema manually. In this example, edit it conforming to the schema
defined previously.

Business components
• Double-click tLogRow_1 to display its Basic settings view and define the component
properties.
• Select Table mode to display the execution result.
• Do the same with tLogRow_2.
On the console of the Run view, you can check the execution result.
In the tLogRow_1 table, you can read the data inserted into your Salesforce.com.
In the tLogRow_2 table, you can read the rejected data due to the incompatibility with the Account
objects you have accessed.
All the customer names are written in upper case.

Business components
tSalesforceOutputBulkExec
tSalesforceOutputBulkExec Properties

the data loading.
Function tSalesforceOutputBulkExec executes the intended actions on the .csv bulk

data for Salesforce.com.
Purpose As a dedicated component, tSalesforceOutputBulkExec gains performance
while carrying out the intended data operations into your Salesforce.com.

URL Salesforce DB.
Password
Bulk file path Directory where are stored the bulk data you need to
process.
want to connect to.

Business components

Advanced settings Rows to commit Specify the number of lines per data batch to be
profcessed.
Bytes to commit Specify the number of bytes per data batch to be
processed.
server. In this case, you should fill in the proxy
parameters in the Proxy host, Proxy port, Proxy
username and Proxy password fields which appear
beneath.
level.
Usage This component is mainly used when no particular transformation is required
on the data to be loaded into Salesforce.com.
Limitation The bulk data to be processed in Salesforce.com should be .csv format.
Scenario: Inserting bulk data into your Salesforce.com
This scenario describes a four-component Job that submits bulk data into Salesforce.com, executs
your intended actions on the data, and ends up with displaying the Job execution results for your
reference.
Before starting this scenario, you need to prepare the input file which offers the data to be processed
by the Job. In this use case, this file is sforcebulk.txt, containing some customer information.
Then to create and execute this Job, operate as follows:

Business components
• Drop tFileInputDelimited, tSalesforceOutputBulkExec, and tLogRow from the Palette

onto the workspace of your studio.
• Use Row > Main link to connect tFileInputDelimited to tSalesforceOutputBulkExec.
• Use Row > Main and Row > Reject to connect tSalesforceOutputBulkExec respectively
to the two tLogRow components.
Property Type list.
• Next to the File name/Stream field, click the button to browse to the input file you
prepared for the scenario, for example, sforcebulk.txt.
where you can set the schema manually. In this scenario, the schema is made of four
columns: Name, ParentId, Phone and Fax.
• According to your input file to be used by the Job, set the other fields like Row Separator,
Field Separator...
• Double-click tSalesforceOutputBulkExec to display its Basic settings view and define the

Business components
• In Salesforce WebService URL field, use the by-default URL of the Salesforce Web service
or enter the URL you want to access.
• In the Bulk file path field, browse to the directory where you store the bulk .csv data to be
processed.
The bulk file here to be processed must be in .csv format.
• From the Action list, select the action you want to carry out on the prepared bulk data. In this
use case, insert.
where you can set the schema manually. In this example, edit it conforming to the schema
defined previously.
• Double-click tLogRow_1 to display its Basic settings view and define the component
properties.
• Select Table mode to display the execution result.

Business components
• Do the same with tLogRow_2.

On the console of the Run view, you can check the execution result.
In the tLogRow_1 table, you can read the data inserted into your Salesforce.com.
In the tLogRow_2 table, you can read the rejected data due to the incompatibility with the Account
objects you have accessed.
If you want to transform the input data before submitting them, you need to use
tSalesforceOutputBulk and tSalesforceBulkExec in cooperation to realize the whole job. For
further information on the use of the two components, see Scenario: Inserting transformed bulk data
into your Salesforce.com on page 121.

Business components
tSAPCommit
tSAPCommit
tSAPCommit Properties
This component is closely related to tSAPConnection and tSAPRollback. It usually doesn’t make
much sense to use these components separately in a transaction.
Component family Business/SAP
Function Validates the data processed through the Job into the connected server.
Purpose Using a unique connection, this component commits a global transaction in
one go instead of doing that on every row or every batch and thus provides gain
in performance.
Basic settings Component list Select the tSAPConnection component in the list if
more than one connection are planned for the current
Job.
Release Connection This check box is selected by default. It allows you to
close the database connection once the commit is
done. Clear this check box to continue to use the
selected connection once the component has
performed its task.
If you want to use a Row > Main

connection to link tSAPCommit to your
Job, your data will be commited row by
row. In this case, do not select the Release
connection check box or your connection will be
closed before the end of your first row commit.
component level.
Usage This component is to be used along with SAP components, especially with
tSAPConnection and tSAPRollback components.
Limitation n/a
Related scenario
This component is closely related to tSAPConnection and tSAPRollback. It usually doesn’t make
much sense to use one of these without using a tSAPConnection component to open a connection
for the current transaction.
For tSAPCommit related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

Business components
tSAPConnection
tSAPConnection
tSAPConnection properties
Function tSAPConnection opens a connection to the SAP system for the current
transaction.
Purpose tSAPConnection allows to commit a whole job data in one go to the SAP
system as one transaction.
Built-in: No property data is stored centrally.
Properties are stored. The fields that follow are
Connection Client type: enter your usual SAP connection.
configuration Userid : enter user login.
Password: enter password.
Language: specify the language.
Host name: enter the IP address of the SAP system.
System number: enter the system number.
component level.
.
Usage This component is to be used along with other SAP components.
Limitation n/a
Related scenarios
For a related scenarios, see Scenario 1: Retrieving metadata from the SAP system on page 134 and
Scenario 2: Reading data in the different schemas of the RFC_READ_TABLE function on page 140.

Business components
tSAPInput
tSAPInput
tSAPInput Properties
Function tSAPInput connects to the SAP system using the system IP address.
Purpose tSAPInput allows to extract data from an SAP system at any level through
calling RFC or BAPI functions.

Connection Client type: Enter your SAP usual connection code
configuration Userid: Enter the user connection Id.
Password: Enter the password.
Language: Specify a language.
Host name Enter the SAP system IP address.
System number Enter the system number.
Function name Enter the name of the function you want to use to
retrieve data.

Business components
tSAPInput
Initialize input Set input parameters.

Parameter Value: Enter between inverted commas
the value that corresponds to the parameter you set in
the Parameter Name column.
Type: Select the type of the input entity to retrieve.
Table Name (Structure Name): Enter between
inverted commas the table name.
Parameter Name: Enter between in,verted commas
the name of the field that corresponds to the table set
in the Table Name column.
Outputs Configure the parameters of the output schema to
select the data to be extracted:
Schema: Enter the output schema name.
Type (for iterate): Select the type of the output
entity you want to have.
Table Name (Structure Name): Enter between
inverted commas the table name.
Mapping: Enter between inverted commas the name
of the field you want to retrieve data from.
level.
Limitation n/a
Scenario 1: Retrieving metadata from the SAP system
Talend SAP components (tSAPInput and tSAPOutput) as well as the SAP wizard are based on a
library validated and provided by SAP (JCO) that allows the user to call functions and retrieve data
from the SAP system at Table, RFC or BAPI, levels.
This scenario uses the SAP wizard that leads a user through dialog steps to create SAP
connection and call RFC and BAPI functions. This SAP wizard is available only for
Talend Integration Suite users. If you are a user of Talend Open Studio or Talend
On Demand, you need to set the basic settings for the tSAPInput component manually.
This java scenario uses the SAP wizard to first create a connection to the SAP system, and then call
a BAPI function to retrieve the details of a company from the SAP system. It finally displays in
Talend Open Studio the company details stored in the SAP system.
The below figure shows the company detail parameters stored in the SAP system and that we want
to read in Talend Open Studio using the tSAPInput component.

Business components
tSAPInput
• Create a connection to the SAP system using the SAP connection wizard, in this scenario the
SAP connection is called sap and is saved in the Metadata node.
• Call the BAPI function BAPI_COMPANY_GETDETAIL using the SAP wizard to access the
BAPI HTML document stored in the SAP system and see the company details.
• In the Name filter field, type in BAPI* and click the Search button to display all available
BAPI functions.
• Select BAPI_COMPANY_GETDETAIL to display the schema that describe the company
details.
The three-tab view to the right of the wizard displays the metadata of the
BAPI_COMPANY_GETDETAIL function and allows you to set the necessary parameters.
The Document view displays the SAP html document about the BAPI_COMPANY_GETDETAIL
function.

Business components
tSAPInput
The Parameter view provides information about the input and output parameters required by the
BAPI_COMPANY_GETDETAIL function to return values.
• In the Parameter view, click the Input tab to list the input parameter(s). In this scenario,
there is only one input parameter required by BAPI_COMPANY_GETDETAIL and it is called
COMPANYID.
• In the Parameter view, click the Output tab to list the output parameters returned by
BAPI_COMPANY_GETDETAIL. In this scenario, there are two output parameters:
COMPANY_DETAIL and RETURN.
Each of these two “structure” parameters consists of numerous “single” parameters.

Business components
tSAPInput
The Test it view allows you to add or delete input parameters according to the called function. In
this scenario, we want to retrieve the metadata of the COMPANY_DETAIL “structure” parameter
that consists of 14 “single” parameters.
• In the Value column of the COMPANYID line in the first table, enter “000001” to send back
company data corresponding to the value 000001.
• In the Output type list at the bottom of the wizard, select output.table.
• Click Launch at the bottom of the view to display the value of each “single” parameter
returned by the BAPI_COMPANY_GETDETAIL function.
• Click Finish to close the wizard and create the connection.
The sap connection and the new schema BAI_COMPANY_GETDETAIL display under the SAP
Connections node in the Repository tree view.
To retrieve the different schemas of the BAPI_COMPANY_GETDETAIL function, do the
following:
• Right-click BAPI_COMPANY_GETDETAIL in the Repository tree view and select
Retrieve schema in the contextual menu.
• In the open dialog box, select the schemas you want to retrieve, COMPANY_DETAIL and
RETURN in this scenario.
• Click Next to display the two selected schemas and then Finish to close the dialog box.
The two schemas display under the BAPI_COMPANY_GETDETAIL function in the Repository
tree view.

Business components
tSAPInput
To retrieve the company metadata that corresponds to the 000001 value and display it in Talend
Open Studio, do the following:
• In the Repository tree view, drop the SAP connection you already created to the design
workspace to open a dialog box where you can select tSAPConnection from the component
list and finally click OK to close the dialog box. The tSAPConnection component holding
the SAP connection, sap in this example, displays on the design workspace.
• Double-click tSAPConnection to display the Basic settings view and define the component
properties.
If you store connection details in the Metadata node in the Repository tree view, the
Repository mode is selected in the Property Type list and the fields that follow are
pre-filled. If not, you need to select Built-in as “property type” and fill in the
connection details manually.
• In the Repository tree-view, expand Metadata and sap in succession and drop
RFC_READ_TABLE to the design workspace to open a component list.
• Select tSAPInput from the component list and click OK.
• Drop tFilterColumns and tLogRow from the Palette to the design workspace.
• Connect tSAPConnection and tSAPInput using a Trigger > OnSubJobOk link
• To connect tSAPInput and tLogRow, right-click tSAPInput and select Row >
row_COMPANY_DETAIL_1 and then click tLogRow.

Business components
tSAPInput
• In the design workspace, double click tSAPInput to display its Basic settings view and
define the component properties.
The basic setting parameters for the tSAPInput component display automatically since the schema
is stored in the Metadata node and the component is initialized by the SAP wizard.
• Select the Use an existing connection check box and then in the Component List, select
the relevant tSAPConnection component, sap in this scenario.
In the Initialize input area, we can see the input parameter needed by the
BAPI_COMPANY_GETDETAIL function.
In the Outputs area, we can see all different schemas of the BAPI_COMPANY_GETDETAIL
function, in particular, COMPANY_DETAIL that we want to output.
• In the design workspace, double-click tLogRow to display the Basic settings view and
define the component properties. For more information about this component, see tLogRow
on page 1023.

Business components
tSAPInput
tSAPInput retrieved from the SAP system the metadata of the COMPANY_DETAIL “structure”
parameter and tLogRow displayed the information on the console.
Scenario 2: Reading data in the different schemas of the

RFC_READ_TABLE function
Talend SAP components (tSAPInput and tSAPOutput) as well as the SAP wizard are based on a
library validated and provided by SAP (JCO) that allows the user to call functions and retrieve data
from the SAP system at Table, RFC or BAPI, levels.
This scenario uses the SAP wizard that leads a user through dialog steps to create a
SAP connection and call RFC and BAPI functions. This SAP wizard is available only
for Talend Integration Suite users. If you are a user of Talend Open Studio or
Talend On Demand, you need to set the basic settings for the tSAPInput component
manually.
This java scenario uses the SAP wizard to first create a connection to the SAP system, and then call
an RFC function to directly read from the SAP system a table called SFLIGHT. It finally displays
in Talend Open Studio the structure of the SFLIGHT table stored in the SAP system.
• Create a connection to the SAP system using the SAP connection wizard, in this scenario the
SAP connection is called sap.
• Call the RFC_READ_TABLE RFC function using the SAP wizard to access the table in the
SAP system and see its structure.
• In the Name filter field, type in RFC* and click the Search button to display all available
RFC functions.

Business components
tSAPInput
• Select RFC_READ_TABLE to display the schema that describe the table structure.
The three-tab view to the right of the wizard displays the metadata of the RFC_READ_TABLE
function and allows you to set the necessary parameters.
The Document view displays the SAP html document about the RFC_READ_TABLE function.
The Parameter view provides information about the parameters required by the
RFC_READ_TABLE function to return parameter values.
• In the Parameter view, click the Table tab to show a description of the structure of the
different tables of the RFC_READ_TABLE function.

Business components
tSAPInput
The Test it view allows you to add or delete input parameters according to the called function. In
this example, we want to retrieve the structure of the SFLIGHT table and not any data.
• In the Value column of the DELIMITER line, enter “;” as field separator.
• In the Value column of the QUERY_TABLE line, enter SFLIGHT as the table to query.
• In the Output type list at the bottom of the view, select output.table.
• In the Constructure|Table list, select DATA.
• Click Launch at the bottom of the view to display the parameter values returned by the
RFC_READ_TABLE function. In this example, the delimiter is “;” and the table to read is
SFLIGHT.

Business components
tSAPInput
• Click Finish to close the wizard and create the connection.

The sap connection and the RFC_READ_TABLE function display under the SAPConnections
node in the Repository tree view.
To retrieve the different schemas of the RFC_READ_TABLE function, do the following:
• In the Repository tree view, right-click RFC_READ_TABLE and select Retrieve schema
in the contextual menu. A dialog box displays.
• Select in the list the schemas you want to retrieve, DATA, FIELDS and OPTIONS in this
example.
• Click Next to open a new view on the dialog box and display these different schemas.
• Click Finish to validate your operation and close the dialog box.
The three schemas display under the RFC_READ_TABLE function in the Repository tree view.
In this example, we want to retrieve the data and column names of the SFLIGHT table and display
them in Talend Open Studio. To do that, proceed as the following:
• In the Repository tree view, drop the RFC_READ_TABLE function of the sap connection
to the design workspace to open a dialog box where you can select tSAPInput from the
component list and then click OK to close the dialog box. The tSAPInput component
displays on the design workspace.
• Drop two tLogRow components from the Palette to the design workspace.
To connect components together:
• Right-click tSAPInput and select Row > row_DATA_1 and click the first tLogRow
component.
• Right-click tSAPInput and select Row > row_FIELDS_1 and click the second tLogRow
components.
In this example, we want to retrieve the FIELDS and DATA schemas and put them in two different
output flows.

Business components
tSAPInput
• In the design workspace, double-click tSAPInput to open the Basic settings view and
display the component properties.
The basic setting parameters for the tSAPInput component display automatically since the schema
is stored in the Metadata node and the component is initialized by the SAP wizard.
In the Initialize input area, we can see the input parameters necessary for the RFC_READ_TABLE
function, the field delimiter “;” and the table name “SFLIGHT”.
In the Outputs area, we can see the different schemas of the SFLIGHT table.

Business components
tSAPInput
• In the design workspace, double click each of the two tLogRow components to display the
Basic settings view and define the component properties. For more information on the
properties of tLogRow, see tLogRow on page 1023.
The tSAPInput component retrieves from the SAP system the column names of the SFLIGHT table
as well as the corresponding data. The tLogRow components display the information in a tabular
form in the Console.

Business components
tSAPOutput
tSAPOutput
tSAPOutput Properties
Function Writes to an SAP system.

Purpose Allows to write data into an SAP system.

Connection Client type: Enter your SAP usual connection code
configuration Userid: Enter the user connection Id.
Password: Enter the password.
Language: Specify a language.
Host name Enter the SAP system IP address.
System number Enter the system number.
Function name Enter the name of the function you want to use to
write data.

Business components
tSAPOutput
Mapping Set the parameters to select the data to write to the

SAP system.
Usage Usually used as an output component. An input component is required.
Limitation n/a
Related scenario
For a related scenarios, see Scenario 1: Retrieving metadata from the SAP system on page 134 and
Scenario 2: Reading data in the different schemas of the RFC_READ_TABLE function on page 140.

Business components
tSAPRollback
tSAPRollback
tSAPRollback properties
This component is closely related to tSAPCommit and tSAPConnection. It usually does not make
much sense to use these components separately in a transaction.
Component family Business/SAP
Function tSAPRollback cancels the transaction commit in the connected SAP.

Purpose tSAPRollback avoids to commit only a fragment of a transaction.
Basic settings Component list Select the tSAPConnection component in the list if
Job.
Release Connection Clear this check box to continue to use the selected
connection once the component has performed its
task.
component level.
Usage This component is intended to be used along with SAP components, especially
with tSAPConnection and tSAPCommit.
Limitation n/a
Related scenarios
For tSAPRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter
tables on page 498 of the tMysqlRollback.

Business components
tSugarCRMInput
tSugarCRMInput
tSugarCRMInput Properties
Function Connects to a Sugar CRM database module via the relevant webservice.
Purpose Allows you to extract data from a SugarCRM DB based on a query.
Basic settings SugarCRM Type in the webservice URL to connect to the
Webservice URL SugarCRM DB.
Module Select the relevant module from the list
To use customized tables, select Use custom
module from the list. The Custom module
package name and Custom module name
fields which appear are automatically filled in with
the relevant names.
Password
Module selected.
Example: account_name= ‘Talend’
Limitation n/a
Scenario: Extracting account data from SugarCRM
This scenario describes a two-component Job which extracts account information from a
SugarCRM database and writes it to an Excel output file.
• Drop a tSugarCRMInput and a tFileOutputExcel component onto the workspace.

• Connect the input component to the output component using a main row link.

Business components
tSugarCRMInput
• On the tSugarCRMInput Component view, fill in the connection information in the

SugarCRM Web Service URL as well as the Username and Password fields
• Then select the Module from the list of modules offered. In this example, Accounts is
selected.
• The Schema is then automatically set according to the module selected. But you can change
it and remove the columns that you don’t require in the output.
• In the Query Condition field, type in the query you want to extract from the CRM. In this
example: “billing_address_city=’Sunnyvale’”
• Then select the tFileOutputExcel component.
• Set the destination file name as well as the Sheet name and select the Include header check
box.
• Save the Job and press F6 to run it.
The filtered data is output in the defined spreadsheet of the specified Excel type file.

Business components
tSugarCRMOutput
tSugarCRMOutput
tSugarCRMOutput Properties
Function Writes in a Sugar CRM database module via the relevant webservice.
Purpose Allows you to write data into a SugarCRM DB.
Basic settings SugarCRM Type in the webservice URL to connect to the
Webservice URL SugarCRM DB.
Module Select the relevant module from the list
To use customized tables, select Use custom
module from the list. The Custom module
package name and Custom module name
fields which appear are automatically filled in with
the relevant names.
Password
Action Insert or Update the data in the SugarCRM module.
Limitation n/a
Related Scenario

Business components
tVtigerCRMInput
tVtigerCRMInput
tVtigerCRMInput Properties
Component family Business/VtigerCRM
Function Connects to a module of a VtigerCRM database.

Purpose Allows to extract data from a VtigerCRM DB.
Basic settings Server Address Type in the IP address of the VtigerCRM server
Port Type in the Port number to access the server
Username and Type in the user authentication data.
Password
Version Type in the version of VtigerCRM you are using.
Method Select the relevant method on the list. The method
specifies the action you can carry out on the
VtigerCRM module selected.
Module selected.
level.
Limitation n/a
Related Scenario

Business components
tVtigerCRMOutput
tVtigerCRMOutput
tVtigerCRMOutput Properties
Component family Business/VtigerCRM
Function Writes data into a module of a VtigerCRM database.

Purpose Allows to write data from a VtigerCRM DB.
Basic settings Server Address Type in the IP address of the VtigerCRM server
Port Type in the Port number to access the server
Username and Type in the user authentication data.
Password
Version Type in the version of VtigerCRM you are using.
Method Select the relevant method on the list. The method
specifies the action you can carry out on the
VtigerCRM module selected.
Module selected.
level.
Limitation n/a
Related Scenario

Business components
tVtigerCRMOutput

Custom Code components
This chapter details the major components which belong to the Custom Code family in the Talend
Open Studio Palette.
The Custom Code components enable you to create codes for specific needs, quickly and
efficiently.
tGroovy
tGroovy
tGroovy Properties
Component Family Custom Code
Function tGroovy allows you to enter customized code which you can integrate
in the Talend programme. The code is run only once.
Purpose tGroovy broadens the functionality if the Talend Job, using the Groovy
language which is a simplified Java syntax.
Basic settings Groovy Script Enter the Groovy code youo want to run.
Variables This table has two columns.
Name: Name of the variable called in the code..
Value: Value associated with the variable.
Advanced settings tStatCatcher Select this check box to collect the log data at
Statistics component level.
Usage This component can be used alone or as a subjob along with one other
component.
Limitation Knowledge of the Groovy language is required.
Related Scenarios
• For a scenario using the Groovy code, see Scenario: Calling a file which contains Groovy
code on page 157.
• For a functional example, see Scenario: Printing out a variable content on page 159

tGroovyFile
tGroovyFile
tGroovyFile Properties
Function tGroovyFile allows you to call an existing Groovy script.

Purpose tGroovyFile broadens the functionaility of Talend Jobs using the
Groovy language which is a simplified Java syntax.
Basic settings Groovy File Name and path of the file containing the Groovy
code.
Variables This table contains two columns.
Name: Name of the variable called in the code.
Value: Value associated with this variable.
Usage This component can be used alone or as a sub-job along with another
component.
Limitation Knowledge of the Groovy language is required.
Scenario: Calling a file which contains Groovy code
This scenario uses tGroovyFile, on its own. The Job calls a file containing Groovy code in order to
display the file information in the Console. Below, is an example of the information displayed:
• Open the Custom_Code folder in the Palette and drop a tGroovyFile component onto the
workspace.
• Double-click the component to display the Component view.
• In the Groovy File field, enter the path to the file containing the Groovy code, or browse to
the file in youor directory.
• In the Variables table, add a line by clicking the [+] button.

tGroovyFile
• In the Name column, enter “age”, then in the Value column, enter 50, as in the screenshot.
• Press F6 to save and run the Job.

• The Console displays the information contained in the input file, to which the variable result
is added.

tJava
tJava
tJava Properties
Component family Custom Code
Function tJava enables you to enter personalized code in order to integrate it in

Talend program. You can execute this code only once.
Purpose tJava makes it possible to extend the functionalities of a Talend Job
through using Java commands.
Basic settings Code Type in the Java code you want to execute
according to the task you need to perform. For
further information about Java functions syntax
specific to Talend, see Talend Open Studio
online Help (Help Contents > Developer Guide >
API Reference).
For a complete Java reference, check
http://java.sun.com/javaee/6/docs/api/
Usage This component is generally used as a one-component subjob.
Limitation You should know Java language.
Scenario: Printing out a variable content
The following scenario is a simple demo of the extended application of the tJava component. The
Job aims at printing out the number of lines being processed using a Java command and the global
variable provided in Talend Open Studio.
• Select and drop the following components from the Palette onto the design workspace:
tFileInputDelimited, tFileOutputExcel, tJava.
• Connect the tFileInputDelimited to the tFileOutputExcel using a Row Main connection.
The content from a delimited txt file will be passed on through the connection to an xls-type
of file without further transformation.

tJava
• Then connect the tFileInputDelimited component to the tJava component using a Then
Run link. This link sets a sequence ordering tJava to be executed at the end of the main
process.
• Set the Basic settings of the tFileInputDelimited component. The input file used in this
example is a simple text file made of two columns: Names and their respective Emails.
• The schema has not been stored in the repository for this use case, therefore you need to set
manually the two-column schema
• Click the Edit Schema button.
• When prompted, click OK to accept the propagation, so that the tFileOutputExcel

component gets automatically set with the input schema. Therefore no need to set the schema
again.
• Set the output file to receive the input content without changes. If the file does not exist
already, it will get created.
In this example, the Sheet name is Email and the Include Header box is selected.
• Then select the tJava component to set the Java command to execute.

tJava
• In the Code area, type in the following command:

String var = "Nb of line processed: ";
var = var + globalMap.get("tFileInputDelimited_1_NB_LINE");
System.out.println(var);
In this use case, we use the NB_Line variable. To access the global variable list, press Ctrl + Space
bar on your keyboard and select the relevant global parameter.
The content gets passed on to the Excel file defined and the Number of lines processed are displayed
on the Run console.

tJavaFlex
tJavaFlex
tJavaFlex properties
Component family Custom Code
Function tJavaFlex enables you to enter personalized code in order to integrate it

in Talend program. With tJavaFlex, you can enter the three java-code
parts (start, main and end) that constitute a kind of component dedicated
to do a desired operation.
Objective tJava makes it possible to extend the functionalities of a Talend Job
through using Java commands.
Basic settings Schema and Edit A schema is a row description, i.e., it defines the
Schema number of fields to be processed and passed on to
the next component. The schema is either
built-in or remotely stored in the Repository.
Click Edit Schema to make changes to the
schema. Note that if you make changes, the
schema automatically becomes built-in.
Built-in: The schema is created and stored locally
for this component only. Related topic: Setting a
built-in schema of Talend Open Studio User
Guide.
Repository: The schema already exists and is
stored in the Repository, hence can be reused.
Related topic: Setting a Repository schema of
Talend Open Studio User Guide.
Data Auto Select this check box to automatically propagate
Propagate the data to the component that follows.
When you select this check box, you can
not later do any transformation on the re-
trieved data by setting Java commands in the
Main code field.
Start code Enter the Java code that will be called during the
initialization phase.
Main code Enter the Java code to be applied for each line in
the data flow.
End code Enter the Java code that will be called during the
closing phase.
Advanced settings Import Enter the Java code that helps to import, if
necessary, external libraries used in the Main
code box of the Basic settings view.
tStatCatcher Select this check box to gather the job processing
Statistics metadata at a job level as well as at each
component level.
Usage You can use this component as a start, intermediate or output component.
You can as well use it as a one-component subjob.

tJavaFlex
Limitation You should know the Java language.
Scenario1: Generating data flow
This scenario describes a two-components Job that generates a three-line data flow describing
different personal titles (Miss, Mrs, and Mr) and displaying them on the console.
• Drop tJavaFlex and tLogRow from the Palette onto the design workspace.
• Connect the components together using a Row Main link.
• Double-click tJavaFlex to display its Basic settings view and define its properties.
• From the Schema Type list, select Built-in and then click the three-dot button next to Edit
schema to open the corresponding dialog box where you can define the data structure to pass
to the component that follows.

tJavaFlex
• Click the plus button to add two columns: key and value and then set their types to Integer
and String respectively.
• Click Ok to validate your changes and close the dialog box.
• In the Basic settings view of tJavaFlex, select the Data Auto Propagate check box to
automatically propagate data to the component that follows.
In this example, we do not want to do any transformation on the retrieved data.
• In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of tJavaFlex by displaying the START
message and sets up the loop and the variables to be used afterwards in the Java code:
System.out.println("## START\n#");
String [] valueArray = {"Miss", "Mrs", "Mr"};
for (int i=0;i<valueArray.length;i++) {
• In the Main code field, enter the code you want to apply on each of the data rows.
In this example, we want to display each key with its value:
row1.key = i;
row1.value = valueArray[i];
In the Main code,” row1” corresponds to the name of the link that comes out of tJavaFlex. If you rename
this link, you have to modify the code of this field accordingly.
• In the End code field, enter the code that will be executed in the closing phase.
In this example, the brace (curly bracket) closes the loop and the code indicates the end of
the execution of tJavaFlex by displaying the END message:
}
System.out.println("#\n## END");
• If needed, double-click tLogRow and in its Basic settings view, click the button next to Edit
schema to make sure that the schema has been correctly propagated.

tJavaFlex
The three personal titles are displayed on the console along with their corresponding keys.
Scenario2: Processing rows of data with tJavaFlex
This scenario describes a two-component Job that generates random data and then collects that data
and does some transformation on it line by line using Java code through the tJavaFlex component.
• Drop tRowGenerator and tJavaFlex from the Palette onto the design workspace.
• Connect the components together using a Row Main link.
• Double-click tRowGenerator to display its Basic settings view and the [RowGenerator
Editor] dialog box where you can define the component properties.
• Click the plus button to add four columns: number, txt, date and flag.
• Define the schema and set the parameters to the four columns according to the above capture.

tJavaFlex
• In the Functions column, select the three-dot function (...) for each of the defined columns.
• In the Parameters column, enter 10 different parameters for each of the defined columns.
These 10 parameters corresponds to the data that will be randomly generated when executing
tRowGenerator.
• Click OK to validate your changes and close the editor.
• Double-click tJavaFlex to display its Basic settings view and define the components
properties.
• In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of the tJavaFlex component by
displaying the START message and defining the variable to be used afterwards in the Java
code:
System.out.println("## START\n#");
int i = 0;
• In the Main code field, enter the code to be applied on each line of data.
In this example, we want to show the number of each line starting from 0 and then the
number and the random text transformed to upper case and finally the random date set in the
editor of tRowGenerator. Then, we create a condition to show if the status is true or false
and we increment the number of the line:
System.out.print(" row" + i + ":");
System.out.print("# number:" + row1.number);
System.out.print (" | txt:" + row1.txt.toUpperCase());
System.out.print(" | date:" + row1.date);
if(row1.flag) System.out.println(" | flag: true");
else System.out.println(" | flag: false");
i++;

tJavaFlex
In the Main code filed, “row1” corresponds to the name of the link that comes out of tJavaFlex. If you
rename this link, you have to modify the code.
• In the End code field, enter the code that will be executed in the closing phase.
In this example, the code indicates the end of the execution of tJavaFlex by displaying the
END message:
System.out.println("#\n## END");
The console displays the randomly generated data that was modified by the java command set
through tJavaFlex.

tJavaRow
tJavaRow
Propriétés du tJavaRow
Function tJavaRow allows you to enter customized code which you can integrate
in a Talend programme.. With tJavaRow, you can enter the Java code to
be applied to each row of the flow.
Purpose tJavaRow allows you to broaden the functionality of Talend jobs, using
the Java language.
Schema number of fields to be processed and passed on to
the next component. The schema is either
built-in or remotely stored in the Repository.
schema automatically becomes built-in..
Built-in: The schema is created and stored locally
built-in schema in the Talend Open Studio User
Guide.
stored in the Repository, hence can be reused.
Talend Open Studio User Guide
Code Enter the Java code to be applied to each line of
the data flow.
Advanced settings Import Enter the Java code required to import, if
required, the external library used in the Main
code field of the Basic settings tab.
tStatCatcher Select this check box to collect the log data at a
Statistics component level..
Usage This component is used as an intermediary between two other
components. It must be linked to both an input and an output
component.
Limitation Knowledge of Java language is necessary.
Related scenario

tLibraryLoad
tLibraryLoad
tLibraryLoad Properties
Famille de composant Custom Code
Function tLibraryLoad allows you to import a library.

Purpose tLibraryLoad allows you to load useable Java libraries in a Job.
Basic settings Library Select the library you want to import from the list,
or click on the [...] button to browse to the library
in your directory.
Advanced settings Dynamic Libs Lib Paths: Enter the access path to your library,
between double quotation marks.
Import Enter the Java code required to import, if
required, the external library used in the Main
code field of the Basic settings tab.
tStatCatcher Select this check box to collect the log data at
Utilisation This component may be used alone, although it is more logical to use it
as part of a Job.
Limitation n/a
Scenario: Checking the format of an e-mail addressl
This scenario uses two components, a tLibraryLoad and a tJava. The goal of this scenario is to
check the format of an e-mail address and verify whether the format is valid or not.
• In the Palette, open the Custom_Code folder, and slide a tLibraryLoad and tJava
component onto the workspace.
• Connect tLibraryLoad to tJava using a Trigger > OnSubjobOk link.

tLibraryLoad
• Double click on tLibraryLoad to display its Basic settings. From the Library list, select
jakarta-oro-2.0.8.jar.
• In the Import field of the Advanced settings tab, type import org.apache.oro.text.regex.*;
• Double click on tJava to display its Component view.

• In the Basic settings tab, enter your code, as in the screenshot below. The code allows you
to check whether the character string pertains to an e-mail address, based on the regular
expression: "^[\\w_.-]+@[\\w_.-]+\\.[\\w]+$".
• Press F6 to save and run the Job.

tLibraryLoad
The Console displayes the boolean false. Hence, the e-mail address is not valid as the format is
incorrect.

tPerl
tPerl
tPerl properties
Component family Processing
Function tPerl transforms any data entered as argument of Perl commands.

Purpose tPerl is an (Perl) editor that is a very flexible tool within a job.
Basic settings Code Type in the Perl code based on the command and
task you need to perform. For further information
about Perl functions syntax, see Talend Open
Studio online Help (under Talend Open Studio
User Guide > Perl)
Advanced settings tStatCatcher Select this check box to gather the job processing
component level.
Usage Typically used for debugging but can also be used to display a variable
content.
Limitation This component requires an advanced Perl user level and is not meant to
be used with a Row connection as is meant for single use.
Scenario: Displaying a number of processed lines
This scenario is a three-component job showing in the Log the number of rows being processed and
output in an XML file.
• Drop three components from the Palette onto the design workspace: tFileInputDelimited,
tFileOutputExcel, tPerl
• Right-click tFileInputDelimited and connect it to tFileOutputExcel using a main Row.
• Right-click again tFileInputDelimited and link it to tPerl using a Trigger > OnSubjobOk
link. This link means that, following the arrow direction, the first component
(tFileDelimited) will run before the second component (tPerl).

tPerl
• Click once on tFileInputDelimited and select the Basic settings tab to define the
• The Properties are not reused from or for another job stored in the repository, but instead
are used for this job only. Therefore select Built-In in the drop-down list.
• Enter a path or browse to the file containing the data to be processed. In this example, the
text file gathers a list of names facing the relevant email addresses.
• Define the Row and Field separators. In this scenario, there is one name and the matching
email per row. And the fields are separated by a semi-colon.
• The first row of the file contains the labels of the columns, therefore it should be ignored in
the job. Therefore the Header field value is 1.
• There is no footer nor limit value to be defined for this scenario.
• The Schema type is also built-in in this case. Click on Edit Schema and describe the content
of the input file. In this scenario, there are two columns labelled Name and Emails, of type
String and with no length defined. Key field being Email.
• Select the tFileOutputExcel component and define it accordingly.
• Select the output file path, Sheet and synchronize the schema.
• Then define the tPerl sub-job in order to get the number of rows transferred to the XML
Output file.

tPerl
• Enter the Perl command print to get the variable containing the number of rows read in
tFileInputDelimited. To access the list of available variables, press Ctrl+Space then select
the relevant variable in the list.
• For a better readability in the Run Job log, add equal signs before and after the commands.
Note also that commands, strings and variables are colored differently.
• Then switch to the Run Job tab and execute the job.
The job runs smoothly and create an output xml file following the two-field schema defined: Name
and Email.
The Perl command result is shown in the job log.

Data Quality components
This chapter details the main components that you can find in the Data Quality family of the
The Data Quality family comprises dedicated components that help you improve the quality of your
data. These components covers various needs such as narrow down filtering the unique row,
calculating CRC, finding data based on fuzzy matching, and so on.
tAddCRCRow
tAddCRCRow
tAddCRCRow properties
Component family Data Quality
Function tAddCRCRow calculates a surrogate key based on one or several

columns and adds it to the defined schema.
Purpose Providing a unique ID helps improving the quality of processed data.
Schema number of fields that will be processed and
passed on to the next component. The schema is
either built-in or remote in the Repository. In this
component, a new CRC column is automatically
added.
Built-in: The schema will be created and stored
locally for this component only. Related topic:
Setting a built-in schema of Talend Open Studio
User Guide.
stored in the Repository, hence can be reused in
various projects and job designs. Related topic:
Setting a Repository schema of Talend Open
Studio User Guide.
Implication Select the check box facing the relevant columns
to be used for the surrogate key checksum.
Advanced Settings CRC type Select a CRC type in the list. The longer the CRC,
the least overlap you will have.
tStatCatcher Select this check box to collect log data at the
Usage This component is an intermediary step. It requires an input flow as well
as an output.
Limitation n/a
Scenario: Adding a surrogate key to a file
This scenario describes a Job adding a surrogate key to a delimited file schema.
• Drop the following components: tFileInputDelimited, tAddCRCRow and tLogRow.

• Connect them using a Main row connection.

tAddCRCRow
• In the tFileInputDelimited Component view, set the File Name path and all related
properties in case these are not stored in the Repository.
• Create the schema through the Edit Schema button, in case the schema is not stored already
in the Repository. In Java, mind the data type column and in case of Date pattern to be filled
in, check out http://java.sun.com/j2se/1.5.0/docs/api/index.html.
• In the tAddCRCRow Component view, select the check boxes of the input flow columns
to be used to calculate the CRC.
• Notice that a CRC column (read-only) has been added at the end of the schema.
• Select CRC32 as CRC Type to get a longer surrogate key.
• In the Basic settings view of tLogRow, select the Print values in cells of a table check box
to display the output data in a table on the Console.
• Then save your Job and press F6 to execute it.

tAddCRCRow
An additional CRC Column has been added to the schema calculated on all previously selected
columns (in this case all columns of the schema).

tExtractRegexFields
tExtractRegexFields
tExtractRegexFields belongs to two component families: Data Quality and Processing. For more
information on tExtractRegexFields, see tExtractRegexFields on page 1128.

tFuzzyMatch
tFuzzyMatch
tFuzzyMatch properties
Function Compares a column from the main flow with a reference column from
the lookup flow and outputs the main flow data displaying the distance
Purpose Helps ensuring the data quality of any source data against a reference
data source.
either built-in or remote in the Repository.
Two read-only columns, Value and Match are
added to the output schema automatically.
User Guide.
Studio User Guide.
Matching type Select the relevant matching algorithm among:
Levenshtein: Based on the edit distance theory. It
calculates the number of insertion, deletion or
substitution required for an entry to match the
reference entry.
Metaphone: Based on a phonetic algorithm for
indexing entries by their pronunciation. It first
loads the phonetics of all entries of the lookup
reference and checks all entries of the main flow
against the entries of the reference flow.
Double Metaphone: a new version of the
Metaphone phonetic algorithm, that produces
more accurate results than the original algorithm.
It can return both a primary and a secondary code
for a string. This accounts for some ambiguous
cases as well as for multiple variants of surnames
with common ancestry.
Min Distance (Levenshtein only) Set the minimum number of
changes allowed to match the reference. If set to
0, only perfect matches are returned.
Max Distance (Levenshtein only) Set the maximum number of
changes allowed to match the reference.

tFuzzyMatch
Matching Column Select the column of the main flow that needs to
be checked against the reference (lookup) key
column
Unique Matching Select this check box if you want to get the best
match possible, in case several matches are
available.
Matching item In case several matches are available, all of them
separator are displayed unless the unique match box is
selected. Define the delimiter between all
matches.
Usage This component is not startable (green background) and it requires two
input components and an output component.
Limitation/prerequisite Perl users: Make sure the relevant packages are installed. Check the
Module view for modules to be installed
Scenario 1: Levenshtein distance of 0 in first names
This scenario describes a four-component Job aiming at checking the edit distance between the First
Name column of an input file with the data of the reference input file. The output of this Levenshtein
type check is displayed along with the content of the main flow on a table
• Drag and drop the following components from the Palette to the design workspace:
tFileInputDelimited (x2), tFuzzyMatch, tFileOutputDelimited.
• Define the first tFileInputDelimited Basic settings. Browse the system to the input file to
be analyzed and most importantly set the schema to be used for the flow to be checked.
• In the schema, set the Type of data in the Java version, especially if you are in Built-in mode.
• Link the defined input to the tFuzzyMatch using a Main row link.
• Define the second tFileInputDelimited component the same way.
Make sure the reference column is set as key column in the schema of the lookup flow.

tFuzzyMatch
• Then connect the second input component to the tFuzzyMatch using a main row (which
displays as a Lookup row on the design workspace).
• Select the tFuzzyMatch Basic settings.
• The Schema should match the Main input flow schema in order for the main flow to be
checked against the reference.
• Note that two columns, Value and Matching, are added to the output schema. These are
standard matching information and are read-only.
• Select the method to be used to check the incoming data. In this scenario, Levenshtein is the
Matching type to be used.
• Then set the distance. In this method, the distance is the number of char changes (insertion,
deletion or substitution) that needs to be carried out in order for the entry to fully match the
reference.
• In this use case, we want the distance be of 0 for the min. or for the max. This means only
the exact matches will be output.
• Also, clear the Case sensitive check box.
• And select the column of the main flow schema that will be selected. In this example, the first
name.
• No need to select the Unique matching check box nor hence the separator.
• Link the tFuzzyMatch to the standard output tLogRow. No other parameters than the
display delimiter is to be set for this scenario.

tFuzzyMatch
• Save the Job and press F6 to execute the Job.
As the edit distance has been set to 0 (min and max), the output shows the result of a regular join
between the main flow and the lookup (reference) flow, hence only full matches with Value of 0 are
displayed.
A more obvious example is with a minimum distance of 1 and a max. distance of 2, see Scenario 2:
Levenshtein distance of 1 or 2 in first names on page 183.
Scenario 2: Levenshtein distance of 1 or 2 in first names
This scenario is based on the scenario 1 described above. Only the min and max distance settings
in tFuzzyMatch component get modified, which will change the output displayed.
• In the Component view of the tFuzzyMatch, change the min distance from 0 to 1. This
excludes straight away the exact matches (which would show a distance of 0).
• Change also the max distance to 2 as the max distance cannot be lower than the min distance.
The output will provide all matching entries showing a discrepancy of 2 characters at most.
• No other change of the setting is required.

• Make sure the Matching item separator is defined, as several references might be matching
the main flow entry.
• Save the new Job and press F6 to run it.

tFuzzyMatch
As the edit distance has been set to 2, some entries of the main flow match several reference entries.
You can also use another method, the metaphone, to assess the distance between the main flow and
the reference,
Scenario 3: Metaphonic distance in first name
This scenario is based on the scenario 1 described above.
• Change the Matching type to Metaphone. There is no min nor max distance to set as the
matching method is based on the discrepancies with the phonetics of the reference.
• Save the Job and press F6. The phonetics value is displayed along with the possible matches.

tIntervalMatch
tIntervalMatch
tIntervalMatch properties
Function tIntervalMatch receives a main flow and aggregates it based on join to

a lookup flow (Java) or a given lookup file (Perl). Then it matches a
specified value to a range of values and returns related information.
Purpose Helps to return a value based on a Join relation.
User Guide.
various projects and job flowcharts. Related
topic: Setting a Repository schema of Talend
Java only Search column Select the main flow column containing the
values to be matched to a range of values
Column (LOOKUP) Select the lookup flow column containing the
values to be returned when the Join is ok.
Lookup Column Select the column containing the min value of the
min/ bounds strictly tange. Select the check box if the boundary is
(min) strict.
Lookup Column Select the column containing the max value of the
max/ bounds strictly tange. Select the check box if the boundary is
(max) strict.
Usage This component handles flow of data therefore it requires input and
output, hence is defined as an intermediary step.
Limitation n/a
The Perl properties being quite different from the Java properties, they are described in a separate
table below.
PERL basic settings

tIntervalMatch
User Guide.
Perl only File Name Enter the file that contains the range of values. It
functions as a lookup flow.
Field separator Character, string or regular expression to separate
fields in the lookup file.
Row separator String (ex: “\n”on Unix) to distinguish rows in the
lookup file.
Lookup index Position of the min column in the lookup file: 0
Column for first col, 1 for second col, etc.
Make sure the interval min and max columns are
adjacent.
Search column Select the main flow column containing the
values to be matched to a range of values
Limitation For the time being, the Perl version of the tIntervalMatch does not
accept a real Lookup flow (but only a reference file in the actual
component’s settings)
Scenario: Identifying Ip country (Perl and Java)
The following scenario describes a Job designed in parallel in both languages, Perl and Java. In this
Job, a incoming main flow provides 2 columns: Documents and IP dummy values. A second file
used as lookup flow in Java and reference range file in Perl contains a list of sorted IP ranges and
their corresponding country. This Job aims at retrieving each document’s country from their IP
value, in other words, creating a Join between the main flow and the lookup flow.
In Perl, the Job requires one tFileInputDelimited, a tIntervalMatch and a tLogRow.

tIntervalMatch
In Java, the Job requires one extra tFileInputDelimited, a tIntervalMatch and a tLogRow.
• Drop the components onto the design workspace.
• Set the basic settings of the tFileInputDelimited component.
• The schema is made of two columns, respectively Document and IP

• (Java only) Set the Type column on String for the Document column and Integer for the IP
column.
• (Java only) Set now the second tFileInputDelimited properties.
• (Java only) Don’t forget to define the Type of data.

• (Both Java and Perl) Propagate the schema from the incoming main flow to the
tIntervalMatch component.

tIntervalMatch
• (Both Java and Perl) Note that the output schema from the tIntervalMatch component is
read-only and is made of the input schema plus an extra Lookup column which will output
the requested lookup data.
• Set the other properties of the tIntervalMatch component.
• (Perl only), the lookup file is defined directly in the setting of the tIntervalMatch.
• (Perl only) In File Name field, set the path to the lookup file. Set the Row and Field separator
of the lookup file.
• (Perl only) In Lookup column index field, set the inferior bound of the data range. This
corresponds to the position of the column containing the min value of the range (0 for the
first column).
• (Perl only) No need to set the lookup values to be returned as all values from the lookup will
be outputted.
• (Java only) Set the tIntervalMatch other properties such as the min and max column
corresponding to the range bounds.
• (Java only) In the Column Lookup field, select the column where are the values to be
returned.
• (Both Java and Perl) In the Search column field, select the main flow column containing the
values to be matched to the range values.

tIntervalMatch
• (Both Java and Perl) The tLogRow component does not require any specific setting for this
Job.
Both Perl and Java Jobs output the same result with slight differences in the way they display.
The Perl results include the range values whereas the Java output only includes the requested return
value (country).

tParseAddress
tParseAddress
tParseAddress properties
Component Family Data Quality
Function This component analyses addresses in a defined schema column and

parses them into different field types.
Purpose Cut data in several columns to sort the different parts, in order to improve
your data quality.
Built-in: You create the schema and store it
User Guide.
Repository: You have already created the
schema and stored it in the Repository. You can
reuse it in various projects and job designs.
Column to parse Select the column in which you want to analyze
and parse data.
Country Select the customers’ country.
Advanced settings Correct case Select this check box in order to correct the case,
if required. For example, to put a capital letter at
the beginning of a name.
Auto clean Select this check box to delete the
non-alphanumerical characters before trying to
parse addresses.
Abbreviate Select this check box to automatically abbreviate
subcountry region names. If the name is already abbreviated,
it will not be modified.
Allow only Select this check box to authorize only
abbreviated abbreviated region names. Data processing will
subcountry be faster and will check that the address matches
postal standards.
tStatCatcher Select this check box to collect log data for the
Statistics job as a whole, as well as for each component.
Usage This component acts as an intermediary. It requires an input and an
output flow.
Limitation n/a

tParseAddress
Related scenario

tParseName
tParseName
tParseName Properties
Component Family Data Quality
Function This component retrieves names presented in different forms and

extracts each item according to its type.
Purpose The tParseName analyses a file containing names and extracts items
according to their type, in order to improve the quality of the processed
data. It also rejects the incorrect items.
User Guide.
reuse it in various projects and job designs.
Column to parse Select the column from which you want to extract
the different items.
Advanced settings Ignore joint names. Select this check box to ignore data containing
Mr John Smith and joint names. This will make the data processing
Ms Mary Jones -> faster. Deselect it in order to return and cut this
Mr John Smith data by type.
Allow reversed. Select this check box so that names will be
Smith, Mr AB -> Mr processed regardless of the form they are
AB Smith presented in. The component puts the names in
order and performs the normal analysis. Note that
if the name can be analyzed, its original order is
not saved as a property.
Extend titles Select this check box to use all possible titles. If
you deselect it, you can only use “standard” titles.
Correct case Select this check box in order to correct the case,
that is to say, if needed, to put a capital letter at the
beginning of a name.
Auto clean Select this check box to delete the
non-alphanumerical characters before trying to
parse addresses.
tStatCatcher Select this check box to collect log data for the
Statistics job as a whole, as well as for each component.

tParseName
Usage This component acts as an intermediary. It requires an input and an

output flow.
Limitation n/a
Related scenario

tReplaceList
tReplaceList
tReplaceList Properties
Function Carries out a Search & Replace operation in the input columns defined
based on an external lookup.
Purpose Helps to cleanse all files before further processing.
User Guide.
Studio User Guide.
Lookup search Type in the position number of the column to be
column searched in the lookup schema.
0: first column read
1: second column read
n: position number of the column in the schema
read.
In order to ensure the uniqueness of values being searched,
make sure this column is marked as Key in your lookup
schema.
Lookup replacement Type in the position number of the column where
column the replacement values are stored.
0: first column read
1: second column read
n: position number of the column in the schema
read
Column options Select the columns of the main flow where the
replacement is to be carried out.
Usage This component is not startable as it requires an input flow. And it
requires an output component.

tReplaceList
Scenario: Replacement from a reference file
The following Job searches and replaces a list of countries with their corresponding codes. The
relevant codes are taken from a reference file placed as lookup flow in the Job. The main flow is
replicated and both outputs are displayed on the console, in order to show the main flow before and
after replacement.
• Drop the following components from the Palette to the design workspace: tMysqlInput,
tFileInputDelimited, tReplicate, tReplaceList and tLogRow (x2). Note that if your input
schemas are stored in the Repository, you can simply drag and drop the relevant node from
the Repository’s Metadata Manager onto the design workspace to retrieve automatically the
input components’ setting. For more information, see Dropping components from the
Metadata node in Talend Open Studio User Guide.
• Connect the components using Main Row connections via a right-click on each component.
Notice that the main row coming from the reference flow (tFileInputDelimited) is called a
lookup row.
• Select the tMysqlInput component and set the input flow parameters.
• The input schema is made of two columns: Names, States. The column States gathered the
name of the United States of America which are to be replaced by their respective code.

tReplaceList
• In the Query field, make sure the State column is included in the Select statement. In this
use case, all columns are selected.
• Check the tReplicate component setting. The schema is simply duplicated into two identical
flows, but no change to the schema can be made.
• Then double-click on the tFileInputDelimited component, to set the reference file.
• The file includes two columns: Postal, State where Postal provides the zipcode
corresponding to the name given in the respective row of the State column.
• The fields are delimited by semicolons and rows are separated by carriage returns.
• Edit the lookup flow schema.
• Make sure the lookup search column (in this use case: State) is a key, in order to ensure the
uniqueness of the values being searched.
• Select the tReplaceList and set the operation to carry out.
• The schema is retrieved from the previous component of the main flow.
• In Lookup search index field, type in the position index of the column being searched. In
this use case, State is the second column of the lookup input file, therefore type in 1 in this
field.

tReplaceList
• In Lookup replacement index field, fill in the position number of the column containing the
replacement values, in this example: Postal for the State codes.
• In the Column options table, select the States column as in this use case, the State names
are to be replaced with their corresponding code.
• In both tLogRow components, select the Print values in table cells check box for a better
readability of the outputs.
The first flow output shows the State column with full state names as it comes from the main input
flow.
The second flow output shows the States column after the State column names have been replaced
with their respective codes.

tSchemaComplianceCheck
tSchemaComplianceCheck Properties
Function Validates all input rows against a reference schema or checks type, nullability, length
of rows against reference values. The validation can be carried out in full or partly.
Purpose Helps ensuring the data quality of any source data against a reference data source.
Basic settings Base Schema and Edit A schema is a row description, i.e., it defines the number
Schema of fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Describe the structure and nature of your data to be
processed as it is.
Built-in: The schema will be created and stored locally for
Repository: The schema already exists and is stored in the
Repository, hence can be reused in various projects and job
designs. Related topic: Setting a Repository schema of
Java only Use another schema for Define a reference schema as you expect the data to be, in
compliance check order to reject the non-compliant data.
It can be restrictive on data type, null values, and/or length.
Perl only Date language/ Date For the validation of date formats containing string such as
format 25 Dec 2007, use the Date Language field and to
distinguish the way months and days are ordered, use the
Date format field.
Check all columns from Select this check box if all checks are to be carried out on
schema all columns of the base schema.
Type In Perl, select the check box of the column of which you
want the data type to be verified against the base schema
definition.
In Java, select the type of which the data should be. This
validation is mandatory for all columns.
Null (empty or zero) Select this check box to verify the nullability of the column
against the base schema definition.
Max length Select this check box to verify the data length against the
base schema length definition
Usage This component is an intermediary step in the flow allowing to exclude from the main
flow the non-compliant data. This component cannot be a start component as it
requires an input flow. It also requires at least one output component to gather the
validated flow, and possibly a second output component for rejected data using
Rejects link. For more information, see Rejects in Talend Open Studio User Guide.

Scenario: Validating dates against schema (java)
This very basic scenario shows how to check the type, null value and length of an incoming flow
against a defined reference schema. The incoming flow comes from a simple csv file which contains
heterogeneous data including wrong data type, data exceeding max length, wrong id and null values
in non-nullable columns.
The output is double as the valid data are gathered into a dedicated delimited file, whereas the
rejected data are displayed on the console.
• Drop the following components: tFileInputDelimited, tSchemaComplianceCheck,

tFileOutputDelimited, tLogRow from the Palette to the design workspace.
• Right-click on the tFileInputDelimited to connect it to the tSchemaComplianceCheck
using a row main link.
• Then right-click on the tSchemaComplianceCheck component and select Row > Main to
connect it to the tFileOutputDelimited. This output flow will gather the valid data.
• Right-click again on the tSchemaComplianceCheck component and, this time, select Row
> Rejects link to connect it to the tLogRow component. This second output flow will gather
the non-compliant data.
• Select the Rejects link that you just connected, and notice that the schema passed through
to the tLogRow contains already two columns: ErrorCode and ErrorMessage. These two
column are read-only and provides information about the rejected data, easing the error
handling and troubleshooting if need be.
• Now define the properties of each component.

• On the Component view of the tFileInputDelimited component, leave the default

parameters for a delimited file, and fill in the File Name browsing to the input file. In this
example: in.csv
• Click Edit Schema to describe the data structure of this input file. The schema is made of
six columns: id, Col2Label, Col3Label, Date, Val and Str_Nullable.
• Leave the Type field as permissive as possible (especially in Java). You will define the actual
type of the data on the tSchemaComplianceCheck Component view.
• The Str_nullable column is the only nullable data. So select the relevant check box.
Col2Label and Col3Label are length-limited to respectively 30 and 40 characters.
• Then double-click on tSchemaComplianceCheck component to set the validation
parameters.

• The Base schema should be automatically propagated from the input component. If not,
click on the Sync columns button.
• In this example, we use the Checked columns table to set the validation parameters. But you
could also select the Use another schema for compliance check check box and define the
schema description of the expected data.
• Select the Check all columns from schema check box if you want to check all the columns
against all parameters (type, null values and length).
• On the Checked Columns table, set the checks to be performed. Below is the setting for a
job designed in Talend Open Studio with Java generation language:
- The Id type should be Int.
- The Col2Label and Col3Label should be checked for their Length
- The Date should of type Date
- The Val column should be checked for null values (as it should not be null).
- The Str_nullable should be checked for null values also (but it can be null).
• In Perl, select the relevant Type check boxes for the columns which the type needs to be
selected (Date). Also, define the Date format and Date language.
• Then set the output components parameters.
• The schema is propagated automatically from the tSchemaComplianceCheck to the output
components.
• Define the output file path on the tFileOutputDelimited properties.
• Then on the Run tab, select the Statistics check box to display the flow dispatching rates
before launching the execution.

• You can notice that three rows from the input flow were validated and therefore were
directed to the tFileOutputDelimited.
• Also seven rows were not compliant with the reference schema, and therefore were directed
to the console.
You can notice that the extra output columns in the Rejects link provide the error code as well as
an error message such as: 2|Date:Date format not valid or 2|id:wrong type in order to ease the
identification of the error.

tUniqRow
tUniqRow
tUniqRow Properties
Function Compares entries and removes the first encountered duplicate from the
input flow.
Purpose Ensures data quality of input or output flow in a Job.
Click Sync columns to retrieve the schema from
the previous component connected in the Job.
if you want the deduplication to be

carried out on particular columns,
define them on the schema.
User Guide.
Case sensitive Select this check box to consider the lower or
upper case.
Limitation n/a
Scenario: Unduplicating entries
Based on the tSortRow job, the tUniqRow component is added to the Job in order to have unique
entries in the output flow. In fact, as the input data is randomly created, duplication cannot be
avoided.

tUniqRow
• On the Basic settings tab panel of the tUniqRow component, click Edit Schema... to set the
Key on Names field to have unique entries in the output flow on this criteria.
• Select the Case Sensitive check box to differentiate lower case and upper case.
• Press F6 to run the Job again. The console displays the sorted and unique results

Databases components
This chapter details the main components which you can find in the Databases family of the Talend
The Databases family comprises the most popular database connectors. These connectors cover
various needs, including: opening connection, reading and writing tables, committing transactions
as a whole, as well as performing rollback for error handling. More than 40 RDBMS are supported.
tAccessInput
tAccessInput
tAccessInput properties
Component family Databases/Access
Function tAccessInput reads a database and extracts fields based on a query.

Purpose tAccessInput executes a DB query with a strictly defined statement which
must correspond to the schema definition. Then it passes on the field list to the
next component via a Main row link.
Basic settings Property type Either Built-in or Repository
Click this icon to open a database connection wizard
and store the database connection parameters you set
in the component Basic settings view.
database connection parameters, see Setting up a DB
connection of Talend Open Studio User Guide.
Password
User Guide.
Query type and Query Enter your DB query paying particularly attention to
schema definition.
Trim all the Select this check box to remove leading and trailing
String/Char columns whitespace from all the String/Char columns.
Trim column Remove leading and trailing whitespace from defined
columns.

tAccessInput
component level.
Usage This component offers the flexibility benefit of the DB query and covers all
possibilities of SQL queries.
Related scenarios
For related topic, see tDBInput scenarios:

• Scenario 1: Displaying selected data from DB table on page 253
• Scenario 2: Using StoreSQLQuery variable on page 254
Related topic in description of tContextLoad on page 1048.

tAccessOutput
tAccessOutput
tAccessOutput properties
Function tAccessOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tAccessOutput executes the action defined on the table and/or on the data
contained in the table, based on the flow incoming from the preceding
component in the Job.
Password
Drop and create a table: The table is removed and
created again.
created.
does not exist.
Drop a table if exists and create: The table is
removed if it already exists and created again.
Clear a table: The table content is deleted.

tAccessOutput
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are
found, Job stops.
Update: Make changes to existing entries.
Insert or update: Add entries or update existing
ones.
Update or insert: Update existing entries or create it
if non existing.
Delete: Remove entries corresponding to the input
flow.
You must specify at least one column as a

primary key on which the Update and
Delete operations are based. You can do that by
clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the
Advanced settings view where you can
simultaneously define primary keys for the
update and delete operations. To do that: Select
the Use field options check box and then in the
Key in update column, select the check boxes
next to the column name on which you want to
base the update operation. Do the same in the
Key in delete column for the deletion operation.
Studio User Guide.
Commit every Number of rows to be completed before committing
batches of rows together into the DB. This option
ensures transaction quality (but not rollback) and,
above all, better performance at executions.
Additional Columns This option is not offered if you create (with or
without drop) the DB table. This option allows you to
call SQL functions to perform actions on columns,
which are not insert, nor update or delete actions, or
action that require particular preprocessing.

tAccessOutput
Name: Type in the name of the schema column to be

altered or inserted as new column
SQL expression: Type in the SQL statement to be
executed in order to alter or insert the relevant column
data.
Position: Select Before, Replace or After following
the action to be performed on the reference column.
Reference column: Type in a column of reference
that the tDBOutput can use to place or replace the
new or altered column.
Use field options Select this check box to customize a request,
especially when there is double action on data.
component level.
Usage This component offers the flexibility benefit of the DB query and covers all of
the SQL queries possible.
This component must be used as an output component. It allows you to carry
out actions on a table or on the data of a table in a Access database. It also
allows you to create a reject flow using a Row > Rejects link to
filSchemaSchemater data in error. For an example of tMySqlOutput in use,
see Scenario 3: Retrieve data in error with a Reject link on page 483
Related scenarios

• tDBOutput Scenario: Displaying DB output on page 258
• tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table on page
476.

tAccessRow
tAccessRow
tAccessRow properties
Function tAccessRow is the specific component for this database query. It executes the
SQL query stated onto the specified database. The row suffix means the
component implements a flow in the job design although it doesn’t provide
output.
Purpose Depending on the nature of the query and the database, tAccessRow acts on
the actual DB structure or on the data (although without handling data). The
SQLBuilder tool helps you write easily your SQL statements.
DB Version Select the Access database version that you are using.
Password
User Guide.
Query type Either Built-in or Repository.
Built-in: Fill in manually the query statement or build
it graphically using SQLBuilder
Repository: Select the relevant query stored in the
Repository. The Query field gets accordingly filled in.
Query Enter your DB query paying particularly attention to
schema definition.
ensures transaction quality (but not rollback) and
above all better performance on executions.

tAccessRow
Encoding Type Select the encoding from the list or select Custom and
data handling.
Advanced settings Propagate QUERY’s Select this check box to insert the result of the query
recordset into a COLUMN of the current flow. Select this
column from the use column list.
Encoding Type Select the encoding from the list or select Custom
and define it manually. This field is compulsory for
DB data handling.
Use Select this checkbox if you want to query the database
PreparedStatement using a PreparedStatement. In the Set
PreparedStatement Parameter table, define the
parameters represented by “?” in the SQL instruction
of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the
SQL instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to exe-

cute the same query several times. Perfor-
mance levels are increased
component level.
Related scenarios

• tDBSQLRow Scenario: Resetting a DB auto-increment on page 262
• tMySQLRow Scenario 1: Removing and regenerating a MySQL table index on page 502.

tAS400Close
tAS400Close
tAS400Close properties
Component family Databases/AS400
Function tAS400Close closes the transaction committed in the connected DB.

Purpose Close a transaction.
Basic settings Component list Select the tAS400Connection component in the list if
Job.
component level.
Usage This component is to be used along with AS400 components, especially with
tAS400Connection and tAS400Commit.
Limitation n/a
Related scenario

tAS400Commit
tAS400Commit
tAS400Commit Properties
This component is closely related to tAS400Connection and tAS400Rollback. It usually doesn’t

make much sense to use these components independently in a transaction.
Function Validates the data processed through the Job into the connected DB.
Purpose Using a unique connection, this component commits in one go a global
transaction instead of doing that on every row or every batch and thus provides
gain in performance.
Job.
Close Connection This check box is selected by default. It allows you to
performed its task.

connection to link tAS400Commit to your
row. In this case, do not select the Close
component level.
tAS400Connection and tAS400Rollback components.
Limitation n/a
Related scenario
This component is closely related to tAS400Connection and tAS400Rollback. It usually doesn’t

make much sense to use one of these without using a tAS400Connection component to open a
connection for the current transaction.
For tAS400Commit related scenario, see tMysqlConnection on page 460.

tAS400Connection
tAS400Connection
tAS400Connection Properties
This component is closely related to tAS400Commit and tAS400Rollback. It usually doesn’t make
much sense to use one of the components without using a tAS400Connection component to open
a connection for the current transaction.
Function Opens a connection to the database for a current transaction.

Purpose This component allows you to commit all of the Job data to an output database
in just a single transaction, once the data has been validated.
DB Version Select the AS400 version in use
Host Database server IP address
Password
data handling.
Use or register a Select this check box to share your connection or
shared DB fetch a connection shared by a parent or child Job.
Connection This allows you to share one single DB connection
among several DB connection components from
different Job levels that can be either parent or child.
Shared DB Connection Name: set or type in the
shared connection name.
Advanced settings Auto commit Select this check box to automatically commit a
transaction when it is completed.
metadata at a Job level as well as at each component
level.
Usage This component is to be used along with AS400components, especially with
tAS400Commit and tAS400Rollback components.
Limitation n/a

tAS400Connection
Related scenario
This component is closely related to tAS400Commit and tAS400Rollback. It usually doesn’t make
much sense to use one of these without using a tAS400Connection component to open a connection
For tAS400Connection related scenario, see tMysqlConnection on page 460.

tAS400Input
tAS400Input
tAS400Input properties
Function tAS400Input reads a database and extracts fields based on a query.

Purpose tAS400SInput executes a DB query with a strictly defined statement which
DB Version Select the AS 400 version in use
Password
User Guide.
schema definition.

tAS400Input
columns.
component level.
Related scenarios
For related topic, see tDBInput scenarios:

Related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert on page 1049.

tAS400LastInsertId
tAS400LastInsertId
tAS400LastInsertId properties
Component family Databases
Function tAS400LastInsertId fetches the last inserted ID from a selected AS400

Connection.
Purpose tAS400LastInsertId obtains the primary key value of the record that was last
inserted in an AS400 table by a user.
Built-in: You create and store the schema locally for
Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and job flow charts. Related topic: Setting a
Repository schema of Talend Open Studio User
Guide.
Component list Select the relevant tAS400Connection component in
the list if more than one connection is planned for the
current job.
Usage This component is to be used as an intermediary component.
Limitation n/a
Related scenario
For a related scenario, see Scenario: Get the ID for the last inserted record on page 467 of the
tMysqlLastInsertId component.

tAS400Output
tAS400Output
tAS400Output properties
Function tAS400Output writes, updates, makes changes or suppresses entries in a

database.
Purpose tAS400Output executes the action defined on the table and/or on the data
Password
created again.
created.
does not exist.

tAS400Output
found, Job stops.
Update: Make changes to existing entries
ones.
if non existing
flow.
It is necessary to specify at least one

column as a primary key on which the
Update and Delete operations are based. You
can do that by clicking Edit Schema and
selecting the check box(es) next to the column(s)
you want to set as primary key(s). For an
advanced use, click the Advanced settings view
where you can simultaneously define primary
keys for the Update and Delete operations. To do
that: Select the Use field options check box and
then in the Key in update column, select the
check boxes next to the column names you want
to use as a base for the Update operation. Do the
same in the Key in delete column for the Delete
operation.
Studio User Guide.

tAS400Output
Use commit control Select this check box to have access to the Commit
every field where you can define the commit
operation.
Commit every: Enter the number of rows to be

completed before committing batches of rows
together into the DB. This option ensures transaction
quality (but not rollback) and, above all, better
performance at execution.
data.
component level.
out actions on a table or on the data of a table in a AS400 database. It also
allows you to create a reject flow using a Row > Rejects link to filter data in
error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data
in error with a Reject link on page 483
Related scenarios
For related topics, see

476.

tAS400Rollback
tAS400Rollback
tAS400Rollback properties
This component is closely related to tAS400Commit and tAS400Connection. It usually does not
Function tAS400Rollback cancels the transaction committed in the connected DB.

Purpose Avoids involuntary commitment of part of a transaction.
Job.
Close Connection Clear this check box to continue to use the selected
task.
component level.
tAS400Connection and tAS400Commit.
Limitation n/a
Related scenarios
For tAS400Rollback related scenario, see Scenario: Rollback from inserting data in
mother/daughter tables on page 498 of the tMysqlRollback.

tAS400Row
tAS400Row
tAS400Row properties
Function tAS400Row is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tAS400Row acts on the
actual DB structure or on the data (although without handling data). The
connection tAS400Connection component on the Component
defined.

nent corresponding to the database you are using.
Password

tAS400Row

User Guide.
schema definition.
Propagate QUERY’s Select this check box to insert the result of the query
SQL instruction.

DB data handling.

tAS400Row
component level.
Related scenarios


tCreateTable
tCreateTable
You can find this component at the root of Databases group of the Palette of Talend Open Studio.
tCreateTable covers needs related indirectly to the use of any database.
tCreateTable Properties
Function tCreateTable creates, drops and creates or clear the specified table.
Purpose This Java specific component helps create or drop any database table
properties are stored. The fields that follow are
Database Type Select the DBMS type from the list.
The component properties may differ slightly
according to the database type selected from
the list.
Table Action Select the action to be carried out on the database
among:
Create table: when you know already that the table
doesn’t exist.
Create table when not exists: when you don’t know
whether the table is already created or not
Drop and create table: when you know that the table
exists already and needs to be replaced.
MySQL Temporary table Select this check box if you want to save the created
table temporarily.

tCreateTable
MSSQLServer, Use an existing Select this check box in case you use a database
MySQL, Oracle, connection connection component, for example:
PostgresPlus, tMysqlConnection or tOracleConnection, etc.
Postgresql
Oracle Connection Type Drop-down list of available drivers.
Access Access File Name and path of the file to be processed.
Related topic:Defining variables from the Component
view of Talend Open Studio User Guide.
Firebird Firebird File Name and path of the file to be processed.
Interbase Interbase File Name and path of the file to be processed.
SQLite SQLite File Name and path of the file to be processed.
JavaDb Framework Type Select from the list a framework for your database.
HSQLDb Running Mode Select from the list the Server Mode that correspond
to your DB setup.
HSQLDb Use TLS/SSL Sockets Select this check box to enable the secured mode, if
required.
AS400/Oracle DB Version Select the database version in use.
All database types Host Database server IP address
except Access, JavaDb,
SQLite and ODBC
All database types Database name Name of the database.
except Access, Firebird,
HSQLDb, SQLite and
ODBC
JavaDb DB Root Path Browse to your database root.

tCreateTable
All database types Port Listening port number of the DB server.

except Access, AS400,
Firebird, Interbase,
JavaDb, SQLite and
ODBC
HSQLDb DB Alias Name of the database.
Informix DB Server Name of the database server.
ODBC ODBC Name Name of the database.
Password
Table name Type in between quotes a name for the newly created
table.
Reset the DB type by clicking the relevant
button, to make sure data type is correct.
in the Repository, hence can be reused in various Jobs
and projects. Related topic: Setting a Repository
Advanced settings Encoding Select the encoding from the list or select Custom
DB data handling.
tStatcatcher Statistics Select this check box to gather the job processing
level.
AS400/ MSSQL Server Additional JDBC Specify additional connection properties for the DB
Parameters connection you are creating. This option is not
possibilities of SQL queries. More scenarios are available for specific DB
Input components
Scenario: Creating new table in a Mysql Database
The Job described below aims at creating a table in a database, made of a dummy schema taken from
a delimited file schema stored in the Repository. This Job is composed of a single component.

tCreateTable
• Drop a tCreateTable component from the Databases family in the Palette to the design
workspace.
• In the Basic settings view, and from the Database Type list, select Mysql for this scenario.
• From the Table Action list, select Create table.

• Select the Use Existing Connection check box only if you are using a dedicated DB
connection component, see tMysqlConnection on page 460. In this example, we won’t use
this option.
• In the Property type field, select Repository so that the connection fields that follow are
automatically filled in. If you have not defined your DB connection metadata in the DB
connection directory under the Metadata node, fill in the details manually as Built-in.
• In the Table Name field, fill in a name for the table to be created.
• If you want to retrieve the Schema from the Metadata (it doesn’t need to be a DB connection
Schema metadata), select Repository then the relevant entry.
• In any case (Built-in or Repository) click Edit Schema to check the data type mapping.

tCreateTable
• Click the Reset DB Types button in case the DB type column is empty or shows
discrepancies (marked in orange). This allows you to map any data type to the relevant DB
data type.
The table is created empty but with all columns defined in the Schema.

tDB2BulkExec
tDB2BulkExec
tDB2BulkExec properties
Function tDB2BulkExec executes the Insert action on the data provided.

Purpose As a dedicated component, tDB2BulkExec allows gains in performance
during Insert operations to a DB2 database.
connection tDB2Connection component on the Component
defined.

Table Schema Name of the DB schema.
Password

tDB2BulkExec
created again.
Create table: The table does not exist and gets
created.
Create table if not exists: The table is created if it
does not exist.
Drop table if exists and create: The table is removed
if it already exists and created again.
Clear table: The table content is deleted.
Built-in: You create the schema and store it locally
built-in schema of Talend Open Studio User Guide.
stored it in the Repository, hence can reuse it. Related
topic: Setting a Repository schema of Talend Open
Studio User Guide.
Data file Name of the file to be processed.
found, Job stops.
ones.
if non existing
flow.
Advanced settings Field terminated by Character, string or regular expression to separate
fields.
Date Format Use this field to define the way months and days are
ordered.
Time Format Use this field to define the way hours, minutes and
seconds are ordered.
Timestamp Format Use this field to define the way date and time are
ordered.
Remove load pending When the box is ticked, tables blocked in "pending"
status following a bulk load are de-blocked..
Load options Click + to add data loading options:
Parameter: select a loading parameter from the list.
Value: enter a value for the parameter selected.

tDB2BulkExec
component level.
Usage This dedicated component offers performance and flexibility of DB2 query
handling.
Related scenarios
For tDB2BulkExec related topics, see:

• tMysqlOutputBulkExec Scenario: Inserting transformed data in MySQL database on page
490.
• tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB on page 538.

tDB2Close
tDB2Close
tDB2Close properties
Function tDB2Close closes the transaction committed in the connected DB.

Basic settings Component list Select the tDB2Connection component in the list if
Job.
component level.
Usage This component is to be used along with DB2 components, especially with
tDB2Connection and tDB2Commit.
Limitation n/a
Related scenario

tDB2Commit
tDB2Commit
tDB2Commit Properties
This component is closely related to tDB2Connection and tDB2Rollback. It usually doesn’t make
much sense to use these components independently in a transaction.
Job.
performed its task.

component level.
tDB2Connection and tDB2Rollback components.
Limitation n/a
Related scenario
This component is closely related to tDB2Connection and tDB2Rollback. It usually doesn’t make
much sense to use one of these without using a tDB2Connection component to open a connection
For tDB2Commit related scenario, see tMysqlConnection on page 460.

tDB2Connection
tDB2Connection
tDB2Connection properties
This component is closely related to tDB2Commit and tDB2Rollback. It usually does not make
much sense to use one of these without using a tDB2Connection to open a connection for the
current transaction.
Function tDB2Connection opens a connection to the database for a current transaction.

in just a single transaction, once the data has been validated..
Host name Database server IP address.
Table Schema Name of the schema.
Password
level.
tDB2Commit and tDB2Rollback.
Limitation n/a

tDB2Connection
Related scenarios
This component is closely related to tDB2Commit and tDB2Rollback. It usually does not make
much sense to use one of these without using a tDB2Connection component to open a connection
For tDB2Connection related scenario, see tMysqlConnection on page 460.

tDB2Input
tDB2Input
tDB2Input properties
Function tDB2Input reads a database and extracts fields based on a query.

Purpose tDB2Input executes a DB query with a strictly defined order which must
correspond to the schema definition. Then it passes on the field list to the next
connection tDB2Connection component on the Component list
to reuse the connection details you already defined.

Password

tDB2Input

Studio User Guide.
schema definition.
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a DB2 database.
Related scenarios
For related topics, see tDBInput scenarios:

See also the related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert on
page 1049.

tDB2Output
tDB2Output
tDB2Output properties
Function tDB2Output writes, updates, makes changes or suppresses entries in a

database.
Purpose tDB2Output executes the action defined on the table and/or on the data

Password

tDB2Output

created again.
created.
does not exist.
found, Job stops.
ones.
if non existing
flow.

Key in delete column for the deletion operation
Studio User Guide.

tDB2Output
Commit every Enter the number of rows to be completed before
committing batches of rows together into the DB.
This option ensures transaction quality (but not
rollback) and, above all, better performance at
execution.
data.
component level.
out actions on a table or on the data of a table in a DB2 database. It also allows
you to create a reject flow using a Row > Rejects link to filter data in error. For
an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error
with a Reject link on page 483
Related scenarios
For tDB2Output related topics, see

476.

tDB2Rollback
tDB2Rollback
tDB2Rollback properties
This component is closely related to tDB2Commit and tDB2Connection. It usually does not make
much sense to use these components independently in a transaction.
Function tDB2Rollback cancels the transaction committed in the connected DB.

Purpose Avoids to commit part of a transaction involuntarily.
Job.
task.
component level.
tDB2Connection and tDB2Commit.
Limitation n/a
Related scenarios
For tDB2Rollback related scenario, see Scenario: Rollback from inserting data in mother/daughter
tables on page 498 of the tMysqlRollback.

tDB2Row
tDB2Row
tDB2Row properties
Function tDB2Row is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tDB2Row acts on the

Password

tDB2Row

User Guide.
schema definition.
SQL instruction.

DB data handling.
component level.

tDB2Row
Related scenarios
For tDB2Row related topics, see:


tDB2SCD
tDB2SCD
tDB2SCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tDB2SCD on page 2.

tDB2SCDELT
tDB2SCDELT
tDB2SCDELT belongs to two component families: Business Intelligence and Databases. For more
information on it, see tDB2SCD on page 2.

tDB2SP
tDB2SP
tDB2SP properties
Function tDB2SP calls the database stored procedure.

Purpose tDB2SP offers a convenient way to centralize multiple or complex queries in
a database and call them easily.
Password
User Guide.
SP Name Type in the exact name of the Stored Procedure
Is Function / Return Check this box, if a value only is to be returned.
result in Select on the list the schema column, the value to be
returned is based on.

tDB2SP
Parameters Click the Plus button and select the various Schema
Columns that will be required by the procedures.
Note that the SP schema can hold more columns than
there are parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter
OUT: Output parameter/return value
IN OUT: Input parameters is to be returned as value,
likely after modification through the procedure
(function).
RECORDSET: Input parameters is to be returned as
a set of values, rather than single value.
Check the tParseRecordSet component if you
want to analyze a set of records from a database
table or DB query and return single records.
data handling.
Usage This component is used as intermediary component. It can be used as start
component but only input parameters are thus allowed.
Limitation The Stored Procedures syntax should match the Database syntax.
Related scenarios
For related topic, see tMysqlSP Scenario: Finding a State Label using a stored procedure on page
512.
Check as well the tParseRecordSet component if you want to analyze a set of records from a
database table or DB query and return single records.

tDBInput
tDBInput
tDBInput properties
Component family Databases/DB

Generic
Function tDBInput reads a database and extracts fields based on a query.

Purpose tDBInput executes a DB query with a strictly defined order which must
For performance reasons, a specific Input component (e.g.:
tMySQLInput for MySQL database) should always be preferred to the
generic component.
Connection type Drop-down list of available DBMS drivers.
Password
User Guide.
schema definition.

tDBInput
columns.
component level.
possibilities of SQL queries using a generic ODBC connection.
Scenario 1: Displaying selected data from DB table
The following scenario creates a two-component Job, reading data from a database using a DB query
and outputting delimited data into the standard output (console).
• Drop a tDBInput and tLogRow component from the Palette to the design workspace.
• Right-click on the tDBInput component and select Row > Main. Drag this main row link
onto the tLogRow component and release when the plug symbol displays.
• Double-click the tDBInput so the Component view shows up, and define the properties:
• The component property data are Built-In for this scenario.

• Fill in the database name, the username and password in the corresponding fields.
• The schema is Built-In.This means that it is available for this Job and on this station only.
• Click on Edit Schema and create a 2-column description including shop code and sales.
• Enter the table name in the corresponding field.

tDBInput
• Type in the query making sure it includes all columns in the same order as defined in the
Schema. In this case, as we’ll select all columns of the schema, the asterisk symbol makes
sense.
• Click on the second component to define it.
• Enter the fields separator. In this case, a pipe separator.
• Now go to the Run tab, and click on Run to execute the Job.
The DB is parsed and queried data is extracted from the specified table and passed on to the job log
console. You can view the output file straight on the console.
Scenario 2: Using StoreSQLQuery variable
StoreSQLQuery is a variable that can be used to debug a tDBInput scenario which does not operate
correctly. It allows you to dynamically feed the SQL query set in your tDBInput component.
• Use the same scenario as scenario 1 above and add a third component, tJava.
• Connect tDBInput to tJava using a trigger connection of the OnComponentOk type. In
this case, we want the tDBInput to run before the tJava component.
• Set both tDBInput and tLogRow component as in tDBInput scenario 1.

• Click anywhere on the design workspace to display the Contexts property panel.
• Create a new parameter called explicitly StoreSQLQuery. Enter a default value of 1. This
value of 1 means the StoreSQLQuery is “true” for a use in the QUERY global variable.

tDBInput
• Click on the tJava component to display the Component view. Enter the
System.Out.println(“”)command to display the query content, press Ctrl+Space
bar to access the variable list and select the global variable QUERY.
• Go to your Run tab and execute the Job.

• The query entered in the tDBInput component shows at the end of the job results, on the log:

tDBOutput
tDBOutput
tDBOutput properties
Function tDBOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tDBOutput executes the action defined on the table and/or on the data
Specific Output component should always be preferred to generic com-
ponent.
Connection type List of available drivers.
Password
created again.
created.
does not exist.

tDBOutput
found, Job stops.
ones.
if non existing
flow.

operation.
Studio User Guide.
execution.

tDBOutput

data.
Position: Select Before, Replace or After depending
on the action to be performed on the reference
column.
component level.
out actions on a table or on the data of a table in a database. It also allows you
to create a reject flow using a Row > Rejects link to filter data in error. For
Scenario: Displaying DB output
This following scenario is a three-component Job aiming at creating a new table in the database
defined and filling it with data. The tFileInputdelimited passes on the Input flow to the tDBoutput
component. As the content of a DB is not viewable as such, a tLogRow component is used to display
the main flow on the Run console.
• Drop the three components required for this Job from the Palette to the design workspace.
• On the Basic settings tab of tFileInputDelimited, define the input flow parameters. In this
use case, the file contains cars’ owner id, makes, color and registration references organized
as follows: semi-colon as field separator, carriage return as row separator. The input file
contains a header row to be considered in the schema. If this file is already described in your
metadata, you can retrieve the properties by selecting the relevant repository entry list.

tDBOutput
• And also, if your schema is already loaded in the Repository, select Repository as Schema
and choose the relevant metadata entry in the list. If you haven’t defined the schema already,
define the data structure in the built-in schema you edit.
• Restrict the extraction to 10 lines, for this example.
• Then define the tDBOutput component to configure the output flow. Select the database to
connect to. Note that you can store all the database connection details in different context
variables. For more information about how to create and use context variables Defining
Contexts and variables in Talend Open Studio User Guide.
• Fill in the table name in the Table field. Then select the operations to be performed:
• As Action on table, select Drop and create table in the list. This allows you to overwrite
the possible existing table with the new selected data. Alternatively you can insert only extra
rows into an existing table, but note that duplicate management is not supported natively. See
tUniqRow Properties on page 203 for further information.
• As Action on data, select Insert. The data flow incoming as input will be thus added to the
selected table.
• To view the output flow easily, connect the DBOuput component to an tLogRow
component. Define the field separator as a pipe symbol. Press F6 to execute the Job.

tDBOutput
• As the processing can take some time to reach the tLogRow component, we recommend you
to enable the Statistics functionality on the Run console.
Related topic: tMysqlOutput properties on page 472

tDBSQLRow
tDBSQLRow
tDBSQLRow properties
Component family Databases/DB

Generic
Function tDBSQLRow is the generic component for database query. It executes the
output.
For performance reasons, specific DB component should always be pre-
ferred to the generic component.
Purpose Depending on the nature of the query and the database, tDBSQLRow acts on
Password
User Guide.
schema definition.

tDBSQLRow

SQL instruction.

DB data handling.
component level.
Note: Use the relevant DBRow component according to the DB type you use.
Most of databases have their specific DBRow components.
Scenario: Resetting a DB auto-increment
This scenario describes a single component Job which aims at re-initializing the DB auto-increment
to 1. This job has no output and is generally to be used before running a script.
• Drag and drop a tDBSQLRow component from the Palette to the design workspace.
• On the Basic settings panel, fill in the DB connection properties.

tDBSQLRow
• The general connection information to the database is stored in the Repository. The
Database Driver is a generic ODBC driver.
• The Schema is built-in for this Job and describes the Talend database structure. The schema
doesn’t really matter for this particular instance of Job as the action is made on the table
auto-increment and not on data.
• The Query type is also built-in. Click on the three dot button to launch the SQLbuilder
editor, or else type in directly in the Query area:
Alter table <TableName> auto_increment = 1
• Then click OK to validate the Basic settings. Then press F6 to run the Job.
The database autoincrement is reset to 1.
Related topics: tMysqlRow properties on page 500.

tEXAInput
tEXAInput
tEXAInput properties
Component Family Databases/EXA
Function tEXAInput reads databases and extracts fields using queries.

Purpose tEXAInput executes queries in databases according to a strict order which
must correspond exactly to that defined in the schema..The list of fields
retrieved is then transmitted to the following component via a Main row link.
Basic settings Property type Built-in or Repository
Built-in: No properties stored centrally
Port Listening port number of the DB server
Schema name Enter the schema name.
Password
next component. The schema is either Built-in or
schema of Talend Open Studio User Guide
User Guide
Table Name Enter the table name.
Query type and Query Enter your database query, taking care to ensure that
the order of the fields corressponds exactly to that
defined in the schema.
Guess Query Click this button to generate a query that corresponds
to your table schema in the Query field.
Guess schema Click this button to retrieve the schema from the table.

tEXAInput
Encoding Type Select the encoding type from the list or else select
Custom and define it manually. This is an obligatory
field.
Trim all the Select this check box to delete the spaces at the start
String/Char columns and end of fields in all of the columns containing
strings.
Trim column Deletes the spaces from the start and end of fields in
the selected columns.
tStatCatcher Statistics Select this check box to collect the log data and a
component level.
Usage This component covers all possible SQL queries for EXA databases.
Related scenarios
For scenaris in which tEXAInput might be used, see the following tBIInput scenarios:

tEXAOutput
tEXAOutput
tEXAOutput properties
Famille de composant Databases/EXA
Function tEXAOutput writes, updates, modifies or deletes data from databases.

Purpose tEXAOutput executes the action defined on the table and/or on the table data,
depending on the function of the input flow, from the preceding component.
Built-in: No properties stored centrally.
Port Listening port number of the DB serve.
Username and User authentication data.
Password
Table Name of the table to be created. You can only create
one table at a time.
created again.
created.
does not exist.
Clear a table: The table content is deleted..

tEXAOutput
found, Job stops.
ones.
if non existing
flow.

Delete operations are based. You can do
that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click
the Advanced settings view where you can
Key in delete column for the deletion operation .
Studio User Guide.
Advanced settings Use commit control Select this box to display the Commit every field in
which you can define the number of rows to be
processed brefore comitting.
Additional JDBC Specify additional connection properties for the DB
field.

tEXAOutput
Commit every Number of rows to be included in the batch before the

data is written. This option guarantees the quality of
the transaction (although there is no rollback option)
and improves performance.
action that require particular preprocessing
Name: Enter the name of the column to be modified
or inserted.
SQL expression: Enter the SQL expression to be
executed to modify or insert data in the corresponding
columns.
Position : Select Before, Replace or After,
depending on the action to be carried out on the
reference column.
particularly when there are several actions to be
carried out on the data.
Enable debug mode Select this check box to display each step of the
process by which the data is written in the database.
component level.
out actions on a table or on the data of a table in an EXA database. It also allows
a user scenario, see the tMySqlOutput example, Scenario 3: Retrieve data in
error with a Reject link on page 483.
Related scenario
For a scenario in which tEXAOutput might be used, see:

• the tDBOutput Scenario: Displaying DB output on page 258
• the tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table on
page 476

tEXARow
tEXARow
tEXARow properties
Component Family Databases/EXA
Function The tEXARow component is specific to this type of database. It executes SQL
queries on specified databases. The Row suffix indicates that it is used to
channel a flow in a Job although it doesn’t produce any output data.
Purpose Depending on the nature of the query and the database, tEXARow acts on the
actual structure of the database, or indeed the data, although without modifying
them.
Basic settings Property type Built-in ou Repository
Port Listening port number of the DB serve.
Username and User authentication information.
Password
User Guide
Table Name Name of the table to be processed.
Query type Built-in or Repository.
Built-in: Enter the query manually or with the help of
the SQLBuilder.
Repository: Select the appropriate query from the
Repository. The Query field is then completed
automatically.
Guess Query Click the Guess Query button to generate the query
that corresponds to the table schema in the Query
field.
Query Enter your query, taking care to ensure that the field
order matches that defined in the schema.

tEXARow
connection check box in the Basic settings..
Propagate QUERY’s Select this check box to insert the query results in one
recordset of the flow columns. Select the particular column
from the use column list.
field.
Commit every Number of rows to be included in the batch before the
data is written. This option guarantees the quality of
the transaction (although there is no rollback option)
and improves performance.
component level.
Usage This component offers query flexibility as it covers all possible SQL query
requirements.
Related scenarios
For a scenario in which tEXARow might be used, see:

• the tDBSQLRow Scenario: Resetting a DB auto-increment on page 262
• the tMySQLRow Scenario 1: Removing and regenerating a MySQL table index on page
502

tEXistConnection
tEXistConnection
tEXistConnection properties
This component is closely related to tEXistGet and tEXistPut. Once you have set the connection
properties in this component, you have the option of reusing the connection without having to set
the properties again for each tEXist component used in the Job.
Component family Databases/eXist
Function tEXistConnection opens a connection to an eXist database in order that a

transaction may be carried out.
Purpose Opens a connection to an eXist database in order that a transaction may be
carried out.
URI URI of the database you want to connect to.
Collection Enter the path to the collection of interest on the
database server.
Driver This field is automatically populated with the
standard driver.
Users can enter a different driver, depending
on their needs.
Password
level.
Usage This component is to be used along with the other tEXist components such as
tEXistGet and tEXistPut.
Limitation n/a
Related scenarios
This component is closely related to tEXistGet and tEXistPut. It usually does not make much sense
to use one of these without using a tEXistConnection component to open a connection for the
current transaction.
For tEXistConnection related scenario, see tMysqlConnection on page 460.

tEXistDelete
tEXistDelete
tEXistDelete properties
Function This component deletes resources from an eXist database.

Purpose tEXistDelete deletes specified resources from remote eXist databases.
Basic settings Use an existing Select this check box and click the relevant
connection/Compon tEXistConnection component on the
ent List Component list to reuse the connection details
you already defined.
When a Job contains the parent Job and
the child Job, Component list presents
only the connection components in the
same Job level, so if you need to use an existing
connection from the other level, make sure that
the available connection components are sharing
the intended connection.
For more information on how to share a DB con-
nection across Job levels, see Use or register a
shared DB connection in any database connec-
tion component corresponding to the database
you are using.
Otherwise, you can as well deactivate the con-
nection components and use Dynamic settings
of the component to specify the intended connec-
tion manually. In this case, make sure the connec-
tion name is unique and distinctive all over
through the two Job levels. For more information
about Dynamic settings, see your studio user
guide.
URI URI of the database you want to connect to..
database server.
standard driver.
Users can enter a different driver, depend-
ing on their needs.
Password
Target Type Either Resource, Collection, or All.
Files Click the plus button to add the lines you want to
use as filters:
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
component level.

tEXistDelete
Usage This component is typically used as a single component sub-job but can
also be used as an output or end object.
Limitation n/a
Related scenario

tEXistGet
tEXistGet
tEXistGet properties
Function This component retrieves resources from a remote DB server.

Purpose tEXistGet downloads selected resources from a remote DB server to a
defined local directory.

you are using.
guide.
database server.
standard driver.
ing on their needs.
Password
Local directory Path to the file’s destination location.
use as filters:
wildcharacters (*) or regular expressions

tEXistGet
component level.
Limitation n/a
Related scenario

tEXistList
tEXistList
tEXistList properties
Function This component lists the resources stored on a remote DB server.

Purpose tEXistList lists the resources stored on a remote database server.

you are using.
guide.
database server.
standard driver.
ing on their needs.
Username and Server authentication information.
Password
use as filters:.
wildcharacters (*) or regular expressions..
Target Type Either Resource, Collection or All contents:

tEXistList
component level.
Usage This component is typically used along with a tEXistGet component to
retrieve the files listed, for example.
Limitation n/a
Related scenarios
For a tEXistList related scenario, see Scenario: Iterating on a remote directory on page 929

tEXistPut
tEXistPut
tEXistPut properties
Function This component uploads resources to a DB server.

Purpose tEXistPut uploads specified files from a defined local directory to a
remote DB server.

you are using.
guide.
Collection Enter a path to indicate where the resource is to be
saved on the server.
standard driver.
ing on their needs.
Password
Local directory Path to the source location of the file(s).
use as filters:.

tEXistPut
component level.
Limitation n/a
Related scenario

tEXistXQuery
tEXistXQuery
tEXistXQuery properties
Function This component uses local files containing XPath queries to query XML
files stored on remote databases.
Purpose tEXistXQuery queries XML files located on remote databases and
outputs the results to an XML file stored locally.

you are using.
guide.
Collection Enter the path to the XML file location on the
database.
standard driver.
ing on their needs.
Username and DB server authentication information.
Password
XQuery Input File Browse to the local file containing the query to
be executed.
Local Output Browse to the directory in which the query results
should be saved.

tEXistXQuery
component level.
Usage This component is typically used as a single component Job but can also
be used as part of a more complex Job.
Limitation n/a
Related scenario

tEXistXUpdate
tEXistXUpdate
tEXistXUpdate properties
Function This component processes XML file records and updates the records on
the DB server.
Purpose tEXistXUpdate processes XML file records and updates the existing
records on the DB server..

you are using.
guide.
Collection Enter the path to the collection and file of interest
on the database server.
standard driver.
ing on their needs.
Username and DB server authentication information.
Password
Update File Browse to the local file in the local directory to be
used to update the records on the database.
component level.

tEXistXUpdate
also be used as part of a more complex Job.
Limitation n/a
Related scenario

tFirebirdClose
tFirebirdClose
tFirebirdClose properties
Component family Databases/Firebird
Function tFirebirdClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tFirebirdConnection component in the
list if more than one connection are planned for the
current Job.
component level.
Usage This component is to be used along with Firebird components, especially with
tFirebirdConnection and tFirebirdCommit.
Limitation n/a
Related scenario

tFirebirdCommit
tFirebirdCommit
tFirebirdCommit Properties
This component is closely related to tFirebirdConnection and tFirebirdRollback. It usually

doesn’t make much sense to use these components independently in a transaction.
Basic settings Component list Select the tFirebirdConnection component in the
current Job.
performed its task.

component level.
tFirebirdConnection and tFirebirdRollback components.
Limitation n/a
Related scenario
This component is closely related to tFirebirdConnection and tFirebirdRollback. It usually

doesn’t make much sense to use one of these without using a tFirebirdConnection component to
open a connection for the current transaction.
For tFirebirdCommit related scenario, see tMysqlConnection on page 460.

tFirebirdConnection
tFirebirdConnection
tFirebirdConnection properties
This component is closely related to tFirebirdCommit and tFirebirdRollback. It usually does not
make much sense to use one of these without using a tFirebirdConnection to open a connection
Function tFirebirdConnection opens a connection to the database for a current

transaction.
Password
level.
tFirebirdCommit and tFirebirdRollback.
Limitation n/a
Related scenarios
This component is closely related to tFirebirdCommit and tFirebirdRollback. It usually does not
make much sense to use one of these without using a tFirebirdConnection component to open a

tFirebirdConnection
For tFirebirdConnection related scenario, see tMysqlConnection on page 460.

tFirebirdInput
tFirebirdInput
tFirebirdInput properties
Component family Databases/FireBird
Function tFirebirdInput reads a database and extracts fields based on a query.

Purpose tFirebirdInput executes a DB query with a strictly defined order which must
Password
Studio User Guide.
schema definition.
Advanced Settings Encoding Type Select the encoding type from the list or select
Enable stream When selected, helps to decide the row set to work
with at a time and thus optimize performance.
columns.
component level.

tFirebirdInput
Usage This component covers all possibilities of SQL queries onto a FireBird
database.
Related scenarios
For related topics, see generic tDBInput scenarios:

See also related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert on page
1049.

tFirebirdOutput
tFirebirdOutput
tFirebirdOutput properties
Function tFirebirdOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tFirebirdOutput executes the action defined on the table and/or on the data
Password
created again.
created.
does not exist.

tFirebirdOutput
found, Job stops.
ones.
if non existing
flow.

Key in delete column for the deletion operation
Studio User Guide.
execution.

tFirebirdOutput

data.
component level.
out actions on a table or on the data of a table in a Firebird database. It also
Related scenarios

476.

tFirebirdRow
tFirebirdRow
tFirebirdRow properties
Function tFirebirdRow is the specific component for this database query. It executes
the SQL query stated onto the specified database. The row suffix means the
output.
Purpose Depending on the nature of the query and the database, tFirebirdRow acts on
connection tFirebirdConnection component on the
Component list to reuse the connection details you
already defined.

Password

tFirebirdRow

User Guide.
schema definition.
SQL instruction.

DB data handling.
component level.

tFirebirdRow
Related scenarios


tGreenplumBulkExec
tGreenplumBulkExec
tGreenplumBulkExec Properties
tGreenplumOutputBulk and tGreenplumlBulkExec components are generally used together, to

generate the output file which will be used as a parameter in the SQL query execution. This
execution in two steps is unified in the tGreenplumOutputBulkExec component, which details are
given in a different section. The point of offering two separated components is to enable you to do
transformations before changing data in the database.
Component Family Databases/Greenplum
Function tGreenplumBulkExec performs an Insert action on the data.

Purpose tGreenplumBulkExec is a component which is specifically designed to
improve performance when loading data in ParAccel database.
Use an existing Select this check box if you use a configured
connection tGreenplumConnection.

Schema Exact name of the schema.
Password

tGreenplumBulkExec

created again.
created.
does not exist.
Filename Path and name of the file to be processed.
User Guide.
Advanced settings Action on data Select the operation you want to perform:
Bulk insert
Bulk update
The details asked will be different according to the
action chosen.
Copy the OID for each Retrieve the ID item for each row.
row
Contains a header line Specify that the table contains header.
with the names of each
column in the file
File type Select the file type to process.
Null string String displayed to indicate that the value is null.
Fields terminated by Character, string or regular expression to separate
fields.
Escape char Character of the row to be escaped
Text enclosure Character used to enclose text.
Force not null for Define the columns nullability
columns Force not null:: Select the check box next to the
column you want to define as not null.
component level.

tGreenplumBulkExec
Usage This component is generally used with a tGreenplumOutputBulk

component. Used together they offer gains in performance while feeding a
Greenplum database.
Related scenarios
For more information about tGreenplumBulkExec, see:

• Scenario: Inserting transformed data in MySQL database on page 490 from
tMysqlOutputBulk.
• Scenario: Inserting data in MySQL database on page 495 from tMysqlOutputBulkExec.
• Scenario: Truncating and inserting file data into Oracle DB on page 538 from
tOracleBulkExec.

tGreenplumClose
tGreenplumClose
tGreenplumClose properties
Function tGreenplumClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tGreenplumConnection component in the
current Job.
component level.
Usage This component is to be used along with Greenplum components, especially
with tGreenplumConnection and tGreenplumCommit.
Limitation n/a
Related scenario

tGreenplumCommit
tGreenplumCommit
tGreenplumCommit Properties
This component is closely related to tGreenplumConnection and tGreenplumRollback. It usually

current Job.
performed its task.

component level.
with tGreenplumConnection and tGreenplumRollback components.
Limitation n/a
Related scenario
This component is closely related to tGreenplumConnection and tGreenplumRollback. It usually

doesn’t make much sense to use one of these without using a tGreenplumConnection component
to open a connection for the current transaction.
For tGreenplumCommit related scenario, see tMysqlConnection on page 460.

tGreenplumConnection
tGreenplumConnection properties
This component is closely related to tGreenplumCommit and tGreenplumRollback. It usually

does not make much sense to use one of these without using a tGreenplumConnection to open a
Function tGreenplumConnection opens a connection to the database for a current

transaction.
Password
level.
with tGreenplumCommit and tGreenplumRollback.
Limitation n/a

Related scenarios
This component is closely related to tGreenplumCommit and tGreenplumRollback. It usually

does not make much sense to use one of these without using a tGreenplumConnection component
For tGreenplumConnection related scenario, see tMysqlConnection on page 460.

tGreenplumInput
tGreenplumInput
tGreenplumInput properties
Function tGreenplumInput reads a database and extracts fields based on a query.

Purpose tGreenplumInput executes a DB query with a strictly defined order which
Password
User Guide.
schema definition.
which corresponds to your table schema in the Query
field.

tGreenplumInput
Guess schema Click the Guess schema button to retrieve the table
schema.
Use cursor When selected, helps to decide the row set to work
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a FireBird
database.
Related scenarios
For related topics, see generic tDBInput scenarios:

See also related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert on page
1049.

tGreenplumOutput
tGreenplumOutput
tGreenplumOutput Properties
Function tGreenplumOutput writes, updates, modifies or deletes the data in a

database.
Purpose tGreenplumOutput executes the action defined on the table and/or on the
data of a table, according to the input flow form the previous component.
connection tGreenplumConnection.

Password

tGreenplumOutput

created again.
created.
does not exist.
found, Job stops.
ones.
if non existing
flow.

Update and Delete operations are based.
You can do that by clicking Edit Schema and
operation.
Studio User Guide.

tGreenplumOutput
execution.
data.
component level.
Usage This component covers all possibilities of SQL queries onto a database. It
allows you to do actions on a table or the data in a table in a Greenplum
database. It enables you to create a reject flow, with a Row > Rejects link
filtering the data in error. For a usage example, see Scenario 3: Retrieve data
in error with a Reject link on page 483 from component tMysqlOutput.
Related scenarios
For a related scenario, see:

• Scenario: Displaying DB output on page 258 from tDBOutput.
• Scenario 1: Adding a new column and altering data in a DB table on page 476 from
tMySQLOutput.

tGreenplumOutputBulk
tGreenplumOutputBulk properties
tGreenplumOutputBulk and tGreenplumBulkExec components are used together to first output

the file that will be then used as parameter to execute the SQL query stated. These two steps
compose the tGreenplumOutputBulkExec component, detailed in a separate section. The interest
the data loading.
Function Writes a file with columns based on the defined delimiter and the Greenplum
standards
Purpose Prepares the file to be used as parameter in the INSERT query to feed the
Greenplum database.
Properties are stored. When selected, the fields to
follow are pre-filled in using fetched data.
File Name Name of the file to be processed.
Related topic: Defining variables from the
Component view of Talend Open Studio User
Guide.
Append Select this check box to add the new rows at the end
of the records
User Guide.
in the Repository, hence can be reused in various
projects and job designs. Related topic: Setting a
Guide.
Advanced settings Row separator String (ex: “\n”on Unix) to distinguish rows.
fields.
Include header Select this check to include the column header.

DB data handling.
tStaCatcher statistics Select this check box to collect log data at the
component level.
Usage This component is to be used along with tGreenplumBulkExec component.
Used together they offer gains in performance while feeding a Greenplum
database.
Related scenarios
For uses cases in relation with tGreenplumOutputBulk, see the following scenarios:
• tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database on page 490
• tMysqlOutputBulkExec Scenario: Inserting data in MySQL database on page 495

tGreenplumOutputBulkExec
tGreenplumOutputBulkExec properties
Function Executes the action on the data provided.

Purpose As a dedicated component, it allows gains in performance during Insert
operations to a Greenplum database.
Database name Name of the database.
Password
Table Name of the table to be written.
Note that only one table can be written at a time and
that the table must exist for the insert operation to
succeed.
created again.
created.
does not exist.


projects and job flowcharts. Related topic: Setting a
Guide.
Advanced settings Action on data Select the operation you want to perform:
Bulk insert
Bulk update
action chosen.
Copy the OID for each Retrieve the ID item for each row.
row
Contains a header line Specify that the table contains header.
with the names of each
column in the file
File type Select the file type to process.
Row separator String (ex: “\n”on Unix) to distinguish rows.
fields.
Force not null for Define the columns nullability
columns Force not null: Select the check box next to the
column you want to define as not null.
tStatCatcherStatistics Select this check box to collect log data at the
component level.
on the data to be loaded onto the database.
Limitation n/a
Related scenarios
For uses cases in relation with tGreenplumOutputBulkExec, see the following scenarios:

tGreenplumRollback
tGreenplumRollback
tGreenplumRollback properties
This component is closely related to tGreenplumCommit and tGreenplumConnection. It usually

does not make much sense to use these components independently in a transaction.
Function tGreenplumRollback cancels the transaction committed in the connected

DB.
current Job.
task.
component level.
with tGreenplumConnection and tGreenplumCommit.
Limitation n/a
Related scenarios
For tGreenplumRollback related scenario, see Scenario: Rollback from inserting data in
mother/daughter tables on page 498.

tGreenplumRow
tGreenplumRow
tGreenplumRow Properties
Function tGreenplumRow is the specific component for this database query. It executes
output.
Purpose Depending on the nature of the query and the database, tGreenplumRow acts
on the actual DB structure or on the data (although without handling data). The
already defined.

Schema Exct name of the schema.
Username et DB user authentication data.
Password

tGreenplumRow

User Guide.
Table Name Name of the table to be read.
it graphically using SQLBuilder.
Repository. The Query field gets accordingly filled
in.
field.
schema definition.
represented by “?” in the SQL instruction of the
Query field in the Basic Settings tab.
SQL instruction.

DB data handling.

tGreenplumRow

component level.
Related scenarios

• Scenario: Resetting a DB auto-increment on page 262 from tDBSQLRow.
• Scenario 1: Removing and regenerating a MySQL table index on page 502 from
tMySQLRow.

tGreenplumSCD
tGreenplumSCD
tGreenplumSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tGreenplumSCD on page 7.

tHiveClose
tHiveClose
tHiveClose properties
Component Family Databases/Hive
Function tHiveClose closes an active connection to a database.

Purpose This component closes connection to a Hive databases.
Basic settings Component list If there is more than one connection used in the Job,
select tHiveConnection from the list.
component level.
Usage This component is generally used as an input component. It requires an output
component.
Limitation n/a
Related scenario
This component is for use with tHiveConnection. It is generally used along with tHiveConnection
as the latter allows you to open a connection for the transaction which is underway.
For a scenario in which tHiveClose might be used, see tMysqlConnection on page 460.

tHiveConnection
tHiveConnection
tHiveConnection properties
Database Family Databases/Hive
Function tIHiveConnection opens a connection to a database in order that a transaction

may be made.
Basic settings Property type Built-in or Repository.
Repository: Select the repository file where the
Properties are stored. The fields that follow are filled
in automatically using the fetched data.
Password
component level.
Usage This component is generally used with other Informix components,
particularly tHiveClose.
Limitation n/a
Related scenario
For a scenario in which tHiveConnection, might be used, see Scenario: Inserting data in

tHiveRow
tHiveRow
tHiveRow properties
Component family Databases/Hive
Function tHiveRow is the dedicated component for this database. It executes the SQL
query stated in the specified database. The row suffix means the component
implements a flow in the job design although it doesn’t provide output.
Purpose Depending on the nature of the query and the database, tHiveRow acts on the
SQLBuilder tool helps you write your SQL statements easily.
connection tHiveConnection component from the Component
defined.

in automatically using the fetched data.
Password

tHiveRow

User Guide.
field.
schema definition.
DB data handling.
component level.
Usage This component offers the benefit of flexible DB queries and covers all
possible Hive QL queries.
Related scenarios


tHSQLDbInput
tHSQLDbInput
tHSQLDbInput properties
Component family Databases/HSQLDb
Function tHSQLDbInput reads a database and extracts fields based on a query.

Purpose tHSQLDbInput executes a DB query with a strictly defined order which must
Running Mode Select on the list the Server Mode corresponding to
your DB setup among the four propositions :
HSQLDb Server, HSQLDb WebServer, HSQLDb
In Process Persistent, HSQLDb In Memory.
Use TLS/SSL sockets select this check box to enable the secured mode if
required.
Database Alias Alias name of the database
Password
User Guide.
schema definition.

tHSQLDbInput
Advanced settings Trim all the Select this check box to remove leading and trailing
columns.
component level.
Usage This component covers all possibilities of SQL queries onto an HSQLDb
database.
Related scenarios


tHSQLDbOutput
tHSQLDbOutput
tHSQLDbOutput properties
Function tHSQLDbOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tHSQLDbOutput executes the action defined on the table and/or on the data
your DB setupamong the four propositions :
Use TLS/SSL sockets Select this check box to enable the secured mode if
required.
Password

tHSQLDbOutput
created again.
created.
does not exist.
found, Job stops.
ones.
if non existing
flow.

Update and Delete operations are based.
You can do that by clicking Edit Schema and
operation.

Studio User Guide.

tHSQLDbOutput
execution.
data.
component level.
out actions on a table or on the data of a table in a MySQL database. It also
Related scenarios

476.

tHSQLDbRow
tHSQLDbRow
tHSQLDbRow properties
Function tHSQLDbRow is the specific component for this database query. It executes
output.
Purpose Depending on the nature of the query and the database, tHSQLDbRow acts on
your DB setup among the four propositions :
Use TLS/SSL sockets Select this check box to enable the secured mode if
required.
Database Alias Name of the database
Password
User Guide.

tHSQLDbRow

schema definition.
DB data handling.
component level.
Related scenarios


tInformixBulkExec
tInformixBulkExec
tInformixBulkExec Properties
Component Family Databases/Informix
Function tInformixBulkExec executes Insert operations on the data supplied.

Purpose tInformixBulkExec is a dedicated component which improves performance
during Insert operations in Informix databases.
Execution Platform Select the operating system you are using.
connection tInformixBulkExec component on the Component
defined.

Password
file.

tInformixBulkExec

created again.
created.
does not exist.
User Guide.
Guide.
Informix Directory Indicate the access path to your Informix directory.
Action on data On the data of the table defined, you can perform the
Insert: Add new data to the table. If duplicates are
found, the job stops.
Update: Update the existing table data.
Insert or update: Add data or update the existing
data.
Update or insert : Update the existing entries or
create them if they do not already exist.
Delete: Delete the entry data which corresponds to the
input flow.
You must specify at least one key upon

which the Update and Delete operations
are to be based. It is possible to define the
columns which should be used as the key from the
schema, from both the Basic Settings and the
Advanced Settings, to optimise these
operations.

tInformixBulkExec
Field terminated by Character, string or regular expression which
separates the fields.
Set DBMONEY Select this check box to define the decimal separator
in the Decimal separator field.
Set DBDATE Select the date format that you want to apply.
Rows Before Commit Enter the numbere of rows to be processed before the
commit.
Bad Rows Before Enter the number of rows in error at which point the
Abort Job should stop.
tStatCatcher Statistics Select this check box to colelct the log data at
component level.
Output Where the output should go.
Usage This component offers database query flexibility and covers all possible DB2
queries which may be required.
Scénario associé
For a scenario in which tInformixBulkExec might be used, see:

• the tMysqlOutputBulkExec Scenario: Inserting transformed data in MySQL database on
page 490.
• the tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB on page
538.

tInformixClose
tInformixClose
tInformixClose properties
Function tInformixClose closes an active connection to a database.

Purpose This component closes connection to Informix databases.
Basic settings Component list If there is more than one connection used in the Job,
select tInformixConnection from the list.
component level.
component.
Limitation n/a
Related scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally
used along with tInformixConnection as the latter allows you to open a connection for the
transaction which is underway.
To see a scenario in which tInformixClose might be used, see tMysqlConnection on page 460.

tInformixCommit
tInformixCommit
tInformixCommit properties
This component is closely related to tInformixConnection and tInformixRollback. They are

generally used to execute transactions together.
Function tInformixCommit validates data processed in a job from a connected

database.
Purpose Using a single connection, make a global commit just once instead of
commiting every row or batch of rows separately. This improves performance.
Basic settings Component list If there is more than one connection in the Job, select
tInformixConnection from the list.
Close connection This check box is selected by default. It means that the
database conenction will be closed once the commit
has been made. Clear the check box to continue using
the connection once the component has completed its
task.
If you are using a Row > Main type

connection to link tInformixCommit to
your Job, your data will be committed row
by row. If this is the case, do not select this check
bx otherwise the conenction will be closed before
the commit of your first row is finalized.
component level.
Usage This component is generally used along with Informix components,
particularly tInformixConnection and tInformixRollback.
Limitation n/a
Related Scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally
used along with tInformixConnection as the latter allows you to open a connection for the
transaction which is underway
To see a scenario in which tInformixCommit might be used, see tMysqlConnection on page 460.

tInformixConnection
tInformixConnection
tInformixConnection properties
This component is closely related to tInformixCommit and tInformixRollback. They are

generally used along with tInformixConnection, with tInformixConnection opening the
connection for the transaction.
Database Family Databases/Informix
Function tInformixConnection opens a connection to a database in order that a

transaction may be made.
Basic settings Property type Built-in ou Repository.
Schema Name of the schema
Password
file.
Advanced settings Use Transaction Clear this check box when the database is configured
in NO_LOG. mode. If the check box is selected, you
can choose whether to activate the Auto Commit
option.
component level.

tInformixConnection
Usage This component is generally used with other Informix components,

particularly tInformixCommit and tInformixRollback.
Limitation n/a
Related scenario
For a scenario in which the tInformixConnection, might be used, see Scenario: Inserting data in

tInformixInput
tInformixInput
tInformixInput properties
Component family Databases/Informix
Function tInformixInput reads a database and extracts fields based on a query.

Purpose tInformixInput executes a DB query with a strictly defined order which must
DB server Name of the database server
Password
User Guide.
schema definition.
data handling.
Usage This component covers all possibilities of SQL queries onto a DB2 database.

tInformixInput
Related scenarios

See also the tContextLoad Scenario: Dynamic context use in MySQL DB insert on page 1049.

tInformixOutput
tInformixOutput
tInformixOutput properties
Function tInformixOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tInformixOutput executes the action defined on the table and/or on the data
DB server Name of the database server
Password
created again.
created.
does not exist.
removed if it already exists and created again..

tInformixOutput
found, Job stops.
ones.
if non existing
flow.

operation.
Studio User Guide.
Extend Insert Select this check box to carry out a bulk insert of a
definable set of lines instead of inserting lines one by
one. The gain in system performance is huge.
Number of rows per insert:: enter the number of

rows to be inserted as one block. Note that too high
value decreases performance due to memory issues.

tInformixOutput
data.
Use Batch Size When selected, enables you to define the number of
lines in each processed batch.
Optimize the batch Ensure the check box is selected, to optimize the
insertion insertion of batches of data.
component level.
out actions on a table or on the data of a table in a Informix database. It also
Related scenarios
For tInformixOutput related topics, see

476.

tInformixOutputBulk
tInformixOutputBulk
tInformixOutputBulk properties
tInformixOutputBulk and tInformixBulkExec are generally used together to generate an output

file for use as a parameter in the execution of an SQL query. This two step execution is unified in
the tInformixOutputBulkExec component, detailed in another section. The advantage of using
two components is that the transformations can be processed before the data is loaded in the
database.
Function Writes a file composed of columns, based on a defined delimiter and on

Informix standards.
Purpose Prepares the file to be used as a parmameter in the INSERT query used to feed
Informix databases.
Built-in: No property data stored centrally
Guide.
Append Select this check box to append new rows to the end
of the file.
User Guide.
Guide.
Field separator Character, string or regular expression used to
separate fields
Set DBMONEY Select this box if you want to define the decimal
separator in the corresponding field.
Set DBDATE Select the date format that you want to apply.

tInformixOutputBulk
Create directory if not This check box is selected automatically. The option
exists allows you to create a folder for the output file if it
doesn’t already exist.
Custom the flush Select this box in order to customize the memory size
buffer size used to store the data temporarily. In the Row
number field enter the number of rows at which
point the memory should be freed.
DB data handling.
component level.
Usage This component is generally used along with tInformixBulkExec. Together,
they improve performance levels when adding data to an Informix database.
Related scenario
For a scenario in which tInformixOutputBulk might be used, see:

• the tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database on page
490.
• the tMysqlOutputBulkExec Scenario: Inserting data in MySQL database on page 495.

tInformixOutputBulkExec
tInformixOutputBulkExec properties
Function tInformixOutputBulkExec carries out Insert operations using the data

provided.
Purpose tInformixOutputBulkExec is a dedicated componant which improves
performance during Insert operations in Informix databases.
Basic settings Property Type Built-in ou Repository.
No properties stored centrally
follow are pre-filled in using fetched data
Execution platform Select the operating system you are using.
Use an existing Select the check box and choose the appropriate
connection tInformixConnection component from the list to use
pre-defined connection parameters.

Password
file.


table can be written at a time and the table must
already exist for the insert operation to be authorised.
created again.
created.
does not exist.
Schema et Edit A schema is a row description, i.e., it defines the
User Guide.
Guide.
Informix Directory Indicate the access path to your Informix directory.
Append Select this check box to add rows to the end of the file.
Action on data Select the operation you want to perform:
Bulk insert
Bulk update
action chosen.
Fields terminated by Character, string or regular expression used to
separate the fields
Set DBMONEY Select this check box to define the decimal separator
used in the corresponding field.
Set DBDATE Select the date format you want to apply.

Rows Before Commit Enter the number of rows to be processed before the
commit.
Bad Rows Before Enter the number of rows in error at which point the
Abort Job should stop.
Create directory if not This check box is selected by default. It creates a
exists directory to hold the output table if required.
Custom the flush Select this box in order to customize the memory size
buffer size used to store the data temporarily. In the Row
number field enter the number of rows at which point
the memory should be freed.
DB data handling.
component level.
Output Where the output should go.
Usage This component is generally used when no particular transformation is
required on the data to be inserted in the database.
Limitation n/a
Related scenario
For a scenarion in which tInformixOutputBulkExec might be used, see:

• the tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database on page
490.
• the tMysqlOutputBulkExec Scenario: Inserting data in MySQL database on page 495.

tInformixRollback
tInformixRollback
tInformixRollback properties
This component is closely related to tInformixCommit and tInformixConnection. They are

generally used together to execute transactions.
Famille de composant Databases/Informix
Function tInformixRollback cancels transactions in connected databases.

Purpose This component prevents involuntary transaction commits.
Basic settings Component list Select the tInformixConnection component from
the list if you plan to add more than one connection to
the Job.
Close Connection Clear this checkbox if you want to continue to use the
connection once the component has completed its
task.
component level.
Usage This component must be used with other Informix components, particularly
tInformixConnection and tInformixCommit.
Limitation n/a
Related Scenario
For a scenario in which tInformixRollback might be used, see the tMysqlRollback Scenario:
Rollback from inserting data in mother/daughter tables on page 498.

tInformixRow
tInformixRow
tInformixRow properties
Function tInformixRow is the specific component for this database query. It executes
output.
Purpose Depending on the nature of the query and the database, tInformixRow acts on
connection tInformixConnection component on the
already defined.

Password

tInformixRow

Studio User Guide.
schema definition.
SQL instruction.

DB data handling.

tInformixRow
component level.
Related scenarios


tInformixSCD
tInformixSCD
The tInformixSCD component belongs to two different families: Business Intelligence and Databases.
For further information, see tInformixSCD on page 9.

tInformixSP
tInformixSP
tInformixSP properties
Function tInformixSP calls procedures stored in a database.

Purpose tInformixSP allows you to centralise multiple and complex queries in a
database and enables you to call them more easily.
Use an existing Select the check box and choose the appropriate
connection tInformixConnection component from the list to use
pre-defined connection parameters.

Password
file.

tInformixSP
Schema and Edit In a stored procedure, a schema is an input parameter.

Schema A schema is a row description, i.e., it defines the
Guide.
SP Name Enter the exact name of the stored procedure (SP).
Is Function / Return Select this check box if only one value must be
result in returned.
From the list, select the the schema column upon
which the value to be obtained is based.
there are paramaters used in the procedure.
IN: Input parameter
(function).
Use Transaction Clear this check box if the database is configured in
the NO_LOG mode.
tStatCatcher Statistics Select this check box to collect log data at a
component level.
Usage This is an intermediary component? It can also be used as an entry component.
In this case, only the entry parameters are authorized.
Limitation The stored procedure syntax must correspond to that of the database.
Related scenario
For a scenarion in which tInformixSP may be used, see:

tInformixSP
• the tMysqlSP Scenario: Finding a State Label using a stored procedure on page 512.
• the tOracleSP Scenario: Checking number format using a stored procedure on page 564
Also, see the tParseRecordSet on page 591 if you want to analyse a set of records in a table or SQL
query.

tIngresClose
tIngresClose
tIngresClose properties
Component family Databases/Ingres
Function tIngresClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tIngresConnection component in the list if
Job.
component level.
Usage This component is to be used along with Ingres components, especially with
tIngresConnection and tIngresCommit.
Limitation n/a
Related scenario

tIngresCommit
tIngresCommit
tIngresCommit Properties
This component is closely related to tIngresConnection and tIngresRollback. It usually does not
Function Validates the data processed through the Job into the connected DB
Job.
performed its task.

component level.
tIngresConnection and tIngresRollback.
Limitation n/a
Related scenario
For tIngresCommit related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tIngresConnection
tIngresConnection
tIngresConnection Properties
This component is closely related to tIngresCommit and tIngresRollback. It usually does not
make much sense to use one of these without using a tIngresConnection component to open a

Password
tIngresCommit and tIngresRollback.
Limitation n/a
Related scenarios
For tIngresConnection related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tIngresInput
tIngresInput
tIngresInput properties
Function tIngresInput reads a database and extracts fields based on a query.

Purpose tIngresInput executes a DB query with a strictly defined order which must
Server Database server IP address
Password
User Guide.
schema definition.
columns.

tIngresInput
component level.
Usage This component covers all possibilities of SQL queries onto an Ingres
database.
Related scenarios

See also, the tContextLoad Scenario: Dynamic context use in MySQL DB insert on page 1049.

tIngresOutput
tIngresOutput
tIngresOutput properties
Function tIngresOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tIngresOutput executes the action defined on the table and/or on the data
Password
created again.
created.
does not exist.

tIngresOutput
found, Job stops.
ones.
if non existing
flow.

operation..
Studio User Guide.
execution.

tIngresOutput

data.
component level.
out actions on a table or on the data of a table in a Ingres database. It also allows
Related scenarios

476.

tIngresRollback
tIngresRollback
tIngresRollback properties
This component is closely related to tIngresCommit and tIngresConnection. It usually does not
Function tIngresRollback cancels the transaction committed in the connected DB.

Job.
task.
component level.
tIngresConnection and tIngresCommit.
Limitation n/a
Related scenarios
For tIngresRollback related scenario, see Scenario: Rollback from inserting data in

tIngresRow
tIngresRow
tIngresRow properties
Function tIngresRow is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tIngresRow acts on the
Password
Studio User Guide.
schema definition.

tIngresRow
Advanced Settings Propagate QUERY’s Select this check box to insert the result of the query
SQL instruction.

DB data handling.
component level.
Related scenarios


tIngresSCD
tIngresSCD
tIngresSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tIngresSCD on page 11.

tInterbaseClose
tInterbaseClose
tInterbaseClose properties
Component family Databases/Interbase
Function tInterbaseClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tInterbaseConnection component in the
current Job.
component level.
Usage This component is to be used along with Interbase components, especially with
tInterbaseConnection and tInterbaseCommit.
Limitation n/a
Related scenario

tInterbaseCommit
tInterbaseCommit
tInterbaseCommit Properties
This component is closely related to tInterbaseConnection and tInterbaseRollback. It usually

doesn’t make much sense to use JDBC components independently in a transaction.
current Job.
task.

component level.
the tInterbaseConnection and tInterbaseRollback components.
Limitation n/a
Related scenario
This component is closely related to tInterbaseConnection and tInterbaseRollback. It usually

doesn’t make much sense to use JDBC components without using the tInterbaseConnection
component to open a connection for the current transaction.
For tInterbaseCommit related scenario, see tMysqlConnection on page 460.

tInterbaseConnection
tInterbaseConnection properties
This component is closely related to tInterbaseCommit and tInterbaseRollback. It usually does

not make much sense to use one of these without using a tInterbaseConnection to open a
Function tInterbaseConnection opens a connection to the database for a current

transaction.
Password
level.
tInterbaseCommit and tInterbaseRollback.
Limitation n/a
Related scenarios
This component is closely related to tInterbaseCommit and tInterbaseRollback. It usually does

not make much sense to use one of these without using a tInterbaseConnection component to open

For tInterbaseConnection related scenario, see tMysqlConnection on page 460.

tInterbaseInput
tInterbaseInput
tInterbaseInput properties
Function tInterbaseInput reads a database and extracts fields based on a query.

Purpose tInterbaseInput executes a DB query with a strictly defined order which must
Password
User Guide.
schema definition.
Advanced settings Encoding Type Select the encoding from the list or select Custom and
data handling.
columns.

tInterbaseInput
component level.
Usage This component covers all possibilities of SQL queries onto an Interbase
database.
Related scenarios

page 1049.

tInterbaseOutput
tInterbaseOutput
tInterbaseOutput properties
Function tInterbaseOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tInterbaseOutput executes the action defined on the table and/or on the data
Password
created again.
created.
does not exist.
if it already exists and created again..

tInterbaseOutput
found, Job stops.
ones.
if non existing
flow.

operation..
Clear data in table Wipes out data from the selected table before action.
Studio User Guide.
execution.

tInterbaseOutput

data.
component level.
out actions on a table or on the data of a table in a Interbase database. It also
Related scenarios

476.

tInterbaseRollback
tInterbaseRollback
tInterbaseRollback properties
This component is closely related to tInterbaseCommit and tInterbaseConnection. It usually

does not make much sense to use these components independently in a transaction.
Function tInterbaseRollback cancels the transaction committed in the connected DB.

current Job.
task.
component level.
tInterbaseConnection and tInterbaseCommit.
Limitation n/a
Related scenarios
For tInterbaseRollback related scenario, see Scenario: Rollback from inserting data in

tInterbaseRow
tInterbaseRow
tInterbaseRow properties
Function tInterbaseRow is the specific component for this database query. It executes
output.
Purpose Depending on the nature of the query and the database, tInterbaseRow acts on
connection tInterbaseConnection component on the
already defined.

Password

tInterbaseRow

User Guide.
schema definition.
SQL instruction.

DB data handling.
component level.

tInterbaseRow
Related scenarios


tJavaDBInput
tJavaDBInput
tJavaDBInput properties
Component family Databases/JavaDB
Function tJavaDBInput reads a database and extracts fields based on a query.

Purpose tJavaDBInput executes a DB query with a strictly defined order which must
Framework Select your Java database framework on the list
DB root path Browse to your database root.
Password
User Guide.
schema definition.
data handling.

tJavaDBInput

columns.
component level.
Usage This component covers all possibilities of SQL queries onto a database.
Related scenarios

page 1049.

tJavaDBOutput
tJavaDBOutput
tJavaDBOutput properties
Function tJavaDBOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tJavaDBOutput executes the action defined on the table and/or on the data
Password
created again.
created.
does not exist.
if it already exists and created again.

tJavaDBOutput
found, Job stops.
ones.
if non existing
flow.

operation.
Studio User Guide.
execution.

tJavaDBOutput

data.
component level.
out actions on a table or on the data of a table in a Java database. It also allows
Related scenarios

476.

tJavaDBRow
tJavaDBRow
tJavaDBRow properties
Function tJavaDBRow is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tJavaDBRow acts on
Password
User Guide.
schema definition.

tJavaDBRow
SQL instruction.

DB data handling.
component level.
Related scenarios


tJDBCColumnList
tJDBCColumnList
tJDBCColumnList Properties
Component family Databases/JDBC
Function Iterates on all columns of a given table through a defined JDBC connection.
Purpose Lists all column names of a given JDBC table.
Basic settings Component list Select the tJDBCConnection component in the list if
Job.
Table name Enter the name of the tabe.
Usage This component is to be used along with JDBC components, especially with
tJDBCConnection.
Limitation n/a
Related scenario
For tJDBCColumnList related scenario, see Scenario: Iterating on a DB table and listing its
column names on page 455.

tJDBCClose
tJDBCClose
tJDBCClose properties
Function tJDBCClose closes the transaction committed in the connected DB.

Job.
component level.
tJDBCConnection and tJDBCCommit.
Limitation n/a
Related scenario

tJDBCCommit
tJDBCCommit
tJDBCCommit Properties
This component is closely related to tJDBCConnection and tJDBCRollback. It usually doesn’t

make much sense to use JDBC components independently in a transaction.
Job.
performed its task.

component level.
the tJDBCConnection and tJDBCRollback components.
Limitation n/a
Related scenario
This component is closely related to tJDBCConnection and tJDBCRollback. It usually doesn’t

make much sense to use JDBC components without using the tJDBCConnection component to
For tJDBCCommit related scenario, see tMysqlConnection on page 460.

tJDBCConnection
tJDBCConnection
tJDBCConnection Properties
This component is closely related to tJDBCCommit and tJDBCRollback. It usually doesn’t make
much sense to use one of JDBC components without using the tJDBCConnection component to

Basic settings
JDBC URL Enter the JDBC URL to connect to the desired DB.
For example, enter: jdbc:mysql://IP
address/database name to connect to a mysql
database.
Driver JAR Select from the drop-down list a desired available
driver, or download one from a local directory
through clicking the three dots [...] button.
Driver Class Enter the driver class related o your connection. For
example, enter com.mysql.jdbc.Driver as a driver
class to connect to a mysql database.
Username and Enter your DB authentication data.
Password
Encoding type Select the encoding type from the list or select
Advanced settings Use Auto-Commit Select this check box to display the Auto Commit
check box. Select it to activate auto commit mode.
Once you clear the Use Auto-Commit check box, the
auto-commit statement will be removed from the
codes.
component level.
the tJDBCCommit and tJDBCRollback components.
Limitation n/a

tJDBCConnection
Related scenario
This component is closely related to tJDBCCommit and tJDBCRollback. It usually doesn’t make
much sense to use one of JDBC components without using the tJDBCConnection component to
For tJDBCConnection related scenario, see tMysqlConnection on page 460.

tJDBCInput
tJDBCInput
tJDBCInput properties
Function tJDBC reads any database using a JDBC API connection and extracts fields
based on a query.
Purpose tJDBC executes a DB query with a strictly defined order which must
connection tJDBCConnection component on the Component
defined.

JDBC URL Type in the database location path
Driver JAR Select the driver JAR on the list or click the three
button to add a new JAR to the list.
Class Name Type in the Class name to be pointed to in the driver.

tJDBCInput

Password
User Guide.
Table Name Type in the name of the table
schema definition.
data handling.
columns.
component level.
Usage This component covers all possibilities of SQL queries onto any JDBC
connected database.
Related scenarios
Related topic in tDBInput scenarios:


tJDBCOutput
tJDBCOutput
tJDBCOutput properties
Function tJDBCOutput writes, updates, makes changes or suppresses entries in any

type of database connected to a JDBC API.
Purpose tJDBCOutput executes the action defined on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
defined.


tJDBCOutput

Password
found, Job stops.
ones.
if non existing
flow.

Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click
the Advanced settings view where you can
Update and Delete operations. To do that: Select
Key in update column, select the check boxes next
to the column names you want to use as a base for
the Update operation. Do the same in the Key in
delete column for the Delete operation.
Studio User Guide.
execution.

tJDBCOutput

data.
component level.
out actions on a table or on the data of a table in a JDBC database. It also allows
Related scenarios
For tJDBCOutput related topics, see:

476.

tJDBCRollback
tJDBCRollback
tJDBCRollback properties
This component is closely related to tJDBCCommit and tJDBCConnection. It usually doesn’t

make much sense to use JDBC components independently in a transaction..
Function Cancels the transaction committed in the connected DB.

Purpose Avoid commiting part of a transaction accidentally.
Job.
task.
component level.
Usage This component is to be used along with Oracle components, especially with
tOracleConnection and tOracleCommit components.
Limitation n/a
Related scenario
This component is closely related to tJDBCConnection and tJDBCCommit. It usually doesn’t

make much sense to use JDBC components without using the tJDBCConnection component to
For tJDBCRollback related scenario, see tMysqlRollback on page 498.

tJDBCRow
tJDBCRow
tJDBCRow properties
Function tJDBCRow is the component for any type database using a JDBC API. It
executes the SQL query stated onto the specified database. The row suffix
means the component implements a flow in the job design although it doesn’t
provide output.
Purpose Depending on the nature of the query and the database, tJDBCRow acts on the
defined.

Password

tJDBCRow

User Guide.
schema definition.
SQL instruction.

DB data handling.
component level.
Usage This component offers the flexibility benefit of any type DB JDBC connection
and covers all possibilities of SQL queries.
Related scenarios


tJDBCRow

tJDBCSP
tJDBCSP
tJDBCSP Properties
Function tJDBCSP calls the specified database stored procedure.

Purpose tJDBCSP offers a convenient way to centralize multiple or complex queries
in a database and call them easily.
Basic settings JDBC URL Type in the database location path
Password
Schema and Edit In SP principle, the schema is an input parameter.
User Guide.
Is Function / Return Select this check box , if a value only is to be returned.
IN: Input parameter
(function).

tJDBCSP

Related scenario
For related scenarios, see:

• tMysqlSP Scenario: Finding a State Label using a stored procedure on page 512.
• tOracleSP Scenario: Checking number format using a stored procedure on page 564

tJDBCTableList
tJDBCTableList
tJDBCTableList Properties
Function Iterates on a set of table names through a defined JDBC connection.

Purpose Lists the names of a given set of JDBC tables using a select statement based
on a Where clause.
Job.
Where clause for table Enter the Where clause to identify the tables to iterate
name selection on.
tJDBCConnection.
Limitation n/a
Related scenario
For tJDBCTableList related scenario, see Scenario: Iterating on a DB table and listing its column
names on page 455.

tLDAPAttributesInput
tLDAPAttributesInput Properties
Component family Databases/LDAP
Function tLDAPAttributesInput analyses each object found via the LDAP query and lists
a collection of attributes associated with the object.
Purpose tLDAPAttributesInput executes an LDAP query based on the given filter and
corresponding to the schema definition. Then it passes on the field list to the next
Repository: Select the Repository file where Properties
are stored. When selected, the fields to follow are
Host LDAP Directory server IP address
Port Listening port number of server.
Base DN Path to user’s authorised tree leaf.
Protocol Select the protocol type on the list.
LDAP : no encryption is used
LDAPS: secured LDAP. When this option is chosen, the
Advanced CA check box appears. Once selected, the
advanced mode allows you to specify the directory and
the keystore password of the certificate file for storing a
specific CA. However, you can still deactivate this
certificate validation by selecting the Trust all certs
check box.
TLS: certificate is used. When this option is chosen, the
Advanced CA check box appears and is used the same
way as that of the LDAPS type.
Authentication Select the Authentication check box if LDAP login is
User and Password required. Note that the login must match the LDAP
syntax requirement to be valid. e.g.: “cn=Directory
Manager”.
Filter Type in the filter as expected by the LDAP directory db.
Multi valued field Type in the value separator in multi-value fields.
separator
Alias dereferencing Select the option on the list. Never improves search
performance if you are sure that no alias is to be
dereferenced. By default, Always is to be used:
Always: Always dereference aliases
Never: Never dereferences aliases.
Searching:Dereferences aliases only after name
resolution.
Finding: Dereferences aliases only during name
resolution

Referral handling Select the option on the list:

Ignore: does not handle request redirections
Follow:does handle request redirections
Limit Fill in a limit number of records to be read If needed.
Time Limit Fill in a timeout period for the directory. access
Paging Specify the number of entries returned at a time by the
LDAP server.
box to skip the row on error and complete the process for
error-free rows. If needed, you can retrieve the rows on
error via a Row > Rejects link.
Schema and Edit A schema is a row description, i.e., it defines the number
Schema of fields to be processed and passed on to the next
component. The schema is either built-in or remotely
stored in the Repository.
Repository: The schema already exists and is stored in
the Repository, hence can be reused. Related topic:
User Guide.
As this component is intended to list the

attributes associated with a LDAP object, its
schema is then pre-defined. You should retain these
established columns, even though you may need to
add some new columns. Hence you should use the
Built-in mode.
The pre-defined schema lists:
- objectclass: list of object classes
- mandatoryattributes: list of mandatory attributes to
these classes
- optionalattributes: list of optional attributes to these
classes
- objectattributes: list of attributes that are essential for
the analysed object.
Advanced settings Class Definition Root Specify the root of the object class definition namespace.
level.
Usage This component covers all possibilities of LDAP queries.
Note: Press Ctrl + Space bar to access the global variable list, including the
GetResultName variable to retrieve automatically the relevant Base

Related scenario
The tLDAPAttributesInput component follows the usage similar to that of tLDAPInput. Hence
for tLDAPInput related scenario, see Scenario: Displaying LDAP directory’s filtered content on
page 406.

tLDAPInput
tLDAPInput
tLDAPInput Properties
Function tLDAPInput reads a directory and extracts data based on the defined filter.
Purpose tLDAPInput executes an LDAP query based on the given filter and corresponding
to the schema definition. Then it passes on the field list to the next component via
a Main row link.
are stored. When selected, the fields to follow are
Click this icon to open a database connection wizard and
store the database connection parameters you set in the
Host LDAP Directory server IP address
Base DN Path to user’s authorised tree leaf.
check box.
TLS: certificate is used When this option is chosen, the
Authentication Select the Authentication check box if LDAP login is
User and Password required. Note that the login must match the LDAP
syntax requirement to be valid. e.g.: “cn=Directory
Manager”.
Filter Type in the filter as expected by the LDAP directory db.
Multi valued field Type in the value separator in multi-value fields.
separator

tLDAPInput
resolution.
resolution
Limit Fill in a limit number of records to be read If needed.
Time Limit Fill in a timeout period for the directory. access
Paging Specify the number of entries returned at a time by the
LDAP server.
error via a Row > Rejects link.
User Guide.
GetResultName variable to retrieve automatically the relevant Base
Scenario: Displaying LDAP directory’s filtered content
The Job described below simply filters the LDAP directory and displays the result on the console.
• Drop the tLDAPInput component along with a tLogRow from the Palette to the design
workspace.
• Set the tLDAPInput properties.

tLDAPInput
• Set the Property type on Repository if you stored the LDAP connection details in the
Metadata Manager in the Repository. Then select the relevant entry on the list.
• In Built-In mode, fill in the Host and Port information manually. Host can be the IP address
of the LDAP directory server or its DNS name.
• No particular Base DN is to be set.
• Then select the relevant Protocol on the list. In this example: a simple LDAP protocol is
used.
• Select the Authentication check box and fill in the login information if required to read the
directory. In this use case, no authentication is needed.
• In the Filter area, type in the command, the data selection is based on. In this example, the
filter is: (&(objectClass=inetorgperson)&(uid=PIERRE DUPONT)).
• Fill in Multi-valued field separator with a comma as some fields may hold more than one
value, separated by a comma.
• As we don’t know if some aliases are used in the LDAP directory, select Always on the list.
• Set Ignore as Referral handling.
• Set the limit to 100 for this use case.

tLDAPInput
• Set the Schema as required by your LDAP directory. In this example, the schema is made
of 6 columns including the objectClass and uid columns which get filtered on.
• In the tLogRow component, no particular setting is required.
Only one entry of the directory corresponds to the filter criteria given in the tLDAPInput
component.

tLDAPOutput
tLDAPOutput
tLDAPOutput Properties
Function tLDAPOutput writes into an LDAP directory.

Purpose tLDAPOutput executes an LDAP query based on the given filter and corresponding
to the schema definition. Then it passes on the field list to the next component via a
Main row link.
Repository: Select the repository file where Properties
are stored. The fields that come after are pre-filled in
using the fetched data.
Click this icon to open a database connection wizard and
store the database connection parameters you set in the
Host LDAP Directory server IP address.
Base DN Path to user’s authorized tree leaf.
check box.
TLS: certificate is used When this option is chosen, the
User and Password Fill in the User and Password as required by the directory
Note that the login must match the LDAP syntax
requirement to be valid. e.g.: “cn=Directory Manager”.
Multi valued field Character, string or regular expression to separate data in
separator a multi-value field.

tLDAPOutput
performance if you are sure that no aliases is to be
resolution.
resolution
Insert mode Select the editing mode on the list:
Add: add a value in a multi-value attribute,
Insert: insert new data,
Updata: updates the existing data,
Delete: remove the selected data from the directory,
Insert or Update: insert new data or update existing ones.
DN Column Name Select in the list the type of the LDAP input entity used.
Built-in: The schema is created and stored locally for this
component only. Related topic: Setting a built-in schema
of Talend Open Studio User Guide.
User Guide.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for
error via a Row > Reject link.
metadata at a job level as well as at each component level.
GetResultName variable to retrieve the relevant DN Base automatically. This
component allows you to carry out actions on a table or on the data of a table in an
database. It also allows you to create a reject flow using a Row > Rejects link to filter
data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve
data in error with a Reject link on page 483
Scenario: Editing data in a LDAP directory
The following scenario describes a Job that reads an LDAP directory, updates the email of a selected
entry and displays the output before writing the LDAP directory. To keep it simple, no alias
dereferencing nor referral handling is performed. This scenario is based on LDAPInput’s Scenario:

tLDAPOutput
Displaying LDAP directory’s filtered content on page 406. The result returned was a single entry,
related to an organisational person, whom email is to be updated.
• Drop the tLDAPInput, tLDAPOutput, tMap and tLogRow components from the Palette
to the design workspace.
• Connect the input component to the tMap then to the tLogRow and to the output
component.
• In the tLDAPInput Component view, set the connection details to the LDAP directory
server as well as the filter as described in Scenario: Displaying LDAP directory’s filtered
content on page 406.
• Change the schema to make it simpler, by removing the unused fields: dc, ou, objectclass.
• Then open the mapper to set the edit to be carried out.

• Drag & drop the uid column from the input table to the output as no change is required on
this column.
• In the Expression field of the dn column (output), fill in with the exact expression expected
by the LDAP server to reach the target tree leaf and allow directory writing on the condition
that you haven’t set it already in the Base DN field of the tLDAPOutput component.
• In this use case, the GetResultName global variable is used to retrieve this path
automatically. Press Ctrl+Space bar to access the variable list and select
tLDAPInput_1_RESULT_NAME.
• In the mail column’s expression field, type in the new email that will overwrite the current
data in the LDAP directory. In this example, we change to Pierre.Dupont@talend.com.
• Click OK to validate the changes.
• The tLogRow component doesn’t need any particular setting.

tLDAPOutput
• Then select the tLDAPOutput component to set the directory writing properties.
• Set the Port and Host details manually if they aren’t stored in the Repository.
• In Base DN field, set the highest tree leaf you have the rights to access. If you haven’t set
previously the exact and full path of the target DN you want to access, then fill in it here. In
this use case, the full DN is provided by the dn output from the tMap component, therefore
only the highest accessible leaf is given: o=directoryRoot.
• Select the relevant protocol to be used: LDAP for this example.
• Then fill in the User and Password as expected by the LDAP directory.
• Use the default setting of Alias Dereferencing and Referral Handling fields, respectively
Always and Ignore.
• The Insert mode for this use case is Update (the email address).
• The schema was provided by the previous component through the propagation operation.
• Save the Job and execute.
The output shows the following fields: dn, uid and mail as defined in the Job.

tLDAPRenameEntry
tLDAPRenameEntry
tLDAPRenameEntry properties
Function tLDAPRenameEntry renames entries in an LDAP directory.

Purpose The tLDAPRenameEntry component rename ones or more entries in a specific
LDAP directory.
Repository: Select the repository file where Properties
are stored. The fields that come after are pre-filled in
using the fetched data.
Host LDAP directory server IP address.
Port Number of the listening port of the server.
Base DN Path to user’s authorized tree leaf.
LDAP: no encryption is used,
LDAPS: secured LDAP,
TLS: certificate is used.
User and Password Fill in user authentication information.
Note that the login must match the LDAP syntax
requirement to be valid. e.g.: “cn=Directory Manager”.
Always: Always dereference aliases,
Never: Never dereferences aliases,
Searching: Dereferences aliases only after name
resolution,
resolution.
Referrals handling Select the option on the list:
Ignore: does not handle request redirections,
Follow: does handle request redirections.
Previous DN and New Select from the list the schema column that holds the old
DN DN (Previous DN) and the column that holds the new
DN (New DN).

tLDAPRenameEntry

User Guide.
error via a Row > Reject link.
Advanced settings tStatCatcher Statistics Select this check box to collect log data at the component
level.
Usage This component covers all possibilities of LDAP queries. It is usually used as a
one-component subjob but you can use it with other components as well.
GetResultName variable to retrieve automatically the relevant DN Base.
Related scenarios
For use cases in relation with tLDAPRenameEntry, see the following scenarios:
• tLDAPInput Scenario: Displaying LDAP directory’s filtered content on page 406.
• tLDAPOutput Scenario: Editing data in a LDAP directory on page 410.

tMaxDBInput
tMaxDBInput
tMaxDBInput properties
Component Family Databases/MaxDB
Function tMaxDBInput reads a database and extracts fields based on a query.

Purpose tMaxDBInput executes a DB query with a strictly defined order which must
Host name Database server IP address
Password
User Guide.
Table name Type in the table name.
schema definition.
field.

tMaxDBInput
schema.
columns.
component level.
Related scenario


tMaxDBOutput
tMaxDBOutput
tMaxDBOutput properties
Function tMaxDBOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tMaxDBOutput executes the action defined on the table and/or on the data
component in the job.
Password
table can be written at a time and that the table must
exist for the insert operation to succeed.
created again.
created.
does not exist.
Truncate table: The table content is deleted. You do

tMaxDBOutput
found, job stops.
ones.
if non existing
flow.

operation.
Studio User Guide.
Advanced settings Encoding Set the encoding manually. This field is compulsory
for DB data handling.
execution.

tMaxDBOutput

data.
component level.
out actions on a table or on the data of a table in a database. It also allows you
to create a reject flow using a Row > Rejects link to filter data in error. For
Related scenario

tMySQLOutput.

tMaxDBRow
tMaxDBRow
tMaxDBRow properties
Function tMaxDBRow is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tMaxDBRow acts on
Password
User Guide.
Table name Type in the table name.
schema definition.
field.

tMaxDBRow
SQL instruction.

Encoding Set manually the encoding type. This field is
component level.
Related scenario


tMSSqlBulkExec
tMSSqlBulkExec
tMSSqlBulkExec properties
tMSSqlOutputBulk and tMSSqlBulkExec components are used together to first output the file
that will be then used as parameter to execute the SQL query stated. These two steps compose the
tMSSqlOutputBulkExec component, detailed in a separate section. The interest in having two
separate elements lies in the fact that it allows transformations to be carried out before the data
loading in the database.
Component family Databases/MSSql
Function Executes the Insert action on the data provided.

Purpose As a dedicated component, tMSSqlBulkExec offers gains in performance
while carrying out the Insert operations to a MSSql database
connection tMSSqlConnection component on the Component
defined.

Password

tMSSqlBulkExec

created again.
created.
does not exist.
User Guide.
Remote File Name Name of the file to be processed.
Advanced settings Action Select the action to be carried out
Bulk insert
Bulk update
Bcp query out
Depending on the action selected, the requied
information varies.
Bulk insert & Bulk Additional JDBC Specify additional connection properties for the DB
update parameters connection you are creating. This option is not
Fields terminated Character, string or regular expression to separate
fields.
Rows terminated Character, string or regular expression to separate
rows.
First row Type in the number of the row where the action should
start
Code page This value can be any of the followings:
OEM (by default value)
ACP
RAW
User-defined

tMSSqlBulkExec
Data file type Select the type of data being handled.

Output Select the type of output for the standard output of the
MSSql database:
to console,
to global variable.
component level.
Bcp query out Fields terminated Character, string or regular expression to separate
fields.
Rows terminated Character, string or regular expression to separate
rows.
Output Select the type of output to pass the processed data
onto:
to console: data is viewed in the Log view.
to global variable: data is put in output variable
linked to a tsystem component
component level.
Usage This component is to be used along with tMSSqlOutputBulk component.
Used together, they can offer gains in performance while feeding a MSSql
database.
Related scenarios
For uses cases in relation with tMSSqlBulkExec, see the following scenarios:


tMSSqlColumnList
tMSSqlColumnList
tMSSqlColumnList Properties
Component family Databases/MS SQL
Function Iterates on all columns of a given table through a defined MS SQL connection.
Purpose Lists all column names of a given MSSql table.
Basic settings Component list Select the tMSSqlConnection component in the list
if more than one connection are planned for the
current job.
Table name Enter the name of the tabe.
Usage This component is to be used along with MSSql components, especially with
tMSSqlConnection.
Limitation n/a
Related scenario
For tMSSqlColumnList related scenario, see Scenario: Iterating on a DB table and listing its
column names on page 455.

tMSSqlClose
tMSSqlClose
tMSSqlClose properties
Function tMssqlClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tMssqlConnection component in the list if
Job.
component level.
Usage This component is to be used along with tMssql components, especially with
tMssqlConnection and tMssqlCommit.
Limitation n/a
Related scenario

tMSSqlCommit
tMSSqlCommit
tMSSqlCommit properties
This component is closely related to tMSSqlConnection and tMSSqlRollback. It usually does not
make much sense to use these components independently in a transaction..
Function tMSSqlCommit validates the data processed through the job into the
connected DB.
current Job.
Close connection This check box is selected by default. It allows you to
performed its task.

level.
Usage This component is to be used along with Mssql components, especially with
tMSSqlConnection and tMSSqlRollback components.
Limitation n/a
Related scenarios
This component is closely related to tMSSqlConnection and tMSSqlRollback. It usually does not
make much sense to use one of these without using a tMSSqlConnection component to open a
connection for the current transaction..
For a tMSSqlCommit related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tMSSqlConnection
tMSSqlConnection
tMSSqlConnection properties
This component is closely related to tMSSqlCommit and tMSSqlRollback. Both components are
usually used with a tMSSqlConnection component to open a connection for the current transaction.
Component family Databases/MSSQL
Function tMSSqlConnection opens a connection to the database for a current

transaction.
Schema Schema name.
Password
parameters connection you are creating.
level.
tMSSqlCommit and tMSSqlRollback.
Limitation n/a

tMSSqlConnection
Related scenarios
This component is closely related to tMSSqlCommit and tMSSqlRollback. It usually does not
make much sense to use one if these without using a tMSSqlConnection component to open a
For tMSSqlConnection related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tMSSqlInput
tMSSqlInput
tMSSqlInput properties

Server
Function tMSSqlInput reads a database and extracts fields based on a query.

Purpose tMSSqlInput executes a DB query with a strictly defined order which must
defined.


tMSSqlInput

Password
User Guide.
schema definition.
data handling.
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a MS SQL server
database..
Related scenarios
Related topics in tDBInput scenarios:


tMSSqlLastInsertId
tMSSqlLastInsertId
tMSSqlLastInsertId properties
Component Family Databases/MS SQL

server
Function tMSSqlLastInsertId displays the last IDs added to a table from a MSSql
specified connection.
Purpose tMSSqlLastInsertId enables you to retrieve the last primary keys added by a
user to a MSSql table.
User Guide.
Component list Select the tMSSqlConnection component on the
already defined, if there are more than one component
in this list.
component level.
Related scenario
For a related scenario, see Scenario: Get the ID for the last inserted record on page 467..

MSSqlOutput
MSSqlOutput
tMSSqlOutput properties

server
Function tMSSqlOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tMSSqlOutput executes the action defined on the table and/or on the data
defined.


MSSqlOutput

Password
created again.
created.
does not exist.
Turn on identity insert Select this check box to use your own sequence for
the identity value of the inserted records (instead of
having the SQL Server pick the next sequential
value).
found, job stops.
ones.
if non existing
flow.
Insert if not exist : Add new entries to the table if
they do not exist.

operation.

MSSqlOutput

Studio User Guide.
execution.
data.
Support null in “SQL Select this check box if you want to deal with the Null
WHERE” statement values contained in a DB table.
Make sure the Nullable check box is selected
for the corresponding columns in the schema.
component level.

MSSqlOutput
out actions on a table or on the data of a table in a MSSql database. It also
Related scenarios
For tMSSqlOutput related topics, see:

476.

tMSSqlOutputBulk
tMSSqlOutputBulk
tMSSqlOutputBulk properties
tMSSqlOutputBulk and tMSSqlBulkExec components are used together to first output the file
tMSSqlOutputBulkExec component, detailed in a separate section. The interest in having two
loading.
Function Writes a file with columns based on the defined delimiter and the MSSql
standards.
MSSql database.
Guide.
of the records.
User Guide.
Guide.
fields.
Include header Select this check to include the column header.

tMSSqlOutputBulk
DB data handling.
component level.
Usage This component is to be used along with tMSSqlBulkExec component. Used
together they offer gains in performance while feeding a MSSql database.
Related scenarios
For uses cases in relation with tMSSqlOutputBulk, see the following scenarios:

tMSSqlOutputBulkExec
tMSSqlOutputBulkExec properties
Function Executes the action on the data provided.

operations to a MSSql database.
Basic settings Action Select the action to be carried out
Bulk insert
Bulk update
defined.

DB name Name of the database
Schema Name of the database schema
Password


Note that only one table can be written at a time and
that the table must exist for the insert operation to
succeed.
created again.
created.
does not exist.
Guide.
of the records
fields.
First row Type in the number of the row where the action should
start.
Include header Select this check box to include the column header.
Code page OEM code pages used to map a specific set of
characters to numerical code point values.

data handling.
component level.
Limitation n/a
Related scenarios
For uses cases in relation with tMSSqlOutputBulkExec, see the following scenarios:

tMSSqlRollback
tMSSqlRollback
tMSSqlRollback properties
This component is closely related to tMSSqlCommit and tMSSqlConnection. It usually doesn’t

Function Cancel the transaction commit in the connected DB.

current job.
task.
component level.
tMSSqlConnection and tMSSqlCommit components.
Limitation n/a
Related scenario
For tMSSqlRollback related scenario, see Scenario: Rollback from inserting data in

tMSSqlRow
tMSSqlRow
tMSSqlRow properties
Function tMSSqlRow is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tMSSqlRow acts on
defined.

Password

tMSSqlRow

User Guide.
Table name Name of the table to be used.
value).
field.
schema definition.

tMSSqlRow

SQL instruction.

Encoding Type Set manually the encoding type. This field is
component level.
Related scenarios


tMSSqlSCD
tMSSqlSCD
tMSSqlSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tMSSqlSCD on page 17.

tMSSqlSP
tMSSqlSP
tMSSqlSP Properties
Function tMSSqlSP calls the database stored procedure.

Purpose tMSSqlSP offers a convenient way to centralize multiple or complex queries
defined.

Password

tMSSqlSP

User Guide.
Is Function / Return Select this check box, if only a value is to be returned.
IN: Input parameter
(function).
compulsory for DB data handling
component level.
Related scenario

• tMysqlSP Scenario: Finding a State Label using a stored procedure on page 512.
• tOracleSP Scenario: Checking number format using a stored procedure on page 564

tMSSqlTableList
tMSSqlTableList
tMSSqlTableList Properties
Function Iterates on a set of table names through a defined MS SQL connection.

Purpose Lists the names of a given set of MSSql tables using a select statement based
on a Where clause.
current job.
name selection on.
component level.
tMSSqlConnection.
Limitation n/a
Related scenario
For tMSSqlTableList related scenario, see Scenario: Iterating on a DB table and listing its column
names on page 455.

tMysqlBulkExec
tMysqlBulkExec
tMysqlBulkExec properties
tMysqlOutputBulk and tMysqlBulkExec components are used together to first output the file that
will be then used as parameter to execute the SQL query stated. These two steps compose the
tMysqlOutputBulkExec component, detailed in a separate section. The interest in having two
Component family Databases/Mysql

Purpose As a dedicated component, tMysqlBulkExec offers gains in performance
while carrying out the Insert operations to a Mysql database
DB Version Select the version of My SQL that you are using.
Use an existing Select this check box when using a configured
connection tMysqlConnection component.

Password

tMysqlBulkExec
created again.
created.
does not exist.
Local file Name Name of the file to be processed.
User Guide.
Lines terminated by Character or sequence of characters used to separate
lines.
fields.
Enclosed by Character used to enclose text.
Insert records in table: Add new records to the table.
Update records in table: Make changes to existing
records.
Replace records in table: Replace existing records
with new ones.
Ignore records in table: Ignore the existing records,
or insert the new ones.
Records contain Check this box if you want to retrieve the null values
NULL value from the input data flow. If you do not check this box,
the null values from the input data flow will be
considered as empty fields in the output data flow.

tMysqlBulkExec
DB data handling.
component level.
Usage This component is to be used along with tMysqlOutputBulk component.
Used together, they can offer gains in performance while feeding a Mysql
database.
Limitation n/a
Related scenarios
For uses cases in relation with tMysqlBulkExec, see the following scenarios:
• tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB on page 538

tMysqlClose
tMysqlClose
tMysqlClose properties
Function tMysqlClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tMysqlConnection component in the list if
Job.
component level.
Usage This component is to be used along with Mysql components, especially with
tMysqlConnection and tMysqlCommit.
Limitation n/a
Related scenario

tMysqlColumnList
tMysqlColumnList
tMysqlColumnList Properties
Function Iterates on all columns of a given table through a defined Mysql connection.
Purpose Lists all column names of a given Mysql table.
job.
Table name Enter the name of the table.
tMysqlConnection.
Limitation n/a
Scenario: Iterating on a DB table and listing its column names
The following Java scenario creates a five-component job that iterates on a given table name from
a Mysql database using a Where clause and lists all column names present in the table.
tMysqlConnection, tMysqlTableList, tMysqlColumnList, tFixedFlowInput, and
tLogRow.
• Connect tMysqlConnection to tMysqlTableList using an OnSubjobOk link.
• Connect tMysqlTableList, tMysqlColumnList, and tFixedFlowInput using Iterate links.
• Connect tFixedFlowInput to tLogRow using a Row Main link.

tMysqlColumnList
• In the design workspace, select tMysqlConnection and click the Component tab to define
its basic settings.
• In the Basic settings view, set the database connection details manually or select them from
the context variable list, through a Ctrl+Space click in the corresponding field if you have
stored them locally as Metadata DB connection entries.
For more information about Metadata, see Defining Metadata items of Talend Open Studio User
Guide.
In this example, we want to connect to a Mysql database called customers.

• In the design workspace, select tMysqlTableList and click the Component tab to define its
basic settings.
• On the Component list, select the relevant Mysql connection component if more than one
connection is used.
• Enter a Where clause using the right syntax in the corresponding field to iterate on the table
name(s) you want to list on the console.
In this scenario, the table we want to iterate on is called customer.
• In the design workspace, select tMysqlColumnList and click the Component tab to define
its basic settings.

tMysqlColumnList
• On the Component list, select the relevant Mysql connection component if more than one
connection is used.
• In the Table name field, enter the name of the DB table you want to list its column names.
In this scenario, we want to list the columns present in the DB table called customer.
• In the design workspace, select tFixedFlowInput and click the Component tab to define its
basic settings.
• Set the Schema to Built-In and click the three-dot [...] button next to Edit Schema to define
the data you want to use as input. In this scenario, the schema is made of two columns, the
first for the table name and the second for the column name.
• Click OK to close the dialog box, and accept propagating the changes when prompted by the
system. The defined columns display in the Values panel of the Basic settings view.
• Click in the Value cell for each of the two defined columns and press Ctrl+Space to access
the global variable list.
• From the global variable list, select
((String)globalMap.get("tMysqlTableList_1_CURRENT_TABLE")) and
((String)globalMap.get("tMysqlColumnList_1_COLUMN_NAME")) for the TableName
and ColumnName respectively.
• In the design workspace, select tLogRow.

• Click the Component tab and define the basic settings for tLogRow as needed.
• Save your job and press F6 to execute it.

tMysqlColumnList
The name of the DB table is displayed on the console along with all its column names.

tMysqlCommit
tMysqlCommit
tMysqlCommit Properties
This component is closely related to tMysqlConnection and tMysqlRollback. It usually doesn’t

Function Validates the data processed through the job into the connected DB
job.
performed its task.

component level.
tMysqlConnection and tMysqlRollback components.
Limitation n/a
Related scenario
This component is closely related to tMysqlConnection and tMysqlRollback. It usually doesn’t

make much sense to use one of these without using a tMysqlConnection component to open a
For tMysqlCommit related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tMysqlConnection
tMysqlConnection
tMysqlConnection Properties
This component is closely related to tMysqlCommit and tMysqlRollback. It usually doesn’t make
much sense to use one of these without using a tMysqlConnection component to open a connection

parameters connection you are creating.
Password
DB data handling.
tMysqlCommit and tMysqlRollback components.
Limitation n/a
Scenario: Inserting data in mother/daughter tables
The following job is dedicated to advanced database users, who want to carry out multiple table
insertions using a parent table id to feed a child table. As a prerequisite to this job, follow the steps
described below to create the relevant tables using an engine such as innodb.

tMysqlConnection
• In a command line editor, connect to your Mysql server.

• Once connected to the relevant database, type in the following command to create the parent
table: create table f1090_mum(id int not null auto_increment, name varchar(10), primary
key(id)) engine=innodb;
• Then create the second table: create table baby (id_baby int not null, years int)
engine=innodb;
Back into Talend Open Studio, the job requires seven components including tMysqlConnection
and tMysqlCommit.
• Drag and drop the following components from the Palette: tFileList, tFileInputDelimited,
tMap, tMysqlOutput (x2).
• Connect the tFileList component to the input file component using an Iterate link as the
name of the file to be processed will be dynamically filled in from the tFileList directory
using a global variable.
• Connect the tFileInputDelimited component to the tMap and dispatch the flow between the
two output Mysql DB components. Use a Row link for each for these connections
representing the main data flow.
• Set the tFileList component properties, such as the directory. name where files will be
fetched from.
• Add a tMysqlConnection component and connect it to the starter component of this job, in
this example, the tFileList component using an OnComponentOk link to define the
execution order.
• In the tMysqlConnection Component view, set the connection details manually or fetch
them from the Repository if you centrally stored them as a Metadata DB connection entry.
For more information about Metadata, see Defining Metadata items in Talend Open Studio
User Guide.
• On the tFileInputDelimited component’s Basic settings panel, press Ctrl+Space bar to
access the variable list. Set the File Name field to the global variable:
tFileList_1.CURRENT_FILEPATH

tMysqlConnection
• Set the rest of the fields as usual, defining the row and field separators according to your file
structure.
• Then set the schema manually through the Edit schema feature or select the schema from
the Repository. In Java version, make sure the data type is correctly set, in accordance with
the nature of the data processed.
• Change the encoding if different from the default one.
• In the tMap Output area, add two output tables, one called mum for the parent table, the
second called baby, for the child table.
• Drag the Name column from the Input area, and drop it to the mum table.
• Drag the Years column from the Input area and drop it to the baby table.
• Make sure the mum table is on the top of the baby table as the order is determining for the
flow sequence hence the DB insert to perform correctly.
• Then connect the output row link to distribute correctly the flow to the relevant DB output
component.
• In each of the tMysqlOutput components’ Basic settings panel, select the Use an existing
connection check box to retrieve the tMysqlConnection details.
• Notice (in Perl version) that the Commit every field doesn’t show anymore as you are
supposed to use the tMysqlCommit instead to manage the global transaction commit. In
Java version, ignore the field as this command will get overridden by the tMysqlCommit.

tMysqlConnection
• Set the Table name making sure it corresponds to the correct table, in this example either
f1090_mum or f1090_baby.
• There is no action on the table as they are already created.
• Select Insert as Action on data for both output components.
• Click on Sync columns to retrieve the schema set in the tMap.
• Change the encoding type if need be.
• In the Additional columns area of the DB output component corresponding to the child
table (f1090_baby), set the id_baby column so that it reuses the id from the parent table.
• In the SQL expression field type in: '(Select Last_Insert_id())'
• The position is Before and the Reference column is years.
• Add the tMysqlCommit component to the design workspace and connect it from the
tFileList component using a OnComponentOk connection in order for the job to terminate
with the transaction commit.
• On the tMysqlCommit Component view, select in the list the connection to be used.

tMysqlConnection
The parent table id has been reused to feed the id_baby column.

tMysqlInput
tMysqlInput
tMysqlInput properties
Function tMysqlInput reads a database and extracts fields based on a query.

Purpose tMysqlInput executes a DB query with a strictly defined order which must

Password

tMysqlInput

User Guide.
schema definition.
When you need to handle data of the
time-stamp type 0000-00-00 00:00:00 using
this component, set the parameter as:
noDatetimeStringSync=true&ze-
roDateTimeBehavior=convert-
ToNull.
Enable stream When selected, helps to decide the row set to work
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a Mysql database.
Related scenarios


tMysqlLastInsertId
tMysqlLastInsertId
tMysqlLastInsertId properties
Function tMysqlLastInsertId fetches the last inserted ID from a selected MySQL

Connection.
Purpose tMysqlLastInsertId obtains the primary key value of the record that was last
inserted in a Mysql table by a user.
projects and job flow charts. Related topic: Setting a
Guide.
Component list Select the relevant tMysqlConnection component in
the list if more than one connection is planned for the
current job.
Usage This component is to be used as an intermediary component.
If you use this component with tMySqlOutput, verify that the

Extend Insert check box in the Advanced Settings tab is not
selected. Extend Insert allows you to make a batch insertion,
however, if the check box is selected, only the ID of the last line in the last
batch will be returned.
Limitation n/a
Scenario: Get the ID for the last inserted record
The following Java scenario creates a job that opens a connection to Mysql database, writes the
defined data into the database, and finally fetches the last inserted ID on the existing connection.
tMySqlConnection, tMySqlCommit, tFileInputDelimited, tMySqlOutput,
tMysqlLastInsertId, and tLogRow.
• Connect tMySqlConnection to tFileInputDelimited using an OnSubjobOk link.
• Connect tFileInputDelimited to tMySqlCommit using an OnSubjobOk link.
• Connect tFileInputdelimited to the three other components using Row Main links.

tMysqlLastInsertId
• In the design workspace, select tMysqlConnection.

• Click the Component tab to define the basic settings for tMysqlConnection.
• In the Basic settings view, set the connection details manually or select them from the
context variable list, through a Ctrl+Space click in the corresponding field if you stored
them locally as Metadata DB connection entries. For more information about Metadata, see
Defining Metadata items of Talend Open Studio User Guide.
• In the design workspace, select tMysqlCommit and click the Component tab to define its
basic settings.
• On the Component List, select the relevant tMysqlConnection if more than one connection
is used.
• In the design workspace, select tFileInputDelimited.
• Click the Component tab to define the basic settings of tFileInputDelimited.

tMysqlLastInsertId
• Set Property Type to Built-In.

• Fill in a path to the processed file in the File Name field. The file used in this example is
Customers.
• Define the Row separator that allow to identify the end of a row. Then define the Field
separator used to delimit fields in a row.
• Set the header, the footer and the number of processed rows as necessary. In this scenario,
we have one header.
• Set Schema to Built in and click the three-dot button next to Edit Schema to define the data
to pass on to the next component.
Related topics: Setting a built-in schema and Setting a Repository schema of Talend Open Studio
User Guide.
In this scenario, the schema consists of two columns, name and age. The first holds three employees’
names and the second holds the corresponding age for each.
• In the design workspace, select tMySqlOutput.
• Click the Component tab to define the basic settings of tMySqlOuptput.

tMysqlLastInsertId
• Select the Use an existing connection check box.

• In the Table field, enter the name of the table where to write the employees’ list, in this
example: employee.
• Select relevant actions on the Action on table and Action on data lists. In this example, no
action is carried out on table, and the action carried out on data is Insert.
• Set Schema to Built-In and click Sync columns to synchronize columns with the previous
component. In this example, the schema to be inserted into the MySql database table consists
of the two columns name and age.
• In the design workspace, select tMySqlLastInsertId.

• Click the Component tab to define the basic settings of tMySqlLastInserId.
• On the Component List, select the relevant tMysqlConnection, if more than one
connection is used.

tMysqlLastInsertId
• Set Schema to Built-In and click Sync columns to synchronize columns with the previous
component. In the output schema of tMySqlLastInsertId, you can see the read-only column
last_insert_id that will fetch the last inserted ID on the existing connection.
• In the design workspace, select tLogRow and click the Component tab to define its basic
settings. For more information, see tLogRow on page 1023.
tMysqlLastInsertId fetched the last inserted ID for each line on the existing connection.

tMysqlOutput
tMysqlOutput
tMysqlOutput properties
Function tMysqlOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tMysqlOutput executes the action defined on the table and/or on the data
DB Version Select the MySQL version you are using.
Java only..


tMysqlOutput

Password
created again.
created.
does not exist.
Truncate table: The table content is quickly deleted.
However, you will not be able to rollback the
operation.
found, the job stops.
ones.
Update or insert: Update existing entries or creates
them if they do not exist.
flow.
Insert or update on duplicate key or unique index:
Add entries if the inserted value does not exist or
update entries if the inserted value already exists and
there is a risk of violating a unique index or primary
key.
Insert Ignore: Add only new rows to prevent
duplicate key errors.

Key in delete column for the deletion operation.

tMysqlOutput

Studio User Guide.
Java only. box to skip the row in error and complete the process
rows in error via a Row > Rejects link.
Java only..
defined set of lines instead of inserting lines one by
one. The gain in system performance is considerable.
Number of rows per insert:: enter the number of

rows to be inserted per operation. Note that the higher
the value specidied, the lower performance levels
shall be due to the increase in memory demands.
This option is not compatible with the Reject
link. You should therefore clear the checkbox
if you are using a Row > Rejects link with this
component.
If you are using this component with

tMysqlLastInsertID, ensure that the
Extend Insert check box in Advanced
Settings is not selected. Extend Insert allows
for batch loading, however, if the check box is
selected, only the ID of the last line of the last
batch will be returned.
Commit every Number of rows to be included in the batch before it
is committed to the DB. This option ensures
transaction quality (but not rollback) and, above all, a
higher performance level.

tMysqlOutput
Additional Columns This option is not available if you have just created
the DB table (even if you delete it beforehand). This
option allows you to call SQL functions to perform
actions on columns, provided that these are not insert,
update or delete actions, or actions that require
pre-processing.
altered or inserted
executed in order to alter or insert the data in the
corrsponding column.
Position: Select Before, Replace or After,
depending on the action to be performed on the
reference column.
Reference column: Type in a reference column that
tMySqlOutput can use to locate or replace the new
column, or the column to be modified.
particularly if multiple actions are being carried out
on the data.
Use Hint Options Select this check box to activate the hint configuration
Java only.. area which helps you optimize a query’s execution. In
this area, parameters are:
- HINT: specify the hint you need, using the syntax
/*+ */.
- POSITION: specify where you put the hint in a
SQL statement.
- SQL STMT: select the SQL statement you need to
use..
Enable debug mode Select this check box to display each step involved in
Java only.. the process of writing data in the database.
Use duplicate key Select this check box to execute insert or update
Java only.. update mode insert actions in the Extend Insert mode.
To use this option you must first of all select the

Insert mode in the Action on data list found in
the Basic Settings view and select the Extend Insert
mode in the Advanced settings view.
component level.
out actions on a table or on the data of a table in a MySQL database. It also

tMysqlOutput
Scenario 1: Adding a new column and altering data in a DB table
This Java scenario is a three-component job that aims at creating random data using a
tRowGenerator, duplicating a column to be altered using the tMap component, and eventually
altering the data to be inserted based on an SQL expression using the tMysqlOutput component.
tRowGenerator, tMap and tMySQLOutput.
• Connect tRowGenerator, tMap, and tMysqlOutput using the Row Main link.
• In the design workspace, select tRowGenerator to display its Basic settings view.
• Set the Schema to Built-In.

• Click the Edit schema three-dot button to define the data to pass on to the tMap component,
two columns in this scenario, name and random_date.

• Click the RowGenerator Editor three-dot button to open the editor and define the data to
be generated.

tMysqlOutput
• Click in the corresponding Functions fields and select a function for each of the two
columns, getFirstName for the first column and getrandomDate for the second
column.
• In the Number of Rows for Rowgenerator field, enter 10 to generate ten first name rows
and click Ok to close the editor.
• Double-click the tMap component to open the Map editor. The Map editor opens displaying
the input metadata of the tRowGenerator component.
• In the Schema editor panel of the Map editor, click the plus button of the output table to add
two rows and define the first as random_date and the second as random_date1.

tMysqlOutput
In this scenario, we want to duplicate the random_date column and adapt the schema in order to alter
the data in the output component.
• In the Map editor, drag the random_date row from the input table to the random_date and
random_date1 rows in the output table.
• Click OK to close the editor.

• In the design workspace, double-click the tMysqlOutput component to display its Basic
settings view and set its parameters.

tMysqlOutput
• Set Property Type to Repository and then click the three-dot button to open the
[Repository content] dialog box and select the correct DB connection. The connection
details display automatically in the corresponding fields.
If you have not stored the DB connection details in the Metadata entry in the
Repository, select Built-in on the property type list and set the connection detail
manually.
• Click the three-dot button next to the Table field and select the table to be altered, Dates in
this scenario.
• On the Action on table list, select Drop table if exists and create, select Insert on the
Action on data list.
• If needed, click Sync columns to synchronize with the columns coming from the tMap
component.
• Click the Advanced settings tab to display the corresponding view and set the advanced
parameters.

tMysqlOutput
• In the Additional Columns area, set the alteration to be performed on columns.

In this scenario, the One_month_later column replaces random_date_1. Also, the data itself
gets altered using an SQL expression that adds one month to the randomly picked-up date
of the random_date_1 column. ex: 2007-08-12 becomes 2007-09-12.
-Enter One_Month_Later in the Name cell.

-In the SQL expression cell, enter the relevant addition script to be performed,
“adddate(Random_date, interval 1 month)” in this scenario.
-Select Replace on the Position list.
-Enter Random_date1 on the Reference column list.
For this job we duplicated the random_date_1 column in the DB table before replacing
one instance of it with the One_Month_Later column. The aim of this workaround was
to be able to view upfront the modification performed.

The new One_month_later column replaces the random_date1 column in the DB table and adds one
month to each of the randomly generated dates.
Related topic: tDBOutput properties on page 256.
Scenario 2: Updating data in a database table
This Java scenario describes a two-component Job that updates data in a MySQL table according
to that in a delimited file.
• Drop tFileInputDelimited and tMySQLOutput from the Palette onto the design
workspace.

tMysqlOutput
• From the Property Type list, select Repository if you have already stored the metadata of
the delimited file in the Metadata node in the Repository tree view. Otherwise, select
Built-In to define manually the metadata of the delimited file.
For more information about storing metadata, see Setting up a File Delimited schema on
page 166 of Talend Open Studio User Guide.
• In the File Name field, click the three-dot button and browse to the source delimited file that
contains the modifications to propagate in the MySQL table.
In this example, we use the customer_update file that holds four columns: id,
CustomerName, CustomerAddress and idState. Some of the data in these four columns is
different from that in the MySQL table.
• Define the row and field separators used in the source file in the corresponding fields.
• If needed, set Header, Footer and Limit.
In this example, Header is set to 1 since the first row holds the names of columns, therefore
it should be ignored. Also, the number of processed lines is limited to 2000.
• Select Built in from the Schema list then click the three-dot button next to Edit Schema to
open a dialog box where you can describe the data structure of the source delimited file that
you want to pass to the component that follows.

tMysqlOutput
• Select the Key check box(es) next to the column name(s) you want to define as key
column(s).
It is necessary to define at least one column as a key column for the Job to be executed
correctly. Otherwise, the Job is automatically interrupted and an error message
displays on the console.
• In the design workspace, double-click tMysqlOutput to open its Basic settings view where
you can define its properties.
• Click Sync columns to retrieve the schema of the preceding component. If needed, click the
three-dot button next to Edit schema to open a dialog box where you can check the retrieved
schema.
metadata in the Metadata node in the Repository tree view. Otherwise, select Built-In to
define manually the connection information.
For more information about storing metadata, see Setting up a DB connection on page 146
• Fill in the database connection information in the corresponding fields.
• In the Table field, enter the name of the table to update.
• From the Action on table list, select the operation you want to perform, None in this
example since the table already exists.
• From the Action on data list, select the operation you want to perform on the data, Update
in this example.

tMysqlOutput
Using you DB browser, you can verify if the MySQL table, customers, has been modified according
to the delimited file.
In the above example, the database table has always the four columns id, CustomerName,
CustomerAddress and idState, but certain fields have been modified according to the data in the
delimited file used.
Scenario 3: Retrieve data in error with a Reject link
This scenario describes a four-component Job that carries out migration from a customer file to a
MySQL database table and redirects data in error towards a CSV file using a Reject link.
• In the Repository, select the customer file metadata that you want to migrate and drop it onto
the workspace. In the [Components] dialog box, select tFileInputDelimited and click OK.
The component properties will be filled in automatically.

tMysqlOutput
• If you have not stored the information about your customer file under the Metadata node in
the Repository. Drop a tFileInputDelimited component from the family File > Input, in
the Palette, and fill in its properties manually in the Component tab.
• From the Palette, drop a tMap from the Processing family onto the workspace.
• In the Repository, expand the Metadata node, followed by the Db Connections node and
select the connection required to migrate your data to the appropriate database. Drop it onto
the workspace. In the [Components] dialog box, select tMysqlOutput and click OK. The
database connection properties will be automatically filled in.
• If you have not stored the database connection details under the Db Connections node in the
Repository, drop a tMysqlOutput from the Databases family in the Palette and fill in its
properties manually in the Component tab.
For more information, see Setting a built-in schema and Setting a Repository schema in the Talend
• From the Palette, select a tFileOutputDelimited from the File > Output family, and drop
it onto the workspace.
• Link the customers component to the tMap component, and the tMap and Localhost with
a Row Main link. Name this second link out.
• Link the Localhost to the tFileOutputDelimited using a Row > Reject link.
• Double-click the customers component to display the Component view.
• In the Property Type list, select Repository and click the [...] button in order to select the
metadata containing the connection to your file. You can also select the Built-in mode and
fill in the fields manually.
• Click the [...] button next to the File Name field, and fill in the path and the name of the file
you want to use.
• In the Row and Field Separator fields, type in between inverted commas the row and field
separator used in the file.
• In the Header, Footer and Limit fields, type in the number of headers and footers to ignore,
and the number of rows to which processing should be limited.

tMysqlOutput
• In the Schema list, select Repository and click the [...] button in order to select the schema
of your file, if it is stored under the Metadata node in the Repository. You can also click
the [...] button next to the Edit schema field, and set the schema manually.
The schema is as follows:
• Double-click the tMap component to open its editor.
• Select the id, CustomerName, CustomerAddress, idSate, id2, RegTime and RegisterTime
columns on the table on the left and drop them on the out table, on the right.

tMysqlOutput
• In the Schema editor area, at the bottom of the tMap editor, in the right table, change the
length of the CustomerName column to 28 to create an error. Thus, any data for which the
length is greater than 28 will create errors, retrieved with the Reject link.
• Click OK.
• In the workspace, double-click the output Localhost component to display its Component
view.
• In the Property Type list, select Repository and click the [...] button to select the connection
to the database metadata. The connection details will be automatically filled in. You can also
select the Built-in mode and set the fields manually.
• In the Table field, type in the name of the table to be created. In this scenario, we call it
customers_data.
• In the Action on data list, select the Create table option.
• Click the Sync columns button to retrieve the schema from the previous component.

tMysqlOutput
• Make sure the Die on error check box isn’t selected, so that the Job can be executed despite
the error you just created.
• Click the Advanced settings tab of the Component view to set the advanced parameters of
the component.
• Deselect the Extend Insert check box which enables you to insert rows in batch, because
this option is not compatible with the Reject link.
• Double-click the tFileOutputDelimited component to set its properties in the Component
view.
• Click the [...] button next to the File Name field to fill in the path and name of the output file.
• Click the Sync columns button to retrieve the schema of the previous component.

tMysqlOutput
The data in error are sent to the delimited file, as well as the error type met. Here, we have: Data
truncation.

tMysqlOutputBulk
tMysqlOutputBulk
tMysqlOutputBulk properties
tMysqlOutputBulk and tMysqlBulkExec components are used together to first output the file that
will be then used as parameter to execute the SQL query stated. These two steps compose the
tMysqlOutputBulkExec component, detailed in a separate section. The interest in having two
loading.
Function Writes a file with columns based on the defined delimiter and the MySql
standards
MySQL database.
Related topic:Defining variables from the
Guide
fields.
of the file
Include header Select this check box to include the column header to
the file.
User Guide.
Guide.

tMysqlOutputBulk
DB data handling.
Usage This component is to be used along with tMySQlBulkExec component. Used
together they offer gains in performance while feeding a MySQL database.
Scenario: Inserting transformed data in MySQL database
This scenario describes a four-component job which aims at fueling a database with data contained
in a file, including transformed data. Two steps are required in this job, first step is to create the file,
that will then be used in the second step. The first step includes a tranformation phase of the data
included in the file.
• Drag and drop a tRowGenerator, a tMap, a tMysqlOutputBulk as well as a

tMysqlBulkExec component.
• Connect the main flow using row main links.
• And connect the start component (tRowgenerator in this example) to the tMysqlBulkExec
using a trigger connection, of type OnComponentOk.
• A tRowGenerator is used to generate random data. Double-click on the tRowGenerator
component to launch the editor.
• Define the schema of the rows to be generated and the nature of data to generate. In this
example, the clients file to be produced will contain the following columns: ID, First Name,
Last Name, Address, City which all are defined as string data but the ID that is of integer type.

tMysqlOutputBulk
• Some schema information don’t necessarily need to be displayed. To hide them away, click
on Columns list button next to the toolbar, and uncheck the relevant entries, such as
Precision or Parameters.
• Use the plus button to add as many columns to your schema definition.
• Click the Refresh button to preview the first generated row of your output.
• Then select the tMap component to set the transformation.
• Drag and drop all columns from the input table to the output table.
• Apply the transformation on the LastName column by adding uc in front of it.

• Click OK to validate the transformation.
• Then double-click on the tMysqlOutputBulk component.

tMysqlOutputBulk
• Define the name of the file to be produced in File Name field. If the delimited file
information is stored in the Repository, select it in Property type field, to retrieve relevant
data. In this use case the file name is clients.txt.
• The schema is propagated from the tMap component, if you accepted it when prompted.
• In this example, don’t include the header information as the table should already contain it.
• The encoding is the default one for this use case.
• Click OK to validate the output.
• Then double-click on the tMysqlBulkExec to set the INSERT query to be executed.
• Define the database connection details. We recommend you to store this type of information
in the Repository, so that you can retrieve them at any time for any job.
• Set the table to be filled in with the collected data, in the Table field.
• Fill in the column delimiters in the Field terminated by area.
• Make sure the encoding corresponds to the data encoding.
• Then press F6 to run the job.
The clients database table is filled with data from the file including upper-case last name as
transformed in the job.

tMysqlOutputBulk
For simple Insert operations that don’t include any transformations, the use of
tMysqlOutputBulkExec allows you to skip a step in the process and thus improves performance.
Related topic: tMysqlOutputBulkExec properties on page 494

tMysqlOutputBulkExec
tMysqlOutputBulkExec properties

Purpose As a dedicated component, it improves performance during Insert operations
to a MySQL database.
automatically using the fetched data.
DB Version Select the version of MySQL that you are using.
Password
In Java, use Action on table On the table defined, you can perform one of the
tCreateTable following operations:
as substitute for this None: No operation is carried out.
function. Drop and create a table: The table is removed and
created again.
created.
Create a table if doesn’t exist: The table is created if
it does not already exist.
Note that only one table can be written at a
time and that the table must already exist for
the insert operation to succeed
Local FileName Name of the file to be processed.
view of Talend Open Studio User Guide
Append Select the check box for this option to append new
rows to the end of the file.
Schema and Edit A schema is a description of rows. It defines the
Schema number of fields which are treated and passed on to
the following component. The schema is either local
(Built-in) or remote, in the Repository.

fields.
Text enclosure Character used to enclose the text.
Create directory if This check box is selected by default. It creates a
does not exist directory to hold the output table if required.
Custom the flush Customize the amount of memory used to temporarily
buffer size store output data. In the Row number field, enter the
number of rows after which the memory is to be freed
again.
Action on data On the data of the table defined, you can carry out the
following opertaions:
Insert records in table: Add new records to the table.
Update records in table: Make changes to existing
records.
Replace records in table: Replace existing records
with new one.
Ignore records in table: Ignore existing records or
insert the new ones.
Records contain This check box is selected by default. It allows you to
NULL value take account of NULL value fields. If you clear the
check box, the NULL values will automatically be
replaced with empty values.
DB data handling.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Limitation n/a
Scenario: Inserting data in MySQL database
This scenario describes a two-component job which carries out the same operation as the one
described for tMysqlOutputBulk properties on page 489 and tMysqlBulkExec properties on page
451, although no data is transformed.

• Drop a tRowGenerator and a tMysqlOutputBulkExec component from the Palette to the

design workspace.
• Connect the components using a link such as Row > Main.
• Set the tRowGenerator parameters the same way as in the Scenario: Inserting transformed
data in MySQL database on page 490. The schema is made of four columns including: ID,
First Name, Last Name, Address and City.
• In the workspace, double-click the tMysqlOutputBulkExec to display the Component
view and set the properties.
• Define the database connection details, if necessary. Consult the recommendations detailed
in the Scenario: Inserting transformed data in MySQL database on page 490, concerning the
conservation of connection details in the Repository, under the Metadata node. In the
component view, select Repository in the Property Type field and then select the
appropriate connection in the adjacent field. The following fields will be filled in
automatically.
For further information, see Setting a built-in schema and Setting a Repository schema in the Talend
• In the Action on table field, select the None option as you want to insert the data into a table
which already exists.
• In the Table field, enter the name of the table you want to populate, the name being clients
in this example.
• In the Local filename field, indicate the access path and the name of the file which contains
the data to be added to the table. In this example, the file is clients.txt.
• Click on the Advanced settings tab to define the component’s advanced parameters.

• In the Action on data list, select the Insert records in table to insert the new data in the
table.
• Press F6 to run the job.
The result should be pretty much the same as in Scenario: Inserting transformed data in MySQL
database on page 490, but the data might differ as these are regenerated randomly everytime the job
is run.

tMysqlRollback
tMysqlRollback
tMysqlRollback properties
This component is closely related to tMysqlCommit and tMysqlConnection. It usually doesn’t


job.
task.
component level.
tMysqlConnection and tMysqlCommit components.
Limitation n/a
Scenario: Rollback from inserting data in mother/daughter tables
Based on the tMysqlConnection Scenario: Inserting data in mother/daughter tables on page 460,
insert a rollback function in order to prevent unwanted commit.
• Drag and drop a tMysqlRollback to the design workspace and connect it to the Start
component.

tMysqlRollback
• Set the Rollback unique field on the relevant DB connection.

This complementary element to the job ensures that the transaction won’t be partly committed.

tMysqlRow
tMysqlRow
tMysqlRow properties
Function tMysqlRow is the specific component for this database query. It executes the
SQL query stated in the specified database. The row suffix means the
output.
Purpose Depending on the nature of the query and the database, tMysqlRow acts on the
DB Version Select the MySQL version that you are using.
connection tMysqlConnection component on the Component
defined.

Password

tMysqlRow

User Guide.
field.
schema definition.
recordset in a COLUMN of the current flow. Select this column
from the use column list.
SQL instruction.

DB data handling.

tMysqlRow

component level.
Scenario 1: Removing and regenerating a MySQL table index
This scenario describes a four-component job that removes a table index, applies a select insert
action onto a table then regenerates the index.
• Select and drop the following components onto the design workspace: tMysqlRow (x2),
tRowGenerator, and tMysqlOutput.
• Connect tRowGenerator to tMysqlIntput.
• Using a OnComponentOk connections, link the first tMysqlRow to tRowGenerator and
tRowGenerator to the second tMysqlRow.
• Select the tMysqlRow to fill in the DB Basic settings.
• In Property type as well in Schema, select the relevant DB entry in the list.
• The DB connection details and the table schema are accordingly filled in.
• Propagate the properties and schema details onto the other components of the job.
• The query being stored in the Metadata area of the Repository, you can also select
Repository in the Query type field and the relevant query entry.
• If you didn’t store your query in the Repository, type in the following SQL statement to alter
the database entries: drop index <index_name> on <table_name>
• Select the second tMysqlRow component, check the DB properties and schema.

tMysqlRow
• Type in the SQL statement to recreate an index on the table using the following statement:
create index <index_name> on <table_name> (<column_name>)
The tRowGenerator component is used to generate automatically the columns to be added
to the DB output table defined.
• Select the tMysqlOutput component and fill in the DB connection properties either from the
Repository or manually the DB connection details are specific for this use only. The table to
be fed is named: comprehensive.
• The schema should be automatically inherited from the data flow coming from the
tLogRow. Edit the schema to check its structure and check that it corresponds to the schema
expected on the DB table specified.
• The Action on table is None and the Action on data is Insert.
• No additional Columns is required for this job.
• Press F6 to run the job.
If you manage to watch the action on DB data, you can notice that the index is dropped at the start
of the job and recreated at the end of the insert action.
Related topics: tDBSQLRow properties on page 261.
Scénario 2: Using PreparedStatement objects to query data
This scenario describes a four component job which allows you to link a table column with a client
file. The MySQL table contains a list of all the American States along with the State ID, while the
file contains the customer information including the ID of the State in which they live. We want to
retrieve the name of the State for each client, using an SQL query. In order to process a large volume
of data quickly, we use a PreparedStatement object which means that the query is executed only once
rather than against each row in turn. Then each row is sentas a parameter.
For this scenario, we use a file and a database for which we have already stored the connection and
properties in the Rerpository metadata. For further information concerning the creation of metadata
in delimited files, consult Setting up a File Delimited schema. For further information concerning
the creation of database connection metadata, see Setting up a DB connection and for further
information as to the usage of metadata, see Setting a Repository schema. These sections are all in
the Talend Open Studio Components User Guide.
• In the Repository, expand the Metadata and File delimited nodes.

• Select the metadata which corresponds to the client file you want to use in the Job. Here, we
are using the customers metadata.
• Slide the metadata onto the workspace and double-click tFileInputDelimited in the
Components dialog box so that the tFileInputDelimited component is created with the
parameters already set.

tMysqlRow
• In the Schema list, select Built-in so that you can modify the component’s schema. Then
click on [...] next to the Edit schema field to add a column into which the name of the State
will be inserted.
• Click on the [+] button to add a column to the schema. Rename this column
LabelStateRecordSet and select Object from the Type list. Click OK to save your
modifications.
• From the Palette, select the tMysqlRow, tParseRecordSet and tFileOutputDelimited
components and drop them onto the workspace.
• Connect the four components using Row > Main type links.

tMysqlRow
• Double click tMysqlRow to set its properties in the Basic settings tab of the Component
view.
• In the Property Type list, select Repository and click on the [...] button to select a database
connection from the metadata in the Repository. The DB Version, Host, Port, Database,
Username and Password fields are completed automatically. If you are using the Built-in
mode, complete these fields manually.
• From the Schema list, select Built-in to set the schema properties manually and add the
LabelStateRecordSet column, or click directly on the Sync columns button to retrieve the
schemma from the preceding component.
• In the Query field, enter the SQL query you want to use. Here, we want to retrieve the names
of the American States from the LabelState column of the MySQL table, us_state:
"SELECT LabelState FROM us_state WHERE idState=?". The question
mark, “?”, represents the parameter to be set in the Advanced settings tab.
• Click Advanced settings to set the components advanced properties.

tMysqlRow
• Select the Propagate QUERY’s recordset check box and select the LabelStateRecordSet
column from the use column list to insert the query results in that column.
• Select the Use PreparedStatement check box and define the parameter used in the query in
the Set PreparedStatement Parameters table.
• Click on the [+] button to add a parameter.
• In the Parameter Index cell, enter the parameter position in the SQL instruction. Enter “1”
as we are only using one parameter in this example.
• In the Parameter Type cell, enter the type of parameter. Here, the parameter is a whole
number, hence, select Int from the list.
• In the Parameter Value cell, enter the parameter value. Here, we want to retrieve the name
of the State based on the State ID for every client in the input file. Hence, enter
“row1.idState”.
• Double click tParseRecordSet to set its properties in the Basic settings tab of the
Component view.

tMysqlRow
• From the Prev. Comp. Column list, select the preceding components column for analysis.
In this example, select LabelStateRecordSet.
• Click on the Sync columns button to retrieve the schema from the preceding component.
The Attribute table is automatically completed with the schema columns.
• In the Attribute table, in the Value field which corresponds to the LabelStateRecordSet,
enter the name of the column containing the State names to be retrieved and matched with
each client, within double quotation marks. In this example, enter “LabelState”.
• Double click tFileOutputDelimited to set its properties in the Basic settings tab of the
Component view.
• In the File Name field, enter the access path and name of the output file.
• Save your Job and press F6 to run it.

tMysqlRow
A column containing the name of the American State corrresponding to each client is added to the
file.

tMysqlSCD
tMysqlSCD
tMysqlSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tMysqlSCD on page 19.

tMysqlSCDELT
tMysqlSCDELT
tMysqlSCDELT belongs to two component families: Business Intelligence and Databases. For more
information on it, see tMysqlSCDELT on page 31.

tMysqlSP
tMysqlSP
tMysqlSP Properties
Function tMysqlSP calls the database stored procedure.

Purpose tMysqlSP offers a convenient way to centralize multiple or complex queries
Password
DB data handling.
User Guide.
Is Function / Return Select this check box, if a value only is to be returned.

tMysqlSP
IN: Input parameter
(function).
Scenario: Finding a State Label using a stored procedure
The following job aims at finding the State labels matching the odd State IDs in a Mysql two-column
table. A stored procedure is used to carry out this operation.
• Drag and drop the following components used in this example: tRowGenerator, tMysqlSP,
tLogRow.
• Connect the components using the Row Main link.
• The tRowGenerator is used to generate the odd id number. Double-click on the component
to launch the editor.
• Click on the Plus button to add a column to the schema to generate.

• Select the Key check box and define the Type to Int.
• The Length equals to 2 digits max.

tMysqlSP
• Use the preset function called sequence but customize the Parameters in the lower part of
the window.
• Change the Value of step from 1 to 2 for this example, still starting from 1.
• Set the Number of generated rows to 25 in order for all the odd State id (of 50 states) to be
generated.
• Click OK to validate the configuration.
• Then select the tMysqlSP component and define its properties.
• Set the Property type field to Repository and select the relevant entry on the list. The
connection details get filled in automatically.
• Else, set manually the connection information.
• Click Sync Column to retrieve the generated schema from the preceding component.

tMysqlSP
• Then click Edit Schema and add an extra column to hold the State Label to be output, in
addition to the ID.
• Select the encoding type on the list.
• Type in the name of the procedure in the SP Name field as it is called in the Database. In this
example, getstate. The procedure to be executed states as follows:
DROP PROCEDURE IF EXISTS `talend`.`getstate` $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `getstate`(IN pid
INT, OUT pstate VARCHAR(50))
BEGIN
SELECT LabelState INTO pstate FROM us_states WHERE idState =
pid;
END $$
• In the Parameters area, click the plus button to add a line to the table.
• Set the Column field to ID, and the Type field to IN as it will be given as input parameter
to the procedure.
• Add a second line and set the Column field to State and the Type to Out as this is the output
parameter to be returned.
• Eventually, set the tLogRow component properties.
• Synchronize the schema with the preceding component.

• And select the Print values in cells of a table check box for reading convenience.
• Then save your Job and execute it.
The output shows the state labels corresponding to the odd state ids as defined in the procedure.
Check the tParseRecordSet component if you want to analyze a set of records
from a database table or DB query and return single records.

tMysqlTableList
tMysqlTableList
tMysqlTableList Properties
Function Iterates on a set of table names through a defined Mysql connection.

Purpose Lists the names of a given set of Mysql tables using a select statement based
on a Where clause.
job.
name selection on.
tMysqlConnection.
Limitation n/a
Scenario: Iterating on DB tables and deleting their content using a

user-defined SQL template
The following Java scenario creates a three-component job that iterates on given table names from
a MySQL database using a WHERE clause. It then deletes the content of the tables directly on the
DBMS using a user-defined SQL template.
For advanced use, start with creating a connection to the database that contains the tables you want
to empty of their content.
• In the Repository tree view, expand Metadata and right click DB Connections to create a
connection to the relevant database and to store the connection information locally.
For more information about Metadata, see Defining Metadata items of Talend Open Studio
User Guide.
Otherwise, drop a tMySQLConnection component in the design workspace and fill the
connection details manually.
• Drop the database connection you created from the Repository onto the design workspace.
The [Components] dialog box displays.
• Select tMySQLConnection and click OK.
The tMySQLConnection components displays on the design workspace with all connection
details automatically filled in its Basic settings view.
• Drop the following two components from the Palette onto the design workspace:
tMysqlTableList and tELT.
• Connect tMysqlConnection to tMysqlTableList using an OnSubjobOk link.

tMysqlTableList
• Connect tMysqlTableList to tELT using an Iterate link.

• If needed, double-click tMysqlConnection to display its Basic settings view and verify the
connection details.
In this example, we want to connect to a MySQL database called examples.

• In the design workspace, double-click tMysqlTableList to display its Basic settings view
and define its settings.
• On the Component list, select the relevant MySQL connection component if more than one
connection is used.
• Enter a WHERE clause using the right syntax in the corresponding field to iterate on the table
name(s) you want to delete the content of.
In this scenario, we want the job to iterate on all the tables which names start with “ex”.
• In the design workspace, double-click tELT to display its Basic settings view and define its
settings.

tMysqlTableList
• In Database Name, enter the name of the database containing the tables you want to process.
• On the Component list, select the relevant MySQL connection component if more than one
connection is used.
• Click in the Table name field and press Ctrl+Space to access the global variable list.
• From the global variable list, select
((String)globalMap.get("tMysqlTableList_1_CURRENT_TABLE")).
To create the user-defined SQL template:
• In the Repository tree view, expand SQL Templates and MySQL in succession.
• Right-click UserDefined and select Create SQLTemplate from the drop-down list.
The New SQLTemplate wizard opens.

tMysqlTableList
• Enter a name for the new SQL template and fill in the other fields If needed and then click
Finish to close the wizard.
An SQL pattern editor opens on the design workspace.
• Delete the existing code and enter the code necessary to carry out the desired action, deleting
the content of all tables which names start with “ex” in this example.
In the SQL template code, you must use the correct variable name attached to the table
name parameter (“__TABLE-NAME__” in this example).
To display the variable name used, put your pointer in the Table Name field in the
basic settings of the tELT component.
• Press Ctrl+S to save the new user-defined SQL template.

The next step is to add the new user-defined SQL template to the SQL template list in the tELT
component.

tMysqlTableList
To add the user-defined SQL template to the SQL template list:

• In the Component view of tELT, click the SQL Templates tab to display the SQLTemplate
List.
• Click the Add button and add two SQL template lines.
• Click in the first line to display a drop-down arrow and then click the arrow to display the
SQL template list.
• Select in the list the user-defined SQL template you already created.
• Make sure that the SQL template in the second line is Commit.
All tables in the MySQL examples database which names begin with “ex” are emptied from their
content.
Related scenario
For tMysqlTableList related scenario, see Scenario: Iterating on a DB table and listing its column
names on page 455.

tNetezzaBulkExec
tNetezzaBulkExec
tNetezzaBulkExec properties
Component family Databases/Netezza

Purpose As a dedicated component, tNetezzaBulkExec offers gains in performance
while carrying out the Insert operations to a Netezza database
Use an existing Select this check box when you are using the
connection component tNetezzaConnection.

Password

tNetezzaBulkExec

User Guide.
Advanced settings Field Separator Character, string or regular expression to separate
fields.
Require quotes (“) Select this check box to use data enclosure characters.
around data files
Row Separator String (ex: “\n”on Unix) to distinguish rows.
Escape character Character of the row to be escaped.
Date format / Date Use Date format to distinguish the way years, months
delimiter and days are represented in a string. Use Date
delimiter to specify the separator between date
values.
Time format/ Time Use Time format to distinguish the time is represented
delimiter in a string. Use Time delimiter to specify the separator
between time values.
Encoding Select the encoding from the list or select Custom
DB data handling.
Max Errors Enter the maximum error limit that will not stop the
process.
Skip Rows Enter the number of rows to be skipped.
component level.
Usage This component offers performance and flexibility of Netezza DB query
handling.
Limitation n/a
Related scenarios
For uses cases in relation with tNetezzaBulkExec, see the following scenarios:
• tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database on page 490.
• tMysqlOutputBulkExec Scenario: Inserting data in MySQL database on page 495.

tNetezzaClose
tNetezzaClose
tNetezzaClose properties
Function tNetezzaClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tNetezzaConnection component in the list
current Job.
component level.
Usage This component is to be used along with Netezza components, especially with
tNetezzaConnection and tNetezzaCommit.
Limitation n/a
Related scenario

tNetezzaCommit
tNetezzaCommit
tNetezzaCommit Properties
This component is closely related to tNetezzaConnection and tNetezzaRollback. It usually does

not make much sense to use these components independently in a transaction.
Function tNetezzaCommit validates the data processed through the Job into the
connected DB
current Job.
performed its task.

component level.
tNetezzaConnection and tNetezzaRollback.
Limitation n/a
Related scenario
This component is closely related to tNetezzaConnection and tNetezzaRollback. It usually does

not make much sense to use one of these without using a tNetezzaConnection component to open
For tNetezzaCommit related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tNetezzaConnection
tNetezzaConnection
tNetezzaConnection Properties
This component is closely related to tNetezzaCommit and tNetezzaRollback. It usually does not
make much sense to use one of these without using a tNetezzaConnection component to open a
Function tNetezzaConnection opens a connection to the database for a current

transaction.
Password
Parameters connection you are creating.
DB data handling.
tNetezzaCommit and tNetezzaRollback.
Limitation n/a
Related scenarios
For tNetezzaConnection related scenario, see Scenario: Inserting data in mother/daughter tables
on page 460.

tNetezzaInput
tNetezzaInput
tNetezzaInput properties
Function tNetezzaInput reads a database and extracts fields based on a query.

Purpose tNetezzaInput executes a DB query with a strictly defined order which must
Use an existing Select this check box when using a
connection tNetezzaConnection component.

Password

tNetezzaInput

User Guide.
schema definition.
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a Netezza
database.
Related scenarios
Related scenarios for tNetezzaInput are:

• Scenario 1: Displaying selected data from DB table on page 253.
• Scenario 2: Using StoreSQLQuery variable on page 254.

tNetezzaOutput
tNetezzaOutput
tNetezzaOutput properties
Function tNetezzaOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tNetezzaOutput executes the action defined on the table and/or on the data
component in the designed Job.
connection tNetezzaConnection component.

Password

tNetezzaOutput

created again.
created.
does not exist.
found, job stops.
ones.
if non existing
flow.

operation..
Studio User Guide.

tNetezzaOutput
definable set of lines instead of inserting lines one by
one. The gain in system performance is huge.
Number of rows per insert: enter the number of

rows to be inserted as one block. Note that too high
value decreases performance due to memory issues.
data.
component level.
out actions on a table or on the data of a table in a Netezza database. It also

tNetezzaOutput
Related scenarios
For tNetezzaOutput related topics, see:

476.

tNetezzaRollback
tNetezzaRollback
tNetezzaRollback properties
This component is closely related to tNetezzaCommit and tNetezzaConnection. It usually does

not make much sense to use these components independently in a transaction.
Function tNetezzaRollback cancels the transaction committed in the connected DB.

Purpose This component avoids to commit part of a transaction involuntarily.
current job.
task.
component level.
tNetezzaConnection and tNetezzaCommit.
Limitation n/a
Related scenarios
For tNetezzaRollback related scenario, see Scenario: Rollback from inserting data in

tNetezzaRow
tNetezzaRow
tNetezzaRow properties
Function tNetezzaRow is the specific component for this database query. It executes the
SQL query stated onto the specified database. The row suffix means that the
component implements a flow in the job design although it does not provide
output.
Purpose Depending on the nature of the query and the database, tNetezzaRow acts on
connection tNetezzaConnection component on the Component
defined.

Password

tNetezzaRow

User Guide.
Table Name Enter the name of the table to be processed.
schema definition.
SQL instruction.

DB data handling.

tNetezzaRow

component level.
Related scenarios
For a tNetezzaRow related scenario, see Scenario 1: Removing and regenerating a MySQL table
index on page 502.

tOracleBulkExec
tOracleBulkExec
tOracleBulkExec properties
Function tOracleBulkExec inserts, appends, replaces or truncate data in an Oracle

database.
Purpose As a dedicated component, it allows gains in performance during operations
performed on data of an Oracle database.
Use an existing Select this check box when you are using the
connection component tOracleConnection.

Connection type Drop-down list of available drivers
DB Version Select the Oracle version in use
Host IP address of the database server
Port Port number listening the database server
Database Database name.
Schema Schema name.
Service Name Oracle Service Name or SID in Oracle database.
Perl only In Java projects, the the full database connec-
tion details are required.
Password

tOracleBulkExec

created again.
created.
Create a table if doesn’t exist: The table is created if
it does not exist.
Data file name Name of the file to be processed.
Insert: Inserts rows to an empty table. If duplicates
are found, job stops.
Update: Update the existing data of the table.
Append: Adds rows to the existing data of the table
Replace: Overwrites some rows of the table
Truncate: Drops table entries and inserts new input
flow data.
Schema and Edit In Stored Procedure principle, the schema is an input
schema parameter.
Advanced settings Advanced separator Select this check box to change the separator used for
(for number) the numbers.
Use existing control Select this check box if you use a control file (.ctl) and
file specify its path in the .ctl file name field.
Record format Define the record format:
Default: format parameters are set by default.
Stream: set Record terminator.
Fixed: set the Record length.
Variable: set the Field size of the record length.
Specify .ctl file’s INTO Select this check box to manually fill in the INTO
TABLE clause TABLE clause of the control file.
manually
fields:
None: no separator is used.
Whitespace: the separator used is a space.
EOF (used for loading LOBs from lobfile): the
separator used is an EOF character (End Of File).
Other terminator: Set another terminator in the
Field terminator field.

tOracleBulkExec
Use fields enclosure Select this check box if you want to use enclosing
characters for the text:
Fields enclosure (left part): character delimiting the
left of the field.
Field enclosure (right part): character delimiting the
right of the field.
Use schema’s Date Select this check box to use the date pattern of the
Pattern to load Date schema in the date field.
field
Specify field condition Select this check box to define data loading condition.
Preserve blanks Select this check box to preserve the blanks.
Trailing null columns Select this check box to load null columns.
NLS Language In the list, select the language used for the data that are
not used in Unicode.
Set Parameter Select this check box to modify the territory
NLS_TERRITORY conventions used for day and weeks numbering. Your
OS value is the default value used.
compulsory for database data handling.
Oracle database:
to console,
to global variable.
Convert columns and Select this check box to uppercase the names of the
table names to columns and the name of the table.
uppercase
component level.
Perl only fields.
Fields optionnally Data enclosure characters.
Perl only enclosed by
Perl only define it manually. This field is compulsory for DB
data handling.
Usage This dedicated component offers performance and flexibility of Oracle DB
query handling.

tOracleBulkExec
Scenario: Truncating and inserting file data into Oracle DB
This scenario describes how to truncate the content of an Oracle DB and load an input file content.
The related job is composed of three components that respectively creates the content, output this
content into a file to be loaded onto the Oracle database after the DB table has been truncated.
• Drop the following components: tOracleInput, tFileOutputDelimited, tOracleBulkExec

from the Palette to the design workspace
• Connect the tOracleInput with the tFileOutputDelimited using a row main link.
• And connect the tOracleInput to the tOracleBulkExec using a OnSubjobOk trigger link.
• Define the Oracle connection details. We recommend you to store the DB connection details
in the Metadata repository in order to retrieve them easily at any time in any job.
• Define the schema, if it isn’t stored either in the Repository. In this example, the schema is
as follows: ID_Contract, ID_Client, Contract_type, Contract_Value.
• Change the default encoding to AL32UTF8 encoding type.
• Define the tFileOutputDelimited component parameters, including output File Name,
Row separator and Fields delimiter.
• Set also the Encoding to the Oracle encoding type as above.
• Then double-click on the tOracleBulkExec to define the DB feeding properties.

tOracleBulkExec
• In the Property Type, select Repository mode if you stored the database connection details
under the Metadata node of the Repository or select Built-in mode to define them
manually. In this scenario, we use the Built-in mode.
• Thus, set the connection parameters in the following fields: Host, Port, Database, Schema,
Username, and Password.
• Fill in the name of the Table to be fed and the Action on data to be carried out, in this use
case: insert.
• In the Schema field, select Built-in mode, and click [...] button next to the Edit schema field
to describe the structure of the data to be passed on to the next component.
• Click the Advanced settings view to configure the advanced settings of the component.

tOracleBulkExec
• Select the Use an existing control file check box if you want to use a control file (.ctl) storing
the status of the physical structure of the database. Or, fill in the following fields manually:
Record format, Specify .ctl file’s INTO TABLE clause manually, Field terminated by,
Use field enclosure, Use schema’s Date Pattern to load Date field, Specify field
condition, Preserve blanks, Trailing null columns, Load options, NLS Language et Set
Parameter NLS_TERRITORY according to your database.
• Define the Encoding as in preceding steps.
• For this scenario, in the Output field, select to console to output the standard output f the
database to the console.
Press F6 to run the job. The log output displays in the Run tab and the table is fed with the parameter
file data.
Related topic: Scenario: Inserting data in MySQL database on page 495.

tOracleClose
tOracleClose
tOracleClose properties
Function tOracleClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tOracleConnection component in the list
current Job.
component level.
tOracleConnection and tOracleCommit.
Limitation n/a
Related scenario

tOracleCommit
tOracleCommit
tOracleCommit Properties
This component is closely related to tOracleConnection and tOracleRollback. It usually doesn’t

current job.
performed its task.

component level.
tOracleConnection and tOracleRollback components.
Limitation n/a
Related scenario
This component is closely related to tOracleConnection and tOracleRollback. It usually doesn’t

make much sense to use one of these without using a tOracleConnection component to open a
For tOracleCommit related scenario, see tMysqlConnection on page 460.

tOracleConnection
tOracleConnection
tOracleConnection Properties
This component is closely related to tOracleCommit and tOracleRollback. It usually doesn’t


Connection type Drop-down list of available drivers.
Use tns file Select this check box to use the metadata of a context
included in a tns file.
One tns file may have many contexts.
TNS File: Enter the path to the tns file manually or

browse to the file by clicking the three-dot button next
to the filed.
Select a DB Connection in Tns File: Click the
three-dot button to display all the contexts held in the
tns file and select the desired one.
Password
data handling.

tOracleConnection

tOracleCommit and tOracleRollback components.
Limitation n/a
Related scenario
This component is closely related to tOracleCommit and tOracleRollback. It usually doesn’t

For tOracleConnection related scenario, see tMysqlConnection on page 460.

tOracleInput
tOracleInput
tOracleInput properties
Function tOracleInput reads a database and extracts fields based on a query.

Purpose tOracleInput executes a DB query with a strictly defined order which must
connection tOracleConnection component.

Oracle schema Oracle schema name.

tOracleInput

Password
User Guide.
Table name Database table name.
schema definition.
component level.
columns.
Usage This component covers all possibilities of SQL queries onto a Oracle database.
Related scenarios


tOracleOutput
tOracleOutput
tOracleOutput properties
Function tOracleOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tOracleOutput executes the action defined on the table and/or on the data
connection tOracleConnection component. When you deselect
it, a check box appears (selected by default and
followed by a field) in the Advanced settings, Batch
Size, which enables you to define the number of lines
in each processed batch.

This field is DB Version Select the Oracle version in use
Java only.

tOracleOutput

Password
This field is Oracle schema Name of the Oracle schema
Perl only.
created again.
created.
does not exist.
found, job stops.
ones.
if non existing
flow.

operation..

tOracleOutput

Studio User Guide.
This option is Die on error This check box is selected by default. Clear the check
Java only. box to skip the row on error and complete the process
Java only.
This field is Override any existing Select this check box to override variables already set
Perl only. NLS_LANG for a NLS language environment.
environment variable
execution.
component level.
data.
Java only. especially when there is double action on data.

tOracleOutput
Java only. area which helps you optimize a query’s execution. In
this area, parameters are:
/*+ */.
SQL statement.
use.
Convert columns and Select this check box to set the names of columns and
Java only. table to uppercase table in upper case.
Java only. processing entries in a database.
Java only. lines in each processed batch.
This option is available only when you do

not Use an existing connection in Basic
settings.
Support null in “SQL Select this check box to validate null in “SQL
Java only. WHERE” statement WHERE” statement.
out actions on a table or on the data of a table in a Oracle database. It also
Related scenarios
For tOracleOutput related topics, see:

476.

tOracleOutputBulk
tOracleOutputBulk
tOracleOutputBulk properties
tOracleOutputBulk and tOracleBulkExec components are used together to first output the file
tOracleOutputBulkExec component, detailed in a separate section. The interest in having two
loading.
Function Writes a file with columns based on the defined delimiter and the Oracle
standards
Oracle database.
Guide.
of the file
User Guide.
Guide.
Advanced settings Advanced separator Select this check box to change data separators for
(for number) numbers:
Thousands separator: define separators you want to
use for thousands.
Decimal separator: define separators you want to
use for decimals.

tOracleOutputBulk

fields.
Row separator String (ex: “\n”on Unix) to separate rows.
Encodind Type Select the encoding from the list or select Custom
DB data handling.
level.
Usage This component is to be used along with tOracleBulkExec component. Used
together they offer gains in performance while feeding a Oracle database.
Related scenarios
For uses cases in relation with tOracleOutputBulk, see the following scenarios:

tOracleOutputBulkExec
tOracleOutputBulkExec properties

operations to an Oracle database.
defined.

Connection type List of available drivers
Password

None: No operations is carried out.
Drop and create the table: The table is removed and
created again.
created.
Create table if doesn’t exist: The table is created if
does not exist.
exists directory to hold the output table if required.
of the file.
found, job stops.
ones.
if non existing
Truncate: Remove all entries from table.
fields.
data handling.
Advanced settings Advanced separator Select this check box to change data separators for
Thousands separator: define separators you want to
use for thousands.
Decimal separator: define separators you want to
use for decimals.
Use existing control Select this check box and browse to the .ctl control
file file you want to use.
fields.
Specify .ctl file’s INTO Select this check box to enter manually the INTO
TABLE clause TABLE clause of the control file directly into the
manually code.
Use schema’s Date Select this check box to use the date model indicated
Pattern to load Date in the schema for dates.
field

Specify field condition Select this check box to define a condition for loading
data.
Preserve blanks Select this check box to preserve blank spaces.
Trailing null columns Select this check box to load data with all empty
columns.
NLS Language From the drop-down list, select the language for your
data if the data is not in Unicode.
Set Parameter Select this check box to modify the conventions used
NLS_TERRITORY for date and time formats. The default value is that of
the operating system.
DB data handling.
Oracle database:
to console,
to global variable.
Convert columns and Select this check box to put columns and table names
table names to in upper case.
uppercase
level.
Limitation n/a
Related scenarios
For uses cases in relation with tOracleOutputBulkExec, see the following scenarios:

tOracleRollback
tOracleRollback
tOracleRollback properties
This component is closely related to tOracleCommit and tOracleConnection. It usually doesn’t


current job.
task.
component level.
tOracleConnection and tOracleCommit components.
Limitation n/a
Related scenario
This component is closely related to tOracleConnection and tOracleCommit. It usually doesn’t

For tOracleRollback related scenario, see tMysqlRollback on page 498.

tOracleRow
tOracleRow
tOracleRow properties
Function tOracleRow is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tOracleRow acts on
defined.

Password

tOracleRow

User Guide.
schema definition.
SQL instruction.

DB data handling.
component level.

tOracleRow
Related scenarios


tOracleSCD
tOracleSCD
tOracleSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tOracleSCD on page 34.

tOracleSCDELT
tOracleSCDELT
tOracleSCDELT belongs to two component families: Business Intelligence and Databases. For more
information on it, see tOracleSCDELT on page 36.

tOracleSP
tOracleSP
tOracleSP Properties
Function tOracleSP calls the database stored procedure.

Purpose tOracleSP offers a convenient way to centralize multiple or complex queries
Basic settings Use an existing Select this check box to use an established connection
connection from tOracleConnection. Once you select it, the
Component list field appears allowing you to choose
the tOracleConnection component to be used from
those already established on the studio workspace.
For more information on tOracleConnection, see
section tOracleConnection on page 543.

Connection type The type may be:
- Oracle SID
- Oracle Service Name
- Oracle OCI
Schema Name of the Schema

tOracleSP

Password
User Guide.
SP Name Type in the exact name of the Stored Procedure (or
Function)
Is Function / Return Select this check box, if the stored procedure is a
result in function and one value only is to be returned.
Select on the list the schema column, the value to be
IN: Input parameter
IN OUT: Input parameter is to be returned as value,
(function).
The Custom Type is used when a Schema Column
you want to use is user-defined. Two Custom Type
columns are available in the Parameters table. In the
first Custom Type column:
- Select the check box in the Custom Type column
when the corresponding Schema Column you want
to use is of user-defined type.
- If all listed Schema Columns in the Parameters
table are of custom type, you can select the check box
before Custom Type once for them all.
In the second Custom Type column, you can precise
what the custom type is. The type may be:
- STRUCT: used for one element.
- ARRAY: used for a collection of elements.

tOracleSP
In the Custom name column, specify the name of the

custom type that you have given to this type.
When an OUT parameter uses the custom

type, make sure that its corresponding
Schema Column has chosen the Object type in
the schema table.
DB data handling.
NLS Language In the list, select the language used for the data that are
not used in Unicode.
NLS Territory Select the conventions used for date and time formats.
The default value is that of the operating system.
level.
When the parameters set in this component are of Custom Type, the tJava
family components should be placed before the component in order for users
to define values for the custom-type parameters, or after the component so as
to read and output the Out-type custom parameters.
Scenario: Checking number format using a stored procedure
The following job aims at connecting to an Oracle Database containing Social Security Numbers
and their holders’ name, calling a stored procedure that checks the SSN format of against a standard
###-##-#### format. Then the verification output results, 1 for valid format and 0 for wrong format
get displayed onto the execution console.

tOracleSP
• Drag and drop the following components from the Palette: tOracleConnection,
tOracleInput, tOracleSP and tLogRow.
• Link the tOracleConnection to the tOracleInput using a Then Run connection as no data
is handled here.
• And connect the other components using a Row Main link as rows are to be passed on as
parameter to the SP component and to the console.
• In the tOracleConnection, define the details of connection to the relevant Database. You
will then be able to reuse this information in all other DB-related components.
• Then select the tOracleInput and define its properties.
• Select the Use an existing connection check box and select the tOracleConnection
component in the list in order to reuse the connection details that you already set.
• Select Repository as Property type as the Oracle schema is defined in the DB Oracle
connection entry of the Repository. If you haven’t recorded the Oracle DB details in the
Repository, then fill in the Schema name manually.
• Then select Repository as Schema, and retrieve the relevant schema corresponding to your
Oracle DB table.
• In this example, the SSN table has a four-column schema that includes ID, NAME, CITY and
SSNUMBER.
• In the Query field, type in the following Select query or select it in the list, if you stored it
in the Repository.
select ID, NAME, CITY, SSNUMBER from SSN
• Then select the tOracleSP and define its Basic settings.

tOracleSP
• Like for the tOracleInput component, select Repository in the Property type field and
select the Use an existing connection check box, then select the relevant entries in the
respective list.
• The schema used for the tOracleSP slightly differs from the input schema. Indeed, an extra
column (SSN_Valid) is added to the Input schema. This column will hold the format validity
status (1 or 0) produced by the procedure.
• In the SP Name field, type in the exact name of the stored procedure (or function) as called
in the Database. In this use case, the stored procedure name is is_ssn.

tOracleSP
• The basic function used in this particular example is as follows:

CREATE OR REPLACE FUNCTION is_ssn(string_in VARCHAR2) RETURN
PLS_INTEGER
IS
-- validating ###-##-#### format
BEGIN
IF TRANSLATE(string_in, '0123456789A', 'AAAAAAAAAAB') =
'AAA-AA-AAAA' THEN
RETURN 1;
END IF;
RETURN 0;
END is_ssn;
/
• As a return value is expected in this use case, the procedure acts as a function, so select the
Is function check box.
• The only return value expected is based on the ssn_valid column, hence select the relevant
list entry.
• In the Parameters area, define the input and output parameters used in the procedure. In this
use case, only the SSNumber column from the schema is used in the procedure.
• Click the plus sign to add a line to the table and select the relevant column (SSNumber) and
type (IN).
• Then select the tLogRow component and click Sync Column to make sure the schema is
passed on from the preceding tOracleSP component.
• Select the Print values in cells of a table check box to facilitate the output reading.
• Then save your job and press F6 to run it.
On the console, you can read the output results. All input schema columns are displayed eventhough
they are not used as parameters in the stored procedure.

tOracleSP
The final column shows the expected return value, i.e. whether the SS Number checked is valid or
not.
Check the tParseRecordSet component if you want to analyze a set of records
from a database table or DB query and return single records.

tOracleTableList
tOracleTableList
tOracleTableList properties
Function tOracleTableList iterates on a set of tables through a defined Oracle

connection.
Purpose This component lists the names of specified Oracle tables using a SELECT
statement based on a WHERE clause.
if more than one connection is planned for the current
Job.
Where clause for table Enter the WHERE clause that will be used to identify
name selection the tables to iterate on.
Usage This component is to be used along with other Oracle components, especially
with tOracleConnection.
Limitation n/a
Related scenarios
For a a tOracleTablerList related scenario, see Scenario: Iterating on DB tables and deleting their
content using a user-defined SQL template on page 515.

tParAccelBulkExec
tParAccelBulkExec
tParAccelBulkExec Properties
tParAccelOutputBulk and tParAccelBulkExec components are generally used together, to

generate the output file which will be used as a parameter in the SQL query execution. This
execution in two steps is unified in the tParAccelOutputBulkExec component, which details are
given in a different section. The point of offering two separated components is to enable you to do
transformations before changing data in the database.
Component Family Databases/ParAccel
Function tParAccelBulkExec performs an Insert action on the data.

Purpose tParAccelBulkExec is a component which is specifically designed to improve
performance when loading data in ParAccel database.
connection tParAccelConnection.

Password

tParAccelBulkExec

created again.
created.
does not exist.
removed if already exists and created again.
User Guide.
Advanced settings Copy mode
Filename Name and path of the file to be processed.
File Type Select in the list the file type.
Field Layout Select in the list the field layout.
fields.
Explicit IDs
Remove Quotes
Max. Errors Type in the maximum number of errors before your
Job stops.
Date Format Type in the date format to be used.
Time/Timestamp
Format
Additional COPY
Options
Log file
Logging level Select the information type you want to record in your
log file.
component level.

tParAccelBulkExec
allows you to do actions on a table or the data in a table in a Greenplum
database. It enables you to create a reject flow, with a Row > Reject link
filtering the data in error. For a usage example, see Scenario 3: Retrieve data
in error with a Reject link on page 483 from component tMysqlOutput.
Related scenarios

tMySQLOutput.

tParAccelClose
tParAccelClose
tParAccelClose properties
Function tParAccelClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tParAccelConnection component in the
current Job.
component level.
Usage This component is to be used along with ParAccel components, especially with
tParAccelConnection and tParAccelCommit.
Limitation n/a
Related scenario

tParAccelCommit
tParAccelCommit
tParAccelCommit Properties
This component is closely related to tParAccelConnection and tParAccelRollback. It usually

Function Validates the data processed through the job into the connected DB.
current job.
performed its task.

component level.
tParAccelConnection and tParAccelRollback components.
Limitation n/a
Related scenario
This component is closely related to tParAccelConnection and tParAccelRollback. It usually

doesn’t make much sense to use one of these without using a tParAccelConnection component to
For tParAccelCommit related scenario, see tMysqlConnection on page 460.

tParAccelConnection
tParAccelConnection
tParAccelConnection Properties
This component is closely related to tParAccelCommit and tParAccelRollback. It usually doesn’t

make much sense to use one of these without using a tParAccelConnection component to open a

Password
data handling.
level.
tParAccelCommit and tParAccelRollback components.
Limitation n/a

tParAccelConnection
Related scenario
This component is closely related to tParAccelCommit and tParAccelRollback. It usually doesn’t

make much sense to use one of these without using a tParAccelConnection component to open a
For tParAccelConnection related scenario, see tMysqlConnection on page 460.

tParAccelInput
tParAccelInput
tParAccelInput properties
Component family Databases/ ParAccel
Function tParAccelInput reads a database and extracts fields based on a query.

Purpose tParAccelInput executes a DB query with a strictly defined order which must
connection tParAccelConnection component.

Password

tParAccelInput

User Guide.
Table name Name of the table to be read.
schema definition.
field.
schema.
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a ParAccel
database.
Related scenarios


tParAccelOutput
tParAccelOutput
tParAccelOutput Properties
Function tParAccelOutput writes, updates, modifies or deletes the data in a database.

Purpose tParAccelOutput executes the action defined on the table and/or on the data
of a table, according to the input flow form the previous component.
connection tParAccelConnection.

Password

tParAccelOutput

created again.
created.
does not exist.
found, job stops.
ones.
if non existing
flow.

operation.
Studio User Guide.

tParAccelOutput
execution.
data.
component level.
allows you to do actions on a table or the data in a table in a ParAccel database.
It enables you to create a reject flow, with a Row > Rejects link filtering the
data in error. For a usage example, see Scenario 3: Retrieve data in error with
a Reject link on page 483 from component tMysqlOutput.
Related scenarios

tMySQLOutput.

tParAccelOutputBulk
tParAccelOutputBulk
tParAccelOutputBulk properties
tParAccelOutputBulk and tParAccelBulkExec components are used together to output the file
tParAccelOutputBulkExec component, detailed in a separate section. The interest in having two
loading.
Function Writes a file with columns based on the defined delimiter and the ParAccel
standards
ParAccel database.
Guide.
of the file
User Guide.
Guide.
fields.
Include header Select this check box to include the column header.

tParAccelOutputBulk
component level.
Usage This component is to be used along with tParAccelBulkExec component.
Used together they offer gains in performance while feeding a ParAccel
database.
Related scenarios
For uses cases in relation with tParAccelOutputBulk, see the following scenarios:

tParAccelOutputBulkExec
tParAccelOutputBulkExec Properties
Function tParAccelOutputBulkExec performs an Insert action on the data.

Purpose tParAccelOutputBulkExec is a component which is specifically designed to
improve performance when loading data in ParAccel database.
Password
created again.
created.
does not exist.
User Guide.

Copy mode
Filename Name and path of the file to be processed.
Advanced settings File Type Select in the list the file type.
fields.
of the file.
Explicit IDs
Remove Quotes
Max. Errors Type in the maximum number of errors before your
Job stops.
Date Format Type in the date format to be used.
Time/Timestamp
Format
Additional COPY
Options
Log file
Logging level Select the information type you want to record in your
log file.
component level.
allows you to do actions on a table or the data in a table in a ParAccel database.
It enables you to create a reject flow, with a Row > Reject link filtering the data
in error. For a usage example, see Scenario 3: Retrieve data in error with a
Reject link on page 483 from component tMysqlOutput.
Related scenarios

tMySQLOutput.

tParAccelRollback
tParAccelRollback
tParAccelRollback properties
This component is closely related to tParAccelCommit and tParAccelConnection. It usually


current job.
task.
component level.
tParAccelConnection and tParAccelCommit components.
Limitation n/a
Related scenario
This component is closely related to tParAccelConnection and tParAccelCommit. It usually

doesn’t make much sense to use one of them without using a tParAccelConnection component to
For tParAccelRollback related scenario, see tMysqlRollback on page 498.

tParAccelRow
tParAccelRow
tParAccelRow Properties
Function tParAccelRow is the specific component for this database query. It executes
output.
Purpose Depending on the nature of the query and the database, tParAccelRow acts on
already defined.

Schema Exct name of the schema.
Password

tParAccelRow

User Guide.
it graphically using SQLBuilder.
Repository. The Query field gets accordingly filled
in.
field.
schema definition.
SQL instruction.

DB data handling.

tParAccelRow

component level.
Related scenarios

• Scenario: Resetting a DB auto-increment on page 262 from tDBSQLRow.
• Scenario 1: Removing and regenerating a MySQL table index on page 502 from
tMySQLRow.

tParAccelSCD
tParAccelSCD
tParAccelSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tParAccelSCD on page 39.

tParseRecordSet
tParseRecordSet
You can find this component at the root of Databases group of the Palette of Talend Open Studio.
tParseRecordSet covers needs related indirectly to the use of any database.
tParseRecordSet properties
Function tParseRecordSet parses a set of records from a database table or DB query

and possibly returns single records.
Purpose .Parses a recordset rather than individual records from a table.
Basic settings Prev. Comp. Column Set the column from the database that holds the
list recordset.
User Guide.
Attribute table Set the position value of each column for single
records from the recordset.
Limitation This component is mainly designed for a use with the SP component Recordset
feature.
Related Scenario
For an example of tParseRecordSet in use, see:

Scénario 2: Using PreparedStatement objects to query data on page 503

tPostgresPlusBulkExec
tPostgresPlusBulkExec properties

us
Function tPostgresPlusBulkExec executes the Insert action on the data provided.

Purpose As a dedicated component, tPostgresPlusBulkExec allows gains in
performance during Insert operations to a DB2 database.
already defined.

Password

created again.
created.
does not exist.
stored it in the Repository, hence can reuse it. Related
topic: Setting a Repository schema of Talend Open
Studio User Guide.
Bulk insert
Bulk update
Depending on the action selected, the required
information varies.
Field terminated by Character, string or regular expression to separate
fields.
component level.
Usage This dedicated component offers performance and flexibility of DB2 query
handling.
Related scenarios
For tPostgresPlusBulkExec related topics, see:

490.

tPostgresPlusClose
tPostgresPlusClose
tPostgresPlusClose properties
Component family Databases/Postgres
Function tPostgresPlusClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tPostgresPlusConnection component in
the list if more than one connection are planned for
the current Job.
component level.
Usage This component is to be used along with PostgresPlus components, especially
with tPostgresPlusConnection and tPostgresPlusCommit.
Limitation n/a
Related scenario

tPostgresPlusCommit
tPostgresPlusCommit
tPostgresPlusCommit Properties
This component is closely related to tPostgresPlusConnection and tPostgresPlusRollback. It

usually doesn’t make much sense to use JDBC components independently in a transaction.

us
the current Job.
performed its task.

component level.
with the tPostgresPlusConnection and tPostgresPlusRollback components.
Limitation n/a
Related scenario
This component is closely related to tPostgresPlusConnection and tPostgresPlusRollback. It

usually doesn’t make much sense to use PostgresPlus components without using the
tPostgresPlusConnection component to open a connection for the current transaction.
For tPostgresPlusCommit related scenario, see tMysqlConnection on page 460.

tPostgresPlusConnection
tPostgresPlusConnection Properties
This component is closely related to tPostgresPlusCommit and tPostgresPlusRollback. It usually

doesn’t make much sense to use one of PostgresPlus components without using the

us

Username and Enter your DB authentication data.
Password
level.
with the tPostgresPlusCommit and tPostgresPlusRollback components.
Limitation n/a

Related scenario
This component is closely related to tPostgresPlusCommit and tPostgresPlusRollback. It usually

doesn’t make much sense to use one of PostgresPlus components without using the
For tPostgresPlusConnection related scenario, see tMysqlConnection on page 460.

tPostgresPlusInput
tPostgresPlusInput
tPostgresPlusInput properties
Component family Databases/

PostgresPlus
Function tPostgresPlusInput reads a database and extracts fields based on a query.

Purpose tPostgresPlusInput executes a DB query with a strictly defined order which
connection tPostgresplusConnection component.

Password

tPostgresPlusInput

User Guide.
schema definition.
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a Postgresql
database.
Related scenarios


tPostgresPlusOutput
tPostgresPlusOutput
tPostgresPlusOutput properties

us
Function tPostgresPlusOutput writes, updates, makes changes or suppresses entries in

a database.
Purpose tPostgresPlusOutput executes the action defined on the table and/or on the
data contained in the table, based on the flow incoming from the preceding
connection tPostgresPlusConnection component.


tPostgresPlusOutput

Password
created again.
created.
does not exist.
found, job stops.
ones.
if non existing
flow.

operation.

tPostgresPlusOutput

Studio User Guide.
execution.
data.
component level.
out actions on a table or on the data of a table in a PostgresPlus database. It also

tPostgresPlusOutput
Related scenarios
For tPostgresPlusOutput related topics, see:

476.

tPostgresPlusOutputBulk
tPostgresPlusOutputBulk properties
tPostgresplusOutputBulk and tPostgresplusBulkExec components are used together to first

output the file that will be then used as parameter to execute the SQL query stated. These two steps
compose the tPostgresPlusOutputBulkExec component, detailed in a separate section. The
interest in having two separate elements lies in the fact that it allows transformations to be carried
out before the data loading.

us
Function Writes a file with columns based on the defined delimiter and the PostgresPlus
standards
PostgresPlus database.
Guide.
fields.
of the file
the file.
User Guide.
Guide.

DB data handling.
Usage This component is to be used along with tPostgresPlusBulkExec component.
Used together they offer gains in performance while feeding a PostgresPlus
database.
Related scenarios
For uses cases in relation with tPostgresplusOutputBulk, see the following scenarios:

tPostgresPlusOutputBulkExec
tPostgresPlusOutputBulkExec properties

us

operations to a PostgresPlus database.
Password
created again.
created.
does not exist.
Bulk insert
Bulk update
Depending on the action selected, the required
information varies.
File type Select the type of file being handled.


Field terminated by Character, string or regular expression to separate
fields.
component level.
Usage This component is mainly used when no particular tranformation is required on
the data to be loaded onto the database.
Related scenarios
For uses cases in relation with tPostgresPlusOutputBulkExec, see the following scenarios:

tPostgresPlusRollback
tPostgresPlusRollback
tPostgresPlusRollback properties
This component is closely related to tPostgresPlusCommit and tPostgresPlusConnection. It

usually does not make much sense to use these components independently in a transaction.

us
Function tPostgresPlusRollback cancels the transaction committed in the connected

DB.
the current job.
task.
component level.
with tPostgresPlusConnection and tPostgresPlusCommit.
Limitation n/a
Related scenarios
For tPostgresPlusRollback related scenario, see Scenario: Rollback from inserting data in

tPostgresPlusRow
tPostgresPlusRow
tPostgresPlusRow properties
Component family Databases/Postgrespl

us
Function tPostgresPlusRow is the specific component for the database query. It

executes the SQL query stated onto the specified database. The row suffix
means the component implements a flow in the job design although it doesn’t
provide output.
Purpose Depending on the nature of the query and the database, tPostgresPlusRow
acts on the actual DB structure or on the data (although without handling data).
The SQLBuilder tool helps you write easily your SQL statements.
already defined.

Password

tPostgresPlusRow

User Guide.
schema definition.
SQL instruction.

DB data handling.
component level.

tPostgresPlusRow
Related scenarios


tPostgresPlusSCD
tPostgresPlusSCD
tPostgresPlusSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tPostgresPlusSCD on page 41.

tPostgresPlusSCDELT
tPostgresPlusSCDELT
tPostgresPlusSCDELT belongs to two component families: Business Intelligence and Databases. For
more information on it, see tPostgresPlusSCDELT on page 43.

tPostgresqlBulkExec
tPostgresqlBulkExec
tPostgresqlBulkExec properties
tPostgresqlOutputBulk and tPostgresqlBulkExec components are used together to first output

compose the tPostgresqlOutputBulkExec component, detailed in a separate section. The interest
the data loading in the database.

Purpose As a dedicated component, tPostgresqlBulkExec offers gains in performance
while carrying out the Insert operations to a Postgresql database
connection tPostgrresqlConnection component on the
already defined.

Password

tPostgresqlBulkExec

created again.
created.
does not exist.
User Guide.
fields.
Usage This component is to be used along with tPostgresqlOutputBulk component.
Used together, they can offer gains in performance while feeding a Postgresql
database.
Limitation n/a
Related scenarios
For uses cases in relation with tPostgresqlBulkExec, see the following scenarios:

tPostgresqlCommit
tPostgresqlCommit
tPostgresqlCommit Properties
This component is closely related to tPostgresqlCommit and tPostgresqlRollback. It usually

Basic settings Component list Select the tPostgresqlConnection component in the
current job.
performed its task.

component level.
Usage This component is to be used along with Postgresql components, especially
with tPostgresqlConnection and tPostgresqlRollback components.
Limitation n/a
Related scenario
This component is closely related to tPostgresqlConnection and tPostgresqlRollback. It usually

doesn’t make much sense to use one of these without using a tPostgresqlConnection component
For tPostgresqlCommit related scenario, see tMysqlConnection on page 460.

tPostgresqlClose
tPostgresqlClose
tPostgresqlClose properties
Function tPostgresqlClose closes the transaction committed in the connected DB.

current Job.
component level.
with tPostgresqlConnection and tPostgresqlCommit.
Limitation n/a
Related scenario

tPostgresqlConnection
tPostgresqlConnection Properties


Password
data handling.
with tPostgresqlCommit and tPostgresqlRollback components.
Limitation n/a
Related scenario


For tPostgresqlConnection related scenario, see tMysqlConnection on page 460.

tPostgresqlInput
tPostgresqlInput
tPostgresqlInput properties
Component family Databases/

PostgreSQL
Function tPostgresqlInput reads a database and extracts fields based on a query.

Purpose tPostgresqlInput executes a DB query with a strictly defined order which
connection tPostgresqlConnection component.

Password

tPostgresqlInput

User Guide.
schema definition.
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a Postgresql
database.
Related scenarios


tPostgresqlOutput
tPostgresqlOutput
tPostgresqlOutput properties
Function tPostgresqlOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tPostgresqlOutput executes the action defined on the table and/or on the data
connection tPostgresqlConnection component.


tPostgresqlOutput

Password
created again.
created.
does not exist.
found, job stops.
ones.
if non existing
flow.

operation.
Studio User Guide.

tPostgresqlOutput
execution.
data.
component level.
out actions on a table or on the data of a table in a Postgresql database. It also
Related scenarios
For tPostgresqlOutput related topics, see:

476.

tPostgresqlOutputBulk
tPostgresqlOutputBulk properties
tPostgresqlOutputBulk and tPostgresqlBulkExec components are used together to first output

compose the tPostgresqlOutputBulkExec component, detailed in a separate section. The interest
the data loading.
Function Writes a file with columns based on the defined delimiter and the Postgresql
standards
Postgresql database.
Guide.
fields.
of the file
the file.
User Guide.
Guide.

DB data handling.
Usage This component is to be used along with tPostgresqlBulkExec component.
Used together they offer gains in performance while feeding a Postgresql
database.
Related scenarios
For uses cases in relation with tPostgresqlOutputBulk, see the following scenarios:

tPostgresqlOutputBulkExec
tPostgresqlOutputBulkExec
tPostgresqlOutputBulkExec properties

operations to a Postgresql database.
Password
fields.
Usage This component is mainly used when no particular tranformation is required on
the data to be loaded onto the database.
Related scenarios
For uses cases in relation with tPostgresqlOutputBulkExec, see the following scenarios:

tPostgresqlRollback
tPostgresqlRollback
tPostgresqlRollback properties
This component is closely related to tPostgresqlCommit and tPostgresqlConnection. It usually


current job.
task.
component level.
with tPostgresqlConnection and tPostgresqlCommit components.
Limitation n/a
Related scenario
This component is closely related to tPostgresqlConnection and tPostgresqlCommit. It usually

doesn’t make much sense to use one of them without using a tPostgresqlConnection component
For tPostgresqlRollback related scenario, see tMysqlRollback on page 498.

tPostgresqlRow
tPostgresqlRow
tPostgresqlRow properties
Function tPostgresqlRow is the specific component for the database query. It executes
output.
Purpose Depending on the nature of the query and the database, tPostgresqlRow acts
on the actual DB structure or on the data (although without handling data). The
connection tPostgresqlConnection component on the
already defined.

Password

tPostgresqlRow

User Guide.
schema definition.
SQL instruction.

DB data handling.
component level.

tPostgresqlRow
Related scenarios


tPostgresqlSCD
tPostgresqlSCD
tPostgresqlSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tPostgresqlSCD on page 46.

tPostgresqlSCDELT
tPostgresqlSCDELT
tPostgresqlSCDELT belongs to two component families: Business Intelligence and Databases. For
more information on it, see tPostgresPlusSCDELT on page 43.

tSASInput
tSASInput
Before being able to benefit from all functional objectives of the SAS components, make sure to install the
following three modules: sas.core.jar, sas.intrnet.javatools.jar and sas.svc.connection.jar in the path lib
> java in your Talend Open Studio directory. You can later verify, if needed whether the modules are
successfully installed through the Modules view of the Studio.
tSASInput properties
Component family Databases/SAS
Function tSASInput reads a database and extracts fields based on a query.

Purpose tSASInput executes a DB query with a strictly defined statement which must
correspond to the schema definition. Then it passes on the field list to the
component that follows via a Main row link.
Host name SAS server IP address.
Librefs Enter the directory name that holds the table to read
followed by its access path. For example:
“TpSas ‘C:/SAS/TpSas’”
Password
User Guide.
Table Name Enter the name of the table to read preceded by the
directory name that holds it. For example:
“TpSas.Customers”.

tSASInput
Query type The query can be built-in for a particular job or for
commonly used query, it can be stored in the
repository to ease the query reuse.
Query If your query is not stored in the Repository, type in
your DB query paying particularly attention to
schema definition.
level.
Usage This component covers all possibilities of SQL queries onto a database using
an SAS connection.
Related scenarios

• Scenario 1: Displaying selected data from DB table on page 253.
• Scenario 2: Using StoreSQLQuery variable on page 254.
page 1049.

tSASOutput
tSASOutput
Before being able to benefit from all functional objectives of the SAS components, make sure to install the
following three modules: sas.core.jar, sas.intrnet.javatools.jar and sas.svc.connection.jar in the path lib
> java in your Talend Open Studio directory. You can later verify, if needed whether the modules are
successfully installed through the Modules view of the Studio.
tSASOutput properties
Component family Databases/SAS
Function tSASOutput writes, updates, makes changes or suppresses entries in a data-

base.
Purpose tSASOutput executes the action defined on the table and/or on the data
contained in the table, based on the incoming flow from the preceding
connection tSASConnection component on the Component list

SAS URL Enter the URL to connect to the desired DB.
Driver JAR In the drop down list, select a desired available driver,
or download one from a local directory through
clicking the three-dot button.
Password
Table Name of the table to read.

tSASOutput
found, job stops.
ones.
if non existing
flow.

operation..
Clear data in table Select this check box to delete data in the selected
table before any operation.
Studio User Guide.
execution.

tSASOutput

altered or inserted as a new column.
data.
that the tSASOutput can use to place or replace the
component level.
out actions on a table or on the data of a table in a SAS database. It also allows
Related scenarios
For tJDBCOutput related topics, see:

476.

tSQLiteClose
tSQLiteClose
tSQLiteClose properties
Component family Databases/SQLite
Function tSQLiteClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tSQLiteConnection component in the list
current Job.
component level.
Usage This component is to be used along with SQLite components, especially with
tSQLiteConnection and tSQLiteCommit.
Limitation n/a
Related scenario

tSQLiteCommit
tSQLiteCommit
tSQLiteCommit Properties
This component is closely related to tSQLiteConnection and tSQLiteRollback. It usually does not
Function tSQLiteCommit validates the data processed through the Job into the
connected DB
current Job.
performed its task.

component level.
tSQLiteConnection and tSQLiteRollback.
Limitation n/a
Related scenario
This component is closely related to tSQLiteConnection and tSQLiteRollback. It usually does not
make much sense to use one of these without using a tSQLiteConnection component to open a
For tSQLiteCommit related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tSQLiteConnection
tSQLiteConnection
SQLiteConnection properties
This component is closely related to tSQLiteCommit and tSQLiteRollback. It usually does not
make much sense to use one of these without using a tSQLiteConnection to open a connection for
the current transaction.
Function tSQLiteConnection opens a connection to the database for a current

transaction.
level.
tSQLiteCommit and tSQLiteRollback.
Limitation n/a
Related scenarios
This component is closely related to tSQLiteCommit and tSQLiteRollback. It usually does not
make much sense to use one of these without using a tSQLiteConnection component to open a
For tSQLiteConnection related scenario, see tMysqlConnection on page 460.

tSQLiteInput
tSQLiteInput
tSQLiteInput Properties
Function tSQLiteInput reads a database file and extracts fields based on an SQL query.
As it embeds the SQLite engine, no need of connecting to any database server.
Purpose tSQLiteInput executes a DB query with a defined command which must
correspond to the schema definition. Then it passes on rows to the next
connection tSQLiteConnection component on the Component
defined.

Database Filepath to the SQLite database file.

tSQLiteInput

User Guide.
Query type The query can be built-in for a particular job or for
commonly used query, it can be stored in the
repository to ease the query reuse.
Query If your query is not stored in the Repository, type in
your DB query paying particularly attention to
schema definition.
columns.
component level.
Usage This component is standalone as it includes the SQLite engine. This is a
startable component that can iniate a data flow processing.
Scenario: Filtering SQlite data
This scenario describes a rather simple job which uses a select statement based on a filter to extract
rows from a source SQLite Database and feed an output SQLite table.
• Drop from the Palette, a tSQLiteInput and a tSQLiteOutput component from the Palette
• Connect the input to the output using a row main link.
• On the tSQLiteInput Basic settings, type in or browse to the SQLite Database input file.

tSQLiteInput
• The file contains hundreds of lines and includes an ip column which the select statement will
based on
• On the tSQLite Basic settings, edit the schema for it to match the table structure.
• In the Query field, type in your select statement based on the ip column.
• Select the right encoding parameter.
• On the tSQLiteOutput component Basic settings panel, select the Database filepath.

tSQLiteInput
• Type in the Table to be fed with the selected data.

• Select the Action on table and Action on Data. In this use case, the action on table is Drop
and create and the action on data is Insert.
• The schema should be synchronized with the input schema.
• Select the encoding and define the threshold to commit.
• Save the job and run it.
The queried data are returned in the defined SQLite file.

tSQLiteOutput
tSQLiteOutput
tSQLiteOutput Properties
Function tSQLiteOutput writes, updates, makes changes or suppresses entries in an

SQLite database. As it embeds the SQLite engine, no need of connecting to any
database server.
Purpose tSQLiteOutput executes the action defined on the table and/or on the data
defined.

Database Filepath to the Database file

tSQLiteOutput
created again.
created.
does not exist.
found, job stops.
ones.
if non existing
flow.

operation..
Studio User Guide.

tSQLiteOutput
execution.
data.
component level.
Usage This component must be connected to an Input component. It allows you to
carry out actions on a table or on the data of a table in an SQLite database. It
also allows you to create reject flows using a Row > Reject link to filter
erroneous data. For an example of tSQLiteOuput in use, see Scenario 3:
Retrieve data in error with a Reject link on page 483n of the tMySQLOutput
component.
Related Scenario
For scenarios related to tSQLiteOutput, see tSQLiteInput on page 642.

tSQLiteRollback
tSQLiteRollback
tSQLiteRollback properties
This component is closely related to tSQLiteCommit and tSQLiteConnection. It usually does not
Function tSQLiteRollback cancels the transaction committed in the connected DB.

current Job.
task.
component level.
tSQLiteConnection and tSQLiteCommit.
Limitation n/a
Related scenarios
For tSQLiteRollback related scenario, see Scenario: Rollback from inserting data in

tSQLiteRow
tSQLiteRow
tSQLiteRow Properties
Function tSQLiteRow executes the defined query onto the specified database and uses
the parameters bound with the column.
Purpose A prepared statement uses the input flow to replace the placeholders with the
values for each parameters defined. This component can be very useful for
updates.
defined.

User Guide.

tSQLiteRow

schema definition.
Die on error Clear this check box to skip the row on error and
complete the process for error-free rows.
SQL instruction.

DB data handling.
component level.
Scenario: Updating SQLite rows
This scenario describes a job which updates an SQLite database file based on a prepared statement
and using a delimited file.

tSQLiteRow
• Drop a tFileInputDelimited and a tSQLiteRow component from the Palette to the design
workspace.
• On the tFileInputDelimited Basic settings panel, browse to the input file that will be used
to update rows in the database.
• There is no header nor footer. The Row separator is a carriage return and the field separator
is a semi-colon.
• Edit the schema in case it is not stored in the Repository.
• Make sure the length and type are respectively correct and large enough to define the
columns.
• Then in the tSQLiteRow Basic settings panel, set the Database filepath to the file to be
updated.
• The schema is read-only as it is required to match the input schema.

tSQLiteRow
• Type in the query or retrieve it from the Repository. In this use case, we updated the type_os
for the id defined in the Input flow. The statement is as follows: 'Update download set
type_os=? where id=?'
• Then select the Prepared statement check box to display the placeholders’ parameter table.
• In the Input parameters table, add as many lines as necessary to cover all placeholders. In this
scenario, type_os and id are to be defined.
• Set the Commit every field and select the Encoding type in the list.
• Save the job and press F6 to run it.
The dowload table from the SQLite database is thus updated with new type_os code according to
the delimited input file.

tSybaseBulkExec
tSybaseBulkExec
tSybaseBulkExec Properties

operations to a Sybase database.
defined.

Database Database name
Password
Bcp Utility Name of the utility to be used to copy data over to the
Sybase server.
Server IP address of the database server for the Bcp utility
connection.
Batch size Number of lines in each processed batch.

tSybaseBulkExec

created again.
created.
does not exist.
Repository: You have already created and stored the
schema in the Repository, hence can be reused.
Related topic: Setting a Repository schema of Talend
Advanced settings Action on data On the data of the table defined, you can perform:
Bulk Insert: Add multiple entries to the table. If
duplicates are found, job stops.
Bulk Update: Make simultaneous changes to
multiple entries.
Field Terminator Character, string or regular expression to separate
fields.
Row Terminator String (ex: “\n” in Unix) to separate lines.
Head row Number of head lines to be ignored in the beginning
of a file.
DB data handling.
Sybase database:
to console,
to global variable.
tStataCatcher Select this check box to gather the job processing
statistics metadata at a job level as well as at each component
level.

tSybaseBulkExec

Limitation As opposed to the Oracle dedicated bulk component, no action on data is
possible using this Sybase dedicated component.
Related scenarios
For tSybaseBulkExec related topics, see:

490

tSybaseClose
tSybaseClose
tSybaseClose properties
Function tSybaseClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tSybaseConnection component in the list
current Job.
component level.
Usage This component is to be used along with Sybase components, especially with
tSybaseConnection and tSybaseCommit.
Limitation n/a
Related scenario

tSybaseCommit
tSybaseCommit
tSybaseCommit Properties
This component is closely related to tSybaseConnection and tSybaseRollback. It usually does not
Function tSybaseCommit validates the data processed through the Job into the
connected DB
current Job.
performed its task.

component level.
tSybaseConnection and tSybaseRollback.
Limitation n/a
Related scenario
This component is closely related to tSybaseConnection and tSybaseRollback. It usually does not
make much sense to use one of these without using a tSybaseConnection component to open a
For tSybaseCommit related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tSybaseConnection
tSybaseConnection
tSybaseConnection Properties
This component is closely related to tSybaseCommit and tSybaseRollback. It usually does not
make much sense to use one of these without using a tSybaseConnection component to open a
Function tSybaseConnection opens a connection to the database for a current

transaction.
Password
tSybaseCommit and tSybaseRollback.
Limitation n/a
Related scenarios
For tSybaseConnection related scenario, see Scenario: Inserting data in mother/daughter tables on
page 460.

tSybaseInput
tSybaseInput
tSybaseInput Properties
Function tSybaseInput reads a database and extracts fields based on a query.

Purpose tSybaseInput executes a DB query with a strictly defined order which must
defined.

Sybase Schema Exact name of the Sybase schema.

tSybaseInput

Password
User Guide.
Table Name Name of the table to read.
schema definition.
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a Sybase database.
Related scenarios


tSybaseIQBulkExec
tSybaseIQBulkExec
tSybaseIQBulkExec Properties
Component family Databases/Sybase IQ
Function tSybaseIQBulkExec uploads a bulk file in a Sybase IQ database.

operations to a Sybase IQ database.
defined.

DB Version Select the Sybase database version you are using.
Database Database name
Password

tSybaseIQBulkExec
created again.
created.
does not exist.
Local filename Name of the file to be processed.
Advanced settings Lines terminated by Character or sequence of characters used to separate
lines.
Field Terminated by Character, string or regular expression to separate
fields.
Use enclosed quotes Select this check box to use data enclosure characters.
Use fixed length Select this check box to set a fixed width for data
lines.
level.
Usage This dedicated component offers performance and flexibility of Sybase IQ DB
query handling.
Limitation As opposed to the Oracle dedicated bulk component, no action on data is
possible using this Sybase dedicated component.
Related scenarios
For tSybaseIQBulkExec related topics, see:

490

tSybaseIQOutputBulkExec
tSybaseIQOutputBulkExec properties
Component family Databases/Sybase IQ

operations to a Sybase IQ database.
defined.

Password

created again.
created.
does not exist.
Append the file select this check box to add the new rows at the end of
the records.
Advanced settings Fields terminated by Character, string or regular expression to separate
fields.
Lines terminated by Character or sequence of characters used to separate
lines.
Use enclose quotes Select this check box to use data enclosure characters.
Include Head Select this heck box to include the column header.
component level.
Limitation n/a
Related scenarios
For use cases in relation with tSybaseIQOutputBulkExec, see the following scenarios:


tSybaseOutput
tSybaseOutput
tSybaseOutput Properties
Function tSybaseOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tSybaseOutput executes the action defined on the table and/or on the data
defined.

Sybase Schema Exact name of the Sybase schema.

tSybaseOutput

Password
created again.
created.
does not exist.
value).
found, job stops.
ones.
if non existing
flow.

operation.

tSybaseOutput

Studio User Guide.
Advanced settings Commit every Enter the number of rows to be completed before
execution.
data.
component level.
out actions on a table or on the data of a table in a Sybase database. It also
Related scenarios
For use cases in relation with tSybaseOutput, see:


tSybaseOutput
476.

tSybaseOutputBulk
tSybaseOutputBulk
tSybaseOutputBulk properties
tSybaseOutputBulk and tSybaseBulkExec components are used together to first output the file
tSybaseOutputBulkExec component, detailed in a separate section. The interest in having two
loading.
Function Writes a file with columns based on the defined delimiter and the Sybase
standards
Sybase database.
Repository: Select the Repository file where you
have stored Properties. The fields that come after are
Guide.
of the file.
schema in the Repository, hence can be reused in
User Guide.
fields.
Include header Select this check box to include the column header in
the file.

tSybaseOutputBulk
DB data handling.
component level
Usage This component is to be used along with tSybaseBulkExec component. Used
together they offer gains in performance while feeding a Sybase database.
Related scenarios
For uses cases in relation with tSybaseOutputBulk, see the following scenarios:

tSybaseOutputBulkExec
tSybaseOutputBulkExec properties

operations to a Sybase database.
defined.

Password
Bcp utility Name of the utility to be used to copy data over to the
Sybase server.
Batch row number Number of lines in each processed batch.

created again.
created.
does not exist.
of the records.
Advanced settings Action on data On the data of the table defined, you can perform:
Bulk Insert: Add multiple entries to the table. If
duplicates are found, job stops.
Bulk Update: Make simultaneous changes to
multiple entries.
Field terminator Character, string or regular expression to separate
fields.
DB Row terminator String (ex: “\n”on Unix) to distinguish rows in the
DB.
First row Type in the number of the file row where the action
should start at.
FILE Row terminator Character, string or regular expression to separate
fields in a file.
Include Head Select this heck box to include the column header.
data handling.
Sybase database:
to console,
to global variable.

component level.
Limitation n/a
Related scenarios
For use cases in relation with tSybaseOutputBulkExec, see the following scenarios:

tSybaseRollback
tSybaseRollback
tSybaseRollback properties
This component is closely related to tSybaseCommit and tSybaseConnection. It usually does not
Function tSybaseRollback cancels the transaction committed in the connected DB.

current job.
task.
component level.
tSybaseConnection and tSybaseCommit.
Limitation n/a
Related scenarios
For tSybaseRollback related scenario, see Scenario: Rollback from inserting data in

tSybaseRow
tSybaseRow
tSybaseRow Properties
Function tSybaseRow is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tSybaseRow acts on
defined.

Sybase Schema Exact name of the sybase schema.
Password

tSybaseRow
value).
User Guide.
schema definition.
SQL instruction.

component level.

tSybaseRow
Related scenarios
For tSybaseRow related topics, see:


tSybaseSCD
tSybaseSCD
tSybaseSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see tSybaseSCD on page 59.

tSybaseSCDELT
tSybaseSCDELT
tSybaseSCDELT belongs to two component families: Business Intelligence and Databases. For more
information on it, see tSybaseSCDELT on page 61.

tSybaseSP
tSybaseSP
tSybaseSP properties
Function tSybaseSP calls the database stored procedure.

Purpose tSybaseSP offers a convenient way to centralize multiple or complex queries
defined.

Password

tSybaseSP

User Guide.
Is Function / Return Select this check box, if a value is to be returned.
Timeout Interval Maximum waiting time for the results of the stored
procedure.
there are parameters used in the procedure.
IN: Input parameter
(function).
data handling.
level.
Related scenarios
For related topic, see tMysqlSP Scenario: Finding a State Label using a stored procedure on page
512.

tTeradataClose
tTeradataClose
tTeradataClose properties
Component family Databases/Teradata
Function tTeradataClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tTeradataConnection component in the
current Job.
component level.
Usage This component is to be used along with Teradata components, especially with
tTeradataConnection and tTeradataCommit.
Limitation n/a
Related scenario

tTeradataCommit
tTeradataCommit
tTeradataCommit Properties
This component is closely related to tTeradataConnection and tTeradataRollback. It usually

Function tTeradataConnection validates the data processed through the job into the
connected DB.
Basic settings Component list Select the tTeradataConnection component in the
current job.
performed its task.

component level.
tTeradataConnection and tTeradataRollback components.
Limitation n/a
Related scenario
This component is closely related to tTeradataConnection and tTeradataRollback. It usually

doesn’t make much sense to use one of these without using a tTeradataConnection component to
For tTeradataCommit related scenario, see tVerticaConnection on page 713.

tTeradataConnection
tTeradataConnection
tTeradataConnection Properties
This component is closely related to tTeradataCommit and tTeradataRollback. It usually doesn’t

make much sense to use one of these without using a tTeradataConnection component to open a
Function tTeradataConnection opens a connection to the database for a current

transaction.
Password
Additional JDBC Specify additional connection properties in the
parameters existing DB connection, to allow specific character
set support. E.G.: CHARSET=KANJISJIS_OS to get
support of Japanese characters.
data handling.
component level.
Utilisation This component is to be used along with Mysql components, especially with
tTeradataCommit and tTeradataRollback components.
Limitation n/a

tTeradataConnection
Related scenario
This component is closely related to tTeradataCommit and tTeradataRollback. It usually doesn’t

make much sense to use one of these without using a tTeradataConnection component to open a
For tTeradataConnection related scenario, see tMysqlConnection on page 460.

tTeradataFastExport
tTeradataFastExport
tTeradataFastExport Properties
Component Family Databases/Teradata
Function tTeradataFastExport exports rapidly voluminous data batches from a

Teradata table or view.
Purpose tTeradataFastExport exports data batches from a Teradata table to a
cutsomer system or to a smaller database.
Execution platform Select the Operating System type you use.
Database name Database name.
Password
User Guide.
Script generated Browse your directory and select the destination of
folder the file which will be created.
Exported file Name and path to the file which will be created.
fields.
Error file Browse your directory and select the destination of
the file where the error messages will be recorded.
component level.

tTeradataFastExport
Related scenario

tTeradataFastLoad
tTeradataFastLoad
tTeradataFastLoad Properties
Function tTeradataFastLoad reads a database and extracts fields using queries.

Purpose tTeradataFastLoad executes a database query according to a strict order
which must be the same as the one in the schema. The retrieve list of fields is
then transfered to the next component, using a connexion flow (Main row).
Password
Execute Batch every Number of rows per batch to be loaded..
User Guide.
component level.

tTeradataFastLoad
Related scenario

tTeradataFastLoadUtility
tTeradataFastLoadUtility Properties
Function tTeradataFastLoadUtility reads a database and extracts fields using queries.

Purpose tTeradataFastLoadUtility executes a database query according to a strict
order which must be the same as the one in the schema. The retrieve list of
fields is then transfered to the next component, using a connexion flow (Main
row).
Password
User Guide.
Load file Browse your directory and select the file from which
you want to load data.
fields.


User Guide.
component level.
Related scenario

tTeradataInput
tTeradataInput
tTeradataInput Properties
Function tTeradataInput reads a database and extracts fields based on a query.

Purpose tTeradataInput executes a DB query with a strictly defined order which must
connection tTeradataConnection component on the
already defined.

Password

tTeradataInput

schema of Talend Open Studio User Guide
User Guide.
schema definition.
Advanced settings Additional JDBC Specify additional connection properties in the
parameters existing DB connection, to allow specific character
set support. E.G.: CHARSET=KANJISJIS_OS to get
columns.
component level.
Usage This component covers all possibilities of SQL queries onto a Teradata
database.
Related scenarios
Related topics in generic tDBInput scenarios:


tTeradataMultiLoad
tTeradataMultiLoad
tTeradataMultiLoad Properties
Function tTeradataMultiLoad reads a database and extracts fields using queries.

Purpose tTeradataMultiLoad executes a database query according to a strict order
which must be the same as the one in the schema. The retrieve list of fields is
then transfered to the next component, using a connexion flow (Main row).
Password
User Guide.

tTeradataMultiLoad
Action to data On the data of the table defined, you can perform:
found, job stops.
ones.
flow.

do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want
to set as primary key(s).
Where condition in Type in a condition, which, once verified, will delete
case Delete the row.
fields.
component level.
Related scenario

tTeradataOutput
tTeradataOutput
tTeradataOutput Properties
Function tTeradataOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tTeradataOutput executes the action defined on the table and/or on the data
already defined.


tTeradataOutput

Password
created again.
created.
does not exist.
Drop a table if exists and ceate: The table is
found, job stops.
ones.
if non existing
flow.

operation.
Studio User Guide.

tTeradataOutput
This is intended to allow specific character set
support. E.G.: CHARSET=KANJISJIS_OS to get
execution.
data.
component level.
out actions on a table or on the data of a table in a Teradata database. It also

tTeradataOutput
Related scenarios

476.

tTeradataRollback
tTeradataRollback
tTeradataRollback Properties
This component is closely related to tTeradataCommit and tTeradataConnection. It usually

Function tTeradataRollback cancels the transaction commit in the connected DB.

Purpose tTeradataRollback avoids to commit part of a transaction involuntarily.
Basic settings Component list Select the TeradataConnection component in the list
current job.
task.
component level.
Usage This component is to be used along with Teradata components, especially with
tTeradataConnection and tTeradataCommit components.
Limitation n/a
Related scenario
For tTeradataRollback related scenario, see Scenario: Rollback from inserting data in

tTeradataRow
tTeradataRow
tTeradataRow Properties
Function tTeradataRow is the specific component for this database query. It executes
output.
Purpose Depending on the nature of the query and the database, tTeradataRow acts on
already defined.

Password

tTeradataRow

User Guide.
schema definition.
This is intended to allow specific character set
support. E.G.: CHARSET=KANJISJIS_OS to get
SQL instruction.


tTeradataRow

component level.
Related scenarios


tTeradataTPump
tTeradataTPump
tTeradataTPump Properties
Function tTeradataTPumpreads a database and extracts fields using queries.

Purpose tTeradataTPump executes a database query according to a strict order which
must be the same as the one in the schema. The retrieve list of fields is then
transfered to the next component, using a connexion flow (Main row).
Password
User Guide.

tTeradataTPump
Action to data On the data of the table defined, you can perform:
found, job stops.
ones.
flow.

do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want
to set as primary key(s).
Where condition in Type in a condition, which, once verified, will delete
case Delete the row.
fields.
component level.
Related scenario

tVerticaBulkExec
tVerticaBulkExec
tVerticaBulkExec Properties
tVerticaOutputBulk and tVerticaBulkExec components are used together to first output the file
tVerticaOutputBulkExec component, detailed in a separate section. The interest in having two
Component family Databases/Vertica

Purpose As a dedicated component, tVerticaBulkExec offers gains in performance
while carrying out the Insert operations to a Mysql database
connection tVerticaConnection component.

Password

tVerticaBulkExec
created again.
created.
does not exist.
schema number of fields to be processed and passed on to the
User Guide.
Remote Filename Name of the file to be processed.
Advanced settings Write to ROS (Read Select this check box to store the data in a physical
Optimized Store) storage area, in order to optimize the reading, as the
data is compressed and pre-sorted.
Exit job if no row was The Job automatically stops if no row has been
loaded loaded.
fields.
component level.
Usage This component is to be used along with tVerticaOutputBulk component.
Used together, they can offer gains in performance while feeding a Vertica
database.
Related scenarios


tVerticaBulkExec
du composant.

tVerticaClose
tVerticaClose
tVerticaClose properties
Function tVerticaClose closes the transaction committed in the connected DB.

Basic settings Component list Select the tVerticaConnection component in the list
current Job.
component level.
Usage This component is to be used along with Vertica components, especially with
tVerticaConnection and tVerticaCommit.
Limitation n/a
Related scenario

tVerticaCommit
tVerticaCommit
tVerticaCommit Properties
This component is closely related to tVerticaConnection and tVerticaRollback. It usually doesn’t

Function tVerticaConnection validates the data processed through the job into the
connected DB.
Basic settings Component list Select the tVerticaConnection component in the list
current job.
performed its task.

component level.
tVerticaConnection and tVerticaRollback components.
Limitation n/a
Related scenario
This component is closely related to tVerticaConnection and tVerticaRollback. It usually doesn’t

make much sense to use one of these without using a tVerticaConnection component to open a
For tVerticaCommit related scenario, see tVerticaConnection on page 713.

tVerticaConnection
tVerticaConnection
tVerticaConnection Properties
This component is closely related to tVerticaCommit and tVerticaRollback. It usually doesn’t

Function tVerticaConnection opens a connection to the database for a current

transaction.
Password
data handling.
component level.
Utilisation This component is to be used along with Mysql components, especially with
tVerticaCommit and tVerticaRollback components.
Limitation n/a

tVerticaConnection
Related scenario
This component is closely related to tVerticaCommit and tVerticaRollback. It usually doesn’t

For tVerticaConnection related scenario, see tMysqlConnection on page 460.

tVerticaInput
tVerticaInput
tVerticaInput Properties
Function tVerticaInput reads a database and extracts fields based on a query.

Purpose tVerticaInput executes a DB query with a strictly defined order which must

Password

tVerticaInput

User Guide.
schema definition.
columns.
component level.
Utilisation This component covers all possibilities of SQL queries onto a Vertica database.
Related scenarios


tVerticaOutput
tVerticaOutput
tVerticaOutput Properties
Function tVerticaOutput writes, updates, makes changes or suppresses entries in a

database.
Purpose tVerticaOutput executes the action defined on the table and/or on the data

Password

tVerticaOutput

created again.
created.
does not exist.
found, job stops.
ones.
if non existing
flow.

operation.
Studio User Guide.

tVerticaOutput
Advanced settings Encoding Select the encoding type from the list or select
Use Batch Mode Select this check box in order to set the number of
lines in each processed batch, in the Batch Size field.
data.
component level.
out actions on a table or on the data of a table in a Vertica database. It also
Related scenarios
For tVerticaOutput related topics, see:


tVerticaOutput
476.

tVerticaOutputBulk
tVerticaOutputBulk
tVerticaOutputBulk Properties
tVerticaOutputBulk and tVerticaBulkExec components are used together to first output the file
tVerticaOutputBulkExec component, detailed in a separate section. The interest in having two
loading.
Function tVerticaBulkOutputExec writes a file with columns based on the defined

delimiter and the Vertica standards.
Purpose tVerticaBulkOutputExec prepares the file to be used as parameter in the
INSERT query to feed the Vertica database.
Guide
of the file.
User Guide.
Guide.
fields.
the file.

tVerticaOutputBulk
DB data handling.
component level.
Utilisation This component is to be used along with tVerticaBulkExec component. Used
together they offer gains in performance while feeding a Vertica database.
Related scenarios
For uses cases in relation with tVerticaOutputBulk, see the following scenarios:

tVerticaOutputBulkExec
tVerticaOutputBulkExec Properties
Function tVerticaOutputBulkExec executes the Insert action on the data provided.

operations to a Vertica database.
Basic settings Property Type Either Built-in or Repository.
DB Name Name of the database
Password
created again.
created.
does not exist.
locally for this component only. Related topic: Setting
a built-in schema of Talend Open Studio User
Guide.
Guide.


of the file
Advanced settings Write to ROS (Read Select this check box to store the data in a physical
Optimized Store) storage area, in order to optimize the reading, as the
data is compressed and pre-sorted.
Exit job if no row was The Job automatically stops if no row has been
loaded loaded.
Field Separator Character, string or regular expression to separate
fields.
the file.
DB data handling.
component level.
Limitation n/a
Related scenarios
For uses cases in relation with tVerticaOutputBulkExec, see the following scenarios:

tVerticaRollback
tVerticaRollback
tVerticaRollback Properties
This component is closely related to tVerticaCommit and tVerticaConnection. It usually doesn’t

Function tVerticaRollback cancels the transaction commit in the connected DB.

Purpose tVerticaRollback avoids to commit part of a transaction involuntarily.
Basic settings Component list Select the VerticaConnection component in the list if
job.
task.
component level.
tVerticaConnection and tVerticaCommit components.
Limitation n/a
Related scenario
For tVerticaRollback related scenario, see Scenario: Rollback from inserting data in

tVerticaRow
tVerticaRow
tVerticaRow Properties
Function tVerticaRow is the specific component for this database query. It executes the
output.
Purpose Depending on the nature of the query and the database, tVerticaRow acts on
connection tVerticaConnection component on the Component
defined.

Password

tVerticaRow

User Guide.
Table name Name of the table to process.
schema definition.
SQL instruction.

DB data handling.
component level.

tVerticaRow
Related scenario


ELT components
This chapter details the main components that you can find in the ELT family of the Talend Open
Studio Palette.
The ELT family groups together the most popular database connectors and processing components,
all dedicated to the ELT mode where the target DBMS becomes the transformation engine.
This mode supports all of the most popular databases including Teradata, Oracle, Vertica, Netezza,
Sybase, etc.
ELT components
tELTAggregate
tELTAggregate
tELTAggregate properties
Component family ELT
Function tELTAggregate collects data values from one or more columns with the
intent to manage the collection as a single unit. This component has
real-time capabilities since it runs the data transform on the DBMS itself.
Purpose Helps to provide a set of matrix based on values or calculations.
Basic settings Component list Select the relevant DB connection component in
the list if you use more than one connection in the
current Job.
Source table name Name of the table holding the data you want to
collect values from.
Target table name Name of the table you want to write the collected
and transformed data in.
Built-in: You create and store the schema locally
Guide.
reuse it in various projects and job flowcharts.
Operations Select the type of operation along with the value
to use for the calculation and the output field.
Output Column: Select the destination field in
the list.
Function: Select any of the following operations
to perform on data: count, min, max, avg, sum,
and count (distinct).
Input column position: Select the input column
from which you want to collect the values to be
aggregated.
Group by Define the aggregation sets, the values of which
will be used for calculations.

ELT components
tELTAggregate
Output Column: Select the column label in the

list offered according to the schema structure you
defined. You can add as many output columns as
you wish to make more precise aggregations.
Input Column position: Match the input column
label with your output columns, in case the output
label of the aggregation set needs to be different.
component level.
SQL Template SQL Template List To add a default system SQL template:
Click the Add button to add the default system
SQL template(s) in the SQL Template List.
Click in the SQL template field and then click the
arrow to display the system SQL template list.
Select the desired system SQL template provided
by Talend.
Note: You can create your own SQL template and
add them to the SQL Template List.
To create a user-defined SQL template:

-Select a system template from the SQL
Template list and click on its code in the code
box. You will be prompted by the system to create
a new template.
-Click Yes to open the SQL template wizard.
-Define your new SQL template in the
corresponding fields and click Finish to close the
wizard. An SQL template editor opens where you
can enter the template code.
-Click the Add button to add the new created
template to the SQL Template list.
For more information, see Using SQL Templates
Usage This component is used as an intermediate component with other
relevant DB components, especially the DB connection and commit
components.
Limitation n/a
Scenario: Filtering and aggregating table columns directly on the

DBMS
The following Java scenario creates a Job that opens a connection to a Mysql database and:
• instantiates the schemas from a database table whose rows match the column names
specified in the filter,
• filters a column in the same database table to have only the data that matches a WHERE
clause,

ELT components
tELTAggregate
• collects data grouped by specific value(s) from the filtered column and writes aggregated
data in a target database table.
To filter and aggregate database table columns:
tELTMysqlconnection, tELTFilterColumns, tELTFilterRows, tELTAggregae,
tELTCommit, and tELTRollback.
• Connect the five first components using OnComponentOk links.
• Connect tELTAggregate to tELTRollback using an OnComponentError link.
• In the design workspace, select tMysqlConnection and click the Component tab to define
the basic settings for tMysqlConnection.
• In the Basic settings view, set the database connection details manually or select them from
the context variable list, through a Ctrl+Space click in the corresponding field if you have
stored them locally as Metadata DB connection entries.
For more information about Metadata, see Defining Metadata items of Talend Open Studio User
Guide.
• In the design workspace, select tELTFilterColumns and click the Component tab to define
its basic settings.

ELT components
tELTAggregate
• On the Database type list, select the relevant database.

• On the Component list, select the relevant database connection component if more than one
connection is used.
• Enter the names for the database, source table, and target table in the corresponding fields
and click the three-dot buttons next to Edit schema to define the data structure in the source
and target tables.
When you define the data structure for the source table, column names automatically
appear in the Column list in the Column filters panel.
In this scenario, the source table has five columns: id, First_Name, Last_Name, Address, and
id_State.
• In the Column filters panel, set the column filter by selecting the check boxes of the
columns you want to write in the source table.
In this scenario, the tELTFilterColumns component instantiates only three columns: id,
First_Name, and id_State from the source table.
In the Component view, you can click the SQL Template tab and add system SQL
templates or create your own and use them within your Job to carry out the coded
operation. For more information, see tELTFilterColumns Properties on page 739.
• In the design workspace, select tELTFilterRows and click the Component tab to define its
basic settings.

ELT components
tELTAggregate

connection is used.
and target tables.
In this scenario, the source table has the three initially instantiated columns: id, First_Name, and
id_State and the source table has the same three-column schema.
• In the Where condition field, enter a WHERE clause to extract only those records that fulfill
the specified criterion.
In this scenario, the tELTFilterRows component filters the First_Name column in the source table
to extract only the first names that contain the “a” letter.
• In the design workspace, select tELTAggregate and click the Component tab to define its
basic settings.
connection is used.
and target tables.
The schema for the source table consists of the three columns: id, First_Name, and id_State. The
schema for the target table consists of two columns: customers_status and customers_number. In
this scenario, we want to group customers by their marital status and count customer number in each
marital group. To do that, we define the Operations and Group by panels accordingly.

ELT components
tELTAggregate
• In the Operations panel, click the plus button to add one or more lines and then click in the
Output column line to select the output column that will hold the counted data.
• Click in the Function line and select the operation to be carried on.
• In the Group by panel, click the plus button to add one or more lines and then click in the
Output column line to select the output column that will hold the aggregated data.
• In the design workspace, select tELTCommit and click the Component tab to define its
basic settings.
connection is used.
• Do the same for tELTRollback.
A two-column table aggregate_customers is created in the database. It groups customers according
to their marital status and count customer number in each marital group.

ELT components
tELTAggregate

ELT components
tELTCommit
tELTCommit
tELTCommit properties
This component is closely related to tELTRollback and to the ELT connection component for the
database you work with. tELTCommit, tELTRollback and the ELT database connection
component are usually used together in a transaction.
Component family Databases/ELT
Function tELTCommit validates the data processed in a Job in a specified database.

Purpose Using a single connection, this component commits a global action in one go
instead of doing so for every row or every batch of rows, separately. This
provides a gain in performance.
Basic settings Database Type Select the database type you want to connect to from
the list.
Component list Select the ELT database connection component in the
list if more than one connection is required for the
current Job.
Close connection Clear this check box to continue to use the selected
task.
level.
Click the Add button to add the default system SQL
template(s) in the SQL Template List.
arrow to display the system SQL template list. Select
the desired system SQL template provided by Talend.
Note: You can create your own SQL template and add
them to the SQL Ttemplate List.

-Select a system template from the SQL Template
list and click on its code in the code box. You will be
prompted by the system to create a new template.
-Define your new SQL template in the corresponding
fields and click Finish to close the wizard. An SQL
template editor opens where you can enter the
template code.
-Click the Add button to add the new created template
to the SQL Template list.
For more information, see Using SQL Templates of
Usage This component is to be used with ELT components, especially with
tELTRollback and the relevant database connection component.

ELT components
tELTCommit
Limitation n/a
Related scenario
This component is closely related to tELTRollback and to the ELT connection component
depending on the database you are working with. It usually does not make much sense to use ELT
components without using the relevant ELT database connection component as its purpose is to open
a connection for a transaction.
For more information on tELTCommit, see Scenario: Filtering and aggregating table columns
directly on the DBMS on page 731.

ELT components
tELTFilterColumns
tELTFilterColumns
tELTFilterColumns Properties
Function tELTFilterColumns makes specified changes to the defined schema of

the database table based on column name mapping. This component has
real-time capabilities since it runs the data filtering on the DBMS itself
Purpose Helps homogenize schemas by reorganizing, deleting or adding new
columns.
Basic settings Database Type Select the type of database you want to work on
from the drop-down list.
Component list Select the relevant DB connection component in
the list if you use more than one connection in the
current Job.
filter.
Target table name Name of the table you want to write the filtered
data in.
Guide.
Column Filters In the table, click the Filter check box to filter all
of the columns. To select specific columns for
filtering, select the check box(es) which
correspond(s) to the column name(s).
component level.

ELT components
tELTFilterColumns
SQL Template SQL Template List To add a default system SQL Template:
by Talend.
Note: You can create your own SQL templates
and add them to the SQL Template List.
To create a user-defined SQL list:

a new template.
-Click Yes to open the SQL Template wizard.
Usage This component is used as an intermediary component with other
relevant DB components, especially DB connection components.
Limitation n/a
Related Scenario
For a related scenario, see Scenario: Filtering and aggregating table columns directly on the DBMS
on page 731.

ELT components
tELTFilterRows
tELTFilterRows
tELTFilterRows Properties
Function tELTFilterRows allows you to define a row filter on one table. This
component has real-time capabilities since it runs the data filtering on the
DBMS itself.
Purpose Helps to set row filters for any given data source, based on a WHERE
clause.
Basic settings Database Type Select the type of database you want to work on
from the drop down list.
Component list Select the relevant DB connection component in
the list if you are using more than one connection
in the current Job.
filter.
Target table name Name of the table you want to write the filtered
data in.
Guide.
Where condition Use a WHERE clause to set the criteria that you
want the rows to meet.
You can use the WHERE clause to select specific
rows from the table that match specified criteria
or conditions.
component level.

ELT components
tELTFilterRows
by Talend.
Note: You can create your own SQL template and
add them to the SQL Template List.

a new template.
Usage This component is used as an intermediary component with other DB
components, particularly DB connection components.
Limitation n/a
Related Scenario
For a related scenario, see Scenario: Filtering and aggregating table columns directly on the DBMS
on page 731.

ELT components
tELTJDBCInput
tELTJDBCInput
tELTJDBCInput properties
The three ELT JDBC components are closely related, in terms of their operating conditions. These
components should be used to handle JDBC DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
Function Provides the table schema to be used for the SQL statement to execute.
Purpose Allows you to add as many Input tables as required for the most complicated
Insert statement.
Schema nature and number of fields to be processed. The
schema is either built-in or remotely stored in the
Repository. The Schema defined is then passed on to
the ELT Mapper to be included to the Insert SQL
statement.
Click Edit Schema to modify the schema. Note that
if you make the modifcation, the schema switches
automatically to the Built-in mode.
User Guide.
Default Table Name Type in the default table name.
Default Schema Name Type in the default schema name.
level.
Usage tELTJDBCInput is to be used along with the tELTJDBCMap. Note that the
Output link to be used with these components must correspond strictly to the
syntax of the table name
Note that the ELT components do not handle actual data flow but only
schema information.
Related scenarios
For uses cases in relation with tELTJDBCInput, see tELTMysqlMap scenarios:

• Scenario1: Aggregating table columns and filtering on page 756
• Scenario 2: ELT using Alias table on page 759

ELT components
tELTJDBCMap
tELTJDBCMap
tELTJDBCMap properties
Function Helps to graphically build the SQL statement using the table provided as input.
Purpose Uses the tables provided as input, to feed the parameter in the built statement.
The statement can include inner or outer joins to be implemented between
tables or between one table and its aliases.
Basic settings Use an existing Select this check box and select the appropriate
connection Connection component from the Component list if
you want to re-use connection parameters that you
have already defined.

ELT JDBC Map The ELT Map editor allows you to define the output
Editor schema and make a graphical build of the SQL
statement to be executed.
Style link Select the way in which links are displayed. Select
Line, for optimal performance.

ELT components
tELTJDBCMap

Password
DB data handling.
level.
Usage tELTJDBCMap is used along with a tELTJDBCInput and
tELTJDBCOutput. Note that the Output link to be used with these
components must correspond strictly to the syntax of the table name.
schema information.
Related scenario:
For related scenarios, see tELTJDBCMap scenarios:

• Scenario1: Aggregating table columns and filtering on page 756.
• Scenario 2: ELT using Alias table on page 759.

ELT components
tELTJDBCOutput
tELTJDBCOutput
tELTJDBCOutput properties
Function Carries out the action on the table specified and inserts the data according to
the output schema defined the ELT Mapper.
Purpose Executes the SQL Insert statement to the JDBC database
Basic settings Action on data On the data of the table defined, you can perform the
following operation:
found, Job stops.
Note that in JDBC ELT, only Insert operation is
available.
User Guide.
level.
Usage tELTJDBCOutput is to be used along with the tELTJDBCMap. Note that
the Output link to be used with these components must correspond strictly to
the syntax of the table name.
schema information.

ELT components
tELTJDBCOutput
Related scenarios
For uses cases in relation with tELTJDBCOutput, see tELTMysqlMap scenarios:


ELT components
tELTMSSqlInput
tELTMSSqlInput
tELTMSSqlInput properties
The three ELT MSSql components are closely related, in terms of their operating conditions. These
components should be used to handle MSSql DB schemas to generate Insert statements, including
Insert statement.
statement.
User Guide.
level.
Usage tELTMySSqlInput is to be used along with the tELTMSSsqlMap. Note that
the syntax of the table name
schema information.
Related scenarios
For uses cases in relation with tELTMSSqlInput, see tELTMysqlMap scenarios:


ELT components
tELTMSSqlMap
tELTMSSqlMap
tELTMSSqlMap properties
Function Helps you to build the SQL statement graphically, using the table provided as
input.

ELT MSSql Map The ELT Map editor allows you to define the output

ELT components
tELTMSSqlMap

Password
data handling.
level.
Usage tELTMSSqlMap is used along with a tELTMSSqlInput and
tELTMSSqlOutput. Note that the Output link to be used with these
schema information.
Related scenario:
For related scenarios, see tELTMysqlMap scenarios:


ELT components
tELTMSSqlOutput
tELTMSSqlOutput
tELTMSSqlOutput properties
Purpose Executes the SQL Insert statement to the MSSql database
found, Job stops.
Note that in MSSql ELT, only Insert operation is
available.
User Guide.
level.
Usage tELTMSSqlOutput is to be used along with the tELTMSSqlMap. Note that
schema information.

ELT components
tELTMSSqlOutput
Related scenarios
For uses cases in relation with tELTMSSqlOutput, see tELTMysqlMap scenarios:


ELT components
tELTMysqlInput
tELTMysqlInput
tELTMysqlInput properties
The three ELT Mysql components are closely related, in terms of their operating conditions. These
components should be used to handle Mysql DB schemas to generate Insert statements, including
Insert statement.
statement.
User Guide.
Usage tELTMysqlInput is to be used along with the tELTMysqlMap. Note that the
syntax of the table name
schema information.
Related scenarios
For uses cases in relation with tELTMysqlInput, see tELTMysqlMap scenarios:


ELT components
tELTMysqlMap
tELTMysqlMap
tELTMysqlMap properties
Password
data handling.
Preview The preview is an instant shot of the Mapper data. It
becomes available when Mapper properties have
been filled in with data. The preview synchronization
takes effect only after saving changes.
Map editor The ELT Map editor allows you to define the output
schema as well as build graphically the SQL
Usage tELTMysqlMap is used along with a tELTMysqlInput and
tELTMysqlOutput. Note that the Output link to be used with these
schema information.
Connecting ELT components

The ELT components do not handle any data as such but table schema information that will be
used to build the SQL query to execute.
Therefore the only connection required to connect these components together is a simple link.

ELT components
tELTMysqlMap
The output name you give to this link when creating it should always be the exact
name of the table to be accessed as this parameter will be used in the SQL
statement generated.
Related topic: Link connection of Talend Open Studio User Guide.
Mapping and joining tables

In the ELT Mapper, you can select specific columns from input schemas and include them in the
output schema.
• As you would do it in the regular Mapper editor, simply drag & drop the content from
the input schema towards the output table defined.
• Use the Ctrl and Shift keys for multiple selection of contiguous or non contiguous table
columns.
You can implement explicit joins to retrieve various data from different tables.
• Click on the Join drop-down list and select the relevant explicit join.
• Possible joins include: Inner Join, Left Outer Join, Right Outer Join or Full Outer
Join and Cross Join.
• By default the Inner Join is selected.
You can also create Alias tables to retrieve various data from the same table.
• In the Input area, click on the plus [+] button to create an Alias.
• Define the table to base the alias on.
• Type in a new name for the alias table, preferably not the same as the main table.
Adding where clauses

You can also restrict the Select statement based on a Where clause. Click on the Add filter row
button at the top of the output table and type in the relevant restriction to be applied.
Make sure that all input components are linked correctly to the ELT Map component to be able
to implement all inclusions, joins and clauses.
Generating the SQL statement

The mapping of elements from the input schemas to the output schemas create instantly the
corresponding Select statement.
The clause are also included automatically.

ELT components
tELTMysqlMap
Scenario1: Aggregating table columns and filtering
This scenario describes a Job gathering together several Input DB table schemas and implementing
a clause to filter the resulting output using an SQL statement.
• Drop the following components: tELTMysqlIntput, tELTMysqlMap,

tELTMysqlOutput from the Palette to the design workspace.
• Three input components are required for this job.
• Connect the three ELT input components to the ELT mapper using links named following
strictly the actual DB table names: owners, cars and resellers.
• Then connect the ELT mapper to the ELT Output component using another link that you call
results.
• All three Input schemas are stored in the Metadata area of the Repository. They can
therefore be easily retrieved.
• Click on the ELT mapper component to define the Database connection details.
• The Database connection details are stored in the Repository again

ELT components
tELTMysqlMap
• The default encoding for Mysql database is retained.

• Launch the ELT Map editor to set up the join between Input tables
• Drag & drop the ID_Owner column from the Owners table to the corresponding column of
the cars table.
• Select INNER JOIN in the Cars table join list, and select the Explicit Join check box, in
front of the ID_Owners.
• Drag the ID_Resellers column from the Cars table to the Resellers table to set up the second
join. Select here again INNER JOIN in the list of the Resellers table and select the Explicit
Join check box of the relevant column.
• Then select the columns to be aggregated into the output.
• Select all columns from the Cars and Owners table and only the Reseller_Name and City
columns from the Resellers table.
• Drag & drop them to the Results output table.
• The mapping displays in yellow and the joins display in dark violet.
• Click on the Generated SQL Select query tab to display the corresponding SQL Statement.

ELT components
tELTMysqlMap
• Then implement a filter on the output table.

• Click on the Add filter row button of the output table.
• Restrict the Select using a Where clause such as: resellers.City ='West Coast City'
• See the reflected where clause on the Generated SQL Select query tab.
• Click OK to save the ELT Map setting.

• Define the ELT Output in the Component view of the tELTMysqlOutput component.
• The Action on table is Drop and create table for this use case and the only action available
on data in MySQL is Insert.
• The schema is to be synchronized with the tELTMysqlMap component as it aggregates
several source schemas.

ELT components
tELTMysqlMap
All selected data are inserted in the results table as specified in the SQL statement defined respecting
the clause.
Scenario 2: ELT using Alias table
This scenario describes a Job that uses an Alias table. The employees table contains all employees
details as well as the ID of their respective manager, which are also considered as employees and
hence included in the employees table. The dept table contains location and department information
about the employees.
• Drag and drop tELTMysqlInput components to retrieve respectively the employees and
dept table schemas.
• In this use case, both schemas are stored in the Repository and can therefore be easily
retrieved.
• Then select the tELTMysqlMap and define the Mysql database connection details.
• Here again the connection information is stored in the Repository’s Metadata.

ELT components
tELTMysqlMap
• Click on the button to launch the ELT Map editor.

• First make sure the correct input table is positioned at the top of the Input area, as the Joins
are highly dependent on this position.
• In this example, the employees table should be on top.
• Drag and drop the DeptNo column from the employees table to the dept table to set up the
join between both input tables.
• Select the Explicit Join check box and define the join as an Inner Join.
• Then create the Alias table based on the employees table

ELT components
tELTMysqlMap
• Name it Managers and click OK to display it as a new Input table in the ELT mapper.
• Drag & drop the ID column from the employees table to the ID_Manager column of the
newly added Managers table.
• Select the Explicit Join check box and define it as Left Outer Join, in order for results to
be output even though they contain a Null value.
• Drag and drop the content of both Input tables, employees and dept, as well as the Name
column from the Manager table to the Output table.
• Click on the Generated SQL Select query tab to display the query to be executed.
• Then click on the Output component and define the Action on data on Insert.
• Make sure the schema is synchronized with the Output table from the ELT mapper before
running the Job through F6 or via the toolbar.

ELT components
tELTMysqlMap
The Department information as well as the Employees entries are coupled in the output, and the
Manager Name could be retrieved via the explicit join.

ELT components
tELTMysqlOutput
tELTMysqlOutput
tELTMysqlOutput properties
Purpose Executes the SQL Insert statement to the Mysql database
Basic settings Action on table On the table defined, you can perform one of the
In Java, use None: No operation carried out
tCreateTable Drop and create the table: The table is removed and
as substitute created again
for this function. Create a table: The table doesn’t exist and gets
created. If the table exists, an error is generated and
the Job is stopped.
Create table if doesn’t exist: Create the table if
needed and carries out the action on data anyway.
Clear a table: The table content is deleted
found, Job stops.
Note that in Mysql ELT, only Insert operation is
available.
User Guide.
data handling.

ELT components
tELTMysqlOutput
Usage tELTMysqlOutput is to be used along with the tELTMysqlMap. Note that

schema information.
Related scenarios
For uses cases in relation with tELTMysqlOutput, see tELTMysqlMap scenarios:


ELT components
tELTOracleInput
tELTOracleInput
tELTOracleInput properties
The three ELT Oracle components are closely related, in terms of their operating conditions. These
components should be used to handle Oracle DB schemas to generate Insert statements, including
Insert statement.
statement.
Studio User Guide.
Default Table Name Specify the name of the default table you need to
retrieve.
Default Schema Name Specify the name of the schema you retrieve the
Java only default table from.
level.
Usage tELTOracleInput is to be used along with the tELTOracleMap. Note that the
Output link to be used with these components must must correspond strictly to
the syntax of the table name
schema information.
Related scenarios
For uses cases in relation with tELTOracleInput, see tELTOracleMap Scenario: Updating
Oracle DB entries on page 768.

ELT components
tELTOracleMap
tELTOracleMap
tELTOracleMap properties
components should be used to handle Oracle DB schemas to generate Insert statements, including
connection tOracleConnection component from the
Component list if you want to re-use connection
parameters that you have already defined
Java only
ELT Oracle Map The ELT Map editor allows you to define the output
Style link Auto: By default, the links between the input and
output schemas and the Web service parameters are in
the form of curves.
Curves: Links between the schema and the Web
service parameters are in the form of curve.
Lines: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

ELT components
tELTOracleMap

Connection type Drop-down list of the available drivers.
Java only
Password
data handling.
Mapping Automatically set mapping parameter.
Parameters connection you are creating. This option is not
Java only
Java only area to help you optimize a query’s execution. In this
area, parameters are:
/*+ */.
SQL statement.
use.
level.
Usage tELTOracleMap is used along with a tELTOracleInput and
tELTOracleOutput. Note that the Output link to be used with these
schema information.

For detailed information regarding ELT component connections, see Connecting ELT
components on page 754.

output schema.
For detailed information regarding the table schema mapping and joining, see Mapping and
joining tables on page 767.

ELT components
tELTOracleMap
Adding where clauses

For details regarding the clause handling, see Adding where clauses on page 768.

The clause defined internally in the ELT Mapper are also included automatically.
Scenario: Updating Oracle DB entries
This scenario relies on the Job described in ELT MySQL components, Scenario1: Aggregating table
columns and filtering on page 756. As the update action on the data is available in Oracle DB, this
scenario describes a Job updating particular entries of the results table, adding a model to the make
column of the cars table.
• Define all three Input components as described in Scenario1: Aggregating table columns
and filtering on page 756.
• When connecting the ELT Input components to the ELT mapper, make sure you use the
relevant table names as these table names will be used as parameters in the SQL statement
generated in the ELT mapper.
• Remove the additional clause used to filter the output columns.
• Add a new filter row to the output table in the ELT mapper to setup a relationship between
input and output tables: owners.ID_OWNER=results.ID_OWNER

ELT components
tELTOracleMap
• Remove also all the columns unused for the Update action on the output table.
• Then apply the update to the Make column adding the mention C-Class preceding by a
double-pipe.
• And also add the mention Sold by in front of the reseller name column from the resellers
table.
• Check the Generated SQL select query to be executed.
• Click OK to validate the changes in the ELT mapper. And make sure the Oracle DB
connection details are filled in the tELTOracleMap component Basic settings.
• Select the tELTOracleOutput component to define the Action on data to be carried out.
• There is no action on the table, and the Action on data is set to Update.
• Check that the Schema type corresponds to the output table from the ELT Mapper.
• In the Where clause area, add an additional clause: results.MAKE= ‘Mercedes’.
• Then press F6 to run the Job and check the results table in a DB viewer.

ELT components
tELTOracleMap
The Job executes the query generated and updates the relevant rows.

ELT components
tELTOracleOutput
tELTOracleOutput
tELTOracleOutput properties
components should be used to handle Oracle DB schemas to generate Insert, Update or Delete
statements, including clauses, which are to be executed in the DB output table defined.
Purpose Executes the SQL Insert statement to the Mysql database
tCreate Table Drop and create the table: The table is removed and
as substitute for this a new one is created
function. Create a table: The table doesn’t exist and gets
the Job is stopped.
Create table if doesn’t exist: Creates the table if
The MERGE Action on data On the data of the table defined, you can perform the
option is following operation:
available in Insert: Add new entries to the table. If duplicates are
Java only found, the Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the
entry flow.:
MERGE: Updates or adds data to the table.
The options available for the MERGE opera-
tion are different to those available for the In-
sert, Update or Delete operations
Studio User Guide.

ELT components
tELTOracleOutput
Where clauses for (for Enter the clause which allows you to filter the data to
UPDATE and be updated or deleted during the update or delete
DELETE only) operations.
Use Merge Update Select this check box to update the data in the output
Java only (for MERGE) table.
Column : Lists the columns in the entry flow.
Update : Select the check box which corresponds to
the name of the column you want to update.
Use Merge Update Where Clause : Select this check
box and enter the WHERE clause required to filter the
data to be updated, if necessary.
Use Merge Update Delete Clause : Select this check
data to be deleted and updated, if necessary.
Use Merge Insert (for Select this check box to insert the data in the table.
Java only MERGE) Column : Lists the entry flow columns.
Check All : Select the check box corresponding to the
name of the column you want to insert.
Use Merge Update Where Clause : Select this check
data to be inserted.
Default Table Name Enter a default name for the table.
Default Schema Name Enter the name of the Oracle default schema.
Java only
data handling.
Advanced settings Use Hint Options Select this check box to activate the hint configuration
area when you want to use a hint to optimize a query’s
execution. In this area, parameters are:
Java only
/*+ */.
SQL statement.
use.
level.
Usage tELTOracleOutput is to be used along with the tELTOracleInput and
tELTOracleMap components. Note that the Output link to be used with these
schema information.

ELT components
tELTOracleOutput
Scenario: Using the Oracle MERGE function to update and add data
simultaneously
The following scenario describes a Job, using three ELT components, which allows you to update
data in a client table thanks to the Oracle MERGE command. New client information is added and
existing client information is updated.
From the Metadata node in the Repository panel :

• Under the node which corresponds to the database containing the tables to be used in this
scenario, select the table schema which contains the new client data and drag it to the
workspace. In this scenario, the schema used is called NEW_CUSTOMERS_MLE.
• In the [Components] dialog box which appears, select the tELTOracleInput component
and click OK.
For further information concerning metadata, see Defining Metadata items, in the Talend Open
Studio User Guide.
From the components Palette :
• Drag the tELTOracleMap and tELTOracleOutput components on to the workspace.
To link the components :
• Right click on the entry component, NEW_CUSTOMERS_MLE, drag the line which appears
to tELTOracleMap and release to create the link.
• Repeat the step to connect tELTOracleMap to tELTOracleOutput.
A dialog box appears, in which you must enter a name for the link
• Enter a name for the link and click OK. In this example, the link is called RESULTS.
• Double-click on the entry component to display its properties in the Components view.
• In the schema list, select Repository if you have stored your table schema under Metadata
in the Repository. Otherwise, select Built-in and configure the schema manually by
clicking on the [...] button next to Edit schema.
The schema is configured as follows :

ELT components
tELTOracleOutput
• In the Default Table Name field, enter the name of the source table within double quotation
marks. In this example, the table used is called NEW_CUSTOMERS_MLE.
• In the Default Schema Name field, enter the name of the source schema within double
quotation marks. In this example, the schema used is called ROOT.
• Click on the tELTOracleMap and on the Component view to display the component’s
basic properties.
• Select the Use an existing connection check box and select the tOracleConnection
component from the Component list if you want to use a connection opened by a connection
component.
• In the Property Type list, select Repository and click on the [...] button to select connection
to the database containing the tables used in this scenario. The following fields will be filled
in with the connection information for the database. If you haven’t created any metadata with
this information, select the Built-in mode in the Property Type list and enter the metadata
manually.

ELT components
tELTOracleOutput
For further information concerning metadata, see Defining Metadata items in the Talend Open
Studio User Guide.
• Click on the [...] button next to ELT Oracle Map Editor to open the editor which allows you
to map the input flow to the output flow.
• Click on the [+] button at the top left of the editor

A dialog box opens in which you can add an alias for the input table.
In the Add a new alias dialog box :

ELT components
tELTOracleOutput
• Select the name of the input table flow from the Select the table to use list. Here we have
selected ROOT.NEW_CUSTOMERS_MLE.
• Enter the name you want to give to the corresponding alias in the Type in a valid alias field.
In this example, we have named the alias new_customers.
The alias isn’t necessary if the input table is only used once in tMap.
• Click on the OK button.

• In the tELTOracleMap editor, highlight all of the input table columns and drag them to the
output table, RESULTS.
• Click on the Generated SQL Select query tab to display the query to be executed.
• Click on OK.
• In the workspace, double-click on the tELTOracleOutput component to configure the
properties in the Component view.
• In the Action on data list, select MERGE.

• Click on the Sync columns button to recuperate the schema pertaining to the previous
component, or configure the output schema manually by selecting Built-in from the Schema
list and by clicking on the [...] button next to the Edit schema button.
• Select the Use Merge Update check box to update the data using Oracle’s MERGE function.
• In the table which appears, select the check boxes which correspond to the columns you want
to update. We want to update the client data according to their ID. Therefore, select all of the
check boxes apart from the one which corresponds to the ID column.

ELT components
tELTOracleOutput
The columns defined as the primary key CANNOT and MUST NOT be made subject to updates.
•Select the check boxes for Use Merge Update Where Clause and Use Merge Update
Delete Clause if you want to use a WHERE clause to filter the data to be updated, as well
as a DELETE WHERE clause to delete, then re-insert the data updated during the MERGE
operation.
• Select the Use Merge Insert check box to insert the new data, while the existing data is
updated by Oracle’s MERGE function.
• In the table which appears, select the check boxes which correspond to the columns you want
to insert. We want to insert all the new client data. Therefore, select all of the check boxes
which correspond to the columns to be inserted.
• Select the Use Merge Insert Where Clause check box if you want to use a WHERE clause
to filter the data inserted during the MERGE operation.
• In the Default Table Name field, enter the name of the source table within inverted commas.
In this example, the table used is called NEW_CUSTOMERS_MLE
• In the Default Schema Name field, enter the name of the source schema within inverted
commas. In this example, the schema used is called ROOT.
• Save the job and press F6 to run it.
The data is updated and inserted in the database. The query used is displayed in the console.

ELT components
tELTOracleOutput
For a scenario using the tELTOracleOutput component, see Scenario: Updating Oracle DB
entries on page 768of the tELTOracleMap component.

ELT components
tELTPostgresqlInput
tELTPostgresqlInput
tELTPostgresqlInput properties
The three ELT Postgresql components are closely related, in terms of their operating conditions.
These components should be used to handle Postgresql DB schemas to generate Insert statements,
including clauses, which are to be executed in the DB output table defined.
Insert statement.
statement.
User Guide.
level.
Usage tELTPostgresqlInput is to be used along with the tELTPostgresqlMap. Note
that the Output link to be used with these components must correspond strictly
to the syntax of the table name
schema information.
Related scenarios
For uses cases in relation with tELTPostgresqlInput, see tELTMysqlMap scenarios:


ELT components
tELTPostgresqlMap
tELTPostgresqlMap
tELTPostgresqlMap properties
These components should be used to handle Postgresql DB schemas to generate Insert statements,
Function Helps to build the SQL statement graphically, using the table provided as input.

ELT Postgresql Map The ELT Map editor allows you to define the output

ELT components
tELTPostgresqlMap

Password
DB data handling.
level.
Usage tELTPostgresqlMap is used along with a tELTPostgresqlInput and
tELTPostgresqlOutput. Note that the Output link to be used with these
schema information.
Related scenario:
For related scenarios, see tELTMysqlMap scenarios:


ELT components
tELTPostgresqlOutput
tELTPostgresqlOutput properties
These components should be used to handle Mysql DB schemas to generate Insert statements,
Purpose Executes the SQL Insert statement to the Postgresql database
found, Job stops.
Note that in Postgresql ELT, only Insert operation is
available.
User Guide.
level.
Usage tELTPostgresqlOutput is to be used along with the tELTPostgresqlMap.
Note that the Output link to be used with these components must correspond
strictly to the syntax of the table name.
schema information.

ELT components
Related scenarios
For uses cases in relation with tELTPostgresqlOutput, see tELTMysqlMap scenarios:


ELT components
tELTRollback
tELTRollback
tELTRollback properties
This component is closely related to tELTCommit and to the ELT connection component relative
to the database you work with. tELTRollback, tELTCommit and the ELT database connection
component are usually used together in a transaction.
Component family Databases/ELT
Function tELTRollback cancels the transaction committed in the database you connect
to.
Purpose To avoid committing transactions accidentally.
Basic settings Database Type Select the database type you want to connect to from
the list.
Component list Select the ELT database connection component in the
list if more than one connection is planned for the
current Job.
task.
level.
Click the Add button to add the default system SQL
template(s) in the SQL Template List.
arrow to display the system SQL template list. Select
the desired system SQL template provided by Talend.
Note: You can create your own SQL template and add
them to the SQL Template List.

-Select a system template from the SQL Template
list and click on its code in the code box. You will be
prompted by the system to create a new template.
-Define your new SQL template in the corresponding
fields and click Finish to close the wizard. An SQL
template editor opens where you can enter the
template code.
-Click the Add button to add the new created template
to the SQL Template list.
For more information, see Using SQL Templates of
Usage This component is to be used with ELT components, especially with
tELTCommit and the relevant database connection component.
Limitation n/a

ELT components
tELTRollback
Related scenarios
For a tELTRollback related scenario, see Scenario: Filtering and aggregating table columns
directly on the DBMS on page 731.

ELT components
tELTSybaseInput
tELTSybaseInput
tELTSybaseInput properties
The three ELT Sybase components are closely related, in terms of their operating conditions. These
components should be used to handle Sybase DB schemas to generate Insert statements, including
Function Provides the table schema for the SQL statement to execute
Purpose Allows you to add as many Input tables as required, for Insert statements which
can be complex.
Schema number and nature of the fields to be processed. The
schema is either built-in (local) or stored remotely in
the Repository. The Schema defined is then passed on
to the ELT Mapper for inclusion in the Insert SQL
statement.
Click on Edit Schema, to modify the schema. Note
that if you modify the schema, it automatically
becomes built-in.
schema, in the Talend Open Studio User Guide.
in the Repository. Hence, it can be re-used for other
projects and Jobs. Related topic: Setting a built-in
Default Schema Name Enter a default name for the Sybase schema.
level.
Usage tELTSybaseInput is intended for use with tELTSybaseMap. Note that the
syntax of the table name.
ELT components only handle schema information. They do not handle
actual data flow..
Related scenarios
For scenarios in which tELTSybaseInput may be used, see tELTMysqlMap scenarios:


ELT components
tELTSybaseMap
tELTSybaseMap
tELTSybaseMap properties
The three ELT Sybase components are closely related in terms of their operating conditions. These
Function Allows you construct a graphical build of the SQL statement using the table
provided as input.
Purpose Uses the tables provided as input to feed the parameters required to execute the
SQL statement. The statement can include inner or outer joins to be
implemented between tables or between a table and its aliases
connection tSybaseConnection component from the
Component list if you want to re-use connection
parameters that you have already defined.

ELT Sybase Map The ELT Map editor allows you to define the output
Property type Can be either Built-in or Repository.
Built-in : No property data is stored centrally.
Repository : Select the Repository file where the
component properties are stored. The following fields
are pre-filled using collected data

ELT components
tELTSybaseMap

Port Listening port number of DB server
Password
metadata at a job level as well as at component level.
Usage tELTSybaseMap is intended for use with tELTSybaseInput and
tELTSybaseOutput. Note that the Output link to be used with these
The ELT components only handle schema information. They do not
handle actual data flow.
Related scenarios
For scenarios in which tELTSybaseMap may be used, see the following tELTMysqlMap
scenarios:

ELT components
tELTSybaseOutput
tELTSybaseOutput
tELTSybaseOutput properties
The three ELT Sybase components are closely related in terms of their operating conditions. These
Componant family ELT
Purpose Executes the SQL Insert statement in the Mysql database
as substitute for this created again
the Job is stopped.
found, the Job stops.
Update: updates entries in the table.
Schema number and nature of the fields to be processed and
passed on to the next component. The schema is either
built-in (local) or stored remotely in the Repository.
The Schema defined is then passed on to the ELT
Mapper for inclusion in the Insert SQL statement.
Click on Edit Schema, to modify the schema. Note
that if you modify the schema, it automatically
becomes built-in.
this component only. Related topic:: Setting a built-in
in the Repository. Hence, it can be re-used for other
projects and Jobs. Related topic: Setting a built-in
Default Schema Name Enter a default name for the Sybase schema.
metadata at a job level as well as at component level.

ELT components
tELTSybaseOutput
Usage tELTSybaseOutput is intended for use with the tELTMysqlInput and

tELTSybaseMap components. Note that the Output link to be used with these
components must correspond strictly to the syntax of the table name..
ELT components only handle schema information. They do not handle
actual data flow.
Related scenarios
For scenarios in which tELTSybaseOutput may be used, see the following tELTMysqlMap
scenarios :

ELT components
tELTTeradataInput
tELTTeradataInput
tELTTeradataInput properties
The three ELT Teradata components are closely related, in terms of their operating conditions. These
components should be used to handle Teradata DB schemas to generate Insert statements, including
Insert statement.
statement.
User Guide.
Usage tELTTeradataInput is to be used along with the tELTTeradataMap. Note
to the syntax of the table name
schema information.
Related scenarios
For uses cases in relation with tELTTeradataInput, see tELTMysqlMap scenarios:


ELT components
tELTTeradataMap
tELTTeradataMap
tELTTeradataMap properties
Password
data handling.
Preview The preview is an instant shot of the Mapper data. It
becomes available when Mapper properties have
been filled in with data. The preview synchronization
takes effect only after saving changes.
Map editor The ELT Map editor allows you to define the output
schema as well as build graphically the SQL
Usage tELTTeradataMap is used along with a tELTTeradataInput and
tELTTeradataOutput. Note that the Output link to be used with these
components must faithfully reflect the name of the tables.
The ELT components do not handle actual data flow but only schema
information.

For detailed information regarding ELT component connections, see Connecting ELT
components on page 754.

ELT components
tELTTeradataMap

output schema.
For detailed information regarding the table schema mapping and joining, see Mapping and
joining tables on page 767.
Adding WHERE clauses

For details regarding the clause handling, see Adding where clauses on page 768.

The clause defined internally in the ELT Mapper are also included automatically.
Related scenarios
For uses cases in relation with tELTTeradataMap, see tELTMysqlMap scenarios:


ELT components
tELTTeradataOutput
tELTTeradataOutput
tELTTeradataOutput properties
Purpose Executes the SQL Insert statement to the Teradata database
as substitute for this created again
the Job is stopped.
found, Job stops.
Note that in Teradata ELT, only Insert operation is
available.
User Guide.
data handling.

ELT components
tELTTeradataOutput
Usage tELTTeradataOutput is to be used along with the tELTTeradataMap. Note

to the syntax of the table name.
schema information.
Related scenarios
For uses cases in relation with tELTTeradataOutput, see tELTMysqlMap scenarios:


ELT components
tELTTeradataOutput

File components
This chapter details the main components that you can find in File family of the Talend Open
Studio Palette.
The File family groups together components that read and write data in all types of files, from the
most popular to the most specific format (in the Input and Output subfamilies). In addition, the
Management subfamily groups together File-dedicated components that perform various tasks on
files, including unarchiving, deleting, copying, comparing files and so on.
File components
tAdvancedFileOutputXML
tAdvancedFileOutputXML belongs to two component families: File and XML. For more information
on tAdvancedFileOutputXML, see tAdvancedFileOutputXML on page 1220.

File components
tApacheLogInput
tApacheLogInput
tApacheLogInput properties
Component family File/Input
Function tApacheLogInput reads the access-log file for an Apache HTTP server.
Purpose tApachLogInput helps to effectively manage the Apache HTTP Server,. It is
necessary to get feedback about the activity and performance of the server as well as
any problems that may be occurring.
Repository: Select the Repository file where Properties are
stored. The fields to follow are pre defined using fetched
data.
Schema Type and Edit A schema is a row description, i.e., it defines the number of
Schema fields that will be processed and passed on to the next
Repository.
In the context of tApacheLogInput usage, the schema is
read-only.
Built-in: You can create the schema and store it locally for
this component. Related topic: Setting a built-in schema of
schema in the Repository. You can reuse it in various
Guide.
File Name Name of the file and/or the variable to be processed.
Related topic:Defining variables from the Component view

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
Advanced settings Encoding Type Select the encoding type from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Usage tApacheLogInput is for use as an entry componant. It allows you to create a flow of
XML data using a Row > Main link. You can also create a rejection flow using a Row
> Reject link to filter the data which doesn’t correspond to the type defined. For an
example of how to use these two links, see Scenario 2: Extracting correct and
erroneous data from an XML field in a delimited file on page 1237.

File components
tApacheLogInput
Limitation n/a
Scenario: Reading an Apache access-log file
The following scenario creates a two-component Job, which aims at reading the access-log file for
an Apache HTTP server and displaying the output in the Run log console.
• Drop a tApacheLogInput component and a tLogRow component from the Palette onto the
design workspace.
• Right-click on the tApacheLogInput component and connect it to the tLogRow component
using a Main Row link.
• In the design workspace, select tApacheLogInput.

• Click the Component tab to define the basic settings for tApacheLogInput.
• Set Property Type and Schema Type to Built-In.

• If desired, click the Edit schema button to see the read-only columns.
• In the File Name field, enter the file path or browse to the access-log file you want to read.
• Press F6 to execute the Job.

File components
tApacheLogInput
The log lines of the defined file are displayed on the console.

File components
tCreateTemporaryFile
tCreateTemporaryFile properties
Component family File/Management
Function tCreateTemporaryFile creates and manages temporary files.

Purpose tCreateTemporaryFile helps to create a temporary file and puts it in a
defined directory. This component allows either to keep the temporary
file or to delete it after job execution.
Basic settings Remove file when Select this check box to delete the temporary file
execution is over after job execution.
Directory Enter the path to the directory where temporary
files are stored.
Template Enter a name to the temporary file respecting the
template.
Suffix Enter the filename extension to indicate the file
format you want to give to the temporary file.
Usage tCreateTemporaryFile provides the possibility to manage temporary
files so that the memory can be freed for other ends and thus optimizes
system performance.
Scenario: Creating a temporary file and writing data in it
The following scenario describes a simple Job that creates an empty temporary file in a defined
directory, writes data in it and deletes it after job execution.
• Drop the following components from the Palette onto the design workspace: tCreate
temporaryFile, tRowGenerator, tFileOutputDelimited, tFileInputDelimited and
tLogRow.
• Connect tCreateTemporaryFile to tRowGenerator using a SubjobOk link.
• Connect tRowGenerator to tFileOutputDelimited using a Row Main link.
• Connect tRowGenerator to tFileInputDelimited using a SubjobOk link.
• Connect tFileInputDelimited to tLogRow using a Row Main link.

File components
• In the design workspace, select tCreateTemporaryFile.

• Click the Component tab to define the basic settings for tCreateTemporaryFile.
• Select the Remove file when execution is over check box to delete the created temporary
file when job execution is over.
• Click the three-dot button next to the Directory field to browse to the directory where
temporary files will be stored, or enter the path manually.
• In the Template field, enter a name for the temporary file respecting the template format.
• In the Suffix field, enter a filename extension to indicate the file format you want to give to
the temporary file.
• In the design workspace, select tRowGenerator and click the Component tab to define its
basic settings.

File components
• Set the Schema Type to Built-In.

• Click the Edit schema three-dot button to define the data to pass on to the
tFileOutputDelimited component, one column in this scenario, value.

• Click the RowGenerator Editor three-dot button to open the editor dialog box.
• In the Number of Rows for Rowgenerator field, enter 5 to generate five rows and click Ok
to close the dialog box.
• In the design workspace, select tFileOutputDelimited and click the Component tab to
define its basic settings.

File components

• Click in the File Name field and use the Ctrl+Space bar combination to access the variable
completion list. To output data in the created temporary file, select
tCreateTemporaryFile_1.FILEPATH on the global variable list.
• Set the row and field separators in their corresponding fields as needed.
• Set Schema to Built-In and click Sync columns to synchronize input and output columns.
Note that the row connection feeds automatically the output schema.
For more information about schema types, see Setting a built-in schema and Setting a Repository
• In the design workspace, select the tFileInputDelimited component.
• Click the Component tab to define the basic settings of tFileInputDelimited.
• Set property type to Build in.

• Click in the File Name field and use the Ctrl+Space bar combination to access the variable
completion list. To read data in the created temporary file, select
tCreateTemporaryFile_1.FILEPATH on the global variable list.
• Set the row and field separators in their corresponding fields as needed.
• Set Schema to Built in and click Edit schema to define the data to pass on to the tLogRow
component. The schema consists of one column here, value.
• Save the Job and press F6 to execute the Job.

File components
The temporary file is created in the defined directory during job execution and the five generated
rows are written in it. The temporary file is deleted when job execution is over.

File components
tFileArchive
tFileArchive
tFileArchive properties
Component Family File/Management
Function The tFileArchive zips one or several file(s) according to the parameters
defined and places the archive created in the directory selected.
Purpose This component zips one or several file(s) for processing.
Basic settings Directory Path where the zipped file will be created.
Subdirectories (in Perl, this field is Include
subdirectories): Select this check box if the
selected directory contains subfolders.
Archive file Destination path and name of the archive file.
Compress level Select the compression level you want to apply.
Best: the compression quality will be optimum,
but the compression time will be long.
Normal: compression quality and time will be
average.
Fast: compression will be fast, but quality will be
lower.
All files Select this check box if you want all files in the
directory to be zipped. Deselect it to specify the
file(s) you want to zip in the Files table.
Filemask: type in a file name or a file mask using
a special character or a regular expression.
Create directory if This check box is selected by default. It creates a
Java only not exists destination folder for the output table if it does
not already exist.
Advanced settings tStatCatcher Select this check box to gather the processing
Statistics metadata at the Job level as well as at each
component level.
Usage This component must be used as a standalone component.
Limitation n/a
Scenario: Zip files using a tFileArchive
This scenario creates a Job with a unique component. It aims at zipping files and recording them in
the selected directory.
• Drop the tFileArchive component from the Palette onto the workspace.

File components
tFileArchive
• Double-click it to display its Component view.
• In the Directory field, click the [...] button, browse your directory and select the directory
or the file you want to compress.
• Select the Subdirectories check box if you want to include the subfolders and their files in
the archive.
• Then, set the Archive file field, by filling the destination path and the name of your archive
file.
• Select the Create directory if not exists check box if you do not have a destination directory
yet and you want to create it.
• In the Compress level list, select the compression level you want to apply to your archive.
In this example, we use the normal level.
• Clear the All Files check box if you only want to zip specific files.
• Add a row in the table by clicking the [+] button and click the name which appears. Between
two star symbols (ie. *RG*), type part of the name of the file that you want to compress.
• Press F6 to execute your Job.
The tArchiveFile has compressed the selected file(s) and created the folder in the selected directory.

File components
tFileCompare
tFileCompare
tFileCompare properties
Function Compares two files and provides comparison data (based on a read-only
schema)
Purpose Helps at controlling the data quality of files being processed.
Basic settings Schema Type and A schema is a row description, i.e., it defines the
Edit Schema number of fields to be processed and passed on to
the next component. The schema is either built-in
or remotely stored in the Repository but in this
case, the schema is read-only.
File to compare Filepath to the file to be checked.
Reference file Filepath to the file, the comparison is based on.
If differences are Type in a message to be displayed in the Run
detected, display console based on the result of the comparison.
If no difference
detected, display
Print to console Select this check box to display the message.
Usage This component can be used as standalone component but it is usually
linked to an output component to gather the log data.
Limitation n/a
Scenario: Comparing unzipped files
This scenario describes a Job unarchiving a file and comparing it to a reference file to make sure it
didn’t change. The output of the comparison is stored into a delimited file and a message displays
in the console.
• Drag and drop the following components: tFileUnarchive, tFileCompare, and

tFileOutputDelimited.
• Link the tFileUnarchive to the tFileCompare with Iterate connection.
• Connect the tFileCompare to the output component, using a Main row link.
• In the tFileUnarchive component Basic settings, fill in the path to the archive to unzip.

File components
tFileCompare
• In the Extraction Directory field, fill in the destination folder for the unarchived file.
• In the tFileCompare Basic settings, set the File to compare. Press Ctrl+Space bar to
display the list of global variables. Select
$_globals{tFileUnarchive_1}{CURRENT_FILEPATH} or
"((String)globalMap.get("tFileUnarchive_1_CURRENT_FILEPATH"))" according to the
language you work with, to fetch the file path from the tFileUnarchive component.
• And set the Reference file to base the comparison on it.

• In the messages fields, set the messages you want to see in case the files differ or in case the
files are identical, for example: '[job '.$_globals{job_name}.'] Files differ' if you work with
Perl or "[job " + jobName + "] Files differ" if you work in Java.
• Select the Print to Console check box, for the message defined to display at the end of the
execution.
• The schema is read-only and contains standard information data. Click Edit schema to have
a look to it.
• Then set the output component as usual with semi-colon as data separators.
The message set is displayed to the console and the output shows the schema information data.

File components
tFileCompare

File components
tFileCopy
tFileCopy
tFileCopy Properties
Function Copies a source file into a target directory and can remove the source file
if so defined.
Purpose Helps to streamline processes by automating recurrent and tedious tasks
such as copy.
Basic settings File Name Path to the file to be copied or moved
Destination Path to the directory where the file is
copied/moved to.
Remove source file Select this check box to move the file to the
destination.
Replace existing file Select this check box to overwrite any existing
file with the newly copied file.
Usage This component can be used as standalone component.
Limitation n/a
Scenario: Restoring files from bin
This scenario describes a Job that iterates on a list of files, copies each file from the defined source
directory to a target directory. It then removes the copied files from the source directory.
• Drop a tFileList and a tFileCopy from the Palette to the design workspace.
• Link both components using an Iterate link.
• In the tFileList Basic settings, set the directory for the iteration loop.
• Set the Filemask to “*.txt” to catch all files with this extension. For this use case, the case
is not sensitive.

File components
tFileCopy
• Then select the tFileCopy to set its Basic settings.
• In the File Name field, press Ctrl+Space bar to access the list of variables.
• Select the global variable ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) if
you work in Java, or $_globals{tFileList_1}{CURRENT_FILEPATH} if you work in Perl.
This way, all files from the source directory can be processed.
• Select the Remove Source file check box to get rid of the file that have been copied.
• Select the Replace existing file check box to overwrite any file possibly present in the
destination directory.
• Save your Job and press F6.
The files are copied onto the destination folder and are removed from the source folder.

File components
tFileDelete
tFileDelete
tFileDelete Properties
Function Suppresses a file from a defined directory.

Purpose Helps to streamline processes by automating recurrent and tedious tasks
such as delete.
Basic settings File Name Path to the file to be copied or moved
Limitation n/a
Scenario: Deleting files
This very simple scenario describes a Job deleting files from a given directory.
• Drop the following components: tFileList, tFileDelete, tJava from the Palette to the design
workspace.
• In the tFileList Basic settings, set the directory to loop on in the Directory field.
• The filemask is “*.txt” and no case check is to carry out.

• In the tFileDelete Basic settings panel, set the File Name field in order for the current file
in selection in the tFileList component be deleted. This delete all files contained in the
directory, as specified earlier.

File components
tFileDelete
• press Ctrl+Space bar to access the list of global variables. In Java, the relevant variable to
collect the current file is: ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
• Then in the tJava component, define the message to be displayed in the standard output (Run
console). In this Java use case, type in the Code field, the following script:
System.out.println( ((String)globalMap.get("tFileList_1_CURRENT_FILE"))
+ " has been deleted!" );
• Then save your Job and press F6 to run it.
The message set in the tJava component displays in the log, for each file that has been deleted
through the tFileDelete component.

File components
tFileExist
tFileExist
tFileExist Properties
Function tFileExist checks if a file exists or not.

Purpose tFileExists helps to streamline processes by automating recurrent and
tedious tasks such as checking if a file exists.
Basic settings File Name Path to the file you want to check if it exists or
not.
Limitation n/a
Scenario: Checking for the presence of a file and creating it if it does

not exist
This scenario describes a simple Job that: checks if a given file exists, displays a graphical message
to confirm that the file does not exist, reads the input data in another given file and writes it in an
output delimited file.
• Drop the following components from the Palette onto the design workspace: tFileExist,
tFileInputDelimited, tFileOutputDelimited, and tMsgBox.
• Connect tFileExist to tFile InputDelimited using an OnSubjobOk and to tMsBox using
a Run If link.
• Connect tFileInputDelimited to tFileOutputDelimite using a Row Main link.

• In the design workspace, select tFileExist and click the Component tab to define its basic
settings.

File components
tFileExist
• In the File name field, enter the file path or browse to the file you want to check if it exists
or not.
• In the design workspace, select tFileInputDelimited and click the Component tab to define
its basic settings.
• Browse to the input file you want to read to fill out the File Name field.
• Set the row and field separators in their corresponding fields.
• Set the header, footer and number of processed rows as needed. In this scenario, there is one
header in our table.
• Set Schema to Built in and click the Edit schema button to define the data to pass on to the
tFileOutputDelimited component. Define the data present in the file to read, file2 in this
scenario.

File components
tFileExist
The schema in file2 consists of five columns: Num, Ref, Price, Quant, and tax.
• In the design workspace, select the tFileOutputDelimited component.
• Click the Component tab to define the basic settings of tFileOutputDelimited.
• Set property type to Build in.

• In the File name field, press Ctrl+Space to access the variable list and select the global
variable FILENAME.
• Set the row and field separators in their corresponding fields.
• Select the Include Header check box as file2 in this scenario includes a header.
• Set Schema to Built in and click Sync columns to synchronize the output file schema (file1)
with the input file schema (file2).
• In the design workspace, select the tMsBox component.

• Click the Component tab to define the basic settings of tMsBox.

File components
tFileExist
• Click the If link to display its properties in the Basic settings view.
• In the Condition panel, press Ctrl+Space to access the variable list and select the global
variable EXISTS. Type an exclamation mark before the variable to negate the meaning of the
variable.
Save your Job and press F6 to execute it.
A dialog box appears to confirm that the file does not exists.
Click OK to close the dialog box and continue the job execution process. The missing file, file1 in
this scenario, got written in a delimited file in the defined place.

File components
tFileInputARFF
tFileInputARFF
tFileInputARFF properties
Component Family File/Input
Function tFileInputARFF reads a ARFF file row by row, with simple separated fields.
Purpose This component opens a file and reads it row by row, in order to divide it in fields and
to send these fields to the next component, as defined in the schema, through a Row
connection.
stored. The following fields are filled in automatically using
fetched data.
Click this icon to open a connection wizard and store the
Excel file connection parameters you set in the component’s
Basic settings view.
For more information about setting up and storing file
connection parameters, see Setting up a File XML schema of
File Name Name and path of the ARFF file and/or variable to be
processed.
Related topic: Defining variables from the Component view
Schema and Edit A schema is a row description, i.e., it defines the number of
Repository.
Click Edit Schema to make changes to the schema. Note
that if you make changes, the schema automatically
becomes built-in.
flowcharts. Related topic: Setting a Repository schema of
Advanced settings Encoding type Select the encoding type from the list or select Custom and
handling.

File components
tFileInputARFF
Usage Use this component to read a file and separate the fields with the specified separator.
Scenario : Display the content of a ARFF file
This scenario describes a two-component Job in which the rows of an ARFF file are read, the
delimited data is selected and the output is displayed in the Run view.
An ARFF file looks like the following:
It is generally made of two parts. The first part describes the data structure, that is to say the rows
which begin by @attribute and the second part comprises the raw data, which follows the
expression @data.
• Drop the tFileInputARFF component from the Palette onto the workspace.
• In the same way, drop the tLogRow component.
• Right-click the tFileInputARFF and select Row > Main in the menu. Then, drag the link
to the tLogRow, and click it. The link is createde appears.
• Double-click the tFileInputARFF.
• In the Component view, in the File Name field, browse your directory in order to select your
.arff file.
• In the Schema field, select Built-In.

File components
tFileInputARFF
• Click the [...] button next to Edit schema to add column descriptions corresponding to the
file to be read.
• Click on the button as many times as required to create the number of columns
required, according to the source file. Name the columns as follows.
• For every column, the String check box is selected by default. Leave the check boxes
selected, for all of the columns.
• Click OK.
• In the workspace, double-click the tLogRow to display its Component view. Click the [...]
button of the Edit schema field to check that the schema has been propagated. If not, click
the Sync columns button.

File components
tFileInputARFF
• Use the default settings.

The console displays the data contained in the ARFF file, delimited using a vertical line (the default
separator).

File components
tFileInputDelimited
tFileInputDelimited
tFileInputDelimited properties
Function tFileInputDelimited reads a given file row by row with simple separated fields.
Purpose Opens a file and reads it row by row to split them up into fields then sends fields as
defined in the Schema to the next job component, via a Row link.
stored. The following fields are pre-filled in using fetched
data.
File Name/Stream File name: Name and path of the file to be processed.
Stream: The data flow to be processed. The data must be
added to the flow in order to be fetched by the
tFileInputDelimited via the INPUT_STREAM variable in
the auto-completion list (Ctrl+Space)

of Talend Open Studio User Guide
Field separator Character, string or regular expression to separate fields.
CSV options Select this check box to include CSV specific parameters
such as Escape char and Text enclosure.
Header Number of rows to be skipped in the beginning of file
Footer Number of rows to be skipped at the end of the file.
Limit Maximum number of rows to be processed. If Limit = 0, no
row is read or processed.
Repository.
Click Edit Schema to make changes to the schema. Note
becomes built-in.
flowcharts. Related topic: Setting a Repository schema of

File components
tFileInputDelimited
Skip empty rows Select this check box to skip empty rows.
Uncompress as zip file Select this check box to uncompress the input file.
Advanced settings Advanced separator Select this check box to modify the separators used for
(for numbers) numbers.
Extract lines at random Select this check box to set the number of lines to be
extracted randomly.
Encoding Type Select the encoding type from the list or select Custom and
handling.
Trim all column Select this check box to remove leading and trailing
whitespace from all columns.
Check each row Select this check box to synchronize every row against the
structure against input schema.
schema
Check columns to trim Select the check box next to the column name you want to
remove leading and trailing whitespace from.
Split row before field Select this check box to split rows before splitting fields.
Usage Use this component to read a file and separate fields contained in this file using a
defined separator. It allows you to create a data flow using a Row > Main link or via
a Row > Reject link in which case the the data is filtered by data that doesn’t
correspond to the type defined. For further information, please see Scenario 2:
Extracting correct and erroneous data from an XML field in a delimited file on page
1237.
Scenario: Delimited file content display
The following scenario creates a two-component Job, which aims at reading each row of a file,
selecting delimited data and displaying the output in the Run log console.
• Drop a tFileInputDelimited component from the Palette to the design workspace.

• Drop a tLogRow component the same way.
• Right-click on the tFileInputDelimited component and select Row > Main. Then drag it
onto the tLogRow component and release when the plug symbol shows up.
• Select the tFileInputDelimited component again, and define its Basic settings:

File components
tFileInputDelimited
• Fill in a path to the file in the File Name field. This field is mandatory.
• Define the Row separator allowing to identify the end of a row. Then define the Field
separator used to delimit fields in a row.
• In this scenario, the header and footer limits are not set. And the Limit number of processed
rows is set on 50.
• Set the Schema as either a local (Built-in) or a remotely managed (Repository) to define the
data to pass on to the tLogRow component.
• You can load and/or edit the schema via the Edit Schema function.
Related topics: Setting a built-in schema and Setting a Repository schema of Talend Open Studio
User Guide.
• As selected, the empty rows will be ignored.
• Enter the encoding standard the input file is encoded in. This setting is meant to ensure
encoding consistency throughout all input and output files.
• Select the tLogRow and define the Field separator to use for the output display. Related
topic: tLogRow on page 1023.
• Select the Print schema column name in front of each value check box to retrieve the
column labels in the output displayed.
• Go to Run tab, and click on Run to execute the Job.
The file is read row by row and the extracted fields are displayed on the Run log as defined in both
components Basic settings.
The Log sums up all parameters in a header followed by the result of the Job.

File components
tFileInputDelimited
Scénario 2 : Reading data from a remote file in streaming mode
This scenario describes a four component job used to fetch data from a voluminous file almost as
soon as it has been read. The data is displayed in the Run view. The advantage of this technique
is that you do not have to wait for the entire file to be downloaded, before viewing the data.
• Drop the following components onto the workspace: tFileFetch, tSleep,

tFileInputDelimited, and tLogRow.
• Connect tSleep and tFileInputDelimited using a Trigger > OnComponentOk link and
connect tFileInputDelimited to tLogRow using a Row > Main link.
• Double-click tFileFetch to display the Basic settings tab in the Component view and set
the properties.
• From the Protocol list, select the appropriate protocol to access the server on which your
data is stored.
• In the URI field, enter the URI required to acces the server on which your file is stored.
• Select the Use cache to save the resource check box to add your file data to the cache
memory. This option allows you to use the streaming mode to transfer the data.

File components
tFileInputDelimited
• In the workspace, click on tSleep to display the Basic settings tab in the Component view
and set the properties.
• By default, tSleep’s Pause field is set to 1 second. Do not change this setting. It pauses the
second Job in order to give the first Job, containing tFileFetch, the time to read the file data.
• In the workspace, double-click tFileInputDelimited to display its Basic settings tab in the
Component view and set the properties.
• In the File name/Stream field:

- Delete the default content
- Press Ctrl+Space to view the variables available for this component.
- Select tFileFetch_1_INPUT_STREAM from the auto-completion list, to add the following
variable to the Filename field:
((java.io.InputStream)globalMap.get("tFileFetch_1_INPUT_STRE
AM")).
• From the Schema list, select Built-in and click [...] next to the Edit schema field to describe
the structure of the file that you want to fetch. The US_Employees file is composed of six
colonnes: ID, Employee, Age, Address, State, EntryDate.
• Click [+] to add the six columns and set them as indicated in the above screenshot. Click OK.
• In the workspace, double-click tLogRow to display its Basic settings in the Component
view and set its properties.

File components
tFileInputDelimited
• Click Edit schema and ensure that the schema has been fetched from the preceding
component. If it hasn’t, click Sync Columns to fetch it.
• Click on the Job tab and then on the Extra view.
• Select the Multi thread execution check box in order to run the two Jobs at the same time.
Bear in mind that the second Job has a one second delay according to the properties set in
tSleep. This option allows you to fetch the data almost as soon as it is read by tFileFetch,
thanks to the tFileDelimited component.
The data is displayed in the console as almost as soon as it is read.

File components
tFileInputEBCDIC
tFileInputEBCDIC
This component requires a JDK Sun to be functional.
tFileInputEBCDIC properties
Function tFileInputEBCDIC reads an EBCDIC file and extracts data depending on the
selected schema.
Purpose tFileInputEBCDIC opens a file and reads it in order to separate the data, based on the
file structure description (schemas), and to send the file data and metadata to the next
job component(s), via a Row link.
Repository: Select the repository file where Properties are
stored. The fields that come after are pre-filled in using the
fetched data.
Schema(s) Add the various schemas to output to the next job
component(s).
Data file Select the EBCDIC file containing the data to be processed.
Xc2j file Select the xc2j file, transforming the EBCDIC schema(s)
into an intermediary XML file.
Usage Use this component to read an EBCDIC file and to output the data separately
depending on the schemas identified in the file.
Scenario: Extracting data from an EBCDIC file and populating a

database
This scenario uses the [Copybook Connection] wizard that guides users through the
different steps necessary to create a Copybook connection and to retrieve the EBCDIC
schemas. This wizard is available only for Talend Integration Suite users. If you are
using Talend Open Studio or Talend On Demand, you need to set the basic settings for the
tFileInputEBCDIC component manually.
The following scenario is a four-component Job that aims at: reading an EBCDIC file which
contains information concerning clients and their financial transactions, extracting and transforming
this data, and finally creating two tables in a database, based on the two schemas, clients and
transactions, extracted from the original EBCDIC file.

File components
tFileInputEBCDIC
This Java scenario uses the EBCDIC Connection wizard to set up a connection to the Copybook file
and to generate an xc2j file, which allows the retrieval and transformation of the different file
schemas.
• Create a connection to the Copybook file, which describes the structure of your EBCDIC
file. In this scenario, the Copybook connection is called EBCDIC.
• Retrieve the file schemas.
Once the Copybook connection has been created and the schemas retrieved, using the EBCDIC and
Schema wizards, the new schemas appear under the node Metadata > Copybook. They are called
Schema01, Schema04 and Schema05.
In order to retrieve the different file structures and to use them in Talend Open Studio:
• Drop schema 01 from the Repository tree view to the design workspace. This automatically
creates the tFileInputEBCDIC input component.
• Drop the tMysqlOutput component from the Palette to the design workspace.
• Double-click tFileInputEBCDIC to display the Basic settings view, then define the
component properties:

File components
tFileInputEBCDIC
The metadata is automatically defined in the Property Type, Schema(s), Data file and Xc2j file
fields. The Property Type field shows which metadata has been used for the component. The
Schema field shows which schema will be transmitted to the following component. The Data file
field shows the path to the file that holds the EBCDIC data. The Xc2j file field shows the path to
the file which enables to extract the schema describing the EBCDIC file structure. If you are in
Built-In mode, you have to fill these fields manually.
• In the design workspace, right-click tFileInputEBCDIC, select Row > row_Schema01_1
from the menu, then click tMysqlOutput to connect the components together.
• Double-click tMysqlOutput to display the Basic settings view, then define the component
properties.
• In the Property Type list, select Repository and click the button [...]. Select the database
connection you want to use, which is centralized in the metadata of the Repository. The Host,
Port, Database, Username and Password fields are automatically filled. If you are in
Built-In mode, you have to fill these fields manually.
• In the Table field, enter the name of the table to be created, which will contain the data
extracted from the EBCDIC file.
• In the Action on table field, select the option Create table.

File components
tFileInputEBCDIC
At this stage, the Job retrieves the schema Schema01 from the EBCDIC file and transfers it, as well
as the corresponding data, to the database. We now need to retrieve, from the EBCDIC file, the
schema 04 and its data, then transform and transmit the data to the same database. To do this:
• Drop the tMap and tMysqlOutputBulkExec components to the design workspace.
• Double-click the tFileInputEBCDIC to display the Basic settings view, then define the
• In the Schema(s) field, click the plus button to add a line.

• Click in this line and then click the three-dot button that displays to open a dialog box. Select
the Create schema from repository button to retrieve the schema defined in the EBCDIC
metadata, then select Shema04 from the drop-down list.
• If you did not retrieve the schema from the Repository tree view, select Create schema for
built-in and manually enter the name and description of your schema.
The two schemas Shema01 and Schema04 appear in the Schema(s) field of the
tFileInputEBCDIC component.
• In order to connect these two components, right-click tFileInputEBCDIC, select Row >
row_Schema04_1 in the menu and click the tMap component. Then right-click tMap, drag
a link over to tMysqlOutputBulkExec and release the right-click button. In the dialog box
that opens up, fill in the name of the ebcdic_04 output file.
• Double-click tMap to open up the tMap Editor.

File components
tFileInputEBCDIC
• Select all the columns from the row_Schema04_1 table and drag them towards the
ebcdic_04 table.
• In the table ebcdic_04, located in the Schema editor area at the bottom of the editor, click
the plus button to add a column to the schema. Name this column SUM_AG_NUMBER.
• In the table row_Schema04_1, to the left of the editor, press Ctrl and select the
CC01404_L_11_MENAG_1_1 and CC01404_AG_CAM_1_1 columns. Drag them to the
new column SUM_AG_NUMBER in table ebcdic_04. Add the sign + between the two
concatenated columns so that you have:
row_04_1.CC01404_L_11_MENAG_1_1 + row_04_1.CC01404_AG_CAM_1_
1.
• Click OK. to validate your changes and close the editor.
• In the design workspace, double-click tMysqlOutputBulkExec to display the Basic
settings view, then define the component properties:

File components
tFileInputEBCDIC
• In the Property Type list, select Repository and click the three-dot button to display a
dialog bow where you can select the database connection you want to use, which is
centralized in the Metadata folder of the Repository tree view. The Host, Port, Database,
Username and Password fields are automatically filled. If you are in Built-In mode, you
have to fill these fields manually.
• In the Table field, enter the name of the table to be created, which will contain the data
extracted from the EBCDIC file.
• In the Action on table field, select the option Create table.
• Press Ctrl+S to save your Job and click the Run view. Select the Statistics and Exec time
check boxes, then click Run to execute the Job.
The two tables are created in the database. They contain the structure, as well as the clients and
transaction data, from the original EBCDIC file.

File components
tFileInputExcel
tFileInputExcel
tFileInputExcel properties
Function tFileInputExcel reads an Excel file (.xls or .xlsx) and extracts data line by line.
Purpose tFileInputExcel opens a file and reads it row by row to split data up into fields
using regular expressions. Then sends fields as defined in the schema to the
next component in the job via a Row link.
For more information about setting up and storing file
connection parameters, see Setting up a File XML
File Name/Stream File name: Name of the file and/or the variable to be
processed.
Stream: Data flow to be processed. The data must be
added to the flow in order to be collected by
tFileInputExcel via the INPUT_STREAM variable in
the auto-completion list (Ctrl+Space).

Guide.
All sheets Select this check box to process all sheets of the Excel
file.
Sheet list Click the plus button to add as many lines as needed
to the list of the excel sheets to be processed:
Sheet (name or position): enter the name or position
of the excel sheet to be processed.
Use Regex: select this check box if you want to use a
regular expression to filter the sheets to process.
Header Number of records to be skipped in the beginning of
the file.
Footer Number of records to be skipped at the end of the file.
Limit Maximum number of lines to be processed.
Affect each Select this check box if you want to apply the
sheet(header&footer) parameters set in the Header and Footer fields to all
excel sheets to be processed.

File components
tFileInputExcel
Die on error Select this check box to stop the execution of the Job
when an error occurs. Clear the check box to skip the
row on error and complete the process for error-free
rows. If needed, you can collect the rows on error
using a Row > Reject link.
First column and Last Define the range of the columns to be processed
column through setting the first and last columns in the First
column and Last column fields respectively.
Schema Type and Edit A schema is a row description, i.e., it defines the
Guide.
Guide.
Advanced settings Advanced separator Select this check box to change the used data
separators.
DB data handling.
Read real values for Select this check box to read numbers in real values.
numbers
Stop to read on empty Select this check box to ignore empty lines.
rows
Don’t validate the Select this check box to in order not to validate data.
cells
Ignore the warning Select this check box to ignore all warnings generated
to indicate errors in the Excel file.
tStateCatcher Select this check box to gather the job processing
Statistics metadata at a job level as well as at each component
level.
Usage Use this component to read an Excel file and to output the data separately
depending on the schemas identified in the file. You can use a Row > Reject
link to filter the data which doesn’t correspond to the type defined. For an
example of how to use these two links, see Scenario 2: Extracting correct and

File components
tFileInputExcel
Related scenarios

File components
tFileInputFullRow
tFileInputFullRow
tFileInputFull Row properties
Function tFileInputFullRow reads a given file row by row.

Purpose tFileInputFullRow opens a file and reads it row by row and sends complete rows as
defined in the Schema to the next job component, via a Row link.
Basic settings Schema and Edit A schema is a row description, i.e., it defines the number of
Repository.
Click Edit schema to make changes to the schema. Note that
if you make changes, the schema automatically becomes
built-in.
previous component connected to tFileInputFullRow.
File Name Name of the file and/or the variable to be processed

Header Number of rows to be skipped at the beginning of a file
Footer Number of rows to be skipped at the end of a file.
Usage Use this component to read full rows in delimited files that can get very large. You can
also create a rejection flow using a Row > Reject link to filter the data which doesn’t
correspond to the type defined. For an example of how to use these two links, see
Scenario 2: Extracting correct and erroneous data from an XML field in a delimited
file on page 1237.
Scenario: Reading full rows in a delimited file
The following scenario creates a two-component Job that aims at reading complete rows in a file and
displaying the output in the Run log console.
• Drop a tFileInputFullRow and a tLogRow from the Palette onto the design workspace.

File components
tFileInputFullRow
• Right-click on the tFileInputFullRow component and connect it to tLogRow using a Row

Main link.
• In the design workspace, select tFileInputFullRow.

• Click the Component tab to define the basic settings for tFileInputFullRow.
• In the Basic settings view, set Schema to Built-In.

• Click the three-dot [...] button next to the Edit schema field to see the data to pass on to the
tLogRow component. Note that the schema is read-only and it consists of one column, line.
• Fill in a path to the file to process in the File Name field, or click the three-dot [...] button.
This field is mandatory. In this scenario, the file to read is test5. It holds three rows where
each row consists of tow fields separated by a semi colon.
• Define the Row separator used to identify the end of a row.
• Set the Header to 1, in this scenario the footer and the number of processed rows are not set.

File components
tFileInputFullRow
tFileInputFullRow reads the three rows one by one ignoring field separators, and the complete
rows are displayed on the Run console.
To extract only fields from rows, you must use tExtractDelimitedFields,
tExtractPositionalFields, and tExtractRegexFields. For more information, see
tExtractDelimitedFields on page 1121, tExtractPositionalFields on page 1126 and
tExtractRegexFields on page 1128.

File components
tFileInputJSON
tFileInputJSON
tFileInputJSON properties
Component Family File
Function The tFileInputJSON reads a JSON file and extracts data accordign to the
selected schema.
Purpose This component opens a file and reads it in order to isolate data according to
the schemas which describe this file structure, and to send the data and
schemas to the next component(s), via a Row connection.
Properties are stored. The fields that follow are
Schema Type and Edit A schema is a row description, i.e., it defines the
User Guide.
Guide.
Use URL Select this check box to retrieve data directly from
the Web.
URL: type in the URL path from which you will
retrieve data.
FIlename Name of the file from which you will retrieve data.
Mapping Column: shows the schema as defined in the Schema
editor.
JSONPath Query: Type in the fields to extract from
the JSON input structure.
Advanced settings Advanced separator Select this check box to change the expected data
separator.

File components
tFileInputJSON
level.
Usage Use this component to read a JSON file and separate data according to the
identified schemas in this file.
Limitation n/a
Scenario: Extracting data from the fields of a JSON format file
This scenario is made of two components and is aiming at reading a JSON file, and extracting its
data.
• Drag and drop a tFileInputJSON component from File family and a tLogRow from the
Logs & Errors family from the Palette to the Job designer.
• Link the components between them using Main > Row links.
• Double-click the tFileInputJSON component to set its properties in the Basic settings, in
the Component view:
• If your schema is already stored under the Db Connections node in the Repository, select
Repository in the Schema Type field, and choose the metadata in the list.
• If you have not defined a schema yet, select the Built-in mode, type in manually the
connection details, and the data structure of a schema.
• Click the [...] button of the Edit schema field to open a dialog box in which you will define
the output schema to be displayed.

File components
tFileInputJSON
• Click OK to close the dialog box. In the Mapping table, the items in the Column field are
automatically filled in according to the schema you just defined.In this example, the schema
is made of four columns: FirstName, LastName, Address and City.
• In the Filename field, fill in the path to the JSON file from which you want to retrieve data.
If your data are stored on the internet, select the Use URL check box, and then, in the same
way, fill in the access URL to the file to be processed. In this example, the processed file is
presented as follows:
• In the Mapping table, the rows in the Column field are already filled in. For each of them,
type in the tree view level in which retrieve data, in the JSONPath query field.
• In the Job designer, double-click the tLogRow to set its properties in the Basic settings tab,
in the Component view.
• Click Sync Columns button to retrieve the schema of the previous component.

File components
tFileInputJSON
The Job returns the customer information according to the parameters selected in the schema.

File components
tFileInputLDIF
tFileInputLDIF
tFileInputLDIF Properties
Component Family File/Input
Function tFileInputLDIF reads a given LDIF file row by row.

Purpose tFileInputLDIF opens a file, reads it row by row, et gives the full rows to the next
component as defined in the schema, using a Row connection.
stored. The following fields are filled automatically using
the fetched data.
File Name Name of the file and/or variable to be processed.

add operation as prefix Select this check box to display the operation mode.
when the entry is
modify type
Repository. Click Edit Schema to modify the schema. Note
becomes built-in.
Advanced settings Encoding Type Select the encoding type from the list or select Custom and
handling.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.
Usage Use this component to read full rows in a voluminous LDIF file. This component
enables you to create a data flow, using a Row > Main link, and to create a reject flow
with a Row > Reject link filtering the data which type does not match the defined type.
For an example of usage, see Scenario 2: Extracting erroneous XML data via a reject
flow on page 1244 from tFileInputXML.

File components
tFileInputLDIF
Related scenario
For a related scenario, see Scenario: Writing DB data into an LDIF-type file on page 896..

File components
tFileInputMail
tFileInputMail
tFileInputMail properties
Function tFileInputMail reads the header and content parts of defined email file.
Purpose This component helps to extract standard key data from emails.
Basic settings File name Browse to the source email file.
Schema Type and A schema is a row description, i.e., it defines the
Edit Schema number of fields that will be processed and
User Guide.
Attachment export Enter the path to the directory where you want to
directory export email attachments.
Mail parts Column: This field is automatically populated
with the columns defined in the schema that you
propagated.
Mail part: Type in the label of the header part or
body to be displayed on the output.
Multi value: Select the check box next to the
name of the column that is made up of fields of
multiple values.
Field separator: Enter a value separator for the
field of multiple values.
Die on error Select this check box to stop the execution of the
Job when an error occurs. Clear the check box to
skip the row on error and complete the process for
error-free rows.
component level.

File components
tFileInputMail
Usage This component handles flow of data therefore it requires output. It is

defined as an intermediary step.
Limitation n/a
Scenario: Extracting key fields from an email
This Java scenario describes a two-component Job that extracts some key standard fields and
displays the values on the Run console.
• Drop a tFileInputMail and a tLogRow component from the Palette to the design
workspace.
• Connect the two components together using a Main Row link.
• Double-click tFileInputMail to display its Basic settings view and define the component
properties.
• Click the three-dot button next to the File Name field and browse to the mail file to be
processed.
• Set schema type to Built-in and click the three-dot button next to Edit schema to open a
dialog box where you can define the schema including all columns you want to retrieve on
your output.
• Click the plus button in the dialog box to add as many columns as you want to include in the
output flow. In this example, the schema has four columns: Date, Author, Object and Status.
• Once the schema is defined, click OK to close the dialog box and propagate the schema into
the Mail parts table.
• Click the three-dot button next to Attachment export directory and browse to the directory
in which you want to export email attachments, if any.

File components
tFileInputMail
• In the Mail part column of the Mail parts table, type in the actual header or body standard
keys that will be used to retrieve the values to be displayed.
• Select the Multi Value check box next to any of the standard keys if more than one value
for the relative standard key is present in the input file.
• If needed, define a separator for the different values of the relative standard key in the
Separator field.
properties in order for the values to be separated by a carriage return. On Windows OS, type
in \n between double quotes.
• Save your Job and press F6 to execute it and display the output flow on the console.
The header key values are extracted as defined in the Mail parts table. Mail reception date, author,
subject and status are displayed on the console.

File components
tFileInputMSDelimited
tFileInputMSDelimited properties
Function tFileInputMSDelimited reads a complex multi-structured delimited file.

Purpose tFileInputMSDelimited opens a complex multi-structured file, reads its data
structures (schemas) and then uses Row links to send fields as defined in the different
schemas to the next job components.
Basic settings Multi Schema Editor The [Multi Schema Editor] helps to build and configure the
data flow in a multi-structure delimited file to associate one
schema per output.
For more information, see The Multi Schema Editor on page
851.
Output Lists all the schemas you define in the [Multi Schema
Editor], along with the related record type and the field
separator that corresponds to every schema, if different field
separators are used.
and complete the process for error-free rows.
Advanced settings Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.
Usage Use this component to read multi-structured delimited files and separate fields
contained in these files using a defined separator.
The Multi Schema Editor

The [Multi Shema Editor] enables you to:
• set the path to the source file,
• define the source file properties,
• define data structure for each of the output schemas.
When you define data structure for each of the output schemas in the [Multi
Schema Editor], column names in the different data structures automatically
appear in the input schema lists of the components that come after
tFileInputMSDelimited. However, you can still define data structures directly in the
Basic settings view of each of these components.
The [Multi Shema Editor] also helps to declare the schema that should act as the source schema
(primary key) from the incoming data to insure its unicity.The editor uses this mapping to
associate all schemas processed in the delimited file to the source schema in the same file.

File components
The editor opens with the first column, that usually holds the record type
indicator, selected by default. However, once the editor is open, you can select the
check box of any of the schema columns to define it as a primary key.
The below figure illustrates an example of the [Multi Schema Editor].
For detailed information about the usage of the Multi Schema Editor, see Scenario: Reading
a multi structure delimited file on page 852.
Scenario: Reading a multi structure delimited file
The following scenario creates a Java Job which aims at reading three schemas in a delimited file
and displaying their data structure on the Run Job console.
The delimited file processed in this example looks like the following:

File components
• Drop a tFileInputMSDelimited and tLogRow (X3) components from the Palette onto the
design workspace.
• Double-click tFileInputMSDelimited to open the Multi Schema Editor.
• Click Browse... next to the File name field to locate the multi schema delimited file you need
to process.
• In the File Settings area:
-Select from the list the encoding type the source file is encoded in. This setting is meant to
ensure encoding consistency throughout all input and output files.
-Select the field and row separators used in the source file.
Select the Use Multiple Separator check box and define the fields that follow
accordingly if different field separators are used to separate schemas in the source file.
A preview of the source file data displays automatically in the Preview panel.

File components
Column 0 that usually holds the record type indicator is selected by default. However,
you can select the check box of any of the other columns to define it as a primary key.
• Click Fetch Codes to the right of the Preview panel to list the type of schema and records
you have in the source file. In this scenario, the source file has three schema types (A, B, C).
• Click each schema type in the Fetch Codes panel to display its data structure below the
Preview panel.
• Click in the name cells and set column names for each of the selected schema.
In this scenario, column names read as the following:
-Schema A: Type, DiscName, Author, Date,
-Schema B: Type, SongName,
-Schema C: Type, LibraryName.

File components
You need now to set the primary key from the incoming data to insure its unicity (DiscName in this
scenario). To do that:
• In the Fetch Codes panel, select the schema holding the column you want to set as the
primary key (schema A in this scenario) to display its data structure.
• Click in the Key cell that corresponds to the DiscName column and select the check box that
displays.
• Click anywhere in the editor and the false in the Key cell will become true.
You need now to declare the parent schema by which you want to group the other “children”
schemas (DiscName in this scenario). To do that:
• In the Fetch Codes panel, select schema B and click the right arrow button to move it to the
right.
• Do the same with schema C.
The Cardinality field is not compulsory. It helps you to define the number (or range)
of fields in “children” schemas attached to the parent schema. However, if you set the
wrong number or range and try to execute the Job, an error message will display.
• In the [Multi Schema Editor], click OK to validate all the changes you did and close the
editor.
The three defined schemas along with the corresponding record types and field separators display
automatically in the Basic settings view of tFileInputMSDelimited.

File components
• In the design workspace, right-click tFileInputMSDelimited and connect it to tLogRow1,

tLogRow2, and tLogRow3 using the row_A_1, row_B_1, and row_C_1 links respectively.
The three schemas you defined in the [Multi Schema Editor] are automatically passed to the three
tLogRow components.
• If needed, click the Edit schema button in the Basic settings view of each of the tLogRow
components to view the input and output data structures you defined in the Multi Schema
Editor or to modify them.

File components
• Save your Job and click F6 to execute it.

The multi schema delimited file is read row by row and the extracted fields are displayed on the Run
Job console as defined in the [Multi Schema Editor].

File components
tFileInputMSPositional
tFileInputMSPositional properties
Function tFileInputMSPositional reads multiple schemas from a positional file.

Purpose tFileInputMSPositional opens a complex multi-structured file, reads its data structures
(schemas) and then uses Row links to send fields as defined in the different schemas to the
next job components.
Repository: Select the repository file where properties are
stored. The fields that come after are pre-filled in using
fetched data.
File Name Name of the file and/or the variable to be processed
Related topic:Defining variables from the Component view of

Schema identifier Field Character, string or regular expression to separate fields.
Position
Records Schema: define as many schemas as needed.
Schema identifier value: value of the string which identifies
the different schemas. Type in the column name.
Pattern: string which represents the length of each column of
the schema, separated by commas. Make sure the values
defined in this field are relevant with the defined schema.
Reject incorrect row size: select the check boxes of the
schemas where to reject incorrect row size.
Parent key column: Type in the parent key column name.
Key column: Type in the key column name/
Skip from header Number of rows to be skipped in the beginning of file.
Skip from footer Number of rows to be skipped at the end of the file.
Die on parse error Let the component die if an parsing error occurs.
Die on unknown header Length values separated by commas, interpreted as a string
type between quotes. Make sure the values entered in this fields are
consistent with the schema defined.
Advanced settings Process long rows Select this check box to process long rows (this is necessary to
(needed for processing process rows longer than 100 000 characters).
rows longger than
100,000 characters wide)

File components
Advanced separator (for Select this check box to modify the separators used for
numbers) numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
handling.
tStatCatcher Statistics Select this check box to gather the job processing metadata at
a job level as well as at each component level.
Usage Use this component to read a multi schemas positional file and separate fields using a
position separator value. You can also create a rejection flow using a Row > Reject link to
filter the data which doesn’t correspond to the type defined. For an example of how to use
these two links, see Scenario 2: Extracting correct and erroneous data from an XML field
in a delimited file on page 1237.
Related scenario
For related use case, see tFileInputMSDelimited Scenario: Reading a multi structure delimited file
on page 852.

File components
tFileInputMSXML
tFileInputMSXML
tFileInputMSXML Properties
Component family XML or File/Input
Function tFileInputMSXML reads and outputs multiple schema within an XML

structured file.
Purpose tFileInputMSXML opens a complex multi-structured file, reads its data
structures (schemas) and then uses Row links to send fields as defined in the
different schemas to the next job components.
Basic settings File Name Name of the file and/or the variable to be processed

Guide.
Root XPath query The root of the XML tree, which the query is based
on.
Enable XPath in Select this check box if you want to define a XPath
column “Schema path in the Schema XPath loop field of th Outputs
XPath loop” But lose array.
the order
This option is only available with the
dom4j generation mode. Make sure this
mode is selected in the Generation mode
list, in the Advanced settings tab of your
component. If you use this option, the data will
not be returned in order.
Outputs Schema: define as many schemas as needed.
Schema XPath loop: node of the XML tree or XPath
path which the loop is based on..
If you want to use a XPath path in the

Schema XPath loop field, you must
select the Enable XPath in column
“Schema XPath loop” but lose the order
check box.
XPath Queries: Enter the fields to be extracted from

the structured input.
Create empty row: select the check boxes of the
schemas where you want to create empty rows.
rows.
Advanced settings Trim all column Select this check box to remove leading and trailing
Generation mode Select the generation mode from the list.

File components
tFileInputMSXML
CUSTOM and define it manually. This field is
level.
Limitation n/a
Scenario: Reading a multi structure XML file
The following scenario creates a Java Job which aims at reading a multi schema XML file and
displaying data structures on the Run Job console.
The XM file processed in this example looks like the following:
• Drop a tFileInputMSXML and two tLogRow components from the Palette onto the design
workspace.
• Double-click tFileInputMSXML to open the component Basic settings view.

File components
tFileInputMSXML
• Browse to the XML file you want to process.

• In the Root XPath query field, enter the root of the XML tree, which the query will be based
on.
• Select the Enable XPath in column “Schema XPath loop” but lose the order check box
if you want to define a XPath path in the Schema XPath loop field, in the Outputs array.
In this scenario, we do not use this option.
• Click the plus button to add lines in the Outputs table where you can define the output
schema, two lines in this scenario: record and book.
• In the Outputs table, click in the Schema cell and then click a three-dot button to display a
dialog box where you can define the schema name.
• Enter a name for the output schema and click OK to close the dialog box.
The tFileInputMSXML schema editor displays.
• Define the schema you previously defined in the Outputs table.
• Do the same for all the output schemas you want to define.
• In the design workspace, right-click tFileInputMSXML and connect it to tLogRow1, and
tLogRow2 using the record and book links respectively.
• In the Basic settings view and in the Schema XPath loop cell, enter the node of the XML
tree, which the loop is based on.
• In the XPath Queries cell, enter the fields to be extracted from the structured XML input.
• Select the check boxes next to schemas’ names where you want to create empty rows.
• Save your Job and press F6 to execute it. The defined schemas are extracted from the multi
schema XML structured file and displayed on the console.

File components
tFileInputMSXML
The multi schema XML file is read row by row and the extracted fields are displayed on the Run
Job console as defined.

File components
tFileInputPositional
tFileInputPositional properties
Function tFileInputPositional reads a given file row by row and extracts fields based on a
pattern.
Purpose This component opens a file and reads it row by row to split them up into fields then
sends fields as defined in the schema to the next job component, via a Row link.
are stored. The fields that follow are pre-filled in using
fetched data.
File Name/Stream File name: Path and name of the output file and/or
variable to be used
added to the flow so that it can be collected by
tFileInputPositional via the INPUT_STREAM variable
in the autocompletion list (Ctrl+Space).

Customize Select this check box to customize the data format of the
positional file and define the table columns:
Column: Select the column you want to customize.
Size: Enter the column size.
Padding char: Type in between inverted commas the
padding character used in order for it to be removed
from the field. A space by default.
Alignment: Select the appropriate alignment parameter.
Pattern Length values separated by commas, interpreted as a
string between quotes. Make sure the values entered in
this field are consistent with the schema defined.
Skip empty rows select this check box to skip empty rows.
Uncompress as zip file Select this check box to uncompress the input file.
rows. If needed, you can collect the rows on error using
a Row > Reject link.

File components
Limit Maximum number of rows to be processed. If Limit = 0,

no row is read or processed.
Schema and Edit Schema A schema is a row description, i.e., it defines the number
of fields that will be processed and passed on to the next
component. The schema is either built-in or remote in
the Repository.
Built-in: The schema will be created and stored locally
for this component only. Related topic: Setting a built-in
the Repository, hence can be reused in various projects
and job flowcharts. Related topic: Setting a Repository
Advanced settings Needed to process rows Select this check box if the rows to be processed in the
longer than 100 000 input file are longer than 100 000 characters.
characters
Advanced separator Select this check box to change the expected data
separator.
Encoding Type Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB
data handling.
level.
Usage Use this component to read a file and separate fields using a position separator value.
You can also create a rejection flow using a Row > Reject link to filter the data which
doesn’t correspond to the type defined. For an example of how to use these two links,
see Scenario 2: Extracting correct and erroneous data from an XML field in a
delimited file on page 1237.
Scenario: From Positional to XML file
The following scenario creates a two-component Job, which aims at reading data of an Input file and
outputting selected data (according to the data position) into an XML file.
• Drop a tFileInputPositional component from the Palette to the design workspace. The file
contains raw data, in this case, contract nr, customer references and insurance numbers.
• Drop a tFileOutputXML component as well. This file is meant to receive the references in
a structured way.

File components
• Right-click the tFileInputPositional component and select Row > Main. Then drag it onto
the tFileOutputXML component and release when the plug symbol shows up.
• Select the tFileInputPositional component again, and define its properties.
• The job properties are built-in for this scenario. As opposed to the Repository, this means that
the Property type is set for this station only.
• Fill in a path to the file in the File Name field. This field is mandatory.
• Define the Row separator identifying the end of a row, by default, a carriage return.
• Then define the Pattern to delimit fields in a row. The pattern is a series of length values
corresponding to the values of your input files. The values should be entered between quotes,
and separated by a comma. Make sure the values you enter match the schema defined.
• In this scenario, the header, footer and limit fields are not set. But depending on the input file
structure, you may need to define them.
• Next to Schema, select either Repository or Built-In to define the data to pass on to the
tFileOutputXML component.
• You can load and/or edit the schema via the Edit Schema function. For this schema, define
three columns, respectively Contracts, CustomerRef and InsuranceNr matching the three
value lengths defined.
• Then define the second component Basic settings:

File components
• Enter the XML output file path.
• Enter a root tag (or more), to wrap the XML structure output, in this case ‘ContractsList’.
• Define the row tag that will wrap each line data, in this case ‘ContractRef’.
• Select the Column name as tag name check box to reuse the column label from the input
schema as tag label. By default, ‘field’ is used for each column value data.
• Enter the Encoding standard, the input file is encoded in. Note that, for the time being, the
encoding consistency verification is not supported.
• Select the Schema type. If the row connection is already implemented, the schema is
automatically synchronized with the Input file schema. Else, click on Sync columns.
• Go to the Run tab, and click on Run to execute the Job.
The file is read row by row and split up into fields based on the length values defined in the Pattern
field. You can open it using any standard XML editor.

File components

File components
tFileInputProperties
tFileInputProperties properties
Function tFileInputProperties reads a text file row by row and extracts the fields.
Purpose tFileInputProperties opens a text file and reads it row by row then separates the fields
according to the model key = value.
component.
The schema is either built-in or remotely stored in the
Repository but for this component, the schema is read-only. It
is made of two column, Key and Value, corresponding to the
parameter name and the parameter value to be copied.
File format Select from the list your file format, either: .properties or .ini.
.properties: data in the configuration file is written in two
lines and structured according to the following way: key =
value.
.ini: data in the configuration file is written in two lines and
structured according to the following way: key = value and
re-grouped in sections.
Section Name: enter the section name on which the iteration
is based.
File Name Name or path to the file to be processed. Related topic:
Defining variables from the Component view of Talend
Usage Use this component to read a text file and separate data according to the structure key
= value.
Scenario: Reading and matching the keys and the values of different
.properties files and outputting the results in a glossary
This four-component Java Job reads two .properties files, one in French and the other in English.
The data in the two input files is mapped to output a glossary matching the English and French terms.
The two input files used in this scenario hold localization strings for the tMysqlInput component
in Talend Open Studio.

File components
tFileInputProperties (x2), tMap, and tLogRow.
• Connect the component together using Row > Main links. The second properties file, FR,
is used as a lookup flow.
• Double-click the first tFileInputProperties component to open its Basic settings view and

File components
• In the File Format field, select your file format.

• In the File Name field, click the three-dot button and browse to the input .properties file you
want to use.
• Do the same with the second tFileInputProperties and browse to the French properties file
this time.
• Double-click the tMap component to open the tMap editor.
• Select all columns from the English_terms table and drop them to the output table.
• Select the key column from the English_terms table and drop it to the key column in the
French_terms table.
• In the glossary table in the lower right corner of the tMap editor, rename the value field as
EN because it will hold the values of the English file.

File components
• Click the plus button to add a line to the glossary table and rename it as FR.
• In the Length field, set the maximum length to 255.
• In the upper left corner of the tMap editor, select the value column in the English_terms table
and drop it to the FR column in the French_terms table.
• Click OK to validate your changes and close the editor.
• In the design workspace, double-click tLogRow to display its Basic settings and define the
• Click Sync Columns to retrieve the schema from the preceding component.
The glossary displays on the console listing three columns holding: the key name in the first column,
the English term in the second, and the corresponding French term in the third.

File components
tFileInputRegex
tFileInputRegex
tFileInputRegex properties
Function Powerful feature which can replace number of other components of the File
family. Requires some advanced knowledge on regular expression syntax
Purpose Opens a file and reads it row by row to split them up into fields using regular
expressions. Then sends fields as defined in the Schema to the next job
component, via a Row link.
are stored. The following fields are pre-filled in using
fetched data.
File Name/Stream File name: Name of the file and/or the variable to be
processed
added to the flow so that it can be collected by the
tFileInputRegex via the INPUT_STREAM variable in
the autocompletion list (Ctrl+Space)

Regex This field is Perl or Java compatible and can contain
multiple lines. Type in your regular expressions
including the subpattern matching the fields to be
extracted.
Note: In Java, antislashes need to be doubled in regexp
Regex syntax is different in Java/Perl and

requires doubel/single quotes respectively.
Limit Maximum number of rows to be processed. If Limit = 0,
no row is read or processed.
Schema of fields that will be processed and passed on to the next
component. The schema is either built-in or remote in
the Repository.
Built-in: The schema will be created and stored locally
for this component only. Related topic: Setting a built-in

File components
tFileInputRegex

the Repository, hence can be reused in various projects
and job flowcharts. Related topic: Setting a Repository
rows. If needed, you can collect the rows on error using
a Row > Reject link.
handling.
Usage Use this component to read a file and separate fields contained in this file
according to the defined Regex. You can also create a rejection flow using a Row
> Reject link to filter the data which doesn’t correspond to the type defined. For
an example of how to use these two links, see Scenario 2: Extracting correct and
Limitation n/a
Scenario: Regex to Positional file
The following scenario creates a two-component Job, reading data from an Input file using regular
expression and outputting delimited data into an XML file.
• Drop a tFileInputRegex component from the Palette to the design workspace.

• Drop a tFileOutputPositional component the same way.
• Right-click on the tFileInputRegex component and select Row > Main. Drag this main row
link onto the tFileOutputPositional component and release when the plug symbol displays.
• Select the tFileInputRegex again so the Component view shows up, and define the
properties:

File components
tFileInputRegex
• The Job is built-in for this scenario. Hence, the Properties are set for this station only.
• Fill in a path to the file in File Name field. This field is mandatory.
• Define the Row separator identifying the end of a row.
• Then define the Regular expression in order to delimit fields of a row, which are to be
passed on to the next component. You can type in a regular expression using Java or Perl
code, and on mutiple lines if needed.
Make sure to use the correct Regex syntax according to the generation language in use as the syntax is
different in Java/Perl, and include the regex in double/single quotes respectively.
• In this expression, make sure you include all subpatterns matching the fields to be extracted.
• In this scenario, ignore the header, footer and limit fields.
• Select a local (Built-in) Schema to define the data to pass on to the tFileOutputPositional
component.
• You can load or create the schema through the Edit Schema function.
• Then define the second component properties:

File components
tFileInputRegex
• Enter the Positional file output path.

• Enter the Encoding standard, the output file is encoded in. Note that, for the time being, the
encoding consistency verification is not supported.
• Select the Schema type. Click on Sync columns to automatically synchronize the schema
with the Input file schema.
• Now go to the Run tab, and click on Run to execute the Job.
The file is read row by row and split up into fields based on the Regular Expression definition. You
can open it using any standard file editor.

File components
tFileInputXML
tFileInputXML
tFileInputXML belongs to two component families: File and XML. For more information on
tFileInputXML, see tFileInputXML on page 1241.

File components
tFileList
tFileList
tFileList properties
Function tFileList iterates on files or folders of a set directory.

Purpose tFileList retrieves a set of files or folders based on a filemask pattern and
iterates on each unity.
Basic settings Directory Path to the directory where files are stored.
FileList Type Select in the list the type of the input you want to
iterate on:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.
Include Select this check box if the selected input source
subdirectories type include sub-directories.
Case Sensitive Set the case mode from the list to either create or
not create case sensitive filter on filenames.
Generate Error if no Select this check box to generate an error
file found message if no files or directories are found.
Use Glob This check box is selected by default. It filters the
Expressions as results using a Global Expression (Glob
Filemask Expressions).
(Unchecked means Clear this check box to filter results using a
Perl5 Regex Regex Expression of the type Perl5.
Expressions)
Files Click the plus button to add as many filter lines as
needed:
Filemask: in the added filter lines, type in a
filename or a filemask using special characters or
regular expressions.
Usage tFilelist provides a list of files or folders from a defined directory on
which it iterates
Scenario: Iterating on a file directory
The following scenario creates a three-component Job, which aims at listing files from a defined
directory, reading each file by iteration, selecting delimited data and displaying the output in the
Run log console.

File components
tFileList
• Drop the following components from the Palette to the design workspace: tFileList,
tFileInputDelimited, and tLogRow.
• Right-click on the tFileList component, and pull an Iterate connection to the
tFileInputDelimited component. Then pull a Main row from the tFileInputDelimited to
the tLogRow component.
• Double-click tFileList to display its Basic settings view and define its properties.
• Browse to the Directory that holds the files you want to process. To display the path on the
Job itself, use the label (__DIRECTORY__) that shows up when you put the pointer
anywhere in the Directory field. Type in this lable in the Label Format field you can find
if you click the View tab in the Basic settings view.
• In the Basic settings view and from the FileList Type list, select the source type you want
to process, Files in this example.
• In the Case sensitive list, select a case mode, Yes in this example to create case sensitive
filter on file names.
• Keep the Use Glob Expressions as Filemask check box selected if you want to use global
expressions to filter files.
• In the Filemask field, define a file mask, use special characters if need be.
• Double-click tFileInputDelimited to display its Basic settings view and set its properties.

File components
tFileList
• Enter the File Name field using a variable containing the current filename path, as you filled
in the Basic settings of tFileList. Press Ctrl+Space bar to access the autocomplete list of
variables.
• Select the global variable
((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) if you
work in Java and, or $_globals{tFileList_1}{CURRENT_FILEPATH} if you
work in Perl. This way, all files in the input directory can be processed.
• Fill in all other fields as detailed in the tFileInputDelimited section. Related topic:
tFileInputDelimited properties on page 824.
• Select the last component, tLogRow, to display its Basic settings view and fill in the
separator to be used to distinguish field content displayed on the console. Related topic:
tLogRow on page 1023.

File components
tFileList
The Job iterates on the defined directory, and reads all included files. Then delimited data is passed
on to the last component which displays it on the console.
For other scenarios using tFileList, see tFileCopy on page 812.

File components
tFileOutputARFF
tFileOutputARFF
tFileOutputARFF properties
Component family File/Output
Function tFileOutputARFF outputs data to an ARFF file.

Purpose This component writes an ARFF file that holds data organized according to
the defined schema.
Properties are stored. The following fields are filled
in automatically using fetched data
File name Name or path to the output file and/or the variable to
be used.

Guide.
Attribute Define Displays the schema you defined in the [Edit
schema dialog box..
Column: Name of the column.
Type: Data type.
Pattern: Enter the data model (pattern), if necessary.
Relation Enter the name of the relation.
of the file.
Built-in: You can create the schema and store it
locally for this component. Related topic: Setting a
Guide.

File components
tFileOutputARFF

exists directory to hold the output table if it does not exist.
Advanced settings Don’t generate empty Select this check box if you do not want to generate
file empty files.
level.
Usage Use this component along with a Row link to collect data from another
component and to re-write the data to an ARFF file.
Limitation n/a
Related scenarios
For tFileOutputARFF related scenario, see Scenario : Display the content of a ARFF file on page
821.

File components
tFileOutputDelimited
tFileOutputDelimited properties
Function tFileOutputDelimited outputs data to a delimited file.

Purpose This component writes a delimited file that holds data organized according to
the defined schema.
File name Name or path to the output file and/or the variable to
be used.

Guide.
Row separator String (ex: “\n” on Unix) to distinguish rows in the
output file.
fields of the output file.
of the file
the file.
Compress as zip file Select this check box to compress the output file in
zip format.
Built-in: You can create the schema and store it
locally for this component. Related topic: Setting a
Guide.
Guide.

File components
Sync columns Click to synchronize the output file schema with the
input file schema. The Sync function only displays
once the Row connection is linked with the output
component.
Advanced settings Advanced separator Select this check box to modify the separators used
(for numbers) for numbers:
Thousands separator: define separators for
thousands.
CSV options Select this check box to take into account all
parameters specific to CSV files, in particular
Escape char and Text enclosure parameters.
Create directory if not This check box is selected by default. It creates the
exists directory that holds the output delimited file, if it
does not already exist.
Split output in several In case of very big output files, select this check box
files to divide the output delimited file into several files.
Rows in each output file: set the number of lines in
each of the output files.
Custom the flush Select this check box to define the number of lines to
buffer size write before emptying the buffer.
Row Number: set the number of lines to write.
Output in row mode Writes in row mode.
DB data handling.
Don’t generate empty Select this check box if you do not want to generate
file empty files.
level.
Usage Use this component to write a delimited file and separate fields using a field
separator value.
Limitation n/a
Scenario: Writing data in a delimited file
This Java scenario describes a three-component Job that extracts certain data from a file holding
information about clients, customers, and then writes the extracted data in a delimited file.
In the following example, we have already stored the input schema under the Metadata node in the
Repository tree view. For more information about storing schema metadata in the Repository, see
Setting up a File Delimited schema.

File components
• In the Repository tree view, expand Metadata and File delimited in succession and then
browse to your input schema, customers, and drop it on the design workspace. A dialog box
displays where you can select the component type you want to use.
• Click tFileInputDelimited and then OK to close the dialog box. A tFileInput Delimited
component holding the name of your input schema displays on the design workspace.
• Drop a tMap and a tFileOutputDelimited components from the Palette to the design
workspace.
• Link the components together using Main Row links.
• Double-click tFileInputDelimited to open its Basic settings view. All its property fields are
automatically filled in because you defined your input file locally.
• If you do not define your input file locally in the Repository tree view, fill in the details
manually after selecting Built-in in the Property type list.
• Click the three-dot button next to the File Name field and browse to the input file,
customer.csv in this example.
• In the Row Separators and Field Separators fields, enter respectively "\n" and ";" as line
and field separators.

File components
• If needed, set the number of lines used as header and the number of lines used as footer in
the corresponding fields and then set a limit for the number of processed rows.
In this example, Header is set to 6 while Footer and Limit are not set.
• In the Schema field, schema is automatically set to Repository and your schema is already
defined since you have stored your input file locally for this example. Otherwise, select
Built-in and click Edit Schema to open a dialog box where you can define the input schema.

• In the design workspace, double-click tMap to open its editor.
• In the tMap editor, click on top of the panel to the right to open the [Add a new output
table] dialog box.

File components
• Enter a name for the table you want to create, row2 in this example.
• Click OK validate your changes and close the dialog box.
• In the table to the left, row1, select the first three lines (Id, CustomerName and
CustomerAddress) and drop them to the table to the right
• In the Schema editor view situated in the lower left corner of the tMap editor, change the
type of RegisterTime to String in the table to the right.
• Click OK to save your changes and close the editor.

• In the design workspace, double-click tFileOutputDelimited to open its Basic settings
view and define the component properties.
• In the Property Type field, set the type to Built-in and fill in the fields that follow manually.
• Click the three-dot button next to the File Name field and browse to the output file you want
to write data in, customerselection.txt in this example.
• In the Row and Field Separators fields, set “\n” and “;” respectively as row and field
separators.
• Select the Include Header check box if you want to output columns headers as well.

File components
• Click Edit schema to open the schema dialog box and verify if the recuperated schema
corresponds to the input schema. If not, click Sync Columns to recuperate the schema from
the preceding component.
The three specified columns Id, CustomerName and CustomerAddress are output in the defined
output file.

File components
tFileOutputEBCDIC
tFileOutputEBCDIC
This component requires a JDK Sun to be functional.
tFileOutputEBCDIC properties
Function The tFileOutputEBCDIC writes ab EBCDIC file based on various source data files,
each of them with a different schema.
Purpose This component writes an EBCDIC file with data extracted from files based on their
schemas.
stored. The fields that come after are pre-filled in using the
fetched data.
File Name Access path and name of the EBCDIC output file et/ou
variable to be used
For further information, see Defining Contexts and

variables in Talend Open Studio.
Xc2j file Select the xc2j transformation file.
Usage Use this component to write an EBCDIC file and to output the data separately
depending on the schemas identified in the incoming file.
Scenario: Creating an EBCDIC file using two delimited files
This scenario uses the [Copybook Connection] wizard that guides users through the
different steps to create a Copybook connection and to retrieve the EBCDIC schemas.
This wizard is available only for Talend Integration Suite users. If you are using
Talend Open Studio or Talend On Demand, you need to set the basic settings for the
tFileInputEBCDIC component manually.
The following scenario is a three-component Job that aims at writing an EBCDIC-format file using
two delimited files with different schemas.

File components
tFileOutputEBCDIC
This Java scenario uses the EBCDIC Connection wizard to set up a connection to the Copybook file
and to generate an xc2j file, which allows the retrieval and transformation of the different file
schemas.
• Create a connection to the Copybook file, which describes the structure of your EBCDIC
file. In this scenario, the Copybook connection is called EBCDIC.
• Retrieve the file schemas.
Once the Copybook connection has been created and the schemas retrieved, using the EBCDIC and
Schema wizards, the new schemas appear under the node Metadata > Copybook. They are called
01, 04 and 05.
To create an EBCDIC file based on two delimited files in Talend Open Studio :
tFileInputDelimited (x2) and tFileOutputEBCDIC.
• To connect them together, right-click on each tFileInputDelimited component, select Row
> Main in the contextual menu and click on the tFileOutputEBCDIC component.
• Double-click on the first tFileInputDelimited component to display the Basic settings view
and set the component properties.
• In the File Name field, browse to the delimited file via the three-dot button [...].
• In the Schema field, select Repository, then click the three-dot button and, when prompted,
select the schema corresponding to your file, under the Copybook node.

File components
tFileOutputEBCDIC
• In the Header field, set the number of fields that are used as “headers”, 1 in this example.
• Set the properties for the second tFileInputDelimited component the same way as for the
first component.
• Double-click the tFileOutputEBCDIC component to display the Basic settings view and
set the component properties:
• In the Data file field, enter or browse to the directory path and the EBCDIC file name that
is to be created based on both delimited files.
• In the Xc2j file field, enter or browse to the path to the file allowing to extract the schema
that describes the EBCDIC structure file.
• Save your Job via Ctrl+S and click on the Run view, select the Statistics and Exec time
check boxes then click Run to execute the Job.

File components
tFileOutputExcel
tFileOutputExcel
tFileOutputExcel Properties
Function tFileOutputExcel outputs data to an MS Excel type of file.

Purpose tFileOutputExcel writes an MS Excel file with separated data value
according to a defined schema.
Basic settings File name Name or path to the output file. Related topic:
Defining variables from the Component view of
Sheet name Name of the xsl sheet.
Include header Select this check box to include a header row to the
output file.
Append existing file Select this check box to add the new lines at the end
of the file.
Append existing sheet: Select this check box to add
the new lines at the end of the Excel sheet.
Is absolute Y pos. Select this check box to add information in specified
cells:
First cell X: cell position on the X-axis
(X-coordinate or Abcissa).
First cell Y: cell position on the Y-axis
(Y-coordinate).
Keep existing cell format: select this check box to
retain the original layout and format of the cell you
want to write into.
Font Select in the list the font you want to use.
Define all columns Select this check box if you want the size of all your
auto size columns to be defined automatically. Otherwise,
select the Auto size check boxes next to the column
names you want their size to be defined
automatically.
User Guide.
Guide.

File components
tFileOutputExcel
Advanced settings Create directory if not This check box is selected by default. This option
exists creates the directory that will hold the output files if
it does not already exist.
Advanced separator Select this check box to modify the separators you
(for numbers) want to use for numbers:
Thousands separator: define separators for
thousands.
level.
Usage Use this component to write an XML file with data passed on from other
components using a Row link.
Limitation n/a
Related scenario
For tFileOutputExcel related scenario, see tSugarCRMInput on page 149.

File components
tFileOutputLDIF
tFileOutputLDIF
tFileOutputLDIF Properties
Function tFileOutputLDIF outputs data to an LDIF type of file which can then be
loaded into a LDAP directory.
Purpose tFileOutputLDIF writes or modifies a LDIF file with data separated in
respective entries based on the schema defined,.or else deletes content from
an LDIF file.
Basic settings File name Name or path to the output file and/or the variable to
be used.

Guide.
Wrap Wraps the file content, every defined number of
characters.
Change type Select Add, Modify or Delete to respectively create
an LDIF file, modify or remove an existing LDIF
file. In case of modification, set the type of attribute
changes to be made.
User Guide.
Guide.
once the Row connection is linked with the Output
component.
of the file.
DB data handling.

File components
tFileOutputLDIF
Limitation n/a
Scenario: Writing DB data into an LDIF-type file
This scenario describes a two component Job which aims at extracting data from a database table
and writing this data into a new output LDIF file.
• Drop a tMysqlInput and a tFileOutputLDIF component from the Palette to the design
area. Connect them together using a Row > Main link.
• Select the tMysqlInput component, and go to the Component panel then select the Basic
settings tab.
• If you stored the DB connection details in a Metadata entry in the Repository, set the
Property type as well as the Schema type on Repository and select the relevant metadata
entry. All other fields are filled in automatically, and retrieve the metadata-stored
parameters.
• Alternatively select Built-in as the Property type and Schema type and fill in the DB
connection and schema fields manually.
• Then double-click on tFileOutpuLDIF and define the Basic settings.
• Browse to the folder where you store the Output file. In this use case, a new LDIF file is to
be created. Thus type in the name of this new file.
• In the Wrap field, enter the number of characters held on one line. The text coming
afterwards will get wrapped onto the next line.

File components
tFileOutputLDIF
• Select Add as Change Type as the newly created file is by definition empty. In case of
modification type of Change, you’ll need to define the nature of the modification you want
to make to the file.
• As the Schema type, select Built-in and use the Sync Columns button to retrieve the input
schema definition.
• Press F6 to short run the Job.
The LDIF file created contains the data from the DB table and the type of change made to the file,
in this use case, addition.

File components
tFileOutputMSDelimited
tFileOutputMSDelimited properties
Component family File/ Output
Function tFileOutputMSDelimited writes multiple schema in a delimited file.

Purpose tFileOutputMSDelimited creates a complex multi-structured delimited file, using data
structures (schemas) coming from several incoming Row flows.
Basic settings File Name Name and path to the file to be created and/or the variable to
be used.

Row Separator String (ex: “\n”on Unix) to distinguish rows.
Field Separator Character, string or regular expression to separate fields.
Use Multi Field Select this check box to set a different field separator for each
Separators of the schemas using the Field separator field in the Schemas
area.
Schemas The table gets automatically populated by schemas coming
from the various incoming rows connected to
tFileOutputMSDelimited. Fill out the dependency between
the various schemas:
Parent row: Type in the parent flow name (based on the Row
name transferring the data).
Parent key column: Type in the key column of the parent row.
Key column: Type in the key column for the selected row.
Advanced settings Advanced separator (for Select this check box to modify the separators used for
numbers) numbers:
Thousands separator: define separators for thousands.
CSV options Select this check box to take into account all parameters
specific to CSV files, in particular Escape char and Text
enclosure parameters.
Create directory if not This check box is selected by default. It creates the directory
exists that holds the output delimited file, if it does not already exist.
Encoding type Select the encoding from the list or select Custom and define
it manually. This field is compulsory for DB data handling.
Don’t generate empty file Select this check box if you do not want to generate empty
files.
Usage Use this component to write a multi-schema delimited file and separate fields using a field
separator value.

File components
Related scenarios

File components
tFileOutputMSPositional
tFileOutputMSPositional
tFileOutputMSPositional properties
Function tFileOutputMSPositional writes multiple schemas in a positional file.

Purpose tFileOutputMSPositional creates a complex multi-structured file, using data structures
(schemas) coming from several incoming Row flows.
Basic settings File Name Name and path to the file to be created and/or variable to be
used.

Schemas The table gets automatically populated by schemas coming
from the various incoming rows connected to
tFileOutputMSPositional. Fill out the dependency between
the various schemas:
Parent row: Type in the parent flow name (based on the Row
name transferring the data).
Parent key column: Type in the key column of the parent row
Key column: Type in the key column for the selected row.
Pattern: Type in the pattern that positions the fields separator
for each incoming row.
Padding char: type in the padding character to be used
Alignment: Select the relevant alignment parameter
Usage Use this component to write a multi-schema positional file and separate fields using a
position separator value.
Related scenario

File components
tFileOutputMSXML
tFileOutputMSXML
tFileOutputMSXML Properties
Function tFileOutputMSXML writes multiple schema within an XML structured file.

Purpose tFileOutputMSXML creates a complex multi-structured XML file, using
data structures (schemas) coming from several incoming Row flows.
Basic settings File Name Name and path to the file to be created and or the
variable to be used.

Guide.
Configure XML tree Opens the dedicated interface to help you set the
XML mapping. For details about the
interfaceDefining the MultiSchema XML tree on
page 901.
Limitation n/a
Defining the MultiSchema XML tree

Double-click on the tFileOutputMSXML component to open the dedicated interface or click
on the three-dot button on the Basic settings vertical tab of the Component tab.

File components
tFileOutputMSXML
To the left of the mapping interface, under Linker source, the drop-down list includes all the
input schemas that should be added to the multi-schema output XML file (on the condition that
more than one input flow is connected to the tFileOutputMSXML component).
And under Schema List, are listed all columns retrieved from the input data flow in selection.
To the right of the interface, are expected all XML structures you want to create in the output
XML file.
You can create manually or easily import the XML structures. Then map the input schema
columns onto each element of the XML tree, respectively for each of the input schemas in
selection under Linker source.
Importing the XML tree

The easiest and most common way to fill out the XML tree panel, is to import a well-formed
XML file.
• Rename the root tag that displays by default on the XML tree panel, by clicking on
it once.
• Right-click on the root tag to display the contextual menu.
• On the menu, select Import XML tree.
• Browse to the file to import and click OK.

File components
tFileOutputMSXML
The XML Tree column is hence automatically filled out with the correct elements. You can
remove and insert elements or sub-elements from and to the tree:
• Select the relevant element of the tree.
• Right-click to display the contextual menu
• Select Delete to remove the selection from the tree or select the relevant option
among: Add sub-element, Add attribute, Add namespace to enrich the tree.

File components
tFileOutputMSXML
Creating manually the XML tree

If you don’t have any XML structure already defined, you can manually create it.
it once.
• On the menu, select Add sub-element to create the first element of the structure.
You can also add an attribute or a child element to any element of the tree or remove any
element from the tree.
• Select the relevant element on the tree you just created.
• Right-click to the left of the element name to display the contextual menu.
• On the menu, select the relevant option among: Add sub-element, Add attribute,
Add namespace or Delete.
Mapping XML data from multiple schema sources

Once your XML tree is ready, select the first input schema that you want to map.
You can map each input column with the relevant XML tree element or sub-element to fill out
the Related Column:
• Click on one of the Schema column name.
• Drag it onto the relevant sub-element to the right.
• Release the mouse button to implement the actual mapping.
A light blue link displays that illustrates this mapping. If available, use the Auto-Map button,
located to the bottom left of the interface, to carry out this operation automatically.
You can disconnect any mapping on any element of the XML tree:
• Select the element of the XML tree, that should be disconnected from its respective
schema column.

File components
tFileOutputMSXML
• Select Disconnect link.

The light blue link disappears.
Defining the node status

Defining the XML tree and mapping the data is not sufficient. You also need to define the loop
elements for each of the source in selection and if required the group element.
Loop element
The loop element allows you to define the iterating object. Generally the Loop element is
also the row generator.
To define an element as loop element:
• Select the relevant element on the XML tree.
• Select Set as Loop Element.
The Node Status column shows the newly added status.
There can only be one loop element at a time.
Group element
The group element is optional, it represents a constant element where the Groupby
operation can be performed. A group element can be defined on the condition that a loop
element was defined before.
When using a group element, the rows should be sorted, in order to be able to group by the
selected node.
To define an element as group element:

File components
tFileOutputMSXML

• Select Set as Group Element.
The Node Status column shows the newly added status and any group status required are
automatically defined, if needed.
Click OK once the mapping is complete to validate the definition for this source and
perform the same operation for the other input flow sources.
Related scenario

File components
tFileOutputPositional
tFileOutputPositional Properties
Component Family File/Output
Function tFileOutputPositional writes a file row by row according to the length and the format of
the f ields or columns in a row.
Purpose It writes a file row by row, according to the data structure (schema) coming from the input
flow.
stored. The fields to follow are pre-defined using fetched data.
File Name Name or path to the file to be processed and or the variable to
be used.
Related topic: Defining variables from the Component view of

Schema and Edit Schema A schema is a row description, i.e., it defines the number of
fields that will be processed and passed on to the next
Repository.
Built-in: You create and store the schema locally for this
component only. Related topic: Setting a built-in schema of
Repository: You have already created the schema and stored
it in the Repository. You can reuse it in various projects and job
designs. Related topic: Setting a Repository schema of
Row separator String (ex: “\n”on Unix) to distinguish rows in the output file.
Append Select this check box to add the new rows at the end of the file.
Include header Select this check box to include the column header to the file.
Compress as zip file Select this check box to compress the output file in zip format.
Formats Customize the positional file data format and fill in the
columns in the Formats table.
Column: Select the column you want to customize.
Padding char: Type in between quotes the padding characters
used. A space by default.
Alignment: Select the appropriate alignment parameter.
Keep: If the data in the column or in the field are too long,
select the part you want to keep.
Advanced settings Advanced separator(for Select this check box to modify the separators used for
numbers) numbers.

File components
Create directory if not This check box is selected by default. It creates a directory to
exists hold the output table if it does not exist.
Custom the flush buffer Select this check box to define the number of lines to write
size before emptying the buffer.
Row Number: set the number of lines to write.
handling.
Don’t generate empty file Select this check box if you do not want to generate empty
files.
Usage Use this component to read a file and separate the fields using the specified separator.
Related scenario
For a related scenario, see: Scenario: From Positional to XML file on page 865.

File components
tFileOutputProperties
tFileOutputProperties
tFileOutputProperties properties
Function tFileInputProperties writes a configuration file of the type .ini or .properties.

Purpose tFileInputProperties writes a configuration file containing text data organized
according to the model key = value.
component.
The schema is either built-in or remotely stored in the
Repository but for this component, the schema is read-only. It
is made of two column, Key and Value, corresponding to the
parameter name and the parameter value to be copied.
File format Select from the list file format: either .properties or .ini.
.properties: data in the configuration file is written in two
lines and structured according to the following way: key =
value.
.ini: data in the configuration file is written in two lines and
structured according to the following way: key = value and
re-grouped in sections.
Section Name: enter the section name on which the iteration
is based.
File Name Name or path to the file to be processed and/or the variable to
be used.
Related topic: Defining variables from the Component view of

Usage Use this component to write files where data is organized according to the structure
key = value.
Related scenarios
For a related scenario, see Scenario: Reading and matching the keys and the values of different
.properties files and outputting the results in a glossary on page 869 of the tFileInputProperties
component.

File components
tFileOutputXML
tFileOutputXML
tFileOtputXML belongs to two component families: File and XML. For more information on
tFileOutputXML, see tFileOutputXML on page 1248.

File components
tFileProperties
tFileProperties
tFileProperties Properties
Function tFileProperties creates a single row flow that displays the properties of the
processed file.
Purpose tFileProperties obtains information about the main properties of a defined
file.
Basic settings Schema Type A schema is a row description, i.e., it defines the
number of fields that will be processed and passed on
Repository: You have already created the schema
and stored it in the Repository. You can reuse it in
User Guide.
Edit schema The number of the read-only lines is different
between Java and Perl.
File Name or path to the file to be processed. Related
topic: Defining variables from the Component view
Calculate MD5 Hash Select this check box to check the MD5 of the
downloaded file.
Limitation n/a
Scenario: Displaying the properties of a processed file
This Java scenario describes a very simple Job that displays the properties of the specified file.
• Drop a tFileProperties component and a tLogRow component from the Palette onto the
design workspace.
• Right-click on tFileProperties and connect it to tLogRow using a Main Row link.

File components
tFileProperties
• In the design workspace, select tFileProperties.

• Click the Component tab to define the basic settings of tFileProperties.
• Set Schema Type to Built in..

• If desired, click the Edit schema button to see the read-only columns.
• In the File field, enter the file path or browse to the file you want to display the properties for.
The properties of the defined file are displayed on the console.

File components
tFileRowCount
tFileRowCount
tFileRowCount properties
Function tFileRowCount counts the number of rows in a file.

Purpose tFileRowCount opens a file and reads it row by row in order to determine the number
of rows inside.
Basic settings File Name Name and path of the file to be processed and/or the variable
to be used.
See also: Defining variables from the Component view in the

Row separator String (ex: “\n”on Unix) to distinguish rows in the output
file.
Ignore empty rows Select this checkbox to ignore the empty rows while the
component is counting the rows in the file.
handling.
Advanced settings tStatCatcher Statistics Select this check box to gather the processing metadata at
Usage Le tFileRowCount est un composant standalone, et doit être utilisé avec un lien de
type OnSubjobOk vers un tJava.
Related scenario

File components
tFileTouch
tFileTouch
tFileTouch properties
Function tFileTouch creates an empty file.

Purpose This component creates an empty file, and creates the destination directory if it does
not exist.
Basic settings File Name Path and name of the file to be created and/or the variable to
be used.
Related topic: Defining variables from the Component view

Create directory if not This check box is selected by default. It creates a directory
exists to hold the output table if it does not exist.
Usage This component can be used as a standalone component.
Related scenario

File components
tFileUnarchive
tFileUnarchive
tFileUnarchive Properties
Function Decompresses the archive file provided as parameter and put it in the
extraction directory.
Purpose Unarchives a file of any format (zip, rar...) that is mostlikely to be
processed.
Basic settings Archive file File path to the archive
Extract Directory Folder where the unarchived file is put
Java only Use archive name as Select this check box to reproduce the whole path
root directory / to the file or if none exists create a new folder
Extract file paths
Perl only Use Command line Select this check box to use another unarchiving
tools tool than the one provided by default in the Perl
package.
Usage This component can be used as a standalone component but it can also
be used within a Job as a Start component using an Iterate link.
Limitation n/a
Related scenario
For tFileUnarchive related scenario, see tFileCompare on page 809.

File components
tPivotToColumnsDelimited
tPivotToColumnsDelimited Properties
Function tPivotToColumnsDelimited outputs data based on an aggregation operation

carried out on a pivot column.
Purpose tPivotToColumnsDelimited is used to fine-tune the selection of data to
output
Basic settings Pivot column Select the column from the incoming flow that will
be used as pivot for the aggregation operation.
Aggregation column Select the column from the incoming flow that
contains the data to be aggregated.
Aggregation function Select the function to be used in case several values
are available for the pivot column.
Group by Define the aggregation sets, the values of which will
be used for calculations.
Input Column: Match the input column label with
your output columns, in case the output label of the
aggregation set needs to be different.
File Name Name or path to the output file and/or the variable to
be used.

Guide.
fields of the output file.
Row separator String (ex: “\n”on Unix) to distinguish rows in the
output file.
Usage This component requires an input flow.
Limitation n/a
Scenario: Using a pivot column to aggregate data
The following scenario describes a Job that aggregates data from a delimited input file, using a
defined pivot column.

File components
• Drop the following component from the Palette to the design workspace:
tFileInputDelimited, tPivotToColumnsDelimited.
• The file to use as input file is made of 3 columns, including: ID, Question and the
corresponding Answer
• On your design workspace, select the tFileInputDelimited component

• Define the basic settings, on the Component view.
• Browse to the input file to fill out the File Name field.
• Define the Row and Field separators, in this example, respectively: carriage return and
semi-colon
• As the file contains a header line, define it also.
• Set the schema describing the three columns: ID, Questions, Answers.
• Then select the tPivotToColumnsDelimited and set its properties on the Basic Settings tab
of the Component view.

File components
• In the Pivot column field, select the pivot column from the input schema. this is often the
column presenting most duplicates (pivot aggregation values).
• In the Aggregation column field, select the column from the input schema that should gets
aggregated.
• In the Aggregation function field, select the function to be used in case duplicates are found
out.
• In the Group by table, add an Input column, that will be used to group by the aggregation
column.
• In the File Name field, browse to the output file path. And on the Row and Field separator
fields, set the separators for the aggregated output rows and data.
Then, press F6 to execute the Job. The output file shows the newly aggregated data.

Internet components
This chapter details the main components which belong to the Internet family in the Talend Open
Studio Palette.
The Internet family comprises all of the components which help you to access information via the
Internet, through various means including Web services, RSS flows, SCP, MOM, Emails, FTP etc.
Internet components
tFileFetch
tFileFetch
tFileFetch properties
Component family Internet
Function tFileFetch retrieves a file via a defined protocol

Purpose tFileFetch allows you to retrieve file data according to the protocol
which is in place.
Basic settings Protocol Select the protocol you want to use from the list
and fill in the corresponding fields: http, https,
ftp, smb.
The properties differ slightly depending on

the type of protocol selected. The addition-
al fields are defined in this table, after the
basic settings.
URI Type in the URI of the site from which the file is
to be fetched.
Use cache to save Select this check box to save the data in the cache.
resource
This option allows you to process the file
data flow (in streaming mode) without
saving it on your drive. This is faster and
improves performance.
smb Domain Enter the Microsoft server domain name
smb Username and Enter the authentication information required to
Password access the server.
Destination Browse to the destination folder where the file
Directory fetched is to be placed.
Destination Enter a new name for the file fetched.
Filename
http, https, ftp Create full path This check box is selected by default. It allows
according to URI you to reproduce the URI directory path. To save
the file at the root of your destination directory,
clear the check box.
http, https POST method This check box is selected by default. It allows
you to use the POST method. In the Parameters
table, enter the name of the variable(s) in the
Name field and the corresponding value in the
Value field.
Clear the check box if you want to use the GET
method.
http, https, ftp Die on error Clear this check box to skip the rows in error and
to complete the process for the error free rows
Statistics each component level.

Internet components
tFileFetch
http, https Timeout Enter the number of seconds after which the
protocol connection should close.
http, https Print response to Select this check box to print the server response
console in the console.
http, https Upload file Select this check box to uload one or more files to
the server. In the Name field, enter the name of
the file you want to upload and in the File field,
indicate the path.
http, https, ftp Enable proxy server Select this check box if you are connecting via a
proxy and complete the fields which follow with
the relevant information.
http, https Enable NTLM Select this check box if you are using an NTLM
Credentials authentication protocol.
Domain: The client domain name.
Host: The client’s IP address
http, https Need authentication Select this check box and enter the username and
password in the relevant fields, if they are
required to access the protocol.
Usage This component is generally used as a start component to feed the input
flow of a Job and is often connected to the Job using an OnSubJobOk
or OnComponentOk link, depending on the context.
Limitation n/a
Related scenario 1: Fetching data through HTTP
This scenario describes a three-component Job which retrieves data from an HTTP website and
select data that will be stored in a delimited file.
• Drop a tFileFetch, a tFileInputRegex and a tFileOutputDelimited onto your design

workspace and connect them as preceding.
• In the tFileFetch Basic settings panel, select the protocol you want to use from the list. Here,
we use the HTTP protocol.
• Type in the URI where the file to be fetched can be retrieved from.
• In the Destination directory field, browse to the folder where the fetched file is to be stored.

Internet components
tFileFetch
• In the Filename field, type in a new name for the file if you want it to be changed. In this
example, filefetch.txt.
• Select the tFileInputRegex, set the File name so that it corresponds to the file fetched
earlier.
• Using a regular expression, in the Regex field, select the relevant data from the fetched file.
In this example: <td(?: class="leftalign")?> \s* (t\w+) \s* </td>
Ensure that you use the correct Regex syntax, depending on the generation language in use as the syntax
is different in Java and Perl. Enter the Regex between double or single quotes accordingly.
• Define the header, footer and limit if need be. In this case, we’ll ignore these fields.
• Define also the schema describing the flow to be passed on to the final output.
• The schema should be automatically propagated to the final output, but to be sure, check the
schema in the Basic settings panel of the tFileOutputDelimited component.
• Then press F6 to run the Job.
Related scenario 2: Reading the data from a remote file in streaming

mode
For an example of transferring data in streaming mode, see Related scenario 2: Reading the data
from a remote file in streaming mode on page 922

Internet components
tFileInputJSON
tFileInputJSON
tFileInputJSON belongs to two different component families: Internet and File. For further
information, see tFileInputJSON on page 923.

Internet components
tFTPConnection
tFTPConnection
tFTPConnection properties
Component family Internet/FTP
Function tFTPConnection opens an FTP connection in order that a transaction

may be carried out.
Purpose tFTPConnection allows you to open an FTP connection to transfer files
in a single transaction.
Basic settings Host The FTP server IP address.
Port The FTP server listening port number.
Username and FTP user authentication data.
Password
Connect mode Select the mode: Active or Passive
Usage This component is typically used as a single-component sub-job. It is
used along with other FTP components.
Limitation n/a
Related scenarios
For a related scenario, see Scenario: Putting files on a remote FTP server on page 937.
For a related scenario, see Scenario: Iterating on a remote directory on page 929.
For a related scenario using a different protocol, see Scenario: Getting files from a remote SCP
server on page 970.

Internet components
tFTPDelete
tFTPDelete
tFTPDelete properties
Function This component deletes specified files via an FTP connection.

Purpose tFTPDelete deletes files on a remote FTP server.
Basic settings Host FTP IP address
Username and FTP user authentication data.
Password
Remote directory Source directory where the files to be deleted are
located.
Files File name or path to the files to be deleted.
Usage This component is typically used as a single-component sub-job but can
Limitation n/a
Related scenario
For tFTPDelete related scenario, see Scenario: Putting files on a remote FTP server on page 937.
For tFTPDelete related scenario using a different protocol, see Scenario: Getting files from a
remote SCP server on page 970.

Internet components
tFTPFileExist
tFTPFileExist
tFTPFileExist properties
Function tFTPFileExist checks if a file exists on an FTP server.

Purpose tFTPFileExist allows you to check if a file exists on an FTP server.
Basic settings Use an existing Select this check box and in the Component List
connection/Compon click the relevant connection component to reuse
ent List the connection details you already defined.

connection from the other level, you can use Dy-
namic settings to share the intended connection.
In this case, make sure that the connection name
is unique and distinctive all over through the two
Job levels. For more information about Dynamic
settings, see your studio user guide.
Host FTP IP address.
Password (or
Private key)
Remote directory Path to the remote directory.
File Name Name of the file you want to check exists.
SFTPSupport/ Select this check box and then in the
Authentication Authentication method list, select the SFTP
method authentication method:
Password: Type in the password required in the
relevant field.
Public key: Type in the private key or click the
three dot button next to the Private key field to
browse to it.
If you select Public Key as the SFTP au-
thentication method, make sure that the key
is added to the agent or that no passphrase (secret
phrase) is required.
Connection Mode Select the SFTP connection mode you want to
use:
Active: You determine the connection port to use
to allow data transfer.
Passive: the FTP server determines the
connection port to use to allow data transfer.

Internet components
tFTPFileExist
Encoding Type Select an encoding type from the list, or select

Advanced settings Use Socks Proxy Select this check box if you want to use a proxy.
Then, set the Host, Port, User and Password
proxy fields.
Ignore Failure At Select this check box to ignore library closing
Quit (FTP) errors or FTP closing errors.
component level.
also be used with other components.
Related scenario
For tFTPFileExist related scenario, see Scenario: Putting files on a remote FTP server on page 937.
For tFTPFileExist related scenario using a different protocol, see Scenario: Getting files from a

Internet components
tFTPFileList
tFTPFileList
tFTPFileList properties
Function tFTPFileList iterates on files and/or folders of a given directory on a

remote host.
Objective tFTPFileList retrieves files and /or folders based on a defined filemask
pattern and iterates on each of them by connecting to a remote directory
via an FTP protocol.

For more information about Dynamic settings,
Port Listening port number of the FTP server.
Password (or
Private key)
File detail Select this check box if you want to display the
details of each of the files or folders on the remote
host. These informative details include:
type of rights on the file/folder, name of the
author, name of the group of users that have a
read-write rights, file size and date of last
modification.
relevant field.
browse to it.

Internet components
tFTPFileList
use as filters:
Connect Mode Select the SFTP connection mode you want to
use:
Active: You determine the connection port to be
used to allow data transfer.
Scenario: Iterating on a remote directory
The following Java scenario describes a three-component Job that connects to an FTP server, lists
files held in a remote directory based on a filemask and finally recuperates and saves the files in a
defined local directory.
• Drop the following components from the Palette to the design workspace: tFTPConnection,
tFTPFileList and tFTPGet.
• Link tFTPConnection to tFTPFileList using an OnSubjobOk connection and then

tFTPFileList to tFTPGet using an Iterate connection.
• Double-click tFTPConnection to display its Basic settings view and define the component
properties.
• In the Host field, enter the IP address of the FTP server.

• In the Port field, enter the listening port number.
• In the Username and Password fields, enter your authentication information for the FTP
server.

Internet components
tFTPFileList
• In the Connect Mode list, select the FTP connection mode you want to use, Passive in this
example.
• Double-click tFTPFileList to open its Basic settings view and define the component
properties.
• Select the Use an existing connection check box and in the Component list, click the
relevant FTP connection component, tFTPConnection_1 in this scenario. Connection
information are automatically filled in.
• In the Remote directory field, enter the relative path of the directory that holds the files to
be listed.
• In the Filemask field, click the plus button to add one line and then define a file mask to filter
the data to be retrieved. You can use special characters if need be. In this example, we want
only to recuperate delimited files (*csv).
• In the Connect Mode list, select the FTP server connection mode you want to use, Active
in this example.
• Double-click tFTPGet to display its Basic settings view and define the components
properties.

Internet components
tFTPFileList
• Select the Use an existing connection check box and in the Component list, click the
relevant FTP connection component, tFTPConnection_1 in this scenario. Connection
information are automatically filled in.
• In the Local directory field, enter the relative path for the output local directory where you
want to write the recuperated files.
• In the Remote directory field, enter the relative path of the remote directory that holds the
file to be recuperated.
• In the Transfer Mode list, select the FTP transfer mode you want to use, ascii in this
example.
• In the Overwrite file field, select an option for you want to use for the transferred files.
• In the Files area, click the plus button to add a line in the Filemask list, then click in the
added line and pressCtrl+Space to access the variable list. In the list, select the global
variable
((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) to
process all files in the remote directory.
• In the Connect Mode list, select the connection mode to the FTP server you want to use.

Internet components
tFTPFileList
All .csv files held in the remote directory on the FTP server are listed in the defined directory, as
defined in the filemask. Then the files are retrieved and saved in the defined local output directory.

Internet components
tFTPFileProperties
tFTPFileProperties
tFTPFileProperties Properties
Function tFTPFileProperties iterates on files and/or folders of a given directory on a

remote host.
Purpose tFTPFileProperties retrieves files and /or folders based on a defined
filemask pattern and iterates on each of them by connecting to a remote
directory via an FTP protocol.
Basic settings Schema type A schema is a row description, i.e., it defines the
number of fields that will be processed and passed on
User Guide.
Edit schema The number of the read-only lines is different
between Java and Perl.
Host FTP IP address
Username FTP user name.
Password FTP password.
Remote directory Path to the source directory where the files can be
fetched.
File Name or path to the file to be processed. Related
topic: Defining variables from the Component view

Internet components
tFTPFileProperties
SFTP Support and Select this check box and then in the Authentication
Authentication method list, select the SFTP authentication method:
method Password: Type in the password required in the
relevant field.
Public key: Type in the private key or click the three
dot button next to the Private key field to browse to
it.
If you select Public Key as the SFTP authenti-
cation method, make sure that the key is added
to the agent or that no passphrase (secret phrase) is
required.
If you do not select the check box, choose the

connection mode you want to use:
Active: You determine the connection port to use to
allow data transfer.
Passive: the FTP server determines the connection
port to use to allow data transfer.
Encoding Type Select an encoding type from the list, or select
Calculate MD5 Hash Select this check box to check the of the downloaded
file’s MD5.
Then, set the Host, Port, User and Password proxy
fields.
Ignore Failure At Quit Select this check box to ignore library closing errors
(FTP) or FTP closing errors.
level.
Limitation n/a
Related scenario
For a related scenario, see Scenario: Displaying the properties of a processed file on page 911

Internet components
tFTPGet
tFTPGet
tFTPGet properties
Function This component retrieves specified files via an FTP connection.

Purpose tFTPGet retrieves selected files from a defined remote FTP directory
and copy them into a local directory.
Basic settings Host FTP IP address
Local directory Path to destination location of the file.
Remote directory Path to source directory where the files can be
fetched.
Transfer mode Different FTP transfer modes.
Overwrite file List of available options for the transferred file.
Files File names or paths to the files to be transferred.
Die on error This check box is selected by default. Clear the
check box to skip the row on error and complete
the process for error-free rows.
component level.
also be used as output or end object.
Limitation n/a
Related scenario
For an tFTPGet related scenario, see Scenario: Putting files on a remote FTP server on page 937.
For an tFTPGet related scenario, see Scenario: Iterating on a remote directory on page 929.
For an tFTPGet related scenario using a different protocol, see Scenario: Getting files from a

Internet components
tFTPPut
tFTPPut
tFTPPut properties
Function This component copies selected files via an FTP connection.

Purpose tFTPPut copies selected files from a defined local directory to a
destination remote FTP directory.
Basic settings Use an existing A connection needs to be open to allow the loop
connection/Compon to check for FTP data on the defined DB.
ent List
Port FTP server listening port number.
Local directory Path to the source location of the file(s).
Remote directory Path to the destination directory of the file(s).
Transfer mode Different FTP transfer modes.
Overwrite file or List of available options for the transferred file
Append
SFTP Support and SFTP Support: select this check box to set an
Authentication authentication method.
method Authentication method: select the
authentication method from the list.
Files Click the [+] button to add a new line, then fill in
the columns.
Filemask: file names or path to the files to be
transferred.
New name: name to give the FTP file after the
transfer.
component level.

Internet components
tFTPPut

also be used as output component.
Limitation n/a
Scenario: Putting files on a remote FTP server
This two-component Job allows you to open a connection to a remote FTP server in order to put
specific files on the remote server in one transaction.
• Drop tFTPConnection and tFTPPut from the Palette onto the design workspace.
tFTPConnection allows you to perform all operations in one transaction.
• Connect the two components together using an OnSubJobOK link.
• Double-click tFTPConnection to display its Basic settings view and define its properties.
• In the Host field, enter the server IP address.

• In the Port field, enter the listening port number.
• In the Username and Password fields, enter your login and password for the remote server.
• From the Connect Mode list, select the FTP connection mode you want to use, Active in this
example.

Internet components
tFTPPut
• In the design workspace, double-click tFTPPut to display its Basic settings view and define
its properties.
• Select the Use an existing connection check box and then select tFTPConnection_1 from
the Component List. The connection information is automatically filled in.
• In the Local directory field, enter the path to the local directory containing the files, if all
your files are in the same directory. If the files are in different directories, enter the path for
each file in the Filemask column of the Files table.
• In the Remote directory field, enter the path to the destination directory on the remote
server.
• From the Transfer mode list, select the transfer mode to be used.
• From the Overwrite file list, select an option for the transferred file.
• In the Files table, click twice the plus button to add two lines to the Filemask column and
then fill in the filemasks of all files to be copied onto the remote directory.
• Save you Job and click F6 to execute it.
The files specified in the Filemask column are copied to the remote server.

Internet components
tFTPRename
tFTPRename
tFTPRename Properties
Component Family Internet/FTP
Function tFTPRename renames the selected files via an FTP connection.

Purpose tFTPRename renames files selected from a local directory towards a
distant FTP directory.

Port FTP server listening port number.
Username Connection login to the FTP server.
Password Connection password to the FTP server.
Overwrite file List of available options for the transferred file.
Append: Select this check box to write the data at
the end of the record, to not delete it.
SFTP Support et SFTP Support: select this check box to set a
Authentication authentication method.
method Authentication method: select the method in the
list box.
browse to it. If you select Public Key as the SFTP
authentication method, make sure that the key is
added to the agent or that no passphrase (secret
Files Click the [+] button to add the lines you want to
use as filters:
New name: name to give to the FTP file after the
transfer.

Internet components
tFTPRename

use:
Encoding type Select an encoding type from the list, or select
check box to skip the row in error and complete
proxy fields.
component level.
Usage This component is generally used as a subjob with one component, but
it can also be used as an output or end component..
Limitation n/a
Related scenario
For a related scenario, see Scenario: Putting files on a remote FTP server on page 937 .

Internet components
tFTPTruncate
tFTPTruncate
tFTPTruncate properties
Function tFTPTruncate truncates the selected files via an FTP connection.

Objective tFTPTruncate truncates the selected files of a defined local directory
via a distant FTP directory.

Password (or
Private key)
relevant field.
browse to it.
use as filters:
use:

Internet components
tFTPTruncate

proxy fields.
component level.
Related scenario

Internet components
tJMSInput
tJMSInput
tJMSInput properties
Component Family Internet
Function tJMSInput creates an interface between a Java application and a

Message-Oriented middle ware system.
Purpose Using a JMS server, tJMSInput makes it possible to have loosely coupled,
reliable, and asynchronous communication between different components in
a distributed application.
Basic settings Module List Select the library to be used from the list.
Context Provider Type in the context URL, for example
"com.tibco.tibjms.naming.TibjmsInitialContextFact
ory". However, be careful, the syntax can vary
according to the JMS server used.
Server URL Type in the server URL, respecting the syntax, for
example "tibjmsnaming://localhost:7222".
Connection Factory Type in the JDNI name.
JDNI Name
Use Specified User If you have to log in, select the check box and type in
Identity your login and password.
Message Type Select the message type, either: Topic or Queue.
Message From Type in the message source, exactly as expected by
the server; this must include the type and name of the
source. e.g.: queue/A or topic/testtopic
Note that the field is case-sensitive.
Timeout for Next Type in the number of seconds before passing to the
Message (in sec) next message.
Maximum Messages Type in the maximum number of messages to be
processed.
Message Selector Set your filter.
Expression
Processing Mode Select the processing mode for the messages.
Raw Message or Message Content
Schema type and Edit A schema is a row description, i.e., it defines the
to the next component.
the tJMSInput schema is read-only. It is made of
only one column: Message
level.
Usage This component is generally used as an input component. It must be linked to
an output component.

Internet components
tJMSInput
Limitation Make sure the JMS server is launched.
Related scenario
For a related scenario, see Scenario: asynchronous communication via a MOM server on page 947.

Internet components
tJMSOutput
tJMSOutput
tJMSOutput properties
Component Family Internet
Function tJMSOutput creates an interface between a Java application and a

Message-Oriented middle ware system.
Purpose Using a JMS server, tJMSOutput makes it possible to have loosely coupled,
reliable, and asynchronous communication between different components in
a distributed application.
Basic settings Module List Select the library to be used from the list.
Context Provider Type in the context URL, for example
"com.tibco.tibjms.naming.TibjmsInitialContextFact
ory". However, be careful, the syntax can vary
according to the JMS server used.
Server URL Type in the server URL, respecting the syntax, for
example "tibjmsnaming://localhost:7222".
Connection Factory Type in the JDNI name.
JDNI Name
Use Specified User If you have to log in, select the check box and type in
Identity your login and password.
Message Type Select the message type, either: topic or queue.
To Type in the message target, as expected by the server.
Processing Mode Select the processing mode for the messages.
Raw Message or Message Content
the tJMSOutput schema is read-only. It is made of
one column: Message
level.
Usage This component is generally used as an output component. It must be linked
to an input component.
Limitation Make sure the JMS server is launched.
Related scenario
For a related scenario, see Scenario: asynchronous communication via a MOM server on page 947.

Internet components
tMomInput
tMomInput
tMomInput Properties
Function Fetches a message from a queue on a Message-Oriented middle ware system

and passes it on to the next component.
Purpose tMomInput makes it possible to set up asynchronous communications via a
MOM server.
Basic settings MQ Server Select the MOM server to be used from the list.
According to the server selected, the parameters
required differ slightly.
Host/Port Fill in the Host name or IP address of the MOM
server and Port.
In the context of tMomInput usage, the schema is
read-only. It is made of two columns: From and
Message
JBoss Messaging Keep Select this check box to keep the MOM server
listening/Sleeping listening for and fetching new messages. Set the
time verification frequency in seconds.
Message From Type in the message source, exactly as expected by
the server; this must include the type and name of the
source. e.g.: queue/A or topic/testtopic
Message Body Type Select the message type, either: Text , Bytes.or Map
Websphere Channel The default value is Channel
Queue Manager Fill in the server driver details
Message Queue Source of the message
Is using message id to Select this check box to fetch messages according to
fetch their IDs.
Commit (delete Select this check box to force a commit after reading
message after reading each message from the queue.
from the queue)
level.
Usage This component is generally used as a start component. It must be linked to
an output component.
Limitation Make sure the relevant JBoss or Websphere server is launched.

Internet components
tMomInput
Scenario: asynchronous communication via a MOM server
This scenario is made of two Jobs. The first Job posts messages on a JBoss server queue and the
second Job fetches the message from the server.
In the first Job, a string message is created using a tRowGenerator and put on a JBoss server using
a tMomOutput. An intermediary tLogRow component displays the flow being passed.
• Drop the three components required for the first Job from the Palette onto the design
workspace and right-click to connect them using a Main row link.
• Double-click on tRowGenerator to set the schema to be randomly generated.
• Set just one column called message. This is the message to be put on the MOM queue.
• This column is of String type and is nullable. To produce the data, use a preset function
which concatenates randomly chosen ascii characters to form a 6-char string. This function
is getAsciiRandomString. (Java version). Click the Preview button to view a random
sample of data generated.
• Set the Number of rows to be generated to 10.
• Click OK to validate.
• The tLogRow is only used to display a intermediary state of the data to be handled. In this
example, it doesn’t require any specific configuration.
• Then select the tMomOutput component.
• In this case, the MQ server to be used is JBoss.

• In the Host and Port fields, fill in the relevant connection information.

Internet components
tMomInput
• Select the Message type from the list. The message can be of Queue or Topic type. In this
example, select the Queue type from the list.
• In the To field, type in the message source information strictly respecting the syntax
expected by the server. This should match the Message Type you selected, such as:
queue/A.
The message name is case-sensitive, therefore queue/A and Queue/A are different.
• Then click Sync Columns to pass on the schema from the preceding component. The
schema being read-only, it cannot be changed. The data posted onto the MQ comes from the
first schema column encountered.
• Press F6 to execute the Job and view the data flow being passed on in the console, thanks to
the tLogRow component.
Then set the second Job in order to fetch the queuing messages from the MOM server.
• Drop the tMomInput component and a tLogRow from the Palette to the design workspace.
• Select the tMomInput to set the parameters.
• Select the MQ server from the list. In this example, a JBoss messaging server is used.
• Set the server Host and Port information.
• Set the Message From and the Message Type to match the source and type expected by the
messaging server.
• The Schema is read-only and is made of two columns: From and Message.
• Select the Keep listening check box and set the verification frequency to 5 seconds.
When using the Keep Listening option, you’ll need to kill the Job to end it.

Internet components
tMomInput
• No need to change any default setting from the tLogRow.

• Save the Job and run it (when launching for the first time or if you killed it on a previous run).
The messages fetched on the server are displayed on the console.

Internet components
tMomMessageIdList
tMomMessageIdList
tMomMessageIdList Properties
Function tMomMessageIdList fetches a message ID list from a queue on a

Message-Oriented middleware system and passes it to the next component.
Purpose tMomMessageIdList makes it possible to iterate on certain message IDs. It
is usually used with tMomInput, for more information, see tMomInput
Properties on page 946.
Host/Port Fill in the Host name or IP address of the MOM
server and Port.
Websphere Channel Channel on the queue.
Queue Manager Fill in the server driver details.
Message Queue Source of the message.
Usage This component is generally used as an input component.
Limitation Make sure the relevant Websphere server is launched.
Related scenario
For a related scenario, see tMomInput on page 946.

Internet components
tMomOutput
tMomOutput
tMomOutput Properties
Function Adds a message to a Message-Oriented middleware system queue in order for

it to be fetched asynchronously.
Purpose tMomOutput makes it possible to set up asynchronous communications via
a MOM server.
Host/Port Fill in the MOM server and Port Host name or IP
address.
In the context of tMomOutput usage, the schema is
read-only but will change according to the incoming
schema. Only one-column schema is expected by the
server to contain the Messages
JBoss Messaging To Type in the message destination, respecting the
syntax required by the server; this must include the
type and name of the target folder. e.g.: queue/A or
topic/testtopic
Message Type Select the message type, either: topic or queue.
Websphere Channel Value by default is Channel
Queue Manager Fill in the server driver details
Message Queue Destination of the message
Is using message id to Select this check box to set messages according to
set their ids
Usage This component must be linked to an input or intermediary component.
Limitation Make sure the relevant JBoss or Websphere server is launched.
Related scenario
For a related scenario, see tMomInput on page 946.

Internet components
tPOP
tPOP
tPOP properties
Function The tPOP component fetches one or more email messages from a server
using the POP3 or IMAP protocol.
Purpose The tPOP component uses the POP or IMAP protocol to connect to a specific
email server. Then it fetches one or more email messages and writes the
recovered information in specified files. Parameters in the Advanced settings
view allows you to use filters on your selection.
Basic settings Host IP address of the email server you want to connect to.
Java-only field Port Port number of the email server.
Username and User authentication data for the email server.

Password Username: enter the username you use to access
your email box.
Password: enter the password you use to access your
email box.
Output directory Enter the path to the file in which you want to store
the email messages you retrieve from the email
server, or click the three-dot button next to the field
to browse to the file.
Filename pattern Define the syntax of the names of the files that will
hold each of the email messages retrieved from the
email server, or press Ctrl+Space to display the list
of predefined patterns.
Retrieve all emails? By default, all email messages present on the
specified server are retrieved.
To retrieve only a limited number of these email
messages, clear this check box and in the Number of
emails to retrieve.field, enter the number of
messages you want to retrieve. email messages are
retrieved starting from the most recent.
Delete emails from Select this check box if you do not want to keep the
server retrieved email messages on the server.
For Gmail servers, this option does not work
for the pop3 protocol. Select the imap protocol
and ensure that the Gmail account is config-
ured to use imap.
Java-only field Choose the protocol From the list, select the protocol to be used to retrieve
the email messages from the server. This protocol is
the one used by the email server. If you choose the
imap protocol, you will be able to select the folder
from which you want to retrieve your emails.

Internet components
tPOP
Java-only field Use SSL Select this check box if your email server uses this
protocol for authentication and communication
confidentiality.
This option is obligatory for users of Gmail.
level.
Java-only field Filter Click the plus button to add as many lines as needed
to filter email messages and retrieve only a specific
selection:
Filter item: select one of the following filter types
from the list:
From: email messages are filtered according to the
sender email address.
To: email messages are filtered according to the
recipient email address.
Subject: email messages are filtered according to the
message subject matter.
Before date: email messages are filtered by the
sending or receiving date. All messages before the set
date are retrieved.
After date: email messages are filtered by the
sending or receiving date. All messages after the set
date are retrieved.
Pattern: press Ctrl+Space to display the list of
available values. Select the value to use for each
filter.
Java-only field Filter condition Select the type of logical relation you want to use to
relation combine the specified filters:
and: the conditions set by the filters are combined
together, the research is more restrictive.
or: the conditions set by the filters are independent,
the research is large.
Usage This component does not handle data flow, it can be used alone.
Limitation n/a
Scenario: Retrieving a selection of email messages from an email

server
This Java scenario is a one-component Job that retrieves a predefined number of email messages
from an email server.
• Drop the tPOP component from the Palette to the design workspace.
• Double click tPOP to display the Basic settings view and define the component properties.
• Enter the email server IP address and port number in the corresponding fields.
• Enter the username and password for your email account in the corresponding fields. In this
example, the email server is called Free.

Internet components
tPOP
• In the Output directory field, enter the path to the output directory manually, or click the
three-dot button next to the field and browse to the output directory where the email
messages retrieved from the email server are to be stored.
• In the Filename pattern field, define the syntax you want to use to name the output files that
will hold the messages retrieved from the email server, or press Ctrl+Space to display a list
of predefined patterns. The syntax used in this example is the following:
TalendDate.getDate("yyyyMMdd-hhmmss") + "_" + (counter_tPOP_1
+ 1) + ".txt"
The output files will be stored as .txt files and are defined by date, time and arrival
chronological order.
• Clear the Retrieve all emails? field and in the Number of emails to retrieve field, enter the
number of email messages you want to retrieve, 10 in this example.
• Select the Delete emails from server check box to delete the email messages from the email
server once they are retrieved and stored locally.
• In the Choose the protocol field, select the protocol type you want to use. This depends on
the protocol used by the email server. Certain email suppliers, like Gmail, use both protocols.
In this example, the protocol used is pop3.

Internet components
tPOP
The tPOP component retrieves the 10 recent messages from the specified email server.
In the tPOP directory stored locally, a .txt file is created for each retrieved message. Each file holds
the metadata of the email message headings (sender’s address, recipient’s address, subject matter)
in addition to the message content.

Internet components
tRSSInput
tRSSInput
tRSSInput Properties
Function tRSSInput reads RSS-Feeds using URLs.

Purpose tRSSInput makes it possible to keep track of blog entries on websites to
gather and organize information for quick and easy access.
Basic settings Schema type and Edit A schema is a row description, i.e., it defines the
In the context of tRSSInput usage, the schema is
made of four columns: TITLE, DESCRIPTION,
PUBDATE, and Link. The parameter titles are
read-only while their type and length are not.
RSS URL Enter the URL for the RSS_Feed to read.

Read articles from If selected, tRSSInput reads articles on the
RSS_Feed from the date set through the three-dot [...]
button next to the date time field.
Max number of If selected, tRSSInput reads as many articles as the
articles number entered in the max amount field.
for error-free rows.
component.
Limitation n/a.
Scenario: Fetching frequently updated blog entries.
This two-component Java scenario aims at retrieving frequently updated blog entries from a Talend
local news RSS feed using the tRSSInput component.
• Drop the following components from the Palette onto the design workspace: tRSSInput
and tLogRow.
• Right-click to connect them using a Row Main link.

Internet components
tRSSInput
• In the design workspace, select tRSSInput.

• Click the Component tab to define the basic settings for tRSSInput.
• Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to
change the type and length of the schema parameters if necessary.
The scheme for tRSSInput is made up of four columns, TITLE, Description,

PUBDATE, and LINK, and it is read-only apart from the type and length of
parameters.
• In the Basic settings view of tRSSInput, enter the URL for the RSS_Feed to access. In this
scenario, tRSSInput links to the Talend RSS_Feed: http://feeds.feedburner.com/Talend.
• Select/clear the other check boxes as required. In this scenario, we want to display the
information about two articles dated from July 20, 2008.

Internet components
tRSSInput
settings. For more information about tLogRow properties, see tLogRow properties on page
1023.
The tRSSInput component accessed the RSS feed of Talend website on your behalf and organized
the information for you.
Two blog entries are displayed on the console. Each entry has its own title, description, publication
date, and the corresponding RSS feed URL address. Blogs show the last entry first, and you can
scroll down to read earlier entries.

Internet components
tRSSOutput
tRSSOutput
tRSSOutput Properties
Function TRSSOutput writes RSS_Feed XML files.

Purpose tRSSOutput makes it possible to create XML files that hold RSS feeds.
Basic settings File name Name or path to the output XML file. Related topic:
Defining variables from the Component view of
Append Select this check box to add the new rows to the end
of the file.
Channel The information to be typed in here concerns
your entire input data, site etc, rather than a
particular item.
Title: Enter a meaningful title.

Description: Enter a description that you think will
describe your content.
Publication date: Enter the relevant date.
Link: Enter the relevant UR.
Optionnal Channel Click the three-dot button below the table to add new
Elements lines and enter the information relative to the RSS
flow metadata:
Element Name: name of the metadata.
Element Value: content of the metadata.
In the context of tRSSInput usage, the schema is
made of four columns: TITLE, DESCRIPTION,
PUBDATE, and Link. The parameter titles are
read-only while their type and length are not.
level.
Usage This component must be linked to an input or intermediary component.
Limitation n/a

Internet components
tRSSOutput
Scenario 1: Creating an RSS flow and storing files on an FTP server
In this java scenario we:

• create an RSS flow for files that you would like to share with other people, and
• store the complete files on an FTP server.
This scenario writes an RSS feed XML file about a Mysql table holding information about books.
It adds links to the files stored on an FTP server in case users want to have access to the complete
files.
• Drop the following components from the Palette onto the design workspace: tMysqlInput,
tRSSOutput, and tFTPPut.
• Right-click tMysqlInput and connect it to tRSSOutpu using a Row Main link.
• Right-click tMysqlInput and connect it to tFTPPut using an OnSubjobOk link.
• In the design workspace, select tMysqlInput.

• click the Component tab to define the basic settings for tMysqlInput.

Internet components
tRSSOutput
• Set the Property type to Repository and click the three-dots button [...] to select the relevant
DB entry from the list. The connection details along with the schema get filled in
automatically.
• In the Table Name field, either type your table name or click the three dots button [...] and
select your table name from the list. In this scenario, the Mysql input table is called
“rss_talend” and the schema is made up of four columns, TITLE, Description, PUBDATE,
and LINK.
• In the Query field, enter your DB query paying particular attention to properly sequence the
fields in order to match the schema definition, or click Guess Query.
• In the design workspace, select tRSSOutput.
• click the Component tab to define the basic settings for tRSSOutput.
• In the File name field, use the by default file name and path, or browse to set your own for
the output XML file.
• Select the encoding type on the Encoding Type list.
• In the Channel panel, enter a title, a description, a publication date, and a link to define your
input data as a whole.
• Select your schema type on the Schema Type list and click Edit Schema to modify the
schema if necessary.
You can click Sync Column to retrieve the generated schema from the preceding
component.
• Save your Job and press F5 to execute this first part.

Internet components
tRSSOutput
The tRSSOutput component created an output RSS flow in an XML format for the defined files.
To store the complete files on an FTP server:
• In the design workspace, select FTPPut.
• Click the Component tab to define the basic settings for tFTPPut.

Internet components
tRSSOutput
• Enter the host name and the port number in their corresponding fields.
• Enter your connection details in the corresponding Username and Password fields.
• Browse to the local directory, or enter it manually in the Local directory field.
• Enter the details of the remote server directory.
• Select the transfer mode from the Transfer mode list.
• On the Files panel, click on the plus button to add new lines and fill in the filemasks of all
files to be copied onto the remote directory. In this scenario, the files to be saved on the FTP
server are all text files.
Files defined in the Filemask are copied on the remote server.
Scenario 2: Creating an RSS flow that contains metadata
This Java scenario describes a two-component Job that creates an RSS flow that holds metadata and
then redirects the obtained information in an XML file of the output RSS flow.
• Drop tRSSInput and tRSSOutput from the Palette to the design workspace.
• Click tRSSInput to open its Basic settings view and define the component properties.
• Set Schema type to Built-in.

If you have already stored your schema locally in the Repository, set schema type to
Repository and simply click the three-dot button next to the field to display a dialog box
where you can select the appropriate metadata. For more information about metadata, see
Managing Metadata in Talend Open Studio User Guide.
• If needed, click the three-dot button next to Edit Schema to open a dialog box where you
can check the schema.

Internet components
tRSSOutput
The read-only schema of tRSSInput is composed of four columns: TITLE, DESCRIPTION, PUBDATE,
and LINK.

• In the design workspace, click tRSSOutput to display its Basic settings view and define the
• In the File name field, use the by default file name and path, or browse to set your own for
the output XML file.
• Select the encoding type on the Encoding Type list.
• In the Channel panel, enter a title, a description, a publication date and a link to define your
input data as a whole.

Internet components
tRSSOutput
• In the Optional Channel Element, define the RSS flow metadata. In this example, the flow
has two metadata: copyright, which value is tos, and language which value is en_us.
• Select your schema type on the Schema Type list and click Edit Schema to modify the
schema if necessary.
You can click Sync Column to retrieve the generated schema from the preceding
component.
The defined files are copied in the output XML file and the metadata display under the <channel>
node above the information about the RSS flow.

Internet components
tSCPConnection
tSCPConnection
tSCPConnection properties
Component family Internet/SCP
Function tSCPConnection opens an SCP connection for the current transaction.

Purpose tSCPConnection allows you to open an SCP connection to transfer files
in one transaction.
Basic settings Host IP address of the SCP server.
Port Number of listening port of the SCP server.
Username User name for the SCP server.
Authentication SCP authentication method.
method
Password User password for the SCP server.
Usage This component is typically used as a single-component sub-job. It is
used along with other SCP components.
Limitation n/a
Related scenarios
For a related scenario using a different protocol, see Scenario: Getting files from a remote SCP
server on page 970.

Internet components
tSCPDelete
tSCPDelete
tSCPDelete properties
Function This component deletes files from remote hosts over a fully encrypted
channel.
Purpose tSCPDelete allows you to remove a file from the defined SCP server.
Basic settings Host SCP IP address.
Port Listening port number of the SCP server.
Username SCP user name.
method
Password SCP password.
Filelist File name or path to the files to be deleted.
Limitation n/a
Related scenario
For tSCPDelete related scenario, see Scenario: Getting files from a remote SCP server on page 970.
For tSCPDelete related scenario using a different protocol, see Scenario: Putting files on a remote
FTP server on page 937.

Internet components
tSCPFileExists
tSCPFileExists
tSCPFileExists properties
Function This component checks, over a fully encrypted channel, if a file exists on
a remote host.
Purpose tSCPFileExists allows you to verify the existence of a file on the defined
SCP server.
method
Remote directory File path on the remote directory.
Filename Name of the file to check.
Limitation n/a
Related scenario
For tSCPFileExists related scenario, see Scenario: Getting files from a remote SCP server on page
970.
For tSCPFileExists related scenario using a different protocol, see Scenario: Putting files on a
remote FTP server on page 937.

Internet components
tSCPFileList
tSCPFileList
tSCPFileList properties
Function This component iterates, over a fully encrypted channel, on files of a

given directory on a remote host.
Purpose tSCPFileList allows you to list files from the defined SCP server.
method
Command separator The character used to separate multiple
commands.
Filelist Directory name or path to the directory holding
the files to list.
Limitation n/a
Related scenario
For tSCPFileList related scenario, see Scenario: Getting files from a remote SCP server on page
970.
For tSCPFileList related scenario using a different protocol, see Scenario: Putting files on a remote
FTP server on page 937.

Internet components
tSCPGet
tSCPGet
tSCPGet properties
Function This component transfers defined files via an SCP connection over a
fully encrypted channel.
Purpose tSCPGet allows you to copy files from the defined SCP server.
method
Local directory Path to the destination folder.
Overwrite or List of available options for the transferred files.
Append
Filelist File name or path to the file(s) to copy.
Limitation n/a
Scenario: Getting files from a remote SCP server
This Java scenario creates a single-component Job which gets the defined file from a remote SCP
server.
• Drop a tSCPGet component from the Palette onto the design workspace.
• In the design workspace, select tSCPGet and click the Component tab to define its basic
settings.

Internet components
tSCPGet
• Fill in the Host IP address, the listening Port number, and the user name in the corresponding
fields.
• On the Authentication method list, select the appropriate authentication method.
Note that the field to follow changes according to the selected authentication method. The
authentication form used in this scenario is password.
• Fill in the local directory details where you want to copy the fetched file.
• On the Overwrite or Append list, select the action to be carried out.
• In the Filelist area, click the plus button to add a line in the Source list and fill in the path
to the given file on the remote SCP server.
In this scenario, the file to copy from the remote SCP server to the local disk is backport.
The given file on the remote server is copied on the local disk.

Internet components
tSCPPut
tSCPPut
tSCPPut properties
Function This component copies defined files to a remote SCP server over a fully
encrypted channel.
Purpose tSCPPut allows you to copy files to the defined SCP server.
method
Remote directory Path. to the destination folder.
Filelist File name or path to the file(s) to copy.
Limitation n/a
Related scenario
For tSCPPut related scenario, see Scenario: Getting files from a remote SCP server on page 970.
For tSCPut related scenario using a different protocol, see Scenario: Putting files on a remote FTP
server on page 937.

Internet components
tSCPRename
tSCPRename
tSCPRename properties
Function This component renames files on a remote SCP server.

Purpose tSCPRename allows you to rename file(s) on the defined SCP server.
method
File to rename Enter the name or path to the file you want to
rename.
Rename to Enter the file new name.
Limitation n/a
Related scenario
For tSCPRename related scenario, see Scenario: Getting files from a remote SCP server on page
970.

Internet components
tSCPTruncate
tSCPTruncate
tSCPRename properties
Function This component removes all the data from a file via an SCP connection.
Purpose tSCPTruncate allows you to remove data from file(s) on the defined
SCP server.
method
Remote directory Path. to the destination file.
Filelist File name or path to the file(s) to truncate.
Limitation n/a
Related scenario
For tSCPTruncate related scenario, see Scenario: Getting files from a remote SCP server on page
970.

Internet components
tSendMail
tSendMail
tSendMail Properties
Function tSendMail sends emails and attachments to defined recipients.

Purpose tSendMail purpose is to notify recipients about a particular state of a Job
or possible errors.
Basic settings To Main recipient email address.
From Sending server email address.
Show sender’s name Select this check box if you want the sender name
to show in the messages.
Cc Email addresses of secondary recipients of the
email message directed to another.
Bcc Email addresses of secondary recipients of the
email message. Recipients listed in the Bcc field
receive a copy of the message but are not shown
on any other recipient's copy.
Subject Heading of the mail.
Message Body message of the email. Press Ctrl+Space to
display the list of available variables.
Die if the attachment This check box is selected by default. Clear this
file doesn’t exist check box if you want the message to be sent even
if there are no attachments.
Attachments Click the plus button to add as many lines as
needed where you can put Filemask or path to the
file to be sent along with the mail, if any.
Other Headers Click the plus button to add as many lines as
needed where you can type the Key and the
corresponding Value of any header information
that does not belong to the standard header.
SMTP Host and Port IP address of SMTP server used to send emails.
SSL Support Select this check box to authenticate the server at
the client side via an SSL protocol.
STARTTLS Support Select this check box to authenticate the server at
the client side via a STARTTLS protocol.
Importance Select in the list the priority level of your
messages.
Need authentication Select this check box and enter a username and a
/ Username and password in the corresponding fields if this is
Password necessary to access the service.

Internet components
tSendMail
Advanced settings MIME subtype from Select in the list the structural form for the text of
the ‘text’ MIME type the message.
Encoding type Select the encoding from the list or select
Custom and define it manually.
component level.
Usage This component is typically used as one sub-job but can also be used as
output or end object. It can be connected to other components with either
Row or Iterate links.
Limitation Note that two different perl modules are required depending on whether
the e-mails have attachments.
Scenario: Email on error
This scenario creates a three-component Job which sends an email to defined recipients when an
error occurs.
• Drop the following components from your Palette to the design workspace:
tFileInputDelimited, tFileOutputXML, tSendMail.
• Define tFileInputdelimited properties. Related topic: tFileInputDelimited on page 824.
• Right-click on the tFileInputDelimited component and select Row > Main. Then drag it
onto the tFileOutputXML component and release when the plug symbol shows up.
• Define tFileOutputXML properties.
• Drag a Run on Error link from tFileDelimited to tSendMail component.
• Define the tSendMail component properties:

Internet components
tSendMail
• Enter the recipient and sender email addresses, as well as the email subject.
• Enter a message containing the error code produced using the corresponding global variable.
Access the list of variables by pressing Ctrl+Space.
• Add attachments and extra header information if any. Type in the SMTP information.
In this scenario, the file containing data to be transferred to XML output cannot be found. tSendmail
runs on this error and sends a notification email to the defined recipient.

Internet components
tSocketInput
tSocketInput
tSocketInput properties
Function tSocketInput component opens the socket port and listens for the
incoming data.
Purpose tSocketInput component is a listening component, allowing to pass data
via a defined port
JAVA Basic settings Host name Name or IP address of the Host server
Port Listening port to open
Timeout Number of seconds for the port to listen before
closing.
Uncompress Select this check box to unzip the data if relevant
the process for error-free rows. If needed, you can
retrieve the rows on error via a Row > Rejects
link.
fields.
Escape Char Character of the row to be escaped
Schema type and A schema is a row description, i.e., it defines the
User Guide.
Encoding type Select the encoding from the list or select Custom
and define it manually. This field is compulsory

Internet components
tSocketInput
JAVA Advanced tStatCatcher Select this check box to gather the job processing
settings Statistics metadata at a job level as well as at each
component level.
Usage This component opens a point of access to a workstation or server. This
component starts a Job and only stops after the time goes out.
Limitation n/a
The Perl properties being slightly different from the Java properties, they are described in a separate
table below.
PERL basic settings Schema type and A schema is a row description, i.e., it defines the
User Guide.
Host name Name or IP address of the Host server
fields.
End of Line String (ex: “\n”on Unix) to distinguish row.
separator
End of data Character, string or regular expression that points
out the end of the data section
Opening message Description of the message if relevant.
/Message
/Acknowledge
message
/Closing message
PERL Advanced tStatCatcher Select this check box to gather the job processing
settings Statistics metadata at a job level as well as at each
component level.
component starts a Job and only stops after it receives a closing message.

Internet components
tSocketInput
Scenario: Passing on data to the listening port (Java)
The following scenario describes a double Job aiming at passing data via a listening port.Another
application for the Socket components would be to allow controlled communication between
servers which cannot communicate directly.
• Create two Jobs: a first Job (SocketInput) opens the listening port and waits for the data to
be sent over. The second Job (SocketOutput) passes delimited data from a file to a defined
port number corresponding to the listening port.
• On the first Job, Drop the following components: tSocketInput and tLogRow from the
Palette to the design workspace.
• On the second Job, Drop the following components from the Palette to the design
workspace: tFileInputDelimited and tSocketOutput.
• Let’s set the parameters of the second Job first...
• Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set
the access parameters to the input file.
• In File Name, browse to the file.

• Define the Row and Field separators, as well as the Header.

Internet components
tSocketInput
• Describe the Schema of the data to be passed on to the tSocketOutput component.

• Select the tSocketOutput component and set the parameters on the Basic Settings tab of the
Component view.
• Define the Host IP address and the Port number where the data will be passed on to.
• Set the number of retries in the Retry field and the amount of time (in seconds) after which
the Job will time out.
• Define the rest of elements if need be.
• The schema should be propagated from the preceding component.
• Now on the other Job (SocketInput) design, define the parameters of the tSocketInput
component.
• Define the Host IP address and the listening Port number where the data are passed on to.
• Set the amount of time (in seconds) after which the Job will time out.
• Define the rest of elements if need be.
• Edit the schema and set it to reflect the whole or part of the other Job’s schema.

Internet components
tSocketInput
• The tLogRow does not require any particular setting for this Job.
• Press F6 to execute this Job (SocketInput) first, in order to open the listening port and
prepare it to receive the passed data.
• Before the time-out, launch the other Job (SocketOutput) to pass on the data.
The result displays on the Run view, along with the opening socket information.

Internet components
tSocketOutput
tSocketOutput
tSocketOutput properties
Function tSocketOutput component writes data to a listening port.

Purpose tSocketOutput sends out the data from the incoming flow to listening
socket port.
Basic settings Host name Name or IP address of the Host server
Compress Select this check box to zip the data if relevant.
Retry times Number of retries before the Job fails.
Timeout Number of seconds for the port to listen before
closing.
Die on error Clear this check box to skip the row on error and
complete the process for error-free rows.
fields.
Escape Char Character of the row to be escaped
User Guide.
component starts a Job and only stops after the time goes out.
Limitation n/a

Internet components
tSocketOutput
Related Scenario
For use cases in relation with tSocketOutput, see Scenario: Passing on data to the listening port
(Java) on page 980.

Internet components
tWebServiceInput
tWebServiceInput
tWebServiceInput Properties
Function Calls the defined method from the invoked Web service, and returns the class
as defined, based on the given parameters.
Purpose Invokes a Method through a Web service.
To handle complex hierarchical data, use the advanced features of
tWebServiceInput and provide Java code directly in the Code field of
the Advanced Settings view.
Click this icon to open a WSDL schema wizard and
store your WSDL connection in the Repository tree
view.
database connection parameters, see Setting up a
WSDL schema of Talend Open Studio User Guide.
Perl only field Encoding type Select the encoding type from the list or select
Custom and define the type manually. This field is
obligatory to manipulate data in a database.
previous component in the Job.
for the relevant component. Related topic: Setting a
various projects and job flowcharts. Related topic:
User Guide.
Perl only field End Point URI Resource identifier of the Web service.
WSDL Description of Web service bindings and

configuration.

Internet components
tWebServiceInput
Java only field Need authentication Select this check box and:
-enter a username and a password in the
corresponding fields if this is necessary to access the
service. Or,
-select the Windows authentication check box and
enter the windows domain in the corresponding field
if this is necessary to access the service.
Java only field Use http proxy Select this check box if you are using a proxy server
and fill in the necessary information.
Java only field Trust server with SSL Select this check box to validate the server certificate
to the client via an SSL protocol and fill in the
corresponding fields:
TrustStore file: enter the path (including filename)
to the certificate TrustStore file that contains the list
of certificates that the client trusts.
TrustStore password: enter the password used to
check the integrity of the TrustStore data.
Java only field Time out (second) Set a value in seconds for Web service connection
time out.
Method Name Enter the exact name of the Method to be invoked.
The Method name MUST match the corresponding
method described in the Web Service. The Method
name is also case-sensitive.
Parameters Enter the parameters expected and the sought values
to be returned. Make sure that the parameters entered
fully match the names and the case of the parameters
described in the method.
Advanced settings Advanced Use Select this check box to display the fields dedicated
for the advanced use of tWebServiceInput:
Java only field
WSDL2java: click the three-dot button to generate
Talend routines that hold the Java code necessary to
connect and query the Web service.
Code: replace the generated model Java code with
the code necessary to connect and query the specified
Web service using the code in the generated Talend
routines.
Match Brackets: select the number of brackets to be
used to close the for loop based on the number of
open brackets.
level.
Usage This component is generally used as a Start component. It requires to be
linked to an output component.
Limitation n/a

Internet components
tWebServiceInput
Scenario 1: Extracting images through a Web service
This scenario describes a two-component Job aiming at using a Web service method and displaying
the output on the Run console view.
The method takes a full url as an input string and returns a string array of images from a given Web
page.
• Drop a tWebServiceInput component and a tLogRow component from the Palette to the
design workspace.
• On the Component view of the tWebServiceInput component, define the WSDL
specifications, such as End Point URI, WSDL and SOAPAction URI where required.
• If the Web service you invoke requires authentication details, select the Need
authentication check box and provide the relevant authentication information.
• If you are using a proxy server, select the Use http proxy check box and fill in the necessary
connection information.
• In the Method Name field, type in the method name as defined in the Web Service
description. The name and the case of the method entered must match the corresponding
Web service method exactly.

Internet components
tWebServiceInput
• In the Parameters area, click the plus (+) button to add a line to the table.
• In the Value column, type in the URL of the Website, the images are to be extracted from.
• Link the tWebServiceInput component to the standard output component, tLogRow.
All images extracted from the given Web site are returned as a list of URLs on the Run view.
Scenario 2: Reading the data published on a Web service using the

tWebServiceInput advanced features (Java only)
This Java scenario describes a two-component Job that aims at fetching a list of funds published by
a financial Web service (distributed by www.xignite.com) and displaying the output on the standard
console (the Run view).
This scenario is designed for advanced users with Java basics. Since the aim of this Job is to fetch
complex hierarchical data, you need to code in Java the necessary functions.
tWebServiceInput and tLogRow.
• Connect the two components together using the Row Main link.
• Double-click tWebServiceInput to show the Component view and set the component
properties:

Internet components
tWebServiceInput
In the Basic settings view:

• In the Property Type list, select the option Built-in and manually fill in the fields that
follow.
• In the Schema Type list, select Built-in and click the three-dot button to manually configure
the data structure (schema) as shown by the figure below:
• In the WSDL field, type in the URL from which to get the WSDL.
• In the Time out field, type in the duration in seconds of the connection to the Web Service.
• Click the Advanced settings tab to display the corresponding view where you can set the
tWebServiceInput advanced features:

Internet components
tWebServiceInput
• Select the check box next to Advanced Use to display the advanced configuration fields.
• Click the three-dot [...] button next to the field WSDL2Java in order to generate
automatically a number of routines from the WSDL information provided.

Internet components
tWebServiceInput
The generated routines display automatically under Code > Routines in the Repository tree view.
These routines can thus easily be called in the code to build the function required to fetch complex
hierarchical data from the Web Service.
• Type in the relevant function in the Code field. By default two code examples are provided
in the Code field. The first example returns one piece of data, and the second example returns
several data.
• In this scenario, several data are to be returned. Therefore, remove the first example of code
and use the second example of code to build the function.
• Replace the pieces of code provided as examples with the relevant routines that have been
automatically generated from the WSDL.
• Change TalendJob_PortType to the routine name ending with _Port_Type, such as:
XigniteFundHoldingsSoap_PortType.
• Replace the various instances of TalendJob with a more relevant name such as the name
of the method in use. In this use case: fundHolding
• Replace TalendJobServiceLocator with the name of the routine ending with
Locator, such as: XigniteFundHoldingLocator.
• Replace both instances of TalendJobSoapBindingStub with the routine name ending
with BindingStub, such as: XigniteFundHoldingsSoap_BindingStub.
• Within the brackets corresponding to the pieces of code: stub.setUsername and
stub.setPassword, type in respectively your username and password between quotes.
For the sake of confidentiality or maintenance, you can store your username and password
in context variables.
• The list of funds provided by the Xignite Web service is identified using so-called “symbols”,
which are of string type. In this example, we intend to fetch the list of funds whom symbol
is between “I” and “J”. To do so, define the following statements: string
startSymbol=“I” and string endSymbol=“J”.
• Then enter the piece of code to create the result table showing the list of funds (listFunds)
of funds holdings using the statements defined earlier on: routines.Fund[] result
= fundHoldings.listFunds(startSymbol, endSymbol);
• Loop on the fund list to fetch the funds ranging from “I” to “J”: for(int i = 0; i <
result.length; i++) {.
• Define the results to return, for example: fetch the CIK data from the Security schema using
the code getSecurity().getCIK(), then pass them on to the CIK output schema.
The function that operates the Web service should read as follows:
routines.XigniteFundHoldingsSoap_PortType fundHoldings = new
routines.XigniteFundHoldingsLocator().getXigniteFundHoldingsSoap(
);
routines.XigniteFundHoldingsSoap_BindingStub stub =
(routines.XigniteFundHoldingsSoap_BindingStub)fundHoldings;
stub.setUsername(“identifiant”);
Stub.setPassword(“mot de passe”);

Internet components
tWebServiceInput
String startSymbol="I";
String endSymbol="J";
routines.Fund[ ] result = fundHoldings.listFunds(startSymbol,
endSymbol); for(int i = 0; i < result.length; i++) {
output_row.CIK = (result[i]).getSecurity().getCIK();
output_row.cusip = (result[i]).getSecurity().getCusip();
output_row.symbol = (result[i]).getSecurity().getSymbol();
output_row.ISIN = (result[i]).getSecurity().getISIN();
output_row.valoren = (result[i]).getSecurity().getValoren();
output_row.name = (result[i]).getSecurity().getName();
output_row.market = (result[i]).getSecurity().getMarket();
output_row.category =
(result[i]).getSecurity().getCategoryOrIndustry();
output_row.asOfDate = (result[i]).getAsOfDate();
The outputs defined in the Java function output_row.output should match exactly the columns
defined in the component schema. Indeed, the case need to match in order for the data to be fetched.
• In the Match Brackets field, select the number of brackets to use to end the For loop, based
on the number of open brackets. For this scenario, select one bracket only as only one bracket
has been opened in the function.
• Double-click the component tLogRow to display the Component view and set its
parameters.
• Click the three-dot [...] button next to the field Edit Schema, in order to check that the
preceding component schema was properly propagated to the output component. If needed,
click the Sync Columns button to retrieve the schema.
The funds comprised between “I” and “J” are returned and displayed in the console of Talend Open
Studio.

Internet components
tXMLRPCInput
tXMLRPCInput
tXMLRPCInput Properties
Function Calls the defined method from the invoked RPC service, and returns the class
as defined, based on the given parameters.
Purpose Invokes a Method through a Web service and for the described purpose
In the RPC context, the schema corresponds to the
output parameters. If two parameters are meant to be
returned, then the schema should contain two
columns.
Server URL URL of the RPC service to be accessed
Need authentication / Select this check box and fill in a username and
Username and password if required to access the service.
Password
Method Name Enter the exact name of the Method to be invoked.
The Method name MUST match the corresponding
method described in the RPC Service. The Method
name is also case-sensitive.
Return class Select the type of data to be returned by the method.
Make sure it fully matches the one defined in the
method.
Parameters Enter the parameters expected by the method as input
parameters.
Usage This component is generally used as a Start component. It requires to be
linked to an output component.
Limitation n/a
Scenario: Guessing the State name from an XMLRPC
This scenario describes a two-component Job aiming at using a RPC method and displaying the
output on the console view.

Internet components
tXMLRPCInput
• Drop the tXMLRPCInput and a tLogRow components from the Palette to the design
workspac.
• Set the tXMLRPCInput basic settings.
• Define the Schema type as Built-in for this use case.

• Set a single-column schema as the expected output for the called method is only one
parameter: StateName.
• Then set the Server url. For this demo, use: http://phpxmlrpc.sourceforge.net/server.php
• No authentication details are required in this use case.
• The Method to be called is: examples.getStateName
• The return class is not compulsory for this method but might be strictly required for another.
Leave the default setting for this use case.
• Then set the input Parameters required by the method called. The Name field is not used
in the code but the value should follow the syntax expected by the method. In this example,
the Name used is State Nr and the value randomly chosen is 42.

Internet components
tXMLRPCInput
• The class has not much impact using this demo method but could have with another method,
so leave the default setting.
• On the tLogRow component Component view, check the box: Print schema column name
in front of each value.
• Then save the Job and press F6 to execute it.
South Dakota is the state name found using the GetStateName RPC method and corresponds the
42nd State of the United States as defined as input parameter.

Internet components
tXMLRPCInput

Logs & Errors components
This chapter details the main components that you can find in the Logs & Errors family of the
The Logs & Errors family groups together the components which are dedicated to log information
catching and job error handling.
tAssert
tAssert
tAssert Properties
The tAssert component works alongside tAssertCatcher to evaluate the status of a Job execution.
It concludes with the boolean result based on an assertive statement related to the execution and feed
the result to tAssertCatcher for proper Job status presentation.
Component family Logs & Errors
Function Provides the job status messages to tAssertCatcher.

Purpose Generates the boolean evaluation on the concern for the job execution status.
The status includes:
- Ok: the job execution succeeds.
- Fail: the job execution fails. The tested Job's result does not match the
expectation or an execution error occured at runtime.
Basic settings Description Type in your descriptive message to help identify the
assertion of a tAssert.
Expression Type in the assertive statement you base the
evaluation on.
Usage This component follows the action the assertive condition is directly related
to. It can be the intermediate or end component of the main job, or the start,
intermediate or end component of the secondary job.
Limitation The evaluation of tAssert is captured only by tAssertCatcher.
Scenario: Setting up the assertive condition for a Job execution
This scenario describes how to set up an assertive condition in tAssert in order to evaluate that a
Job execution succeeds or not. Moreover, you can also find out how the two different evaluation
results display and the way to read them. Apart from tAssert, the scenario uses the following
components as well:
• tFileInputDelimited and tFileOutputDelimited. The two components compose the main
Job of which the execution status is evaluated. For the detailed information on the two
components, see component tFileInputDelimited on page 824 and component
tFileOutputDelimited on page 884.
• tFileCompare. It realizes the comparison between the output file of the main Job and a
standard reference file. The comparative result is evaluated by tAssert against the assertive
condition set up in its settings. For more detailed information on tFileCompare, see
component tFileCompare on page 809
• tAssertCatcher. It captures the evaluation generated by tAssert. For more information on
tAssertCatcher, see component tAssertCatcher on page 1004.
• tLogRow. It allows you to read the captured evaluation. For more information on tLogRow,
see component tLogRow on page 1023.

tAssert
First proceed as follows to design the main Job:

• Prepare a delimited .csv file as the source file read by your main Job.
• Edit two rows in the delimited file. The contents you edit are not important, so feel free to
simplify them.
• Name it source.csv.
• In Talend Open Studio, create a new job JobAssertion.
• Place tFileInputDelimited and tFileOutputDelimited on the workspace.
• Connect them with a Row Main link to create the main job.
• Double-click tFileInputDelimited to open its Component view.

• In the File Name field of the Component view, fill in the path or browse to source.csv.
• Still in the Component view, set Property Type to Built-In and click next to Edit
schema to define the data to pass on to tFileOutputDelimited. In the scenario, define the
data presented in source.csv you created.
• Define the other parameters in the corresponding fields according to source.csv you created.
• Double-click tFileOutputDelimited to open its Component view.
• In the File Name field of the Component view, fill in or browse to specify the path to the
output file, leaving the other fields as they are by default.

tAssert
• Press F6 to execute the main Job. It reads source.csv, pass the data to tFileOutputDelimited
and output an delimited file, out.csv.
Then contine to edit the Job to see how tAssert evaluates the execution status of the main Job.
• Rename out.csv as reference.csv.This file is used as the expected result the main Job should
output.
• Place tFileCompare, tAssert and tLogRow on the workspace.
• Connect them with Row Main link.
• Connect tFileInputDelimited to tFileCompare with OnSubjobOk link.
• Double-click tFileCompare to open its Component view.

• In the Component view, fill in the corresponding file paths in the File to compare field and
the Reference file field, leaving the other fields as default.

tAssert
For more information on the tFileCompare component, see component tFileCompare on page 809.
• Then click tAssert and click the Component tab on the lower side of the workspace.
• In the Component view, edit the assertion row2.differ==0 in the expression field and
the descriptive message of the assertion in description field.
In the expression field, row2 is the data flow transmissing from tFileCompare to tAssert,
differ is one of the columns of the tFileCompare schema and presents whether the compared
files are identical, and 0 means no difference is detected between the out.csv and reference.csv by
tFileCompare. Hence when the compared files are identical, the assertive condition is thus fulfilled,
tAssert concludes that the main Job succeeds; otherwise, it concludes failure.
The differ column is in the read-only tFileCompare schema. For more information on
its schema, see component tFileCompare on page 809

• Check the result presented in the Run view

tAssert
The console shows the comparison result of tFileCompare: Files are identical. But you
find nowhere the evaluation result of tAssert.
So you need tAssertCatcher to capture the evaluation.
• Place tAssertCatcher and tLogRow on the workspace.
• Connect them with Row Main link.
• Use the default configuration in the Component view of tAssertCatcher.

tAssert

• Check the result presented in the Run view. You will see the Job status information is added
in:
2010-01-29
15:37:33|fAvAzH|TASSERT|JobAssertion|java|tAssert_1|Ok|--|
The output file should be identical with the reference file.
The descriptive information on JobAssertion in the console is organized according to the

tAssertCatcher schema. This schema includes, in the following order, the execution time, the
process ID, the project name, the Job name, the code language, the evaluation origin, the evaluation
result, detailed information of the evaluation, descriptive message of the assertion. For more
information on the schema of tAssertCatcher, see component tAssertCatcher on page 1004.
The console indicates that the execution status of Job JobAssertion is Ok. In addition to the
evalution, you can still see other descriptive information about JobAssertion including the
descriptive message you have edited in the Basic settings of tAssert.
Then you will perform operations to make the main Job fail to generate the expected file. To do so,
proceed as follows in the same Job you have executed:
• Delete a row in reference.csv.
• Press F6 to execute the Job again.
• Check the result presented in Run view.
2010-02-01
19:47:43|GeHJNO|TASSERT|JobAssertion|tAssert_1|Failed|Test
logically failed|The output file should be identical with the
reference file.
The console shows that the execution status of the main Job is Failed. The detailed explanation for
this status is closely behind it, reading Test logically failed.
You can thus get a basic idea about your present Job status: it fails to generate the expected file
because of a logical failure. This logical failure could come from a logical mistake during the Job
design.
The status and its explanatory information are presented respectively in the status and the substatus
columns of the tAssertCatcher schema. For more information on the columns, see component
tAssertCatcher on page 1004.

tAssertCatcher
tAssertCatcher
tAssertCatcher Properties
Function Based on its pre-defined schema, fetches the execution status information
from repository, job execution and tAssert.
Purpose Generates a data flow consolidating the status information of a job execution
and transfer the data into defined output files.
fields to be processed and passed on to the next
component. In this particular case, the schema is
read-only, as this component gathers standard log
information including:
Moment: Processing time and date.
Pid: Process ID.
Project: Project which the job belongs to.
Job: Job name.
Language: Language used by the job. It may be Java
or Perl.
Origin: Status evaluation origin. The origin may be
different tAssert components.
Status: Evaluation fetched from tAssert. They may
be
- Ok: if the assertive statement of tAssert is
evaluated as true at runtime.
- Failed: if the assertive statement of tAssert is
evaluated as false or an execution error occurs at
runtime. The tested Job's result does not match the
expectation or an execution error occured at runtime.
Substatus: Detailed explanation for failed
execution. The explanation can be:
- Test logically failed: the investigated Job does not
produce the expected result.
- Execution error: an execution error occured at
runtime.
Description: Descriptive message you typed in in
Basic settings of tAssert.
Catch Java Exception This check box allows to capture Java exception
errors, once checked.
Catch tAssert This check box allows to capture the evaluations of
tAssert.
Usage This component is the start component of a secondary Job which fetches the
execution status information from several sources. It generates a data flow to
transfer the information to the component which proceeds.

tAssertCatcher
Limitation This component must be used with tAssert together.
Related scenarios
For using case in relation with tAssertCatcher, see tAssert scenario:

• Scenario: Setting up the assertive condition for a Job execution on page 998

tChronometerStart
tChronometerStart
tChronometerStart Properties
Function Starts measuring the time a subjob takes to be executed.

Purpose Operates as a chronometer device that starts calculating the processing time
of one or more subjobs in the main Job, or that starts calculating the
processing time of part of your subjob.
Usage You can use tChronometerStart as a start or middle component. It can
precede one or more processing tasks in the subjob. It can precede one or more
subjobs in the main Job.
Limitation n/a
Related scenario
For related scenario, see Scenario: Measuring the processing time of a subjob and part of a subjob
on page 1007.

tChronometerStop
tChronometerStop
tChronometerStop Properties
Function Measures the time a subjob takes to be executed.

Purpose Operates as a chronometer device that stops calculating the processing time
of one or more subjobs in the main Job, or that stops calculating the processing
time of part of your subjob.
In Perl, tChronometerStop displays the total execution time, number of runs,
number of rows processed per second, and the average, minimum and
maximum processing time of a row.
In Java, tChronometerStop displays the total execution time.
Basic settings Since options Select either check box to select measurement
starting point:
Since the beginning: stops time measurement
launched at the beginning of a subjob.
Since a tChronometerStart: stops time
measurement launched at one of the
tChronometerStart components used on the data
flow of the subjob.
Display duration in When selected, it displays subjob execution
console information on the console.
Display component When selected, it displays the name of the
name component on the console.
Caption Enter desired text, to identify your subjob for
example.
Display human When selected, it displays subjob execution
readable duration information in readable time unites.
Usage Cannot be used as a start component.
Limitation n/a
Scenario: Measuring the processing time of a subjob and part of a

subjob
This scenario is a subjob that does the following in a sequence:

• generates 1000 000 rows of first and last names,
• gathers first names with their corresponding last names,
• stores the output data in a delimited file,
• measures the duration of the subjob as a whole,
• measures the duration of the name replacement operation,

tChronometerStop
• displays the gathered information about the processing time on the Run log console.
To measure the processing time of the subjob:
tRowGenerator, tMap, tFileOutputDelimited, and tChronometerStop.
• Connect the first three components using Main Row links.
When connecting tMap to tFileOutputDelimited, you will be prompted to name the
output table. The name used in this example is “new_order”.
• Connect tFileOutputDelimited to tChronometerStop using an OnComponentOk link.

• Select tRowGenerator and click the Component tab to display the component view.
• In the component view, click Basic settings. The Component tab opens on the Basic
settings view by default.
• Click Edit schema to define the schema of the tRowGenerator. For this Job, the schema is
composed of two columns: First_Name and Last_Name, so click twice the [+] button to add
two columns and rename them.
• Click the RowGenerator Editor three-dot button to open the editor and define the data to
be generated.

tChronometerStop
• In the RowGenerator Editor, specify the number of rows to be generated in the Number
of Rows for RowGenerator field and click OK. The RowGenerator Editor closes.
• You will be prompted to propagate changes. Click Yes in the popup message.
• Double-click on the tMap component to open the Map editor. The Map editor opens
displaying the input metadata of the tRowGenerator component.
two rows and define them.
• In the Map editor, drag the First_Name row from the input table to the Last_Name row in
the output table and drag the Last_Name row from the input table to the First_Name row in
the output table.
• Click Apply to save changes.
• You will be prompted to propagate changes. Click Yes in the popup message.
• Click OK to close the editor.

tChronometerStop
• Select tFileOutputDelimited and click the Component tab to display the component view.
• In the Basic settings view, set tFileOutputDelimited properties as needed.
• Select tChronometerStop and click the Component tab to display the component view.
• In the Since options panel of the Basic settings view, select Since the beginning check box
to measure the duration of the subjob as a whole.

tChronometerStop
• Select/clear the other check boxes as needed. In this scenario, we want to display the subjob
duration on the console preceded by the component name.
• If needed, enter a text in the Caption field.
You can measure the duration of the subjob the same way by placing
tChronometerStop below tRowGenerator, and connecting the latter to
tChronometerStop using an OnSubjobOk link.

tDie
tDie
tDie properties
Both tDie and tWarn components are closely related to the tLogCatcher component.They
generally make sense when used alongside a tLogCatcher in order for the log data collected to be
encapsulated and passed on to the output defined.
Function Kills the current Job. Generally used with a tCatch for log purpose.
Purpose Triggers the tLogCatcher component for exhaustive log before killing the
Job.
Basic settings Die message Enter the message to be displayed before the Job is
killed.
Error code Enter the error code if need be, as an integer
Priority Set the level of priority, as an integer
Usage Cannot be used as a start component.
Limitation n/a
Related scenarios
For uses cases in relation with tDie, see tLogCatcher scenarios:

• Scenario1: warning & log on entries on page 1019
• Scenario 2: Log & kill a Job on page 1021

tFlowMeter
tFlowMeter
tFlowMeter Properties
Function Counts the number of rows processed in the defined flow.

Purpose The number of rows is then meant to be caught by the tFlowMeterCatcher
for logging purpose.
Basic settings Use input connection Select this check box to reuse the name given to the
name as label input main row flow as label in the logged data.
Mode Select the type of values for the data measured:
Absolute: the actual number of rows is logged
Relative: a ratio (%) of the number of rows is logged.
When selecting this option, the reference
Thresholds Adds a threshold to watch proportions in volumes
measured. you can decide that the normal flow has to
be between low and top end of a row number range,
and if the flow is under this low end, there is a
bottleneck.
Usage Cannot be used as a start component as it requires an input flow to operate.
Limitation n/a
If you have a need of log, statistics and other measurement of your data flowsAutomating statistics
& logs use in Talend Open Studio User Guide.
Related scenario
For related scenario, see Scenario: Catching flow metrics from a Job on page 1015.

tFlowMeterCatcher
tFlowMeterCatcher
tFlowMeterCatcher Properties
Function Based on a defined sch.ema, the tFlowMeterCatcher catches the processing

volumetric from the tFlowMeter component and passes them on to the output
component.
Purpose Operates as a log function triggered by the use of a tFlowMeter component
in the Job.
Moment: Processing time and date
Pid: Process ID
Father_pid: Process ID of the father Job if
applicable. If not applicable, Pid is duplicated.
Root_pid: Process ID of the root Job if applicable. If
not applicable, pid of current Job is duplicated.
System_pid: Process id generated by the system
Project: Project name, the Job belongs to.
Job: Name of the current Job
Job_repository_id: ID generated by the application.
Job_version: Version number of the current Job
Context: Name of the current context
Origin: Name of the component if any
Label: Label of the row connection preceding the
tFlowMeter component in the Job, and that will be
analyzed for volumetrics.
Count: Actual number of rows being processed
Reference: Name of the reference row as defined in
the tFlowMeter component for relative counting
mode.
Thresholds: Only used when the relative mode is
selected in the tFlowMeter component.
Usage This component is the start component of a secondary Job which triggers
automatically at the end of the main Job.
Limitation The use of this component cannot be separated from the use of the
tFlowMeter. For more information, see tFlowMeter on page 1013.

tFlowMeterCatcher
Scenario: Catching flow metrics from a Job
The following basic Job aims at catching the number of rows being passed in the flow processed.
The measures are taken twice, once after the input component, that is, before the filtering step and
once right after the filtering step, that is, before the output component.
• Drop the following components from the Palette to the design workspace: tMysqlInput,
tFlowMeter (x2), tMap, tLogRow, tFlowMeterCatcher and tFileOutputCSV.
• Link components using row main connections and click on the label to give consistent name
throughout the Job, such as US_States from the input component and filtered_states for the
output from the tMap component, for example.
• Link the tFlowMeterCatcher to the tFileOutputCSV component using a row main link
also as data is passed.
• On the tMysqlInput Component view, configure the connection properties as Repository,
if the table metadata are stored in the Repository. Or else, set the Type as Built-in and
configure manually the connection and schema details if they are built-in for this Job.
• The Schema is simply made of two columns: idState and LabelState.

• The Query type is Built-in for this Job example.
• The 50 States of the USA are recorded in the table us_states. In order for all 50 entries of the
table to get selected, the query to run onto the Mysql database is as follows:
select * from us_states.
• Select the relevant Encoding type in the list.
• Then select the following component which is a tFlowMeter and set its properties.

tFlowMeterCatcher
• Select the check box Use input connection name as label, in order to reuse the label you
chose in the log output file (tFileOutputCSV).
• The mode is Absolute as there is no reference flow to meter against, also no Threshold is
to be set for this example.
• Then launch the tMap editor to set the filtering properties.
• For this use case, drag and drop the ID and States columns from the Input area of the tMap
towards the Output area. No variable is used in this example.
• On the Output flow area (labelled filtered_states in this example), click the arrow & plus
button to activate the expression filter field.
• Drag the LabelState column from the Input area (row2) towards the expression filter field
and type in the rest of the expression in order to filter the state labels starting with the letter
M. The final expression looks like: row2.LabelState.startsWith("M")
• Click OK to validate the setting.
• Then select the second tFlowMeter component and set its properties.

tFlowMeterCatcher
• Select the check box Use input connection name as label.

• Select Relative as Mode and in the Reference connection list, select US_States as reference
to be measured against.
• Once again, no threshold is used for this use case.
• No particular setting is required in the tLogRow.
• Neither does the tFlowMeterCatcher as this component’s properties are limited to a preset
schema which includes typical log information.
• So eventually set the log output component (tFileOutputCSV).
• Select the Append check box in order to log all tFlowMeter measures.

tFlowMeterCatcher
The Run view shows the filtered state labels as defined in the Job.
In the delimited csv file, the number of rows shown in column count varies between tFlowMeter1
and tFlowMeter2 as the filtering has then been carried out. The reference column shows also this
difference.

tLogCatcher
tLogCatcher
tLogCatcher properties
Function Fetches set fields and messages from Java Exception/PerlDie, tDie and/or
tWarn and passes them on to the next component.
Purpose Operates as a log function triggered by one of the three: Java
exception/PerlDie, tDie or tWarn, to collect and transfer log data.
User Guide.
Guide.
Catch PerlDie Select this check box to trigger the tCatch function
Catch Java Exception when a PerlDie/Java Exception occurs in the Job
Catch tDie Select this check box to trigger the tCatch function
when a tDie is called in a Job
Catch tWarn Select this check box to trigger the tCatch function
when a tWarn is called in a Job
Usage This component is the start component of a secondary Job which
automatically triggers at the end of the main Job
Limitation n/a
Scenario1: warning & log on entries
In this basic scenario made of three components, a tRowGenerator creates random entries (id to be
incremented). The input hits a tWarn component which triggers the tLogCatcher subjob. This
subjob fetches the warning message as well as standard predefined information and passes them on
to the tLogRow for a quick display of the log data.

tLogCatcher
• Drop a tRowGenerator, a tWarn, a tLogCatcher and a tLogRow from the Palette, on your
design workspace
• Connect the tRowGenerator to the tWarn component.
• Connect separately the tLogCatcher to the tLogRow.
• On the tRowGenerator editor, set the random entries creation using a basic function:
• On the tWarn Component view, set your warning message, the code the priority level. In
this case, the message is “this is a warning’.
• For this scenario, we will concatenate a function to the message above, in order to collect the
first value from the input table.
• On the Basic settings view of tLogCatcher, select the tWarn check box in order for the
message from the latter to be collected by the subjob.
• Click Edit Schema to view the schema used as log output. Notice that the log is
comprehensive.

tLogCatcher
Press F6 to execute the Job. Notice that the Log produced is exhaustive.
Scenario 2: Log & kill a Job
This scenario uses a tLogCatcher and a tDie component. A tRowGenerator is connected to a

tFileOutputDelimited using a Row link. On error, the tDie triggers the catcher subjob which
displays the log data content on the Run console.
• Drop all required components from various folders of the Palette to the design workspace:
tRowGenerator, tFileOutputDelimited, tDie, tLogCatcher, tLogRow.
• On the tRowGenerator Component view, define the setting of the input entries to be
handled.
• Edit the schema and define the following columns as random input examples: id, name,
quantity, flag and creation.
• Set the Number of rows onto 0. This will constitute the error which the Die operation is
based on.
• On the Values table, define the functions to feed the input flow.

tLogCatcher
• Define the tFileOutputDelimited to hold the possible output data. The row connection from
the tRowGenerator feeds automatically the output schema. The separator is a simple
semi-colon.
• Connect this output component to the tDie using a Trigger > If connection. Double-click
on the newly created connection to define the if:
((Integer)globalMap.get("tRowGenerator_1_NB_LINE")) <=0
• Then double-click to select and define the Basic settings of the tDie component.
• Enter your Die message to be transmitted to the tLogCatcher before the actual kill-job
operation happens.
• Next to the Job but not physically connected to it, drop a tLogCatcher from the Palette to
the design workspace and connect it to a tLogRow component.
• Define the tLogCatcher Basic settings. Make sure the tDie box is selected in order to add
the Die message to the Log information transmitted to the final component.
• Press F6 to run the Job and notice that the log contains a black message and a red one.
• The black log data come from the tDie and are transmitted by the tLogCatcher. In addition
the normal Java Exception message in red displays as a Job abnormally died.

tLogRow
tLogRow
tLogRow properties
Function Displays data or results in the Run console

Purpose tLogRow helps monitoring data processed.
Basic settings Print values in table The output flow displays in table cells.
cells
Separator Enter the separator which will delimit data on the
Log display
Print component Select this check box in case several LogRow
unique name in front components are used. Allows to differentiate
of each output row outputs
Print schema Select this check box to retrieve column labels
column name in from output schema.
front of each value
Use fixed length for Select this check box to set a fixed width for the
values value display.
Usage This component can be used as intermediate step in a data flow or as a n
end object in the job flowchart.
Limitation n/a
Scenario: Delimited file content display
Related topics using a tLogRow component:

• tFileInputDelimited Scenario: Delimited file content display on page 825.
• tContextLoad Scenario: Dynamic context use in MySQL DB insert on page 1049
• tWarn, tDie, tLogCatcher Scenario1: warning & log on entries on page 1019 and Scenario
2: Log & kill a Job on page 1021

tStatCatcher
tStatCatcher
tStatCatcher Properties
Function Based on a defined sch.ema, gathers the job processing metadata at a job level
as well as at each component level.
Purpose Operates as a log function triggered by the StatsCatcher Statistics check box
of individual components, and collects and transfers this log data to the output
defined.
Moment: Processing time and date
Pid: Process ID
Father_pid: Process ID of the father Job if
applicable. If not applicable, Pid is duplicated.
Root-pid: Process ID of the root Job if applicable. If
not applicable, pid of current Job is duplicated.
Project: Project name, the Job belongs to.
Job: Name of the current Job
Context: Name of the current context
Origin: Name of the component if any
Message: Begin or End.
Usage This component is the start component of a secondary Job which triggers
automatically at the end of the main Job. The processing time is also displayed
at the end of the log.
Limitation n/a
Scenario: Displaying job stats log
This scenario describes a four-component Job, aiming at displaying on the Run console the statistics
log fetched from the file generation through the tStatCatcher component.

tStatCatcher
• Drop the required components: tRowGenerator, tFileOutputDelimited, tStatCatcher and

tLogRow from the Palette to the design workspace.
• In the Basic settings panel of tRowGenerator, define the data to be generated. For this Job,
the schema is composed of three columns: ID_Owners, Name_Customer and ID_Insurance,
generated using Perl script.
• The number of rows can be restricted to 100.

• Click on the Main tab of the Component view.
• And select the tStatCatcher Statistics check box to enable the statistics fetching operation.
• Then, define the output component’s properties. In the tFileOutputDelimited Component
view, browse to the output file or enter a name for the output file to be created. Define the
delimiters, such as semi-colon, and the encoding.
• Click on Edit schema and make sure the schema is recollected from the input schema. If
need be, click on Sync Columns.
• Then click on the Basic settings tab of the Component view, and select here as well the
tStatCatcher Statistics check box to enable the processing data gathering.

tStatCatcher
• In the secondary Job, double-click on the tStatCatcher component. Note that the Properties
are provided for information only as the schema representing the processing data to be
gathered and aggregated in statistics, is defined and read-only.
• Define then the tLogRow to set the delimiter to be displayed on the console.
• Eventually, press F6 to run the Job and display the job result.
The log shows the Begin and End information for the Job itself and for each of the component used
in the Job.

tWarn
tWarn
tWarn Properties
Function Provides a priority-rated message to the next component.

Purpose Triggers a warning often caught by the tLogCatcher component for
exhaustive log.
Basic settings Warn message Type in your warning message.
Code Define the code level.
Priority Enter the priority level as an integer.
Usage Cannot be used as a start component. If an output component is connected to
it, an input component should be preceding it.
Limitation n/a
Related scenarios
For uses cases in relation with tWarn, see tLogCatcher scenarios:

• Scenario1: warning & log on entries on page 1019
• Scenario 2: Log & kill a Job on page 1021

tWarn

Misc group components
This chapter details the main components that you can find in Misc family of the Talend Open
Studio Palette.
The Misc family gathers miscellaneous components covering needs such as the creation of sets of
dummy data rows, buffering data or loading context variables.
tAddLocationFromIP
tAddLocationFromIP
tAddLocationFromIP Properties
Component family Misc
Function tAddLocationFromIP replaces IP addresses with geographical

locations.
Purpose tAddLocationFromIP helps you to geolocate visitors through
their IP addresses. It identifies visitors’ geographical locations i.e.
country, region, city, latitude, longitude, ZIP code...etc.using an IP
address lookup database file.
and Edit number of fields to be processed and passed on to
Schema the next component. The schema is either built-in
or remotely stored in the Repository.
Guide.
Properties are stored. When selected, the fields
that follow are pre-defined using fetched data.
Database The path to the IP address lookup database file.
Filepath
Input Input column: Select the input column from
parameters which the input values are to be taken.
input value is a hostname: Check if the input
column holds hostnames.
input value is an IP address: Check if the input
column holds IP addresses.
Location type Country code: Check to replace IP with country
code.
Country name: Check to replace IP with country
name.
Usage This component is an intermediary step in the data flow allowing
to replace IP with geolocation information. It can not be a start
component as it requires an input flow. It also requires an output
component.
Limitation n/a

tAddLocationFromIP
Scenario: Identifying a real-world geographic location of an IP
The following Java scenario creates a three-component Job that associates an IP with a geographical
location. It obtains a site visitor’s geographical location based on its IP.
tFixedFlowInput, tAddLocationFromIP, and tLogRow.
• Connect the three components using Row Main links.
• In the design workspace, select tFixedFlowInput.

• Click the Component tab to define the basic settings for tFixedFlowInput.
• Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to
define the data you want to use as input. In this scenario, the schema is made of one column
that holds an IP address.
• Click OK to close the dialog box, and accept propagating the changes when prompted by the
system. The defined column displays in the Values panel of the Basic settings view.
• Click in the Value cell and set the value for the IP address.
• In the Number of rows field, enter the number of rows to be generated.

tAddLocationFromIP
• In the design workspace, select tAddLocationFromIP.

• Click the Component tab to define the basic settings for tAddLocationFromIP.
• Click the Sync columns button to synchronize the schema with the input schema set with
tFixedFlowInput.
• Browse to the GeoIP.dat file to set its path in the Database filepath field.
Make sure to download the latest version of the IP address lookup database file from
the relevant site as indicated in the Basic settings view of tAddLocationFromIp.
• In the Input parameters panel, set your input parameters as needed. In this scenario, the
input column is the ip column defined earlier that holds an IP address.
• In the Location type panel, set location type as needed. In this scenario, we want to display
the country name.
• In the design workspace, select tLogRow.
• Click the Component tab and define the basic settings for tLogRow as needed. In this
scenario, we want to display values in cells of a table.
The only row is generated to display the country name that is associated with the set IP address.

tBufferInput
tBufferInput
tBufferInput properties
Function This component retrieves bufferized data in order to process it in a

second subjob.
Purpose The tBufferInput component retrieves data bufferized via a
tBufferOutput component, for example, to process it in another subjob.
Basic settings Schema type and A schema is a row description, i.e., it defines the
In the case of tBufferInput, the column position
is more important than the column label as this
will be taken into account.
Setting a built-in schema of Talend Open
Studio User Guide.
schema and stored it in the Repository, hence can
be reused in various projects and job designs.
Usage This component is the start component of a secondary Job which is
triggered automatically at the end of the main Job.
Scenario: Retrieving bufferized data (Java)
This scenario describes a Job that retrieves bufferized data from a subjob and displays it on the
console.

tBufferInput
tFileInputDelimited and tBufferOutput.
• In the File Name field, browse to the delimited file holding the data to be bufferized.
• Click [...] next to the Schema type field to describe the structure of the file.
• Describe the Schema of the data to be passed on to the tBufferOutput component.

• Select the tBufferOutput component and set the parameters on the Basic Settings tab of the
Component view.

tBufferInput
Generally speaking, the schema is propagated from the input component and
automatically fed into the tBufferOutput schema. But you can also set part of
the schema to be bufferized if you want to.
• Drop the tBufferInput and tLogRow components from the Palette onto the design
workspace below the subjob you just created.
• Connect tFileInputDelimited and tBufferInput via a Trigger > OnSubjobOk link and
connect tBufferInput and tLogRow via a Row > Main link.
• Double-click tBufferInput to set its Basic settings in the Component view.
• In the Basic settings view, click [...] next to the Edit Schema field to describe the structure
of the file.
• Use the schema defined for the tFileInputDelimited component and click OK.
• The schema of the tBufferInput component is automatically propagated to the tLogRow.
Otherwise, double-click tLogRow to display the Component view and click Sync column.
The standard console returns the data retrieved from the buffer memory.

tBufferOutput
tBufferOutput
tBufferOutput properties
Function This component collects data in a buffer in order to access it later via
webservice for example.
Purpose This component allows a Webservice to access data. Indeed it had been
designed to be exported as Webservice in order to access data on the web
application server directly. For more information, see Exporting Jobs as
Webservice in Talend Open Studio User Guide.
In the case of the tBufferOutput, the column
position is more important than the column label
as this will be taken into account.
Studio User Guide.
Studio User Guide.
Usage This component is not startable (green background) and it requires an
output component.
Scenario1: Buffering data (Java)
This scenario describes an intentionally basic Job that bufferizes data in a child job while a parent
Job simply displays the bufferized data onto the standard output console. For an example of how to
use tBufferOutput to access output data directly on the Web application server, see Scenario 2:
Buffering output data on the webapp server on page 1039.

tBufferOutput
• Create two Jobs: a first Job (BufferFatherJob) runs the second Job and displays its content
onto the Run console. The second Job (BufferChildJob) stores the defined data into a buffer
memory.
• On the first Job, drop the following components: tRunJob and tLogRow from the Palette
• On the second Job, drop the following components: tFileInputDelimited and
tBufferOutput the same way.
Let’s set the parameters of the second Job first:
• In File Name, browse to the delimited file whose data are to be bufferized.

tBufferOutput
• Describe the Schema of the data to be passed on to the tBufferOutput component.

• Select the tBufferOutput component and set the parameters on the Basic Settings tab of the
Component view.
• Generally the schema is propagated from the input component and automatically fed into the
tBufferOutput schema. But you could also set part of the schema to be bufferized if you
want to.
• Now on the other Job (BufferFatherJob) Design, define the parameters of the tRunJob
component.
• Edit the Schema if relevant and select the column to be displayed. The schema can be
identical to the bufferized schema or different.
• You could also define context parameters to be used for this particular execution. To keep it
simple, the default context with no particular setting is used for this use case.
Press F6 to execute the parent Job. The tRunJob looks after executing the child Job and returns the
data onto the standard console:

tBufferOutput
Scenario 2: Buffering output data on the webapp server
This scenario describes a Job that is called as a Webservice and stores the output data in a buffer
directly on the server of the Web application. This scenario creates first a Webservice oriented Job
with context variables, and next exports the Job as a Webservice.
Creating a Webservice-oriented Job with context variables:
tFixedFlowInput and tBufferOutput.
• Connect tFixedFlowInput to tBufferOutput using a Row Main link.
• In the design workspace, select tFixedFlowInput.

• Click the Component tab to define the basic settings for tFixedFlowInput.
• Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema
to describe the data structure you want to create from internal variables. In this scenario, the
schema is made of three columns, now, firstname, and lastname.
• Click the plus button to add the three parameter lines and define your variables.
• Click OK to close the dialog box and accept propagating the changes when prompted by the
system. The three defined columns display in the Values panel of the Basic settings view of
tFixedFlowInput.

tBufferOutput
• Click in the Value cell of each of the first two defined columns and press Ctrl+Space to
access the global variable list.
• From the global variable list, select TalendDate.getCurrentDate() and
talendDatagenerator.getFirstName, for the now and firstname columns respectively.
For this scenario, we want to define two context variables: nb_lines and lastname. In the first we set
the number of lines to be generated, and in the second we set the last name to display in the output
list. The tFixedFlowInput component will generate the number of lines set in the context variable
with the three columns: now, firstname and lastname. For more information about how to create and
use context variables, see Defining variables on the Contexts view of the Talend Open Studio
User Guide.
To define the two context variables:
• Select tFixedFlowInput and click the Contexts tab.
• In the Variables view, click the plus button to add two parameter lines and define them.
• Click the Values as table tab and define the first parameter to set the number of lines to be
generated and the second to set the last name to be displayed.

tBufferOutput
• Click the Component tab to go back to the Basic settings view of tFixedFlowInput.
• Click in the Value cell of lastname column and press Ctrl+Space to access the global
variable list.
• From the global variable list, select context.lastname, the context variable you created for the
last name column.
Exporting your Job as a Webservice:

Before exporting your Job as a Web service, see Exporting job scripts in Talend Open Studio User
Guide for more information.
• In the Repository tree view, right-click on the above created Job and select Export Job
Scripts. The [Export Job Scripts] dialog box displays.

tBufferOutput
• Click the Browse... button to select a directory to archive your Job in.
• In the Export type panel, select the export type you want to use in the Tomcat webapp
directory (WAR in this example) and click Finish. The [Export Job Scripts] dialog box
disappears.
• Copy the War folder and paste it in a Tomcat webapp directory.
Scenario 3: Calling a Job with context variables from a browser
This scenario describes how to call the Job you created in scenario 2 from your browser with/without
modifying the values of the context variables.
Type the following URL into your browser:
http://localhost:8080//export_job/services/export_job3?method=runJob where “export_job” is the
name of the webapp directory deployed in Tomcat and “export_job3” is the name of the Job.

tBufferOutput
Click Enter to execute your Job from your browser.
The Job uses the default values of the context variables: nb_lines and lastname, that is it generates
three lines with the current date, first name and Ford as a last name.
You can modify the values of the context variables directly from your browser. To call the Job from
your browser and modify the values of the two context variables, type the following URL:
http://localhost:8080//export_job/services/export_job3?method=runJob&arg1=--context_param
%20lastname=MASSY&arg2=--context_param%20nb_lines=2.
%20 stands for a blank space in the URL language. In the first argument “arg1”, you set the value
of the context variable to display “MASSY” as last name. In the second argument “arg2”, you set
the value of the context variable to “2” to generate only two lines.
Click Enter to execute your Job from your browser.

tBufferOutput
The Job generates two lines with MASSY as last name.
Scenario 4: Calling a Job exported as Webservice in another Job
This scenario describes a Job that calls another Job exported as a Webservice using the
tWebServiceInput. This scenario will call the Job created in scenario 2.
tWebServiceInput and tLogRow.
• Connect tWebserviceInput to tLogRow using a Row Main link.
• In the design workspace, select tWebServiceInput.

• Click the Component tab to define the basic settings for tWebServiceInput.
• Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema
to describe the data structure you want to call from the exported Job. In this scenario, the
schema is made of three columns, now, firstname, and lastname.

tBufferOutput
• Click the plus button to add the three parameter lines and define your variables.Click OK to
close the dialog box.
• In the WSDL field of the Basic settings view of tWebServiceInput, enter the URL
http://localhost:8080/export_job/services/export_job3?WSDL where “export_job” is the
name od the webapp directory where the Job to call is stored and “export_job3” is the name
of the Job itself.
• In the Method name field, enter runJob.

• In the Parameters panel, Click the plus button to add two parameter lines to define your
context variables.
• Click in the first Value cell to enter the parameter to set the number of generated lines using
the following syntax: --context_param nb_line=3.
• Click in the second Value cell to enter the parameter to set the last name to display using the
following syntax: --context_param lastname=Ford.
• Select tLogRow and click the Component tab to display the component view.

tBufferOutput
• Set the Basic settings for the tLogRow component to display the output data in a tabular
mode. For more information, see tLogRow on page 1023.
The system generates three columns with the current date, first name, and last name and displays
them onto the log console in a tabular mode.

tContextDump
tContextDump
tContextDump properties
Function tContextDump makes a dump copy the values of the active job context.
Purpose tContextDump can be used to transform the current context parameters
into a flow that can then be used in a tContextLoad. This feature is very
convenient in order to define once only the context and be able to reuse
it in numerous Jobs via the tContextLoad.
Basic settings Schema type and In the tContextDump use, the schema is read
Edit Schema only and made of two columns, Key and Value,
corresponding to the parameter name and the
parameter value to be copied.
fields that will be processed and passed on to the
remote in the Repository.
User Guide.
Print operations Select this check box to display the context
parameters set in the Run view.
Usage This component creates from the current context values, a data flow,
therefore it requires to be connected to an output component.
Limitation tContextDump does not create any non-defined context variable.
Related Scenario

tContextLoad
tContextLoad
tContextLoad properties
Function tContextLoad modifies dynamically the values of the active context.

Purpose tContextLoad can be used to load a context from a flow.
This component performs also two controls. It warns when the
parameters defined in the incoming flow are not defined in the context,
and the other way around, it also warns when a context value is not
initialized in the incoming flow.
But note that this does not block the processing.
Basic settings Schema type and In tContextLoad, the schema must be made of
Edit Schema two columns, including the parameter name and
the parameter value to be loaded.
fields that will be processed and passed on to the
remote in the Repository.
User Guide.
If a variable loaded, If a variable is loaded but does not appear in the
but not in the context context, select how the notification must be
displayed. In the shape of an Error, a warning ou
an information (info).
If a variable in the If a variable appears in the context but is not
context, but not loaded, select how the notification must be
loaded displayed. In the shape of an Error, a warning ou
an information (info)
Print operations Select this check box to display the context
parameters set in the Run view.
Disable errors Select this check box to prevent the error from
displaying.
Disable warnings Select this check box to prevent the warning from
displaying.
Disable infos Select this check box to prevent the information
from displaying.

tContextLoad

component level.
Usage This component relies on the data flow to load the context values to be
used, therefore it requires a preceding input component and thus cannot
be a start component.
Limitation tContextLoad does not create any non-defined variable in the default
context.
Scenario: Dynamic context use in MySQL DB insert
This scenario is made of two subjobs. The first subjob aims at dynamically load the context
parameters, and the second subjob uses the loaded context to display the content of a DB table.
• For the first subjob, drop a tFilelist, tFileInputDelimited, tContextLoad from the Palette
• Drop tMysqlInput and a tLogRow the same way for the second subjob.
• Connect all the components together.
• Create as many delimited files as there are different contexts and store them in a specific
directory, named Contexts. In this scenario, test.txt contains the local database connection
details for testing purpose. And prod.txt holds the actual production db details.
• Each file is made of two fields, contain the parameter name and the corresponding value,
according to the context.

tContextLoad
• In the tFileList component Basic settings panel, select the directory where both context
files, test and prod, are held.
• In the tFileInputDelimited component Basic settings panel, press Ctrl+Space bar to
access the global variable list. Select tFileList_1.CURRENT_FILEPATH to loop on
the context files’ directory.
• Define the schema manually (Built-in). It contains two columns defined as: Key and Value.
• Accept the defined schema to be propagated to the next component (tContextLoad).
• For this scenario, select the Print operations check box in order for the context parameters
in use to be displayed on the Run panel.
• Then double-click to open the tMySQLInput component Basic settings.
• For each of the field values being stored in a context file, press F5 and define the user-defined
context parameter. For example: The Host field has for value parameter context.host,
as the parameter name is host in the context file. Its actual value being talend-dbms.

tContextLoad
• Then fill in the Schema information. If you stored the schema in the Repository Metadata,
then you can retrieve it by selecting Repository and the relevant entry in the list.
• In the Query field, type in the SQL query to be executed on the DB table specified. In this
case, a simple SELECT of the columns of the table, which will be displayed on the Run tab,
through the tLogRow component.
• Eventually, press F6 to run the Job.
The context parameters as well as the select values from the DB table are all displayed on the Run
view.

tFixedFlowInput
tFixedFlowInput
tFixedFlowInput properties
Function tFixedFlowInput generates as many lines and columns as you want using the
context variables.
Purpose tFixedFlowInput allows you to generate fixed flow from internal variables.
Guide.
stored it in the Repository, hence can be reused in
Studio User Guide.
Mode From the three options, select the mode that you want
to use.
Use Single Table : Enter the data that you want to
generate in the relevant value field.
Use Inline Table : Add the row(s) that you want to
generate.
Use Inline Content : SEnter the data that you want to
generate, separated by the separators that you have
already defined in the Row et Field Separator fields.
Number of rows Enter the number of lines to be generated.
Values Between inverted commas, enter the values
corresponding to the columns you defined in the
schema dialog box via the Edit schema button.
level.
Usage This component can be used as a start or intermediate component and thus
Related scenarios

• Scenario 2: Buffering output data on the webapp server on page 1039, and,

tFixedFlowInput
• Scenario: Iterating on a DB table and listing its column names on page 455.

tMsgBox
tMsgBox
tMsgBox properties
Function Opens a dialog box with an OK button requiring action from the user.
Purpose tMsgBox is a graphical break in the job execution progress.
Basic settings Title Text entered shows on the title bar of the dialog
box created.
Buttons Listbox of buttons you want to include in the
dialog box. The button combinations are
restricted and cannot be changed.
Icon Icon shows on the title bar of the dialog box.
Message Free text to display as message on the dialog box.
Text can be dynamic (for example: retrieve and
show a file name).
Usage This component can be used as intermediate step in a data flow or as a
start or end object in the job flowchart.
It can be connected to the next/previous component using either a Row
or Iterate link.
Limitation For Perl users: Make sure the relevant package is installed.
Scenario: ‘Hello world!’ type test
The following scenario creates a single-component Job, where tMsgBox is used to display the pid
(process id) in place of the traditional “Hello World!” message.
• Drop a tMsgBox component from the Palette to the design workspace.
• Define the dialog box display properties:
• ‘Title’ is the message box title, it can be any variable.

tMsgBox
• In the Message field, enter "Current date is: " between double quotation marks.
Then click CTRL+Space to display the autocompletion list and select the following system
routine, TalendDate.getCurrentDate. Put brackets around this routine.
• Switch to the Run tab to execute the Job defined.
The Message box displays the message and requires the user to click OK to go to the next
component or end the Job.
After the user clicked OK, the Run log is updated accordingly.
Related topic: Running a Job of Talend Open Studio User Guide.

tRowGenerator
tRowGenerator
tRowGenerator properties
Function tRowGenerator generates as many rows and fields as needed

using random values taken in a list.
Purpose Can be used to create an input flow in a Job for testing purpose in
particular for boundary test sets
and Edit number of fields to be processed and passed on to
Schema the next component. The schema is either built-in
or remotely stored in the Repository.
Guide.
Properties are stored. When selected, the fields
that follow are pre-defined using fetched data.
RowGenerato The editor allows you to define precisely the
r editor columns and nature of data to be generated. You
can use predefined routines or type in yourself the
function to be used to generate the data specified
Usage The tRowGenerator Editor’s ease of use allows users without any
Perl or Java knowledge to generate random data for test purpose.
Limitation n/a
The tRowGenerator Editor opens up on a separate window made of two parts:

• a Schema definition panel at the top of the window
• and a Function definition and preview panel at the bottom.
Defining the schema

First you need to define the structure of data to be generated.
• Add as many columns to your schema as needed, using the plus (+) button.
• Type in the names of the columns to be created in the Columns area and select the Key
check box if required
• Make sure you define then the nature of the data contained in the column, by selecting
the Type in the list. According to the type you select, the list of Functions offered will
differ. This information is therefore compulsory.

tRowGenerator
• Some extra information, although not required, might be useful such as Length,
Precision or Comment. You can also hide these columns, by clicking on the Columns
drop-down button next to the toolbar, and unchecking the relevant entries on the list.
• In the Function area, you can select the predefined routine/function if one of them
corresponds to your needs.You can also add to this list any routine you stored in the
Routine area of the Repository. Or you can type in the function you want to use in the
Function definition panel. Related topic: Defining the function on page 1057 of Talend
• Click Refresh to have a preview of the data generated.
• Type in a number of rows to be generated. The more rows to be generated, the longer it’ll
take to carry out the generation operation.
The functions list differs from Perl to Java.
Defining the function

You selected the three dots [...] as Function in the Schema definition panel, as you want to
customize the function parameters.
• Select the Function parameters tab
• The Parameter area displays Customized parameter as function name (read-only)
• In the Value area, type in the Perl or Java function to be used to generate the data
specified.
• Click on the Preview tab and click Preview to check out a sample of the data generated.

tRowGenerator
Scenario: Generating random java data
The following scenario creates a two-component Job made in Java, generating 50 rows structured
as follows: a randomly picked-up ID in a 1-to-3 range, a random ascii First Name and Last Name
generation and a random date taken in a defined range.
• Drop a tRowGenerator and a tLogRow component from the Palette to the design
workspace.
• Right-click on the tRowGenerator component and select Row > Main. Drag this main row
link onto the tLogRow component and release when the plug symbol displays.
• Double-click on the tRowGenerator component to open the Editor.
• Define the fields to be generated.
• The random ID column is of integer type, the First and Last names are of string type and the
Date is of date type.
• In the Function list, select the relevant function or set on the three dots for custom function.
• On the Function parameters tab, define the Values to be randomly picked up.
• First_Name and Last_Name columns are to be generated using the getAsciiRandomString

function that is predefined in the system routines. By default the length defined is 6
character-long. But you can change it if need be.
• The Date column calls the also predefined getRandomDate function. You can edit the
parameter values in the Function parameters tab.
• Set the Number of Rows to be generated to 50.
• Click OK to validate the setting.
• Double-click on the tLogRow component to view the Basic settings. The default setting is
retained for this Job.
• Press F6 to run the Job.

tRowGenerator
The 50 rows are generated following the setting defined in the tRowGenerator editor and the output
is displayed in the Run console.

tRowGenerator

Orchestration components
This chapter details the main components that you can find in Orchestration family of the Talend
The Orchestration family groups together components that help you to sequence or orchestrate tasks
or processing in your Jobs or subjobs and so on.
tFileList
tFileList
tFileList belongs to two component families: File and Orchestration. For more information on tFileList,
see tFileList on page 878.

tFlowToIterate
tFlowToIterate
tFlowToIterate Properties
Component family Orchestration
Function tFlowToIterate transforms a data flow into a list.

Purpose Allows you to transform a processable flow into non processable data.
Basic settings Use the default (key, When selected, the system uses the default value
value) in global of the global variable in the current Job.
variables
Customize key: Type in a name for the new global variable.
Press Ctrl+Space to access all available
variables either global or user-defined.
value: Click in the cell to access a list of the
columns attached to the defined global variable.
Usage You can not use this component as a start component. tFlowToIterate
Scenario: Transforming data flow to a list
The following scenario describes a Job that reads a list of files from a defined input file, iterates on
each of the files, selects input data and displays the output on the Run log console.
tFileInputDelimited (x2), tFlowToIterate, and tLogRow.
• Via a right-click on each of the components, connect the first tFileInputdelimited to
tFlowToIterate using a Row Main link, tFlowToIterate to the second tFileInputdelimited
using an Iterate link, and the second tFileInputdelimited to tLogRow using a Row Main
link.
• In the design workspace, select the first tFileInputDelimited.

• Click the Component tab to display the relevant view where you can define the basic
settings for tFileInputDelimited.
• In the Basic settings view, click the three-dot [...] button next to the File Name field to select
the path to the input file.

tFlowToIterate
The File Name field is mandatory.
The input file used in this scenario is called Customers. It is a text file that holds three other simple
text files: Name, E-mail and Address. The first text file, Name, is made of one column holding
customers’ names. The second text file, E-mail, is made of one column holding customers’ e-mail
addresses. The third text file, Address, is made of one column holding customers’ postal addresses.
• Fill in all other fields as needed. For more information, see tFileInputDelimited properties
on page 824. In this scenario, the header and the footer are not set and there is no limit for
the number of processed rows
• Click Edit schema to describe the data structure of this input file. In this scenario, the
schema is made of one column, FileName.
• In the design workspace, select tFlowToIterate.

• Click the Component tab to define the basic settings for tFlowToIterate.

tFlowToIterate
• If needed, select the Use the default (key, value) in global variables check box to use the
default value of the global variable.
• Click the plus button to add new parameter lines and define your variables.
• Click in the key cell to modify the variable name as desired.
You can press Ctrl+Space in the key cell to access the list of global and user-specific
variables.
• In the design workspace, select the second tFileInputDelimited.

• Click the Component tab to define the basic settings for the second tFileInputDelimited.
• In the File Name field, enter the file name using the variable containing the name of the file.
You must use the correct syntax according to the language used, Perl or Java. In Perl, the
relevant syntax is .$_globals{tFlowToIterate}{‘Name_of_File}.In java, the relevant syntax
is +globalMap.get(“file”).
on page 824.
• In the design workspace, select the last component, tLogRow.
• Click the Component tab to define the basic settings for tLogRow.

tFlowToIterate
• Define your settings as needed. For more information, see tLogRow properties on page
1023.
• Save your Job and press F6 to execute it
Customers’ names, customers’ e-mails, and customers’ postal addresses display on the console
preceded by the schema column name.

tInfiniteLoop
tInfiniteLoop
tInfiniteLoop Properties
Component Family Orchestration
Function tInfiniteLoop runs an infiite loop on a task.

Purpose tInfiniteLoop allows you to to execute a task or a Job automatically,
based on a loop.
Basic settings Wait at each Enter the time delay between iterations.
iteration (in
seconds)
Advanced settings tStatCatcher Select this check box to collect the log data at a
Usage tInifniteLoop is an input component and requires an Iterate link to
connect it to the following component.
Limitation n/a
Related scenario
For an example of the kind of scenario in which tInifniteLoop might be used, see Scenario: Job
execution in a loop on page 1071, regarding the tLoop component.

tIterateToFlow
tIterateToFlow
tIterateToFlow Properties
Function tIterateToFlow transforms a list into a data flow that can be processed.
Purpose Allows you to transform non processable data into a processable flow.
In the case of tIterateToFlow, the schema is to be
defined
User Guide.
Studio User Guide.
Mapping Column: Enter a name for the column to be
created
Value: Press Ctrl+Space to access all of the
available variables, be they global or
user-defined.
Advanced Settings tStatCatcher Select this check box to collect the log data at a
output component.
Scenario: Transforming a list of files as data flow
The following scenario describes a Job that iterates on a list of files, picks up the filename and
current date and transforms this into a flow, that gets displayed on the console.
• Drop the following components: tFileList, tIterateToFlow and tLogRow from the Palette

tIterateToFlow
• Connect the tFileList to the tIterateToFlow using an iterate link and connect the Job to the
tLogRow using a Row main connection.
• In the tFileList Component view, set the directory where the list of files is stored.
• In this example, the files are three simple .txt files held in one directory: Countries.
• No need to care about the case, hence clear the Case sensitive check box.
• Leave the Include Subdirectories check box unchecked.
• Then select the tIterateToFlow component et click Edit Schema to set the new schema
• Add two new columns: Filename of String type and Date of date type. Make sure you define
the correct pattern in Java.
• Notice that the newly created schema shows on the Mapping table.
• In each cell of the Value field, press Ctrl+Space bar to access the list of global and
user-specific variables.
• For the Filename column, use the global variable: tFileList_1CURRENT_FILEPATH.
It retrieves the current filepath in order to catch the name of each file, the Job iterates on.
• For the Date column, use the Talend routine: Date.GetDate (Perl) or
TalendDate.getCurrentDate() (in Java)

tIterateToFlow
• Then on the tLogRow component view, select the Print values in cells of a table check box.
The filepath displays on the Filename column and the current date displays on the Date column.

tLoop
tLoop
tLoop Properties
Function tLoop iterates on a task execution.

Purpose tLoop allows you to execute a task or a Job automatically, based on a loop
Basic settings Loop Type Select a type of loop to be carried out: either For or
While.
For: The task or Job is carried out for the defined
number of iteration
While: The task or Job is carried until the condition is
met.
For From Type in the first instance number which the loop should
start from. A start instance number of 2 with a step of 2
means the loop takes on every even number instance.
To Type in the last instance number which the loop should
finish with.
Step Type in the step the loop should be incremented of. A
step of 2 means every second instance.
While Declaration Type in an expression initiating the loop.
Condition Type in the condition that should be met for the loop to
end.
Iteration Type in the expression showing the operation to be
performed at each loop.
Usage tLoop is to be used as a start component and can only be used with an
iterate connection to the next component.
Limitation n/a
Scenario: Job execution in a loop
This scenario describes a Job composed of a parent Job and a child Job. The parent Job implements
a loop which executes n times a child Job, with a pause between each execution.

tLoop
• In the parent Job, drop a tLoop, a tRunJob and a tSleep component from the Palette to the
design workspace.
• Connect the tLoop to the tRunJob using an Iterate connection.
• Then connect the tRunJob to a tSleep component using a Row connection.
• On the child Job, drop the following components: tPOP, tFileInputMail and tLogRow the
same way.
• On the Basic settings panel of the tLoop component, type in the instance number to start
from (1), the instance number to finish with (5) and the step (1)
• On the Basic settings panel of the tRunJob component, select the child Job in the list of
stored Jobs offered. In this example: popinputmail
• Select the context if relevant. In this use case, the context is default with no variables stored.
• In the tSleep Basic settings panel, type in the time-off value in second. In this example, type
in 3 seconds in the Pause field.
• Then in the child Job, define the connection parameters to the pop server, on the Basic
settings panel.
• In the tFileInputMail Basic settings panel, select a global variable as File Name, to collect
the current file in the directory defined in the tPOP component. Press Ctrl+Space bar to
access the variable list. In this example, the variable to be used is:
$_globals{tPOP_1}{CURRENT_FILEPATH} (for Perl)
((String)globalMap.get("tPOP_1_CURRENT_FILEPATH")) (for Java)
• Define the Schema, for it to include the mail element to be processed, such as author, topic,
delivery date and number of lines.
• In the Mail Parts table, type in the corresponding Mail part for each column defined in the
schema. ex: author comes from the From part of the email file.
• Then connect the tFileInputMail to a tLogRow to check out the execution result on the Run
view.

tLoop
• Press F6 to run the Job.

tPostjob
tPostjob
tPostjob Properties
Function tPostjob starts the execution of a postjob.

Purpose tPostjob triggers a task required after the execution of a Job
Usage tPostjob is a start component and can only be used with an iterate
connection to the next component.
Limitation n/a
For more information about the tPostjob component, see Prejob and postjob parts of Talend Open
Studio User Guide.
Related scenario

tPrejob
tPrejob
tPrejob Properties
Function tPrejob starts the execution of a prejob.

Purpose tPrejob triggers a task required for the execution of a job
Usage tPrejob is a start component and can only be used with an iterate
connection to the next component.
Limitation n/a
For more information about the tPrejob component, see Prejob and postjob parts of Talend Open
Studio User Guide.
Related scenario

tReplicate
tReplicate
tReplicate Properties
Function Duplicate the incoming schema into two identical output flows.
Purpose Allows you to perform different operations on the same schema.
Studio User Guide.
Studio User Guide.
Usage This component is not startable (green background), it requires an Input
component and an output component.
Related scenario
For use case showing this component in use, see tReplaceList on page 194.

tRunJob
tRunJob
tRunJob belongs to two component families: System and Orchestration. For more information on
tRunJob, see tRunJob on page 1182.

tSleep
tSleep
tSleep Properties
Function tSleep implements a time off in a job execution.

Purpose Allows you to identify possible bottlenecks using a time break in the Job
for testing or tracking purpose. In production, it can be used for any
needed pause in the Job to feed input flow for example.
Basic settings Pause (in second) Time in second the job execution is stopped for.
Usage tSleep component is generally used as a middle component to make a
break/pause in the Job, before resuming the Job.
Limitation n/a
Related scenarios
For use cases in relation with tSleep, see tLoop Scenario: Job execution in a loop on page 1071.

tUnite
tUnite
tUnite Properties
Function Merges data from various sources, based on a common schema.

Purpose Centralize data from various and heterogeneous sources.
Studio User Guide.
Studio User Guide.
Advanced settings tStatCatcher Select this check box to collect log data at the
Usage This component is not startable and requires one or several input
components and an output component.
Scenario: Iterate on files and merge the content
The following Job iterates on a list of files then merges their content and diplays the final 2-column
content on the console.
• Drop the following components onto the design workspace: tFileList, tFileInputDelimited,
tUnite and tLogRow.
• Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect
the other component using a row main link.
• In the tFileList Basic settings view, browse to the directory, where the files to merge are
stored.

tUnite
• In the Case Sensitive field, select Yes to consider the letter case.
• The files are pretty basic and contain a list of countries and their respective score.
• Select the tFileInputDelimited component, and display this component’s Basic settings
view.
• In this use case, the input files’ connection properties are not centrally stored in the
Repository, therefore select Built-In as Property type and set every single field manually.

tUnite
• To fill in the File Name field, use the Ctrl+Space bar combination to access the variable
completion list. To process all files from the directory defined in the tFileList, select
tFileList.CURRENT_FILEPATH on the global variable list.
• Keep the default setting for the Row and Field separators as well as the other fields.
• Click the Edit Schema button and set manually the 2-column schema to reflect the input
files’ content.
• For this example, the 2 columns are Country and Points .They are both nullable.
• The Country column is of String type and the Points column is of Integer type.
• Click OK to validate the setting and accept to propagate the schema throughout the Job.
• Then select the tUnite component and display the Component view. Notice that the output
schema strictly reflects the input schema and is read-only.
• In the tLogRow Component view, select the Print values in cells of the table check box
to display properly the output values.

tUnite
The console shows the data from the various files, merged into one single table. This uniformized
output can then be aggregated to set

tWaitForFile
tWaitForFile
tWaitForFile properties
Function tWaitForFile component iterates on a given folder for file insertion or

deletion then triggers a subjob to be executed when the condition is met.
Purpose This component allows a subjob to be triggered given a condition linked
to file presence or removal.
Basic settings Wait at each Set the time interval in seconds between each
iteration (in check for the file.
seconds)
Max. iterations Number of checks for file before the jobs times
(infinite if empty) out.
Directory to scan Name of the folder to be checked for insert or
removal
File mask Mask of the file to be searched for insertion or
removal.
Include Select this check box to include the sub-folders.
subdirectories
Case sensitive Select this check box to activate case sensitivity.
Include present file Select this check box to include the file in use.
Trigger action when Select the condition to be met for the action to be
rowcount is carried out:
A file is created
A file is deleted
A file is updated
A file is created or updated or deleted.
Then Select the action to be carried out: either stop the
iterations when the condition is met (exit loop) or
continue the loop until the end of the max
iteration number (continue loop).
either built-in or remote in the Repository. .
Studio User Guide.
Studio User Guide.

tWaitForFile
Advanced Settings Wait for file to be Select this check box to delay the Job execution
released until the file is closed.
Usage This component plays the role of the start (or trigger) component of the
subjob which gets executed under the condition described. Therefore
this component requires a subjob to be connected to via an Iterate link.
Limitation n/a
Scenario: Waiting for a file to be removed
This scenario describes a Job scanning a directory and waiting for a file to be removed from this
directory, in order for a subjob to be executed. When the condition of file removal is met, then the
subjob simply displays a message box showing the file being removed.
• This use case only requires two components from the Palette: tWaitForFile and tMsgbox
• Click and place these components on the design workspace and connect them using an
Iterate link to implement the loop.
• Then select the tWaitForFile component, and on the Basic Settings view of the
Component tab, set the condition and loop properties:
• In the Time (in seconds) between iteration field, set the time in seconds you want to wait
before the next iteration starts. In this example, the directory will be scanned every
5 seconds.
• In the Max. number of iterations (infinite loop if empty) field, fill out the number of
iterations max you want to have before the whole Job is forced to end. In this example, the
directory will be scanned a maximum of 5 times.
• In the Directory to scan field, type in the path to the folder to scan.

tWaitForFile
• In the Trigger action when field, select the condition to be met, for the subjob to be
triggered. In this use case, the condition is a file is deleted (or moved) from the directory.
• In the Then field, select the action to be carried out when the condition is met before the
number of iteration defined is reached. In this use case, as soon as the condition is met, the
loop should be ended.
• Then set the subjob to be executed when the condition set is met. In this use case, the subjob
simply displays a message box.
• Select the tMsgBox component, and on the Basic Setting view of the Component tab, set
the message to be displayed.
• Fill out the Title and Message fields.
• Select the type of Buttons and the Icon
• In the Message field, you can write any type of message you want to display and use global
variables available in the auto-completion list via Ctrl+Space combination.
• For example, in Perl, the message used for this use case is: "Deleted File:
$_globals{tWaitForFile_1}{DELETED_FILE}, on Iteration :
$_globals{tWaitForFile_1}{CURRENT_ITERATION}\n"
• The equivalent Java message is: "Deleted file:
"+((String)globalMap.get("tWaitForFile_1_DELETED_FILE"))+"
on iteration
Nr:"+((Integer)globalMap.get("tWaitForFile_1_CURRENT_ITERATI
ON"))
Then execute the Job via the F6 key. While the loop is executing, remove a file from the location
defined. The message pops up and shows the defined message.

tWaitForFile

tWaitForSocket
tWaitForSocket
tWaitForSocket properties
Component Family Orchestration
Function tWaitForSocket component makes a loop on a defined port, to look for

data, and triggers a subjob when the condition is met.
Purpose This component triggers a job based on a defined condition.
Basic settings Port DB server listening port.
End of line Enter the end of line separator to be used..
separator
Send to client
Then Select the action to be carried out:
keep on listening
or
close socket
Print client/server Select this check box to display the client or
data server data.
Advanced settings tStatCatcher Select this check box to collect the log data at a
Utilisation This is an input, trigger component for the subjob executed depending on
the condition set. Hence, it needs to be connected to a subjob via an
Iterate link.
Limitation n/a
Related scenario

tWaitForSqlData
tWaitForSqlData
tWaitForSqlData properties
Function tWaitForSqlData component iterates on a given connection for

insertion or deletion of rows and triggers a subjob to be executed when
the condition is met.
Purpose This component allows a subjob to be triggered given a condition linked
to sql data presence.
Basic settings Wait at each Set the time interval in seconds between each
iteration (in check for the sql data.
seconds)
Max. iterations Number of checks for sql data before the Jobs
(infinite if empty) times out.
Use an existing A connection needs to be open to allow the loop
connection/Compon to check for sql data on the defined DB.
ent List
connection from the other level, you can
- from the available database connection compo-
nent in the level where the current component is,
select the Use or register a shared DB connec-
tion check box. For more information about this
check box, see the section for the connection
components in Databases components on page
205 according to the database you are using,
- otherwise, still in the level of the current com-
ponent, deactivate the connection components
and use Dynamic settings of the component to
specify the intended connection manually. In this
case, make sure the connection name is unique
and distinctive all over through the two Job lev-
els. For more information about Dynamic set-
tings, see your studio user guide.
Table to scan Name of the table to be checked for insert or
deletion
Trigger action when Select the condition to be met for the action to be
rowcount is carried out:
Equal to
Not Equal to
Greater than
Lower than
Greater or equal to
Lower or equal to
Value Define the value to take into account.

tWaitForSqlData
Then Select the action to be carried out: either stop the

iterations when the condition is met (exit loop) or
continue the loop until the end of the max
iteration number (continue loop).
Usage Although this component requires a Connection component to open the
DB access, it plays also the role of the start (or trigger) component of the
subjob which gets executed under the condition described. Therefore
this component requires a subjob to be connected to via an Iterate link.
Limitation n/a
Scenario: Waiting for insertion of rows in a table
This scenario describes a Job reading a DB table and waiting for data to be put in this table in order
for a subjob to be executed. When the condition of the data insertion in the table is met, then the
subjob performs a Select* on the table and simply displays the content of the inserted data onto the
standard console. This use case is presented in Perl, but there are no difference in setting if you
implement it in Java.
tMySqlConnection, tWaitForSqlData, tMysqlInput, tLogRow.
• Connect the tMysqlConnection component to the tWaitforSqlData using an
OnSubjobOK link, available on the right-click menu.
• Then connect the tWaitForSqlData component to the subjob using an Iterate link as no
actual data is transferred in this part. Indeed, simply a loop is implemented by the
tWaitForSqlData until the condition is met.
• On the subjob to be executed if the condition is met, a tMysqlInput is connected to the
standard console component, tLogRow. As the connection passes on data, use a Row main
link.
• Now, set the connection to the table to check at regular intervals. On the Basic Settings view
of the tMySqlConnection Component tab, set the DB connection properties

tWaitForSqlData
• Fill out the Host, Port, Database, Username, Password fields to open the connection to the
Database table.
• Select the relevant Encoding if needed.
• Then select the tWaitForSqlData component, and on the Basic Setting view of the
Component tab, set its properties.
• In the Wait at each iteration field, set the time in seconds you want to wait before the next
iteration starts.
• In the Max iterations field, fill out the number of iterations max you want to have before
the whole Job is forced to end.
• The tWaitForSqlData component requires a connection to be open in order to loop on the
defined number of iteration. Select the relevant connection (if several) in the Component
List combo box.
• In the Table to scan field, type in the name of the table in the DB to scan.In this example:
test_datatypes.
• In the Trigger action when rowcount is and Value fields, select the condition to be met, for
the subjob to be triggered. In this use case, the number of rows in the scanned table should
be greater or equal to 1.
• In the Then field, select the action to be carried out when the condition is met before the
number of iteration defined is reached. In this use case, as soon as the condition is met, the
loop should be ended.
• Then set the subjob to be executed when the condition set is met. In this use case, the subjob
simply selects the data from the scanned table and displays it on the console.
• Select the tMySqlInput component, and on the Basic Setting view of the Component tab,
set the connection to the table.

tWaitForSqlData
• If the connection is set in the Repository, select the relevant entry on the list. Or alternatively,
select the Use an existing connection check box and select the relevant connection
component on the list.
• In this use case, the schema corresponding to the table structure is stored in the Repository.
• Fill out the Table Name field with the table the data is extracted from, Test_datatypes.
• Then in the Query field, type in the Select statement to extract the content from the table.
• No particular setting is required in the tLogRow component for this use case.
Then before executing the Job, make sure the table to scan (test_datatypes) is empty, in order for the
condition (greater or equal to 1) to be met. Then execute the Job by pressing the F6 key on your
keyboard. Before the end of the iterating loop, feed the test_datatypes table with one or more rows
in order to meet the condition.
The Job ends when this table insert is detected during the loop, and the table content is thus displayed
on the console.

tWaitForSqlData

Processing components
This chapter details the main components that you can find in Processing family of the Talend
The Processing family gathers together components that help you to perform all types of processing
tasks on data flows, including aggregation, mapping, transformation, denormalizing, filtering and
so on.
tAggregateRow
tAggregateRow
tAggregateRow properties
Function tAggregateRow receives a flow and aggregates it based on one or more

columns. For each output line, are provided the aggregation key and the
relevant result of set operations (min, max, sum...).
Purpose Helps to provide a set of metrics based on values or calculations.
Studio User Guide.
list offered based on the schema structure you
Ex: Select Country to calculate an average of
values for each country of a list or select Country
and Region if you want to compare one country’s
regions with another country’ regions.
Input Column: Match the input column label
with your output columns, in case the output label
of the aggregation set needs to be different.
the list.
Function: Select the operator among: count, min,
max, avg, sum, first, last, list, list(objects),
count(distinct), standard deviation.

tAggregateRow
Input column: Select the input column from

which the values are taken to be aggregated.
Ignore null values: Select the check boxes
corresponding to the names of the columns for
which you want the NULL value to be ignored.
Advanced settings Delimiter(only for Enter the delimiter you want to use to separate the
list operation) different operations.
Use financial Select this check box to use a financial precision.
precision, this is the This is a max precision but consumes more
max precision for memory and slows the processing.
“sum” and “avg”
operations, checked We advise you to use the BigDecimal
option heaps more type for the output in order to obtain
memory and slower precise results.
than unchecked.
Check type overflow Checks the type of data to ensure that the job
(slower) doesn’t crash.
Check ULP (Unit in Select this check box to ensure the most precise
the Last Place), results possible for the Float and Double types.
ensure that a value
will be incremented
or decremented
correctly, only float
and double types.
(slower)
tStatCatcher Check this box to collect the log data at
output, hence is defined as an intermediary step. Usually the use of
tAggregateRow is combined with the tSortRow component
Limitation n/a
Scenario: Aggregating values and sorting data
The following scenario describes a four-component Job. As input component, a CSV file contains
countries and notation values to be sorted by best average value. This component is connected to a
tAggregateRow operator, in charge of the average calculation then to a tSortRow component for
the ascending sort. The output flow goes to the new csv file.

tAggregateRow
• From the File folder in the Palette, drop a tFileInputDelimited component to the design
workspace.
• Click on the label and rename it as Countries. Or rename it from the View tab panel
• In the Basic settings tab panel of this component, define the filepath and the delimitation
criteria. Or select the metadata file in the repository if it exists.
• Click on Edit schema... and set the columns: Countries and Points to match the file
structure. If your file description is stored in the Metadata area of the Repository, the schema
is automatically uploaded when you click on Repository in Schema type field.
• Then from the Processing folder in the Palette, drop a tAggregateRow component to the
design workspace. Rename it as Calculation.
• Connect Countries to Calculation via a right-click and select Row > Main.
• Double-click on Calculation (tAggregateRow component) to set the properties. Click on
Edit schema and define the output schema. You can add as many columns as you need to
hold the set operations results in the output flow.
• In this example, we’ll calculate the average notation value per country and we will display
the max and the min notation for each country, given that each country holds several
notations. Click OK when the schema is complete.
• To carry out the various set operations, back in the Basic settings panel, define the sets
holding the operations in the Group By area. In this example, select Country as group by
column. Note that the output column needs to be defined a key field in the schema. The first
column mentioned as output column in the Group By table is the main set of calculation. All
other output sets will be secondary by order of display.
• Select the input column which the values will be taken from.
• Then fill in the various operations to be carried out. The functions are average, min, max for
this use case. Select the input columns, where the values are taken from and select the check
boxes in the Ignore null values list as needed.

tAggregateRow
• Drop a tSortRow component from the Palette onto the design workspace. For more
information regarding this component, see tSortRow properties on page 1178.
• Connect the tAggregateRow to this new component using a row main link.
• On the Component view of the tSortRow component, define the column the sorting is based
on, the sorting type and order.
• In this case, the column to be sorted by is Country, the sort type is alphabetical and the order
is ascending.
• Drop a tFileOutputDelimited from the Palette to the design workspace and define it to set
the output flow.
• Connect the tSortRow component to this output component.

tAggregateRow
• In the Component view, enter the output filepath. Edit the schema if need be. In this case
the delimited file is of csv type. And select the Include Header check box to reuse the
schema column labels in your output flow.
• Press F6 to execute the Job. The csv file thus created contains the aggregating result.

tAggregateSortedRow
tAggregateSortedRow
tAggregateSortedRow properties
Function tAggregateSortedRow receives a sorted flow and aggregates it based

on one or more columns. For each output line, are provided the
aggregation key and the relevant result of set operations (min, max,
sum...).
Purpose Helps to provide a set of metrics based on values or calculations. As the
input flow is meant to be sorted already, the performance are hence
greatly optimized.
Studio User Guide.
list offered based on the schema structure you
Ex: Select Country to calculate an average of
values for each country of a list or select Country
and Region if you want to compare one country’s
regions with another country’ regions.
Input Column: Match the input column label
with your output columns, in case the output label
of the aggregation set needs to be different.
the list.

tAggregateSortedRow
Function: Select the operator among: count, min,

max, avg, first, last.
Input column: Select the input column from
which the values are taken to be aggregated.
Ignore null values: Select the check boxes
corresponding to the names of the columns for
which you want the NULL value to be ignored.
Limitation n/a
Related scenario
For related use case, see tAggregateRow Scenario: Aggregating values and sorting data on page
1095.

tConvertType
tConvertType
tConvertType properties
Function tConvertType allows specific conversions at run time from one Talend
java type to another.
Purpose Helps to automatically convert one Talend java type to another and
thus.avoid compiling errors.
schema.
for only the current component. Related topic:
Studio User Guide.
schema and stored it in the Repository, and thus
you can reuse it in various projects and job
flowcharts. Related topic: Setting a Repository
Auto Cast This check box is selected by default. It performs
an automatic java type conversion.
Usage This component cannot be used as a start component as it requires an
input flow to operate.
Limitation n/a
Scenario: Converting java types
This Java scenario describes a four-component Job where the tConvertType component is used to
convert Java types in three columns, and a tMap is used to adapt the schema and have as an output
the first of the three columns and the sum of the two others after conversion.
In this scenario, the input schemas for the input delimited file are stored in the
repository, you can simply drag and drop the relevant file node from Repository -
Metadata - File delimited onto the design workspace to automatically retrieve the
tFileInputDelimited component’s setting. For more information, see Dropping components
from the Metadata node in Talend Open Studio User Guide.
tConvertType, tMap, and tLogRow.

tConvertType
• In the Repository tree view, expand Metadata and from File delimited drag the relevant
node, JavaTypes in this scenario, to the design workspace.
The [Components] dialog box displays.
• From the component list, select tFileInputDelimited and click Ok.
A tFileInputComponent called Java types displays in the design workspace.
• Connect the components using Row Main links.
• In the design workspace, select tFileInputDelimited and click the Component tab to define
its basic settings.
• In the Basic settings view, set Property Type to Repository since the file details are stored
in the repository. The fields to follow are pre-defined using the fetched data.
The input file used in this scenario is called input. It is a text file that holds string, integer, and float
java types.
• In the Basic settings view, fill in all other fields as needed. For more information, see
tFileInputDelimited properties on page 824. In this scenario, the header and the footer are
not set and there is no limit for the number of processed rows.
schema is made of three columns, StringtoInteger, IntegerField, and FloatToInteger.

tConvertType
• Click Ok to close the dialog box.

• In the design workspace, select tConvertType and click the Component tab to define its
basic settings.
• Set Schema Type to Built in, and click Sync columns to automatically retrieve the columns
from the tFileInputDelimited component.
• If needed, click Edit schema to describe manually the data structure of this processing
component.
Usually, the Auto Cast check box is selected by default.
In this scenario, we want to convert a string type data into an integer type and a float type data into
an integer type.
• Click OK to close the [Schema of tConvertType] dialog box.

tConvertType
• In the design workspace, double-click tMap to open the Map editor.

The Map editor opens displaying the input metadata of the tFileInputDelimited component
two rows and define them as StringToInteger and Sum.
• In the Map editor, drag the StringToInteger row from the input table to the StringToInteger
row in the output table.
• In the Map editor, drag each of the IntegerField and the FloatToInteger rows from the input
table to the Sum row in the output table and click OK to close the Map editor.

tConvertType
The string type data is converted into an integer type and displayed in the StringToInteger column
on the console. The float type data is converted into an integer and added to the IntegerField value
to give the addition result in the Sum column on the console.

tDenormalize
tDenormalize
tDenormalize Properties
Component family Processing/Fields
Function Denormalizes the input flow based on one column.

Purpose tDenormalize helps synthesize the input flow.
component, the schema is read-only.
Studio User Guide.
To denormalize In this table, define the parameters used to
Java denormalize your columns.
uniquement Column: Select the column to denormalize.
Delimiter: Type in the separator you want to use
to denormalize your data between double quotes.
Merge same value: Select this check box to
merge identical values.
Perl only Column to Select the column from the input flow which the
denormalize normalization is based on (included in key)
Java only Group by Select one or several columns to be grouped. We
recommend to remove unused columns from the
schema before processing.
Item Separator Enter the separator which will delimit data in the
Perl denormalized flow.
uniquement
Advanced settings
Perl only Deduplicate items Removes duplicates when concatenating
denormalized values.
tStatCatcher Select this ckeck box to collect the log data at
Usage This component can be used as intermediate step in a data flow.
Limitation Note that this component may change the order of the input flow data in
Java, and in Perl, when the Deduplicate check box is selected.
Scenario 1: Denormalizing on one column in Perl
This scenario illustrates a Job denormalizing one column in a delimited file.

tDenormalize
• Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the

• Connect the components using Row main connections.
• On the tFileInputDelimited Component view, set the filepath to the file to be
denormalized.
• Define the Header, Row Separator and Field Separator parameters.

• The input file schema is made of two columns, Fathers and Children.
• In the Basic settings of tDenormalize, define the column that contains multiple values to be
grouped.
• In this use case, the column to denormalize is Children.

tDenormalize
• Set the Delimiter to separate the grouped values. Beware as only one column can be
denormalized.
• Select the Merge same value check box, if you know that some values to be grouped are
strictly identical.
All values from the column Children (set as column to denormalize) are grouped by their Fathers
column. Values are separated by a comma.
Scenario 2: Denormalizing on multiple columns
This scenario illustrates a Job denormalizing two columns from a delimited file.
• Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the

• Connect all components using a Row main connection.
• On the tFileInputDelimited Basic settings panel, set the filepath to the file to be
denormalized.

tDenormalize
• Define the Row and Field separators, the Header and other information if required.
• The file schema is made of four columns including: Name, FirstName, HomeTown,
WorkTown.
• In the tDenormalize component Basic settings, select the columns that contain the
repetition. These are the column which are meant to occur multiple times in the document.
In this use case, FirstName, HomeCity and WorkCity are the columns against which the
denormalization is performed.
• Add as many line to the table as you need using the plus button. Then select the relevant
columns in the drop-down list.

tDenormalize
• In the Delimiter column, define the separator between double quotes, to split concanated
values. For FirstName column, type in “#”, for HomeCity, type in “§”, ans for WorkCity, type
in “¤”.
• The result shows the denormalized values concatenated using a comma.

• Back to the tDenormalize components Basic settings, in the To denormalize table, select
the Merge same value check box to remove the duplicate occurrences.
• Save your Job again and press F6 to execute it.
This time, the console shows the results with no duplicate instances.

tDenormalizeSortedRow
tDenormalizeSortedRow properties
Function tDenormalizeSortedRow combines in a group all input sorted rows.

Distinct values of the denormalized sorted row are joined with item
separators.
Purpose tDenormalizeSortedRow helps synthesizing sorted input flow to save
memory.
the previous component in the Job.
locally for the relevant component. Related topic:
Studio User Guide.
Input rows count Enter the number of input rows.
To denormalize Enter the name of the column to denormalize.
Usage This component handles flows of data therefore it requires input and
output components.
Limitation n/a
Scenario: Regrouping sorted rows
This Java scenario describes a four-component Job. It aims at reading a given delimited file row by
row, sorting input data by sort type and order, denormalizing all input sorted rows and displaying
the output on the Run log console.
tFileInputDelimited, tSortRow, tDenormalizeSortedRow, and tLogRow.
• Connect the four components using Row Main links.


• Click the Component tab to define the basic settings for tFileInputDelimited.

• Fill in a path to the processed file in the File Name field. The name_list file used in this
example holds two columns, id and first name.
• If needed, define row and field separators, header and footer, and the number of processed
rows.
• Set Schema to Built in and click the three-dot button next to Edit Schema to define the data
to pass on to the next component. The schema in this example consists of two columns, id
and name.

• In the design workspace, select tSortRow.

• Click the Component tab to define the basic settings for tSortRow.
• Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the
tFileInputDelimited component.
• In the Criteria panel, use the plus button to add a line and set the sorting parameters for the
schema column to be processed. In this example we want to sort the id columns in ascending
order.
• In the design workspace, select tDenormalizeSortedRow.
• Click the Component tab to define the basic settings for tDenormalizeSortedRow.

• Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the
tSortRow component.
• In the Input rows countfield, enter the number of the input rows to be processed or press
Ctrl+Space to access the context variable list and select the variable:
tDenormalizeSortedRow_1.NB_LINE.
• In the To denormalize panel, use the plus button to add a line and set the parameters to the
column to be denormalize. In this example we want to denormalize the name column.
settings. For more information about tLogRow, see tLogRow on page 1023.
The result displayed on the console shows how the name column was denormalize.

tEmptyToNull
tEmptyToNull
tEmptyToNull properties
Function tEmptyToNull transforms empty fields in a file or a table to NULL

fields in a database table.
Purpose tEmptyToNull allows to replace empty fields by undefined fields that
give the NULL value (unknown value) in the output component;
Basic settings This component does not need any configuration. It executes
automatically.
component level.
Usage This component is generally used as an intermediate step in a data flow.
It needs then an input and output components. The output component
should be a database output component.
Limitation n/a
Scenario: Replacing empty fields by NULL fields (fields of unknown

value)
This Perl scenario describes a three-component Job that makes it possible to replace input fields
without character strings by undefined fields in order to generate Null values (unknown values) in
the output fields.
tFileInputDelimited, tEmptyToNull and tMysqlOutput.

view where you can define the component properties.

tEmptyToNull
• From the Property Type list, select:

-Repository if you have already stored the metadata of your input file in the Repository, the
fields that follow are filled in with the stored information automatically, or
-select Built-In and fill in the fields that follow manually.
For this example, we use the Built-In mode.
• Click the three-dot button next to the File Name field and browse to the input file. In this
example, our source file is name_list and it holds four columns: id, first name, last name and
login.
• In the Basic settings view of tFileInputDelimited, define in the corresponding fields the
row and field separators used in the source file.
In this example, set Header to 1 since the first row that holds columns’ names is to be
ignored. Footer and Limit for the number of processed rows are not set.
• In the Schema field, set schema to Built in then click the three-dot button next to the Edit
Schema field to define the data to be passed to the following component. In this example,
the source schema consists of four columns: id, first_name, last_name and login.

tEmptyToNull
• In the design workspace, double-click tMysqlOutput to open its Basic Settings view where
you can define the component properties.
• Click Sync columns to retrieve the schema of the preceding component.
You can click the three-dot button next to Edit schema to check the retrieved schema.

-Repository if you have already stored the metadata of your database connection in the
Repository, the fields that follow are filled in with the stored information automatically, or
-select Built-In and fill in the connection information manually.
For more information about tMysqlOutput properties, see tMysqlOutput on page 472.
• In the Table field, enter the name of the table that will hold the data extracted from the source
delimited file.
• From the Action on table list, select the operation you want to carry out on the defined table.
In this example, we select Create table to create the defined table.

tEmptyToNull
• From the Action on data list, select the operation you want to carry out on the data. In this
example, we select Insert.
Through your data explorer, you can check that the output database table namelist is created with
the defined columns and that the empty fields in the source file have been replaced by the NULL
values (unknown values).

tExternalSortRow
tExternalSortRow
tExternalSortRow properties
Function Uses an external sort application to sort input data based on one or
several columns, by sort type and order
Purpose Helps create metrics and classification table.
Studio User Guide.
Component view of Talend Open Studio.
User Guide.
fields.
External command Enter the path to the external file containing the
“sort” path sorting algorithm to use.
Criteria Click the plus button to add as many lines as
required for the sort to be complete. By default
the first column defined in your schema is
selected.
Schema column: Select the column label from
your schema, which the sort will be based on.
Note that the order is essential as it determines the
sorting priority.
Sort type: Numerical and Alphabetical order are
proposed. More sorting types to come.
Order: Ascending or descending order.

tExternalSortRow
Advanced settings Maximum memory Type in the size of physical memory you want to
allocate to sort processing.
Temporary directory Specify the temporary directory to process the
sorting command.
Set temporary input Select the check box to activate the field in which
file directory you can specify the directory to handle your
temporary input file.
Add a dummy EOF Select this check box when using the
line tAggregateSortedRow component.
component level.
Limitation n/a
Related scenario
For related use case, see tSortRow Scenario: Sorting entries on page 1179.

tExtractDelimitedFields
tExtractDelimitedFields properties
Function tExtractDelimitedFields generates multiple columns from a given

column in a delimited file.
Purpose tExtractDelimitedFields helps to extract ‘fields’ from within a string to
write them elsewhere for example.
Basic settings Field to split Select from the list the field to split.
Field separator Set field separator.
Since this component uses regex to split a
filed and the regex syntax uses special char-
acters as operators, make sure to precede the
regex operator you use as a field separator by a
double backslash. For example, you have to use
"\\|" instead of "|".
retrieve the rows on error via a Row > Reject
link.
Click Edit schema to make changes to the
the previous component connected to
tExtractDelimitedFields.
locally for the component. Related topic: Setting
Guide.
use it in various projects and job flowcharts.
Advanced settings Advanced separator Select this check box to modify the separators
(for number) used for numbers.
Trim column Select this check box to remove leading and
trailing whitespace from all columns.

Check each row Select this check box to synchronize every row
structure against against the input schema.
schema
tStatCatcher Select this check box to gather the processing
component level.
output components. It allows you to extract data from a delimited field,
using a Row > Main link, and enables you to create a reject flow filtering
data which type does not match the defined type.
Limitation n/a
Scenario: Extracting fields from a comma-delimited file
This Java scenario describes a three-component Job where the tExtractdelimitedFields component
is used to extract two columns from a comma-delimited file.
tFileInputDelimited, tExtractDelimitedFields, and tLogRow.
• Via a right-click on each of the three components, connect them using Row Main links.

• In the Basic settings view, set Property Type to Built-In.
• Click the three-dot [...] button next to the File Name field to select the path to the input file.

The input file used in this scenario is called test5. It is a text file that holds comma-delimited data.
• In the Basic settings view, fill in all other fields as needed. For more information, see
tFileInputDelimited properties on page 824. In this scenario, the header and the footer are
not set and there is no limit for the number of processed rows
schema is made of one column, name.
• In the design workspace, select tExtractDelimitedFields.

• Click the Component tab to define the basic settings for tExtractDelimitedFields.

• From the Field to split list, select the column to split, name in this scenario.
• In the Field separator field, enter the corresponding separator.
• Click Edit schema to describe the data structure of this processing component.
• In the output panel of the [Schema of tExtractRegexFields] dialog box, click the plus
button to add two columns for the output schema, firstname and lastname.
In this scenario, we want to split the name column into two columns in the output flow, firstname
and lastname.
• Click OK to close the [Schema of tExtractDelimitedFields] dialog box.

First names and last names are extracted and displayed in the corresponding defined columns on the
console.

tExtractPositionalFields
tExtractPositionalFields properties
Function tExtractPositionalFields generates multiple columns from one column

using positional fields.
Purpose tExtractPositionalFields allows to use a positional pattern to extract
data from a formatted string.
Basic settings Field Select from the list the field to extract from.
Customize Select this check box to customize the data format
of the positional file and define the table columns:
Column: Select the column you want to
customize.
Padding char: Type in between inverted
commas the padding character used, in order for
it to be removed from the field. A space by
default.
Alignment: Select the appropriate alignment
parameter.
Pattern Enter the pattern to use as basis for the extraction.
A pattern is length values separated by commas,
interpreted as a string between quotes. Make sure
the values entered in this fields are consistent
with the schema defined.
link.
tPositionalFields.
locally for the component. Related topic: Setting
Guide.


Advanced settings Advanced separator Select this check box to modify the separators
(for number) used for numbers.
Trim Column Select this check box to remove leading and
trailing whitespace from all columns.
schema
component level.
Limitation n/a
Related scenario
For related use case, see tExtractRegexFields on page 1128.

tExtractRegexFields
tExtractRegexFields
tExtractRegexFields properties
Function tExtractRegexFields generates multiple columns from a given column

using regex matching.
Purpose tExtractRegexFields allows to use regular expressions to extract data
from a formatted string.
Basic settings Field to split Select on the list the field (column) to split.
Regex Enter a regular expression according to the
programming language you are using.
Click EditSchema to make changes to the
tExtractRegexFields.
for the component. Related topic: Setting a
Guide.
Repository: You have already created and stored
the schema in the Repository. You can reuse it in
Advanced settings Die on error This check box is selected by default. Clear the
link.
schema
component level.

tExtractRegexFields
Limitation n/a
Scenario: Extracting name, domain and TLD from e-mail addresses
This Java scenario describes a three-component Job where tExtractRegexFields is used to specify
a regular expression that corresponds to one column in the input data, email. The
tExtractRegexFields component is used to perform the actual regular expression matching. This
regular expression includes field identifiers for user name, domain name and Top-Level Domain
name portions in each e-mail address. If the given e-mail address is valid, the name, domain and
TLD are extracted and displayed on the console in three separate columns. Data in the other two
input columns, id and age is extracted and routed to destination as well.
tFileInputDelimited, tExtractRegexFields, and tLogRow.

• In the Basic settings view, set Property Type to Built-In.
• Click the three-dot [...] button next to the File Name field to select the path to the input file.
The input file used in this scenario is called test4. It is a text file that holds three columns: id, email,
and age.

tExtractRegexFields
on page 824. In this scenario, the header and the footer are not set and there is no limit for
the number of processed rows
schema is made of the three columns, id, email and age.
• In the design workspace, select tExtractRegexFields.
• Click the Component tab to define the basic settings for tExtractRegexFields.
• From the Field to split list, select the column to split, email in this scenario.
• In the Regex panel, enter the regular expression you want to use to perform data matching,
java regular expression in this scenario.
• Click Edit schema to describe the data structure of this processing component.
• In the output panel of the [Schema of tExtractRegexFields] dialog box, click the plus
button to add five columns for the output schema.
In this scenario, we want to split the input email column into three columns in the output flow, name,
domain, and tld. The two other input columns will be extracted as they are.
• Click OK to close the [Schema of tExtractRegexFields] dialog box.

tExtractRegexFields
The tExtractRegexFields component matches all given e-mail addresses with the defined regular
expression and extracts the name, domain, and TLD names and displays them on the console in three
separate columns. The two other columns, id and age, are extracted as they are.

tExtractXMLFields
tExtractXMLFields
tExtractXMLFields belongs to two component families: Processing and XML. For more information
on tExtractXMLFields, see tExtractXMLField on page 1234.

tFilterColumns
tFilterColumns
tFilterColumns Properties
Function Makes specified changes to the schema defined, based on column name
mapping.
Purpose Helps homogenizing schemas either on the columns order or by
removing unwanted columns or adding new columns.
Studio User Guide.
Studio User Guide.
output component.
Related Scenario
For more info regarding the tFilterColumns component in use, see tReplace Scenario: multiple
replacements and column filtering on page 1171

tFilterRow
tFilterRow
tFilterRow Properties
Function tFilterRow filters input rows by setting conditions on the selected

columns.
Purpose tFilterRow helps parametrizing filters on the source data.
The schema is read-only.

Studio User Guide.
Studio User Guide.
Logical operator In the case you want to combine simple filtering
used to combine and advanced mode, select the operator to
conditions combine both modes.
Conditions Click the plus button to add as many conditions as
needed. The conditions are performed one after
the other for each row.
Input column: Select the column of the schema
the function is to be operated on
Function: Select the function on the list
Operator: Select the operator to bind the input
column with the value
Value: Type in the filtered value, between quotes
if need be.
Use advanced mode Select this check box when the operation you
want to perform cannot be carried out through the
standard functions offered. In the text field, type
in the regular expression as required.
Statistics metadata at the job level as well as at each
component level.
output component.

tFilterRow
Scenario: Filtering and searching a list of names
The following scenario filters a list of first names based on the name gender. Then using a regular
expression, the first names starting with ‘rom’ are listed.
• Drop tFileInputDelimited, tFilterRow and tLogRow from the Palette onto the design
workspace.
• Connect the tFileInputDelimited to the tFilterRow, using a Row > Main link. Then,
connect the tFilterRow to the tLogRow, using a Row > Filter link.
• Double-click tFileInputDelimited to display its Basic settings view and define its
properties.
• Click the [...] button next to the File Name field to define the path to your input file.
• Set the row and field separators in the corresponding fields. The row separator is a carriage
return and the field separator is a tabulation.
• From the Property Type list, select Built-in. The properties and schema are Built-in for this
Job. This means, the schema is not stored in the Repository.
• Click the three-dot button next to Edit schema to define the schema for the input file. In this
example, the schema is made of the following four columns: firstname, gender, language
and frequency. In the Type column, select String for the four rows.

tFilterRow
• Click OK to validate and close the editor. A dialog box opens and asks you if you want to
propagate the schema. Click Yes.
• Click the Advanced settings tab to display the corresponding view then from the Encoding
type list, select the type according to your file.
• Double-click tFilterRows to display its Basic settings view and define its properties.
• In the Conditions table, fill in the filtering parameters based on the gender column.
• In InputColumn, select gender, in Function, select Match, in Operator, select ==.
• In the Value column, type in m between double quotes to filter only the male names.
• Then to implement the search on first names starting with the rom syllable, select the Use
advanced mode check box and type in the following regular expression that includes the
name of the column to be searched: input_row.firstname.matches("^rom.*")
• To combine both conditions (simple and advanced), select And as logical operator for this
example.
• The tLogRow component does not require any particular setting for this example.
Only the male names starting with the rom syllable are listed on the console.

tMap
tMap
tMap properties
Function tMap is an advanced component, which integrates itself as plugin

to Talend Open Studio.
Purpose tMap transforms and routes data from single or multiple sources
to single or multiple destinations.
Basic settings Preview The preview is an instant shot of the Mapper data.
It becomes available when Mapper properties
have been filled in with data. The preview
synchronization takes effect only after saving
changes.
Mapping links Auto: the default setting is curves links
display as Curves: the mapping display as curves
Lines: the mapping displays as straight lines.
This last option allows to slightly enhance
performance.
Map editor It allows you to define the tMap routing and
transformation properties.
If you do not want to handle execution er-
rors, you can select the Die on error
check box (selected by default), at the
top right of the editor. It will kill the Job if there
is an error.
Usage Possible uses are from a simple reorganization of fields to the most
complex jobs of data multiplexing or demultiplexing
transformation, concatenation, inversion, filtering and more...
Limitation The use of tMap supposes minimum Perl or Java knowledge in
order to fully exploit its functionalities.
This component is a junction step, and for this reason cannot be a
start nor end component in the Job
For further information, see Mapping data flows in Talend Open Studio User Guide.
Scenario 1: Mapping with filter and simple explicit join
The Java Job described below aims at reading data from a csv file stored in the Repository, looking
up at a reference file also stored remotely, then extracting data from these two files based on defined
filters to an output file or a reject file.
• Click on File in the Palette of components, select tFileInputCSV and drop it on the design
area. Rename the Label to Cars, either by double-clicking on the label in the design
workspace or via the View tab of the Component view.

tMap
• Repeat this operation, and rename this second input component: Owners.
• Click on Processing in the Palette of components, select tMap and drop it on the design
area.
• Connect the two Input components to the mapping component and customize the row
connection labels.
• The Cars and Owners delimited files metadata are defined in the Metadata area of the
repository. You can hence use their Repository reference in the Component view.
• Double-click on Cars, to set the Component view.
• Select Repository to retrieve Property type as well as Schema type. The rest of the fields
gets automatically filled in appropriately when you select the relevant Metadata entry on the
list.
• Double-click on the Owners component and repeat the setting operation. Select the
corresponding Metadata entry.
For further information regarding Metadata creation in the Repository, see Defining Metadata items
in Talend Open Studio User Guide.
• Then double-click on the tMap component to open the Mapper. Notice that the Input area
is already filled with the Input component metadata tables and that the top table is the Main
flow table.
• Notice also that the respective row connection labels display on the top bar of the tables.
• Create a Join between the two tables on the ID_Owner field by simply dragging the top table
ID_Owner field to the lookup input table.

tMap
• Define this link as an inner join by selecting the corresponding check box.
• Click on the [+] button on the Output area of the Mapper to add three Output tables
• Drag and drop Input content to fill in the first output schema. For more information regarding
data mapping, see Mapping data flows in Talend Open Studio User Guide.
• Click on the plus arrow button to add a filter row. In the Insured table, will be gathered cars
and owners data which include an Insurance ID.

tMap
• Therefore drag the ID_Insurance field to the Filter condition area and enter the formula used
meaning ‘not undefined’: owners_data.ID_Insurance = null.
• The Reject_NoInsur table is a standard reject output flow containing all data that do not
satisfy the required filter condition. Click the orange arrow to set the table as Reject Output.
• The third and last table gathers the schema entries whose Inner Join could not be established.
One Owners_ID from the Car database does not match any Owner_ID from the Owner file.
• Click the violet arrow button to set the last table as the Inner Join Reject output flow.
• Click OK to validate and come back to the design area.
• Add three tFileOutputDelimited components to the design workspace and right-click on
the tMap to connect the Mapper with all three output components, using the relevant Row
connection.
• Relabel the three output components accordingly.

tMap
• Then double-click on each of them, one after the other, in order to define their respective
output filepath. If you want a new file to be created, browse to the destination output folder,
and type in a file name including the extension.
• Select the Include header check box to reuse the column labels from the schema as header
row in the output file.
• Run the Job using the F6 keystroke or via the Run panel.
• Output files have been created if need be.

tMap
Scenario 2: Mapping with Inner join rejection
This scenario, based on scenario 1, adds one input file containing Resellers details and extra fields
in the main Output table. Two filters on Inner Joins are added to gather specific rejections.
• Click on File in the Palette of Components, and drop a tFileInputCSV component to the
design workspace.
• Connect it to the Mapper and put a label on the component and the connection.
• Double-click on the Resellers component, to define the Reseller input properties.

• Browse to the Resellers.csv file. Edit the schema and add the columns as needed to match
the file structure.

tMap
• You could also create a metadata entry for this file description and select Repository as
properties and schema type. For further information, see Setting up a File Delimited schema
in Talend Open Studio User Guide.
• Double-click on the tMap component and notice that the schema is added on the Input area.
• Create a join between the main input flow and the resellers input. Select the Inner Join
check box to define that an Inner Join Reject output is to be created.
• Drag the fields from the Resellers table to the main Output table.

tMap
When two inner joins are defined, you either need to define two different inner join
reject tables to differentiate both rejections or if there is only one Inner Join reject
output, both Inner Join rejections will be stored in the same output.
• On the output area, click on the [+] button to add a new output table.
• Give a name to this new Output connection: Reject_ResellerID
• Click the Inner Join reject button to define this new output table as Inner Join Reject output.
• Drag & drop two fields from the main input flow (Cars) to the new reject output table. In this
case, if the Inner Join cannot be established, these data (ID_Cars & ID_resellers) will be
gathered in the output file and will help identify the bottleneck.
• Now apply filters on the two Inner Join reject outputs, in order for to distinguish both types
of rejection.

tMap
• In the first Inner Join output table (Reject_OwnerID), click the plus arrow button to add a
filter line and fill it in with the following formula to gather only OwnerID-related rejection:
Owners_data.ID_Owner==null
• In the second Inner Join output table (Reject_ResellerID), repeat the same operation using
the following formula: Resellers_data.ID_Reseller==null
• Click OK to validate and close the Mapper editor.

• Right-click on the tMap component, click on Row and select Reject_ResellerID in the list.
• Connect the main row from the Mapper to the Reseller Inner Rejection output component
• For this scenario, remove from the Resellers.csv the rows corresponding to Reseller ID 5 and
8.
• Then run the Job by pressing F6 or from the Run panel.

tMap
• The four output files are all created in the defined folder (Outputs).
• Notice in the Inner Join reject output file, NoResellerID.csv, that the ID_Owners field values
matching the Reseller ID 5 and 8 were rejected from the cars file to this file.
Scenario 3: Cascading join mapping
As third advanced use scenario, based on the scenario 2, add a new Input table containing Insurance
details for example.
Set up an Inner Join between two lookup input tables (Owners and Insurance) in the Mapper to
create a cascade lookup and hence retrieve Insurance details via the Owners table data.

tMap
Scenario 4: Advanced mapping with filters, explicit joins and Inner

join rejection
This scenario introduces a Job (in Java) tMap which allows to find the reseller’s customer leads who
are owners of a defined make, and have between 2 and 6 children (inclusive), for upsale purpose for
example.
• Drag and drop the following components from the Palette: tFileInputDelimited (x3),
tMap, tFileOutputDelimited (x2)
• Connect the input flow components to the tMap using a Main row connection. Pay attention
to the file you connect first as it will automatically be set as Main flow. And all other
connections will thus become Lookup flows.
• Define the Basic settings of each of the Input components. For example, define the Resellers
file path used as Main flow in the Job.
• Define the delimited file to be used, the Row and the Field Separator, the Header and Footer
rows if any.
• Edit the Schema if it hasn’t been stored in the Repository. You will retrieve this schema in
the Main table at the top of the Input area of the mapper.

tMap
• Carry out these previous steps for the other Input components: Cars and Owners. These two
Lookup flows will fill in secondary (lookup) tables in the Input area of the Mapper.
• Then double-click on the tMap component to launch the Mapper and define the mapping
and filters.
• First set the explicit joins between the Main flow and the Lookup flows.
• Simply drag & drop the ID_Resellers column towards the corresponding column and this
way fill in the Expression key field of the Lookup table.
• The explicit join displays in color along with a hash key.

• Then in the Expr. Key of the Make column, type in (in Java) the filter. In this use case, simply
type in “BMW” as the search is focused on the Owners of this particular Make.

tMap
• Implement a cascading join between the two lookup tables Cars and Owners, in order to
retrieve owners information regarding the number of children they have.
• Simply drag and drop the ID_Owners column from the Cars table towards the Expr. Key
field of the id_owner column from the Owners table.
• Click the Filter button next to the Inner Join button to display the Filter expression area.
• Type in the Filter statement to narrow down the number of rows loaded in the Lookup flows.
In this use case, the statement reads: Owners.Children_Nr > 1 &&
Owners.Children_Nr < 6
• Then, as you want to reject the null values into a separate table and exclude them from the
standard output, select the Inner Join check box for each of the filtered Lookup tables.
• In the Inner join, you can then choose to include only a Unique match, the First match or
All Matches. In this use case, the All matches option is selected. Thus if several matches
are found in the Inner Join, i.e. rows matching the explicit join as well as the filter, all of them
will be added to the output flow (either in Rejection or the regular output).
• Then on the Output area of the Mapper, add two tables, one for the full matches and one
for the rejections.
• Click the plus button to add the tables and give a name to the outputs.

tMap
The Unique match option functions as a Last match. The First match and All
matches options function as named.
• Drag and drop data from the Main and Lookup tables in the Input area, towards the
respective output tables, following the type of information you want to fetch.
• In the rejection table used to direct the non-matches from the external join or the filter, click
the Inner Join Reject button (violet arrow) to activate it.
• On the design space, right-click on the tMap and pull the respective output link to the
relevant components.
• Define the Basic settings of the Output components.
• Define the filepath, the expected Row and Field separator. And for this use case, select the
Include Header check box.
• The Schema should be automatically propagated from the Mapper.

tMap
• Save you Job, then go to the Run tab and select the Statistics check box to follow the
processing thread.
The statistics show that several matches were found and therefore the sum of the output rows (Main
+ rejected) exceeds the Main flow input rows.
Scenario 5: Advanced mapping with filters and a check of all rows
This scenario is a modified version of the preceding scenario. It describes a Job that applies filters
and then checks each row of loaded lookup rows.
• Take the same Job as in Scenario 4: Advanced mapping with filters, explicit joins and Inner
join rejection on page 1147.
• No changes are required in the Input delimited files.

tMap
• Launch the Mapper to change the mapping and the filters.
• Remove all explicit joins between the Main table and the Cars Lookup table.
• Notice that the All Matches setting changes automatically to All Rows. In fact, as no explicit
join is declared (no hash keys), all lookup rows need to be loaded and checked against all
main flow rows.
• Remove the Expr. key filter (“BMW”) from the Cars table.
• And click the Filters button to display the Filter area. Then type in the new filter to narrow
down the search to BMW or Mercedes car makes. The statement reads as follows:
Cars.Make.equals("BMW") || Cars.Make.equals("Mercedes")
• The filter on the Owners Lookup table doesn’t change from the previous scenario.
• Define new file paths for the respective outputs.
• Save the Job and enable the Statistics on the Run tab before executing the Job.

tMap
The Statistics show that a cartesian product has been carried out between the Main flow rows with
the filtered Lookup rows.
The content of the main output flow shows that the filtered rows have correctly been passed on.
Whereas, the Reject result clearly shows the rows that didn’t match one of the filter.

tMap
Scenario 6: Advanced mapping with lookup reload at each row (java)
The following scenario describes a Job that retrieves people details from a lookup database, based
on a join on the age. The main flow source data is read from a MySQL database table called
people_age that contains people details such as numeric id, alphanumeric first name and last name
and numeric age. The people age is either 40 or 60. The number of records in this table is
intentionally restricted.
The reference or lookup information is also stored in a MySQL database table called
large_data_volume. This lookup table contains a number of records including the city where people
from the main flow have been to. For the sake of clarity, the number of records is restricted but, in
a normal use, the usefulness of the feature described in the example below is more obvious for very
large reference data volume.
To optimize performance, a database connection component is used in the beginning of the Job to
open the connection to the lookup database table in order not to do that every time we want to load
a row from the lookup table.
An Expression Filter is applied to this lookup source flow, in order to select only data from people
whose age is equal to 60 or 40. This way only the relevant rows from the lookup database table are
loaded for each row from the main flow.
Therefore this Job shows how, from a limited number of main flow rows, the lookup join can be
optimized to load only results matching the expression key.
Generally speaking, as the lookup loading is performed for each main flow row, this
option is mainly interesting when a limited number of rows is processed in the main
flow while a large number of reference rows are to be looked up to.
The join is solved on the age field. Then, using the relevant loading option in the tMap component
editor, the lookup database information is loaded for each main flow incoming row.
For this Job, the metadata has been prepared for the source and connection components. For more
information on how to set up the DB connection schema metadata, see the relevant section in the
This Java Job is formed with five components, four database components and a mapping
component.

tMap
• Drop the DB Connection under the Metadata node of the Repository to the design
workspace. In this example, the source table is called people_age.
• Select tMySQLInput in the list that pops up when dropping the component.
• Drop the lookup DB connection table from the Metadata node to the design workspace
selecting tMysqlInput from the list that pops up. In this Job, the lookup is called
large_data_volume.
• The same way, drop the DB connection from the Metadata node to the design workspace
selecting tMysqlConnection from the list that pops up. This component creates a permanent
connection to the lookup database table in order not to do that every time we want to load a
row from the lookup table.
• Then pick the tMap connector from the Processing family and the tMySQLOutput
connector from the Database family in the Palette to the right hand side of the editor.
• Now connect all the components together. To do so, right-click on the tMySQLInput
component corresponding to the people table and drag the link towards tMap.
• Release the link over the tMap component, the main row flow is automatically set up.
• Rename the Main row link to people, to identify more easily the main flow data.
• Perform the same operation to connect the lookup table (large_data_volume) to the tMap
component and the tMap to the tMySQLOutput component.

tMap
• A dialog box prompts for a name to the output link. In this example, the output flow is
named: people_mixandmatch
• Rename also the lookup row connection link to large_volume, to help identify the reference
data flow.
• Connect tMysqlConnection to tMysqlInput using the trigger link OnSubjobOk.
• Then double-click the tMap component to open the graphical mapping editor.
• The Output table (that was created automatically when you linked the tMap to the
tMySQLOutput will be formed by the matching rows from the lookup flow
(large_data_volume) and the main flow (people_age).
• Select the main flow rows that are to be passed on to the output and drag them over to paste
them in the Output table (to the right hand side of the mapping editor).
• In this example, the selection from the main flow include the following fields: id, first_name,
last_Name and age.
• From the lookup table, the following column is selected: city.
• Drop the selected columns from the input tables (people and large_volume) to the output
table.
• Now set up the join between the main and lookup flows.
• Select the age column of the main flow table (on top) and drag it towards the age column of
the lookup flow table (large_volume in this example).
• A key icon appears next to the linked expression on the lookup table. The join is now
established.

tMap
• Now select the Reload at each row option in order for the lookup to be reloaded for each
row being processed.
• Click the green arrow to display the pop-up menu and select the relevant option.
• In this use case you also need to select the All matches option in the Lookup table, in order
to gather all instances of age matches in the output flow.
• Now implement the filtering, based on the age column, in the Lookup table. The
GlobalMapKey field is automatically created when you selected the Reload at each row
option. Indeed you can use this expression to dynamically filter the reference data in order
to load only the relevant information when joining with the main flow.
As mentioned in the introduction of the scenario, the main flow data contains only people whose age
is either 40 or 60. To avoid the pain of loading all lookup rows, including ages that are different from
40 and 60, you can use the main flow age as global variable to feed the lookup filtering.
• Drop the Age column from the main flow table to the Expr. field of the lookup table.

tMap
• Then in the globalMap Key field, put in the variable name, using the expression. In this
example, it reads: “people.Age”
• Click OK to save the mapping setting and go back to the design workspace.
• To finalize the implementation of the dynamic filtering of the lookup flow, you need now to
add a WHERE clause in the query of the database input.
• At the end of the Query field, following the Select statement, type in the following where
clause: WHERE AGE = " + ((Integer)globalMap.get("people.Age"))
• Make sure that the type corresponds to the column used as variable. In this use case, Age is
of Integer type. And use the variable the way you set in the globalMap key field of the
map editor.
• Then double-click on the tMySQLoutput component and check that the schema
corresponds to the mapping setting. You are now ready to execute the Job.
• Click the Run view of the tab system located at the bottom of the design workspace, to
display the Job execution tab.
• Before running the Job, select the Traces check box to view the data processing progress.
• Make sure that the database types are respected (string, integer...) to avoid execution errors.
• Then press F6 or click the Run button to execute the Job.
For more comfort, you can maximize the Job design view while executing by simply
double-clicking on the Job name tab.

tMap
The lookup data is reloaded for each of the main flow’s rows, corresponding to the age constraint.
All age matches are retrieved in the lookup rows and grouped together in the output flow.
Therefore if you check out the data contained in the newly created people_mixandmatch table, you
will find all the age duplicates corresponding to different individuals whose age equals to 60 or 40
and the city where they have been to.

tMap
Scenario 7: Mapping with join output tables
The following scenario describes a Job processing reject flows without separating them from the
main flow.
• On the Repository tree view, click Metadata > File delimited. Drag and drop the customers
metadata onto the workspace. For more information about centralizing metadata, see
Defining Metadata items in the Talend Open Studio User Guide. The customers metadata
contains information about customers, such as their ID, their name or their address...
• A dialog box opens and asks you to choose which component type you want to use. Select
tFileInputDelimited and click OK.
• Drop the states metadata. Select the same component in the dialog box and click OK. The
states metadata contains the ID of the state, and its name.
• Drop from the Palette a tMap and two tLogRow components.
• Link the customers component to the tMap, using a Row Main link.
• Link the states component to the tMap, using a Row Main link. It will automatically be
defined as Lookup.
• Double-click the tMap component. Drag and drop the idState column to the first row in the
second table, in order to create a join.
• Clear the Die on error check box if you want to handle the execution errors. The
ErrorReject table is now created. Click four times the [+] button at the bottom right of the
editor to add four columns. Select the input columns id, idState, RegTime and drag and drop
them to the four new blank columns.

tMap
• Click the [+] button at the top right of the editor to add an output table. In the dialog box
which opens, select New output. In the field next to it, type in the name of the table, out1.
Click OK.
• Drop the four following columns from the input table to the out1 table: id, CustomerName,
CustomerAddress, and idState.
• Add two columns RegTime and RegisterTime with a Date format by clicking two times the
[+] button.
• Click in the first one, and press Ctrl+Space to display the autocompletion list. Select
TalendDate.parseDate. Change the pattern to ("dd/MM/yyyy
HH:mm",row1.RegTime).
• Do the same thing in the second column, but change the pattern to ("yyyy-MM-dd
HH:mm:ss.SSS",row1.RegisterTime).
• Then, click the [+] button at the top right of the editor to add an output table. In the dialog
box which opens, select Create join table from, choose Out1, and name it rejectInner. Click
OK.
• Click the purple arrow to handle rejects.
• Click in the LabelState column, and name it “UNKNOWN”.
• In the RegTime column, press Ctrl+Space, and select TalendDate.parseDate.
Change the pattern to ("dd/MM/yyyy HH:mm",row1.RegTime.)
• In the RegisterTime column, press Ctrl+Space, and select TalendDate.parseDate,
but change the pattern to ("yyyy-MM-dd
HH:mm:ss.SSS",row1.RegisterTime).
• If the data from row1 has a wrong pattern, it will be returned by the ErrorReject flow.
• Click OK to validate the changes and close the editor.

tMap
• Double-click the tLogRow component to display its Component view.

• In the Mode area, select Table.
• Do the same thing with the second tLogRow.
• Then, save your Job and press F6 to execute it.
The Console displays the main flow and the rejects, which are both processed in the tMap
component, and united in the same output flow.

tNormalize
tNormalize
tNormalize Properties
Function Normalizes the input flow following SQL standard.

Purpose tNormalize helps improve data quality and thus eases the data update.
component, the schema is read-only.
User Guide.
Studio User Guide.
Java only Get rid of duplicated Select this check box to deduplicate rows in the
rows from output data of the output flow.
Java only Use CSVparameters Select this check box to include CSV specific
parameters such as escape mode and enclosure
character.
Column to Select the column from the input flow which the
normalize normalization is based on
Item separator Enter the separator which will delimits data in the
input flow.
Limitation n/a
Scenario: Normalizing data (in Perl)
This simple scenario illustrates a Job that normalizes a list of tags for Web forum topics and outputs
them into a table in the standard output console (Run tab).
tFileInputDelimited, tNormalize, tLogRow.

tNormalize
• In the tFileInputDelimited Basic settings, set the input file to be normalized.
• The file schema is stored in the repository for ease of use. It is made of one column, called
Tags, containing rows with one or more keywords.
• Set the Row Separator and the Field Separator.
• On the tNormalize Basic settings panel, define the column the normalization operation is
based on.
• In this use case, the column to normalize is Tags.
• The Item separator is the comma, surrounded here by single quotes as the Job is done in
Perl.
• In the tLogRow component, select the Print values in the cells of table check box.

tNormalize
The values are normalized and displayed in a table cell on the console.

tPivotToRows
tPivotToRows
tPivotToRows properties
Function tPivotToRows transforms multiple columns into multiple key/value

lines.
Purpose tPivotToRows makes it possible to choose a list of columns from the
input flow and transforms it into lines in the output flow.
Studio User Guide.
Row keys Select the list of the columns of the input schema
which you want to display as a unique output line.
The non-selected columns will constitute the
pivot.
Click on the plus button to add as many lines as of

columns to concatenate. click in each of the lines
of the Input column list and select the name of
the column.
Row key Set separators for concatenated values.
concatenate
delimiter
component level.
Usage This component is generally used as an intermediate step in a data flow.
It needs then an input and output components.

tPivotToRows
Scenario: Concatenating a list of columns in a table by using the

other table columns as pivot
This Perl scenario describes a four-component Job that allows to concatenate in a single line the
information distributed in a list of columns by using the other columns in the table as pivot.
tFileInputDelimited, tPivotToRows and tLogRow (x2).
• Connect the four components using Row Main links.

view where you can define its properties.

fields that follow are filled in with the stored information automatically, or
-select Built-In and fill in the fields that follow manually.
For this example, we use the Built-In mode.
• Click the three-dot button next to the File Name field and browse to the input file. In this
example, our source file is use_case_tunpivotrow and it holds eight columns. The five
columns: id, CustomerName, CustomerAddress, id2 and RegisterTime are to be
concatenated, while the other three columns: Sum1, Sum2 and Sum3 are to be used as pivot.
• In the Basic settings view of tFileInputDelimited, define in the corresponding fields the
row and field separators used in the source file.
In this example, set Header to 1 since the first row that holds columns’ names is to be
ignored. Footer and Limit for the number of processed rows are not set.

tPivotToRows
• In the Schema field, set schema to Built in then click the three-dot button next to the Edit
Schema field to define the data to be passed to the following component. In this example,
the source schema consists of the eight columns of the source file use_case_tunpivotrow.
• In the design workspace, double-click tPivotToRows to display its Basic settings view
where you can define the component properties.
You can click the three-dot button next to Edit schema to check the retrieved schema.
• Click the plus button to add in the Row keys area as many lines as the columns to
concatenate. In this example, we add five lines.
• Click in each of the lines of the Input column list and select the name of the column to be
concatenated. The input schema columns that are not selected will be used as pivot.
• In the Row key concatenate delimiterfield, define a character to separate the data of the
various columns once the concatenation is completed.
• Double-click the first tLogRow component to display its Basic settings view and define its
properties.

tPivotToRows
• In the mode area, select Table to display the source file and the tPivotToRows results
together to be able to compare them.
• Do the same for the second tLogRow component
• save the Job and press F6 to execute it.
The console shows the results of the two tLogRow components. Table tLogRow_1 gives an outline
of the source file and table tLogRow_2 shows the concatenation of the columns id, CustomerName,
CustomerAddress, id2 and RegisterTime as well as the transformation of the columns Sum1, Sum2
and Sum3 as pivot.

tReplace
tReplace
tReplace Properties
Function Carries out a Search & Replace operation in the input columns defined.
Purpose Helps to cleanse all files before further processing.
User Guide.
Studio User Guide.
Simple Mode Click Plus to add as many conditions as needed.
Search / Replace The conditions are performed one after the other
for each row.
Input column: Select the column of the schema
the search & replace is to be operated on
Search: Type in the value to search in the input
column
Replace with: Type in the substitution value.
Whole word: Select this check box if the
searched value is to be considered as whole.
Case sensitive: Select this check box to care
about the case.
Note that you cannot use regular expression in
these columns.
Use advanced mode Select this check box when the operation you
want to perform cannot be carried out through the
simple mode. In the text field, type in the regular
expression as required.
Usage This component is not startable as it requires an input flow. And it

tReplace
Scenario: multiple replacements and column filtering
This following Job (made in Perl) searches and replaces various typos and defects in a csv file then
operates a column filtering before producing a new csv file with the final output.
• Click & drop the following components from the Palette: tFileInputDelimited, tReplace,
tFilterColumn and tFileOutputDelimited.
• Connect the components using Main Row connections via a right-click on each component.
• Select the tFileInputDelimited component and set the input flow parameters.
• The Property type for this scenario is Built-in. Therefore the following fields are to be set
manually unlike the Properties stored centrally in the repository, that are retrieved
automatically.
• The File is a simple csv file stored locally. The Row Separator is a carriage return and the
Field Separator is a semi-colon. In this example no Header, no Footer and no Limit are
to be set.
• The file contains characters such as: \t, |||, [d] or *d which should not be interpreted
as special characters or wild card.

tReplace
• The schema for this file is built in also and made of four columns of various types (string or
int).
• Now select the tReplace component to set the search & replace parameters.
• The schema can be synchronized with the incoming flow.

• Select the Simple mode check box as the search parameters can be easily set without
requiring the use of regexp.
• Click the plus sign to add some lines to the parameters table.
• On the first parameter line, select amount as input column. In the search field look for the
decimal dot separator and replace it with a comma, in between single quotes.
• On the second parameter line, select str as input column. In the search field, look for stret
or streat or stre. Note that these values are separated by a pipe that means or in Perl
language. Replace them by Street. Select the whole word check box.
• On the third parameter line, select again str as input column, search the pipe character using
a backslash in front, to differentiate it from the “or” in Perl language. and replace it with
nothing between single quotes (‘’).
• On the fourth parameter line, select firstname as input column. In the Search field, look for
the following characters: [, ], +, *. Note that these values are separated by a pipe that
means or in Perl language. Replace them with nothing between single quotes (‘’).

tReplace
• On the fifth parameter line, select amount as input column. In the Search field, type in the
dollar sign between single quotes and In the Replace field, type in the Euro sign.
• On the last parameter line, select firstname as input column. Search the string: \t. To
differenciate it from the tabulation, add as many backslashes in front of it as there are
parsing, in other words, two backlashes are used to avoid misinterpreting and two extra
backslashes constitute part of the character being looked for. In total four backslahes
including the one in the character it self are being searched. Replace them with nothing
between single quotes (‘’). And select the whole word check box.
• The advanced mode isn’t used in this scenario.
• Select the next component in the Job, tFilterColumn.
• The tFilterColumn component holds a schema editor allowing to build the output schema
based on the column names of the input schema. In this use case, change the order of the
input schema columns and add 3 new columns, to obtain a schema as follows: empty_field,
firstname, name, str, amount, filler1, filler2.

tReplace
• Set the tFileOutputDelimited properties manually.

• The schema is built-in for this scenario, and comes from the preceding component in the Job.
The first column is empty and the rest of the columns have been cleaned up from the parasitical
characters. The street column was moved. And the decimal delimiter has been changed from a dot
to a comma, along with the currency sign.

tSampleRow
tSampleRow
tSampleRow properties
Function tSampleRow filters rows according to line numbers.

Purpose tSampleRow helps to select rows according to a list of single lines
and/or a list of groups of lines.
User Guide.
Range Enter a range using the relevant syntax to choose
a list of single lines and/or a list of groups of lines.
Usage This component handles flows of data therefore it requires input and
output components.
Limitation n/a
Scenario: Filtering rows and groups of rows
This Java scenario describes a three-component Job. A tRowGenerator is used to create random
entries which are directly sent to a tSampleRow where they will be filtered according to a defined
range. In this scenario, we suppose the input flow contains names of salespersons along with their
respective number of sold products and their years of presence in the enterprise. The result of the
filtering operation is displayed on the Run console.
tRowGenerator, tSampleRow, and tLogRow.

tSampleRow
• In the design workspace, select tRowgenerator.

• Click the Component tab to define the basic settings for tRowGenerator.
• In the Basic settings view, set the Schema Type to Built-In and click the three-dot [...]
button next to Edit Schema to define the data you want to use as input. In this scenario, the
schema is made of five columns.
• In the Basic settings view, click RowGenerator Editor to define the data to be generated.
• In the RowGenerator Editor, specify the number of rows to be generated in the Number
of Rows for RowGenerator field and click OK. The RowGenerator Editor closes.
• In the design workspace, select tSampleRow.

tSampleRow
• Click the Component tab to define the basic settings for tSampleRow.
• n the Basic settings view, set the Schema Type to Built-In and click Sync columns to
retrieve the schema from the tRowGenerator component.
• In the Range panel, set the filter to select your rows using the correct syntax as explained.
In this scenario, we want to select the first and fifth lines along with the group of lines
between 9 and 12.
settings. For more information about tLogRow, see tLogRow on page 1023.
The filtering result displayed on the console shows the first and fifth rows and the group of rows
between 9 and 12.

tSortRow
tSortRow
tSortRow properties
Function Sorts input data based on one or several columns, by sort type and order
Purpose Helps creating metrics and classification table.
Studio User Guide.
Criteria Click + to add as many lines as required for the
sort to be complete. By default the first column
defined in your schema is selected.
Schema column: Select the column label from
your schema, which the sort will be based on.
Note that the order is essential as it determines the
sorting priority.
Sort type: Numerical and Alphabetical order are
proposed. More sorting types to come.
Order: Ascending or descending order.
Advanced settings Sort on disk Customize the memory used to temporarily store
output data.
Temp data directory path: Set the location
where the temporary files should be stored.
Create temp data directory if not exists: Select
this checkbox to create the directory if it does not
exist.
Buffer size of external sort: Type in the size of
physical memory you want to allocate to sort
processing.

tSortRow

component level.
Limitation n/a
Scenario: Sorting entries
This scenario describes a three-component Job. A tRowGenerator is used to create random entries
which are directly sent to a tSortRow to be ordered following a defined value entry. In this scenario,
we suppose the input flow contains names of salespersons along with their respective sales and their
years of presence in the company. The result of the sorting operation is displayed on the Run
console.
• Drop the three components required for this use case: tRowGenerator, tSortRow and
tLogRow from the Palette to the design workspace.
• Connect them together using Row main links.
• On the tRowGenerator editor, define the values to be randomly used in the Sort component.
For more information regarding the use of this particular component, see tRowGenerator on
page 1056.
• In this scenario, we want to rank each salesperson according to its Sales value and to its
number of years in the company.
• Double-click on tSortRow to display the Basic settings tab panel. Set the sort priority on
the Sales value and as secondary criteria, set the number of years in the company.

tSortRow
• Use the plus button to add the number of rows required. Set the type of sorting, in this case,
both criteria being integer, the sort is numerical. At last, given that the output wanted is a rank
classification, set the order as descending.
• Display the Advanced Settings tab and select the Sort on disk check box to modify the
temporary memory parameters. In the Temp data directory path field, type the path to the
directory where you want to store the temporary data. In the Buffer size of external sort
field, set the maximum buffer value you want to allocate to the processing.
The default buffer value is 1000000 but the more rows and/or columns you process, the higher the value
needs to be to prevent the Job from automatically stopping. In that event, an “out of memory” error
message displays.
• Make sure you connected this flow to the output component, tLogRow, to display the result
in the job console.
• Press F6 to run the Job or go to the Run panel and click Run.The ranking is based first on
the Sales value and second on the number of years of experience.

System components
This chapter details the main components that you can find in the System family of the Talend
The System family groups together components that help you to interact with the operating system.
System components
tRunJob
tRunJob
tRunJob Properties
Component family System
Function tRunJob executes the Job called in the component’s properties, in the
frame of the context defined.
Purpose tRunJob helps mastering complex job systems which need to execute
one Job after another.
Guide.
CopyChild Job Click to fetch the child job schema.
Schema
Job Select the Job to be called in and processed. Make
sure you already executed once the Job called,
beforehand, in order to ensure a smooth run
through tRunJob.
Version Select the child Job version that you want to use.
Context If you defined contexts and variables for the Job
to be run by the tRunJob, select the applicable
context entry on the list.
Use an independent Select this check box to use an independent
process to run process to run the subjob. This helps in solving
subjob issues related to memory limits.
Die on child error Clear this check box to execute the parent Job
even though there is an error when executing the
child Job.
Transmit whole Select this check box to get all the context
context variables from the parent Job. Deselect it to get all
the context variables from the child Job.

System components
tRunJob
Context Param You can change the selected context parameters.

Click the plus button to add the parameters as
defined in the Context of the child Job. For more
information on context parameters, see Context
settings in Talend Open Studio User Guide.
Advanced settings Print Parameters Select this check box to display the internal and
external parameters in the Console.
component level.
Usage This component can be used as a standalone Job or can help clarifying
complex Job by avoiding having too many sub-jobs all together in one
Job.
Limitation n/a
Scenario: Executing a child Job
This scenario describes a single-component Job calling in and executing another Job. The Job to be
executed reads a basic delimited file and simply displays its content on the Run log console. The
particularity of this Job lies in the fact that this latter Job is executed from a separate Job and uses
a context variable to prompt for the input file to be processed.
Create the first Job reading the delimited file.

• Drop a tFileInputDelimited and a tLogRow from the Palette to the design workspace.
• Double-click tFileInputDelimited to open its Basic settings view and define its properties.

System components
tRunJob
• Set the Property type to Built-In for this Job.

• Click in the File Name field and then press F5 to open the [New Context Parameter] dialog
box and configure the context variable.
• In the Name field, enter a name for this new context variable, File in this example.
In this example, there is no need to either select the Prompt for value check box or to set a prompt
message, as the default parameter value can be used.

System components
tRunJob
• Click Finish to validate the modification and press Enter on your keyboard to make sure the
new context variable is stored the File Name field.
• In the Basic settings view, type in the field and row separators used in the input file.
• If needed, set Header, Footer, and Limit. In this example, no header or footer are used and
no limit for the number of processed rows is set.
• Set Schema type to Built-in for this example. Click the three-dot button next to the field
name to open the schema dialog box where you can configure the schema manually.
• In the dialog box, click the plus button to add two columns and name them following the first
and second column names of your input file, username and age in this example.
If you store your schema in the Repository tree view, you only need to select the
relevant metadata entry corresponding to your input file structure.
• Double-click tLogRow to display its Basic settings view and define its properties.
• Click Sync columns to retrieve the schema of the input component and then set other options
according to your needs.
• Save your Job and press F6 to make sure that it executes without error.
Create the second Job that will be the parent Job.
• Drop a tFileList and a tRunJob from the Palette to the design workspace.
• Connect the two components together using an Iterate link.
• Double-click tFileList to open its Basic settings view and define its properties.
• In the Directory field, set the path to the directory that holds the files to be processed, or click
the three-dot button next to the field to browse to the directory. In this example, the directory
is called tRunJob and it holds three delimited files.
• In the FileList Type list, select Files.
• select the Use Glob Expressions as Filemask check box to be able to use regular
expressions in your file masks.

System components
tRunJob
• In the Files area, click the plus button to add a line where you can set the filter to apply. In
this example, we want only to retrieve delimited files “*.csv”.
• Double-click tRunJob to display its Basic settings view and define its properties.
• Click the three-dot button next to the Job field to open the [Find a Job] dialog box.
• Select the child Job you want to execute and click OK to close the dialog box. The name of
the selected Job displays in the Job field in the Basic settings view of tRunJob.
• Click Copy Child Job Schema to retrieve the schema from the child Job.
• In the Context Param area, click the plus button to add a line and define the context
parameter.
• Click in the Values cell and then press Ctrl+Espace on your keyboard to access the list of
context variables.

System components
tRunJob
• In the list, select tFileList-1.CURRENT_FILEPATH. The corresponding context variable

displays in the Values cell:
((String)globalMap.get(“tFileList-1.CURRENT_FILEPATH”)).
For more information on context variables, see Context settings in Talend Open Studio
User Guide.
• save your Job and press F6 to execute it.
The called-in Job reads the data contained in the input file, as defined by the input schema, and the
result of this Job is displayed directly in the Run console.
Related topic: tLoop on page 1071, and Scenario1: Buffering data (Java) on page 1036 of the
tBufferOutput component.

System components
tSetEnv
tSetEnv
tSetEnv Properties
Function tSetEnv adds variables temporarily to system environment during the

execution of a Job.
Purpose tSetEnv allows to create variables and execute a job script through
communicating the information about the newly created variables
between subjobs. After job execution, the newly created variables are
deleted.
Basic settings Parameters Click the plus button to add the variables
needed for the job.
name: Enter the syntax for the new variable.
value: Enter a parameter value according to the
context.
append: Select this check box to add the new
variable at the end.
Usage tSetEnv can be used as a start or an intermediate component.
Limitation n/a
Scenario: Modifying the Date variable during the execution of a Job
The following scenario is a Perl Job that reads a column in an Oracle DB, retrieves the current date
from the column using a DB query, creates a new variable using tSetEnv to modify the date format,
and outputs date, in the modified format, on the console.
To modify the date format using a new variable created by tSetEnv:
• Drop the following components from the Palette onto the design workspace: tSetEnv,
tOracleInput, and tLogRow.
• Connect tSetEnv to tOracleInput using an OnSubjobOk link.
• Connect tOracleInput to tLogRow using a Row Main link.

System components
tSetEnv
• Select tSetEnv and click the Component tab to display the component view.
• In the Basic settings panel, click the plus button to add a new parameter line and define your
new variable.
• Click in the name cell and enter the syntax for your date variable. In this example, we use
NLS_DATE_FORMAT.
• Click in the value cell and enter the desired value for your new date variable.
In this example, we want to modify the date format DD-MMM-YY predefined in the system to display
as YYYY-MM-DD.
• Select tOracleInput and click the Component tab to display the component view.
• Set the Basic settings for the tOracleInput component. For more information, see
tOracleInput on page 545.

System components
tSetEnv
In this example, we query an Oracle database to extract the data held in the “now” column from the
“setenv” table.
• Select tLogRow and click the Component tab to display the component view.
• Set the Basic settings for the tLogRow component as needed. For more information, see
tLogRow on page 1023.
The date displays on the console in the format YYYY-MM-DD set with the tSetEnv component.
To display the date in the above Job in the format pre-defined in the system:
• In the design workspace, right-click the tSetEnv component and select Deactivate
tSetEnv_1 from the drop-down list.

System components
tSetEnv
The date displays on the console in the format DD-MMM-YY pre-defined in the system.

System components
tSSH
tSSH
tSSH Properties
Function Returns data from a remote computer, based on the secure shell
command defined.
Purpose Allows to establish a communication with distant server and return
securely sensible information.
Basic settings Host IP address
Port Listening port number
User User authentication information
Public Select the relevant option.
Key/Password In case of Public Key, make sure the key is added
to the agent or that no passphrase is required.
Password/Private Password: Type in the password required.
Key Private key: browse to the relevant key location.
Commands Type in the command for the relevant information
to be returned from the remote computer.
Use timeout/timeout Define the timeout time period. A timeout
in seconds message will be generated if the actual response
time exceeds this expected processing time.
Limitation The component use is optimized for Unix-like systems.
Scenario: Remote system information display via SSH
The following use case describes a basic Job that uses SSH command to display the hostname of the
distant server being connected to, and the current date on this remote system.
The tSSH component is sufficient for this Job. Drop it from the Palette to the desihn workspace.
Double-click on the tSSH component and select the Basic settings view tab.

System components
tSSH
• Type in the name of the Host to be accessed through SSH as well as the Port number.
• Fill in the User identification name on the remote machine.
• Select the Authentication method on the list. For this use case, the authentication method
used is the public key.
• Thus fill in the corresponding Private key.
• On the Command field, type in the following command. For this use case, type in
hostname; date between single quotes (as the Job is generated in Perl.).
• Select the Use timeout check box and set the time before falling in error to 5 seconds.
The remote machine returns the host name and the current date and time as defined on its system.

System components
tSystem
tSystem
tSystem Properties
Function tSystem executes one or more system commands.

Purpose tSystem can call other processing commands, already up and running in
a larger Job.
Basic settings Use home directory Select this check box to change the name and path
of a dedicated directory.
Command Enter the system command. Note that the syntax
is not checked.
In Windows, the MS-DOS commands

do not allow you to pass directly from
the current folder to the folder
containing the file to be launched. To launch
a file, you must therefore use an initial
command to change the curent folder, then a
second one to launch the file
Standard Output and Select the type of output for the processed data to
Error Output be transferred to.
to console: data is passed on to be viewed in the
Run view.
to global variable: data is passed on to an output
variable linked to the tSystem component.
to console and to global variable: data is passed
on to the Run view and to an output variable
linked to the tSystem component.
normal: data is passed on to the component that
comes next.
the preceding component in the Job.
Guide.

System components
tSystem

Environment Click the [+] button to add as many global
variables variables as needed.
name: Enter the syntax of the new variable.
value: Enter a value for this variable according to
the context.
Usage This component can typically used for companies which already
implemented other applications that they want to integrate into their
processing flow through Talend.
Limitation n/a
Scenario: Echo ‘Hello World!’
This Java scenario is a two-component Job that executes a system command and shows the results
in the Run view “console”.
• Drop a tJava and a tSystem components from the Palette to the design workspace.
• Connect the two components together using a Trigger > OnSubjobOk link between the two
components.
When executing the Job, the first component triggers the second one.
• Double-click tSystem to open the Basic settings view and display the component properties.

System components
tSystem
• In the Command field, enter the echo command followed by the string to display, “Hello
World!” in this example.
• In the Standard Output field, select to a global variable to send the output to a global
variable.
• Keep the by-default parameters in the other fields.
• Double-click tJava to open the Basic settings view and define the component properties.
• Enter the following Java command to display the tSystem output variable in the console:
System.out.println(“Hello World!”);
The Job executes an echo command and shows the output in the Console of the Run view using a
Println command in the tJava component.

Talend MDM components
This chapter details the main components that you can find in the Talend MDM family of the
The Talend MDM family groups together connectors that read and write master data in the MDM
hub (XML repository).
tMDMDelete
tMDMDelete
This component is available in the Palette , but you will only be able to use it if you have Talend MDM.
tMDMDelete properties
Component family Talend MDM
Function tMDMDelete deletes data records from specific entities in the MDM hub (XML
repository).
Purpose This component deletes master data in an MDM hub.
Repository.
Click Edit Schema to modify the schema.
If you modify the schema, it automatically becomes

built-in.
Click Sync columns to collect the schema from the previous

component.
Built-in: You create the schema and store it locally for this
projects and job designs. Related topic: Setting a Repository
schema in the Talend Open Studio User Guide.
URL Type in the URL required to access the MDM server.
Username and Type in the user authentication data for the MDM server.
Password
Version Type in the name of the version of master data you want to
connect to, for which you have the required user rights.
Leave this field empty if you want to display the default
version of master data.
Entity Type in the name of the entity that holds the data record(s)
you want to delete.
Data Container Type in the name of the data container that holds the data
record(s) you want to delete.
Keys Specify the field(s) (in sequence order) composing the key
when the entity have a multiple key.

tMDMDelete
Logical delete Select this check box to send the master data to the Recycle
bin and fill in the Recycle bin path. Once in the Recycle bin,
the master data can be definitely deleted or restored. If you
leave this check box clear, the master data will be
permanently deleted.
an error occurs.
Usage Use this component to write a file and separate the fields using a specific separator.
Scenario: Deleting master data from an MDM hub
This scenario describes a two-component Job that deletes the specified data record from the MDM
XML repository.
• Drop tMDMInput and tMDMDelete of the Talend MDM family from the Palette onto the
design workspace.
• Connect the two components together using a Row > Main link.
• Double-click tMDMInput to display its Basic settings view and define the component
properties.

tMDMDelete
• In the Property Type list, select Built-in to complete the fields manually. If you have stored
the MDM connection information in the repository metadata, select Repository from the list
and the fields will be completed automatically.
• In the Schema Type list, select Built-in and click [...] next to Edit schema to open a dialog
box. Here you can define the structure of the master data you want to read in the MDM hub.
• The master data is collected in a four column schema of the type String: Id, Name, City and
State. Click OK to close the dialog box and proceed to the next step.
• In the URL field, enter the MDM server URL, between quotes: for example,
"http://localhost:8080/talend/TalendPort".
• In the Username and Password fields, enter your login and password to connect to the
MDM server.
• In the Version field, enter between quotes the name of the master data version you want to
access. Leave this field empty to access the default master data version.
• In the Entity field, enter between quotes the name of the business entity that holds the data
record(s) you want to read. Here, we want to access the Agency entity.
• In the Data Container field, enter between quotes the name of the data container that holds
the master data you want to read, the DStar container in this example.
The Use multiple conditions check box is selected by default.
• In the Operations table, define the conditions to filter the master data you want to delete as
the following:
-click the plus button to add a new line.
-In the Xpath column, enter between quotes the Xpath and the tag of the XML node on
which you want to apply the filter. In this example, we work with the Agency entity, so enter
“Agency/Id”.
-In the Function column, select the function you want to use. In this scenario, we use the
Starts With function.
-In the Value column, enter the value of your filter. Here, we want to filter the master data
which Id starts with TA.
• In the Component view, click Advanced settings to set the advanced parameters.

tMDMDelete
• In the Loop XPath query field, enter between quotes the structure and the name of the XML
node on which the loop is to be carried out.
• In the Mapping table and in the XPath query column, enter between quotes the name of the
XML tag in which you want to collect the master data, next to the corresponding output
column name.
• In the design workspace, double-click the tMDMDelete component to display the Basic
settings view and set the component properties.
• In the Schema list, select Built-in and click the three-dot button next to the Edit Schema
field to describe the structure of the master data in the MDM hub.

tMDMDelete
• Click the plus button to the right to add one column of the type String. set the name of this
column to xmlOutput. Click OK to close the dialog box and proceed to the next step.
• In the URL field, enter the URL required to connect to the MDM server, for example:
"http://localhost:8080/talend/TalendPort".
• In the Username and Password fields, enter the authentication information required to
connect to the server.
• In the Version field, enter between inverted commas the name of the master data version you
want to access. Leave the field empty if you want to access the default version.
• In the Entity field, enter the name of the business entity that holds the master data you want
to delete, the Agency entity in this example.
• In the Data Container, enter the name of the data container that holds the data to be deleted,
DStar in this example.
• In the Keys table, click the plus button to add a new line. In the Keys column, select the
column that holds the key of the Agency entity. Here, the key of the Agency entity is set on
the Id field.
If the entity has multiple keys, add as many line as required for the keys and select
them in sequential order.
• Select the Logical delete check box if you do not want to delete the master data permanently.
This will send the deleted data to the Recycle bin. Once in the Recycle bin, the master data
can be restored or permanently deleted. If you leave this check box clear, the master data will
be permanently deleted.
• Fill in the Recycle bin path field. Here, we left the default path but if your recycle bin is in
a path different from the default, specify the path.
• Press Ctrl+S to save your Job and F6 to execute it.
The master data which Id starts with “TA” have been deleted and sent to MDM Recycle bin.

tMDMInput
tMDMInput
tMDMInput properties
Function tMDMInput reads master data in the MDM hub (XML repository).
Purpose This component reads master data in an MDM hub and thus makes it possible to
process this data.
Basic Settings Property Type Either Built in or Repository.
Repository: Select the repository file where properties are
stored. The fields that follow are completed automatically
using the fetched data
Repository.
built-in.
Built-in: The schema will be created and stored for this
component only. Related Topic: Setting a built-in schema in
the Talend Open Studio User Guide.
repository. You can reuse it in various projects and jobs.
Related Topic: Setting a Repository schema in the Talend
URL Type in the URL to access the MDM server.
Username and Type in user authentication data for the MDM server.
Password
Version Type in the name of the master data version you want to
connect to and to which you have access rights.
version.
Entity Type in the name of the business entity that holds the master
data you want to read.
Data Container Type in the name of the data container that holds the master
data you want to read.

tMDMInput
Use multiple conditions Select this check box to filter the master data using certain
conditions.
Xpath: Enter between quotes the path and the XML node to
which you want to apply the condition.
Function: Select the condition to be used from the list.
Value: Enter between inverted commas the value you want
to use.
Predicate: Select a predicate if you use more than one
condition.
If you clear this check box, you have the option of selecting
particular IDs to be displayed in the ID value column of the
IDS table.
If you clear the Use multiple conditions check box,

the Batch Size option in the Advanced Settings tab
will no longer be available
Skip rows Enter the number of lines to be ignored.
Advanced settings Batch Size Number of lines in each processed batch.
This option is not displayed if you have cleared the

Use multiple conditions check box in the Basic
settings view.
Loop XPath query The XML structure node on which the loop is based.
Mapping Column: reflects the schema as defined in the Edit schema
editor.
XPath query: Type in the name of the fields to extract from
the input XML structure.
Get Nodes: Select this check box to retrieve the Xml node
together with the data.
Scenario: Reading master data in an MDM hub
This scenario describes a two-component Job that reads master data on an MDM server. The master
data is fetched and displayed in the log console.
• From the Palette, drop tMDMInput and tLogRow onto the design workspace.

tMDMInput
• Double-click tMDMInput to open the Basic settings view and define the component
properties.
• In the Property Type list, select Built-In to complete the fields manually. If you have stored
the MDM connection information in the repository metadata, select Repository from the list
and the fields will be completed automatically.
• In the Schema list, select Built-In and click the three-dot button next to Edit schema to open
a dialog box. Here you can define the structure of the master data you want to read on the
MDM server.
• The master data is collected in a three column schema of the type String: ISO2Code, Name
and Currency. Click OK to close the dialog box and proceed to the next step.
• In the URL field, enter between inverted commas the URL of the MDM server.

tMDMInput
• In the Username and Password fields, enter your login and password to connect to the
MDM server.
want to access. Leave this field empty to display the default version.
• In the Entity field, enter between inverted commas the name of the business entity that holds
the master data you want to read.
• In the Data Container field, enter between inverted commas the name of the data container
that holds the master data you want to read.
• In the Component view, click Advanced settings to set the advanced parameters.
• In the Loop XPath query field, enter between inverted commas the structure and the name
of the XML node on which the loop is to be carried out.
• In the Mapping table and in the XPath query column, enter between inverted commas the
name of the XML tag in which you want to collect the master data, next to the corresponding
output column name.
• In the design workspace, click on the tLogRow component to display the Basic settings in
the Component view and set the properties.
• Click on Edit Schema and ensure that the schema has been collected from the previous
component. If not, click Sync Columns to fetch the schema from the previous component.

tMDMInput
The list of different countries along with their codes and currencies is displayed on the console of
the Run view.

tMDMOutput
tMDMOutput
tMDMOutput properties
Function tMDMOutput writes master data in an MDM hub (XML repository).

Purpose This component writes master data in on the MDM server.
Basic settings Property Type Either Built-in or Repository.
Repository: Select the repository file where the properties
are stored. The fields which follow are filled in
automatically using the fetched data.
Schema type and Edit A schema is a row description, i.e., it defines the number of
Repository.
Click Edit Schema to modify the schema. Note that if you
modify the schema, it automatically becomes built-in.
Click Sync columns to collect the schema from the previous

component.
Built-in: You create the schema and store it locally for this
projects and job designs. Related topic: Setting a Repository
schema in the Talend Open Studio User Guide.
XML Field Select the name of the column in which you want to write the
XML data.
URL Type in the URL of the MDM server.
Username and Type in the user authentication data for the MDM server.
Password
Version Type in the name of the master data management version
you want to connect to, for which you have the user rights
required.
perspective.
Data Model Type in the name of the data model against which the data to
be written is validated.
Data Container Type in the name of the data container where you want to
write the master data.

tMDMOutput
Return Keys Columns corresponding to IDs in order: in sequential

order, set the output columns that will store the return key
values (primary keys) of the item(s) that will be created.
Is Update Select this check box to update the modified fields.
If you leave this check box unchecked, all fields will be
replaced by the modified ones.
With Report Select this check box to add the actions carried out to a
modification report.
Source Name: Between quotes, enter the name of the
application to be used to carry out the modifications.
Enable verification by “before saving” process: Select
this check box to verify the commit that has been just added;
prior to saving.
Die on error This check box is selected by default. Clear the check box to
skip the row in error and complete the process for error-free
rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.
Advanced settings Extended Output Select this check box to commit master data in batches. You
can specify the number of lines per batch in the Rows to
commit field.
Configure Xml Tree Opens the interface which helps create the XML structure of
the master data you want to write.
Group by Select the column to be used to regroup the master data.
Create empty element if This check box is selected by default. If the content of the
needed interface’s Related Column which enables creation of the
XML structure is null, or if no column is associated with the
XML node, this option creates an opening and closing tag at
the required places.
Advanced separator Select this check box to modify the number of separators
(for number) used by default.
- Thousands separator: enter between inverted commas
the separator for thousands.
- Decimal separator: enter between inverted commas the
decimal separator.
Generation mode Select the generation mode you want to use, according to the
available memory.
- Fast but memory consuming (Dom4J)
- Slow with no memory consumed
Encoding Type Select the encoding type from the list or else select Custom
and define it manually. This is an obligatory field for the
manipulation of data on the server.
Usage Use this component to write a file and separate the fields using a specific separator.

tMDMOutput
Scenario: Writing master data in an MDM hub
This scenario describes a two-component Job that generates a data record, transforms it into XML
and loads it into the defined business entity in the MDM server. This Job also generates the primary
key for the data record that has been loaded.
In this example, we want to load a new agency in the Agency business entity. This new agency
should have a name, a country and a region.
• From the Palette, drop tFixedFlowInput and tMDMOutput onto the design workspace
• Connect the components using a Row Main link.
• Double-click tFixedFlowInput to view its Basic settings, in the Component tab and set the
• In the Schema list, select Built-In and click the three-dot button next to Edit schema to open
a dialog box in which you can define the structure of the master data you want to write on
the MDM server.

tMDMOutput
• Click the plus button and add three columns of the type String. Name the columns: Name,
Country, Region.
• Click OK to validate your changes and proceed to the next step.
• In the Number of rows field, enter the number of rows you want to generate.
• In the Mode zone, select the Use Single Table option to generate just one table.
• In the Value fields, enter between inverted commas the values which correspond to each of
the schema columns.
• In the design workspace, click tMDMOutput to open its Basic settings view and set the
properties.

tMDMOutput
• In the Property Type list, select Built-In and complete the fields manually. If you have
saved the MDM connection information under Metadata in the repository, select
Repository from the list and the fields which follow will be completed automatically.
• In the Schema list, select Built-In and click on the three dot button next to the Edit Schema
field to describe the structure of the master data you want to load on the MDM server.
• Click the plus button to the right to add two output columns of the type String. Name the first
xmlOutput and the second returnkey. The output schema in this component will insert the full
XML content and extract the primary key of the inserted data.

tMDMOutput
• Click OK to validate the changes and proceed to the next step.

The XML Field list in the Basic settings view is automatically filled in with the output
column(s) that you have created.
• In the URL field, enter the URL of the MDM server.
• In the Username and Password fields, enter the authentication information required to
connect to the MDM server.
want to access. Leave the field blank to access the default version.
• In the Data Model field, enter between inverted commas the name of the data model against
which you want to validate the master data you want to write.
• In the Data Container, enter between inverted commas the name of the data container into
which the master data should be written.
• In the Component view, click Advanced settings to set the advanced parameters for the
tMDMOutput component.
• Select the Extended Output check box if you want to commit master data in batches. You
can specify the number of lines per batch in the Rows to commit field.
• Click the three-dot button next to Configure Xml Tree to open the tMDMOutput editor.

tMDMOutput
• In the Link target area to the right, click in the Xml Tree field and then replace rootTag with
the name of the business entity in which you want to insert the data record, Agency in this
example.
• In the Linker source area, select your three schema columns and drop them on the Agency
node.
The [Selection] dialog box displays.
• Select the Create as sub-element of target node option so that the three columns are linked
to the three XML sub-elements of the Agency node and then click OK to close the dialog
box.
• Right-click the element in the Link Target area you want to set as a loop element and select
Set as Loop Element from the contextual menu. In this example, we want Country to be
the iterating object.

tMDMOutput

The new data record is inserted in the Agency business entity in the Location data container on the
MDM server. This data records holds, as we defined in the schema, the agency name, the agency
country and the agency region. The primary key for the inserted data records is also generated and
retrieved.

tMDMReceive
tMDMReceive
tMDMReceive properties
Function tMDMReceive generates as many lines and columns as you want using the context
variables.
Purpose This component retrieves a context parameter holding MDM XML data and
transforms it into a flat schema.
Basic Settings Schema and Edit A schema is a row description, i.e., it defines the number of
Repository.
built-in.
Built-in: The schema will be created and stored for this
component only. Related Topic: Setting a built-in schema in
the Talend Open Studio User Guide.
repository. You can reuse it in various projects and jobs.
Related Topic: Setting a Repository schema in the Talend
XML Record Enter the context parameter allowing to retrieve the last
changes made to the MDM server. For more information
about creating and using a context parameter, see Defining
Contexts and variables in the Talend Open Studio User
Guide.
Loop XPath query Set the XML structure node on which the loop is based.
Mapping Column: reflects the schema as defined in the Edit schema
editor.
XPath query: Type in the name of the fields to extract from
the input XML structure.
Get Nodes: Select this check box to retrieve the XML node
together with the data.
Die on error This check box is selected by default. Clear the check box to
skip the row on error and complete the process for error-free
rows. If needed, you can retrieve the rows on error via a
Row > Reject link.

tMDMReceive
Related scenario

tMDMReceive

XML components
This chapter details the main components that you can find in the XML family of the Talend Open
Studio Palette.
The XML family groups together the components dedicated to XML related tasks such as parsing,
validation, XML structure creation and so on.
XML components
tAdvancedFileOutputXML properties
Component family XML or File/Output
Function tAdvancedFileOutputXML outputs data to an XML type of file and offers

an interface to deal with loop and group by elements if needed.
Purpose tAdvancedFileOutputXML writes an XML file with separated data values
according to an XML tree structure.
be used.

Guide
Configure XML tree Opens the dedicated interface to help you set the
XML mapping. For details about the interface,
see Defining the XML tree on page 1221.
User Guide.
Guide.
component.
Append the source Select this check box to add the new lines at the end
xml file of your source XML file.
Generate compact file Select this check box to generate a file that does not
have any empty space or line separators. All elements
then are presented in a unique line and this will
reduce considerably file size.
Include DTD or XSL Select this check box to to add the DOCTYPE
declaration, indicating the root element, the access
path and the DTD file, or to add the processing
instruction, indicating the type of stylesheet used
(such as XSL types), along with the access path and
file name.

XML components
Advanced settings Split output in several If the XML file output is big, you can split the file
files every certain number of rows.
Create directory only This check box is selected by default. It creates a
if not exists directory to hold the output XML files if required.
Create empty element This box is selected by default. If the Associated
if needed Column content is null or if no column is associated
to an XML node, this option will create an open/close
tag in place of the expected tag.
Create associated If one of the XML elements is defined as a
XSD file Namespace element, this option will create the
corresponding XSD file.
To use this option, you must select Dom4J as
the generation mode.
Advanced separator Select this check box to change the expected data
(for number) separator.
Thousands separator: define the thousands
separator, between inverted commas
Decimal separator: define the decimals separator
between inverted commas
Generation mode Select the appropriate generation mode according to
your memory availability.
-Fast and memory-consuming (Dom4j):
or
-Slow with no memory consumed.
DB data handling.
Don’t generate empty Select the check box to avoid the generation of an
file empty file.
tStatCatcher Statistics Select the check box to collect the log data at a Job
level as well as at each component level.
Limitation n/a
Defining the XML tree

Double-click on the tAdvancedFileOutputXML component to open the dedicated interface or
click on the three-dot button on the Basic settings vertical tab of the Component Settings tab.

XML components
To the left of the mapping interface, under Schema List, all of the columns retrieved from the
incoming data flow are listed (on the condition that an input flow is connected to the
tAdvancedFileOutputXML component).
To the right of the interface, define the XML structure you want to obtain as output.
You can easily import the XML structure or create it manually, then map the input schema
columns onto each corresponding element of the XML tree.
Importing the XML tree

The easiest and most common way to fill out the XML tree panel, is to import a well-formed
XML file.
it once.
• On the menu, select Import XML tree.
• Browse to the file to import and click OK.
You can import an XML tree from files in XML, XSD and DTD formats.

XML components
The XML Tree column is hence automatically filled out with the correct elements. You can
remove and insert elements or sub-elements from and to the tree:
• Select the relevant element of the tree.
• Right-click to display the contextual menu
• Select Delete to remove the selection from the tree or select the relevant option
among: Add sub-element, Add attribute, Add namespace to enrich the tree.
Creating the XML tree manually

If you don’t have any XML structure defined as yet, you can create it manually.
it once.
• On the menu, select Add sub-element to create the first element of the structure.
You can also add an attribute or a child element to any element of the tree or remove any
element from the tree.
• Select the relevant element on the tree you just created.
• On the menu, select the relevant option among: Add sub-element, Add attribute,
Add namespace or Delete.
Mapping XML data

Once your XML tree is ready, you can map each input column with the relevant XML tree
element or sub-element to fill out the Related Column:
• Click on one of the Schema column name.
• Drag it onto the relevant sub-element to the right.
• Release to implement the actual mapping.

XML components
A light blue link displays that illustrates this mapping. If available, use the Auto-Map button,
located to the bottom left of the interface, to carry out this operation automatically.
You can disconnect any mapping on any element of the XML tree:
• Select the element of the XML tree, that should be disconnected from its respective
schema column.
• Select Disconnect linker.
The light blue link disappears.
Defining the node status

Defining the XML tree and mapping the data is not sufficient. You also need to define the loop
element and if required the group element.
Loop element
The loop element allows you to define the iterating object. Generally the Loop element is
also the row generator.
To define an element as loop element:
• Select Set as Loop Element.

XML components
The Node Status column shows the newly added status.
There can only be one loop element at a time.
Group element
The group element is optional, it represents a constant element where the groupby operation
can be performed. A group element can be defined on the condition that a loop element was
defined before.
When using a group element, the rows should sorted, in order to be able to group by the
selected node.
To define an element as group element:
• Select Set as Group Element.

XML components
The Node Status column shows the newly added status and any group status required are
automatically defined, if needed.
Click OK once the mapping is complete to validate the definition and continue the job
configuration where needed.
Scenario: Creating an XML file using a loop
The following scenario describes the creation of an XML file from a sorted flat file gathering a video
collection.
• Drop a tFileInputDelimited and a tAdvancedFileOutputXML from the Palette onto the

design workspace.
• Alternatively, if you configured a description for the input delimited file in the Metadata
area of the Repository, then you can directly drag & drop the metadata entry onto the editor,
to set up automatically the input flow.
• Right-click on the input component and drag a row main link towards the
tAdvancedFileOutputXML component to implement a connection.
• Select the tFileInputDelimited component and display the Component settings tab located
in the tab system at the bottom of the Studio.

XML components
• Select the Property type, according to whether you stored the file description in the
Repository or not. If you dragged & dropped the component directly from the Metadata, no
changes to the setting should be needed.
• If you didn’t setup the file description in the Repository, then select Built-in and manually
fill out the fields displayed on the Basic settings vertical tab.
• The input file contains the following type of columns separated by semi-colons: id, name,
category, year, language, director and cast.
• In this simple use case, the Cast field gathers different values and the id increments when
changing movie.
• If needed, define the tFileDelimitedInput schema according to the file structure.

XML components
• Once you checked that the schema of the input file meets your expectation, click on OK to
validate.
• Then select the tAdvancedFileOutputXML component and click on the Component
settings tab to configure the basic settings as well as the mapping. Note that a double-click
on the component will open directly the mapping interface.
• In the File Name field, browse to the file to be written if it exists or type in the path and file
name that needs to be created for the output.
• By default, the schema (file description) is automatically propagated from the input flow. But
you can edit it if you need.
• Then click on the three-dot button or double-click on the tAdvancedFileOutputXML
component on the design workspace to open the dedicated mapping editor.
• To the left of the interface, are listed the columns from the input file description.
• To the right of the interface, set the XML tree panel to reflect the expected XML structure
output.

XML components
• You can create the structure node by node. For more information about the manual creation
of an XML tree, see Defining the XML tree on page 1221.
• In this example, an XML template is used to populate the XML tree automatically.
• Right-click on the root tag displaying by default and select Import XML tree at the end of
the contextual menu options.
• Browse to the XML file to be imported and click OK to validate the import operation.
You can import the XML structure from XSML, XSD and STS files.
• Then drag & drop each column name from the Schema List to the matching (or relevant)
XML tree elements as described in Mapping XML data on page 1223.
• The mapping is shown as blue links between the left and right panels.
• Finally, define the node status where the loop should take place. In this use case, the Cast
being the changing element on which the iteration should operate, this element will be the
loop element.
• Right-click on the Cast element on the XML tree, and select Set as loop element.
• To group by movie, this use case needs also a group element to be defined.
• Right-click on the Movie parent node of the XML tree, and select Set as group element.
• The newly defined node status show on the corresponding element lines.
• Click OK to validate the configuration.

XML components
The output XML file shows the structure as defined.

XML components
tDTDValidator
tDTDValidator
tDTDValidator Properties
Component family XML
Function Validates the XML input file against a DTD file and sends the validation
log to the defined output.
Purpose Helps at controlling data and structure quality of the file to be processed
case, the schema is read-only. It contains standard
information regarding the file validation.
DTD file Filepath to the reference DTD file.
XML file Filepath to the XML file to be validated.
If XML is valid, Type in a message to be displayed in the Run
display console based on the result of the comparison.
If XML is not valid
detected, display
Print to console Select this check box to display the validation
message.
Limitation n/a
Scenario: Validating xml files
This scenario describes a Job that validates several files from a folder and outputs the log
information for the invalid files into a delimited file.
• Drop the following components from the Palette to the design workspace: tFileList,
tDTDValidator, tMap, tFileOutputDelimited.
• Connect the tFileList to the tDTDValidator with an Iterate link and the remaining
component using a main row.
• Set the tFileList component properties, to fetch an XML file from a folder.

XML components
tDTDValidator
• Change the Filemask to *.xml. Mind the quotes depending on the Perl or Java version you
are using.
• Clear the Case Sensitive check box.
• In the tDTDValidate Component view, the schema is read-only as it contains standard log
information related to the validation process.
• Set the DTD file to be used as reference.
• Press Ctrl+Space bar to access the variable list. In the XML file field, select the current
filepath global variable: $_globals{tFileList_1}{CURRENT_FILEPATH} (in Perl)
• In the various messages to display in the Run tab console, use the jobName to recall the job
name tag. Recall the filename using the relevant global variable:
$_globals{tFileList_1}{CURRENT_FILE}. Mind the Perl or Java operators such as the dot
or the plus sign to build your message.
• Select the Print to Console check box.
• In the tMap component, drag and drop the information data from the standard schema that
you want to pass on to the output file.

XML components
tDTDValidator
• Once the Output schema is defined as required, add a filter condition to only select the log
information data when the XML file is invalid.
• Follow the best practice by typing first the wanted value for the variable, then the operator
based on the type of data filtered then the variable that should meet the requirement. In this
case (in Java and Perl): 0 == $row1[validate]
• Then connect (if not already done) the tMap to the tFileOutputDelimited component using
a main row. Name it as relevant, in this example: errorsOnly.
• In the tFileOutputDelimited Basic settings, Define the destination filepath, the field
delimiters and the encoding.
On the Run console the messages defined display for each of the invalid files. At the same time the
output file is filled with log data.

XML components
tExtractXMLField
tExtractXMLField
tExtractXMLField properties
Function tExtractXMLField reads an input XML field of a file or a database table and
extracts desired data.
Purpose tExtractXMLField opens an input XML field, reads the XML structured
data directly without having first to write it out to a temporary file, and finally
sends data as defined in the schema to the following component via a Row
link.
Repository: Properties are stored in a repository file.
When this file is selected, the fields that follow are
built-in schema of Talend Open Studio User Guide
Repository: You already created the schema and
User Guide.
XML field Name of the XML field to be processed.
Component view of Talend Open Studio User Guide
Loop XPath query Node of the XML tree, which the loop is based on.
Mapping Column: reflects the schema as defined by the
Schema type field.
XPath Query: Enter the fields to be extracted from
Get nodes: Select this check box to recuperate the
XML content of all current nodes specified in the
Xpath query list or select the check box next to
specific XML nodes to recuperate only the content of
the selected nodes.
Limit Maximum number of rows to be processed. If Limit
is 0, no rows are read or processed.

XML components
tExtractXMLField
rows on error via a Row > Reject link.
level.
Usage This component is an intermediate component. It needs an input and an output
components.
Limitation n/a
Scenario 1: Extracting XML data from a field in a database table
This three-component Java scenario allows to read the XML structure included in the fields of a
database table and then extracts the data.
• Drop the following components from the Palette onto the design workspace: tMysqlInput,
tExtractXMLField, and tFileOutputDelimited.
• Connect the three components using Main links.

• Double-click tMysqlInput to display its Basic settings view and define its properties.

XML components
tExtractXMLField
• If you have already stored the input schema in the Repository tree view, select Repository
first from the Property Type list and then from the Schema list to display the [Repository
Content] dialog box where you can select the relevant metadata.
For more information about storing schema metadata in the Repository tree view, see
Setting up a DB connection and Dropping components from the Metadata node in Talend
• If you have not stored the input schema locally, select Built-in in the Property Type and
Schema fields and enter the database connection and the data structure information
manually. For more information about tMysqlInput properties, see tMysqlInput on page
465.
• In the Table Name field, enter the name of the table holding the XML data, customerdetails
in this example.
• Click Guess Query to display the query corresponding to your schema.
• Double-click tExtractXMLField to display its Basic settings view and define its
properties.
• In the Property type list, select Repository if you have already stored the description of
your file in the Repository tree view. The fields that follow are filled in automatically with
the stored data.
• If not, select Built-in and fill in the fields that follow manually.
• Click Sync columns to retrieve the schema from the preceding component. You can click the
three-dot button next to Edit schema to view/modify the schema.
Column in the Mapping table will be automatically populated with the defined schema.
• In the Xml field list, select the column from which you want to extract the XML data. In this
example, the filed holding the XML data is called CustomerDetails.
• In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve
data.

XML components
tExtractXMLField
• In the Xpath query column, enter between inverted commas the node of the XML field
holding the data you want to extract, CustomerName in this example.
• Double-click tFileOutputDelimited to display its Basic settings view and define its
properties.
• In the File Name field, define or browse to the path of the output file you want to write the
extracted data in.
• Click Sync columns to retrieve the schema from the preceding component. If needed, click
the three-dot button next to Edit schema to view the schema.
tExtractXMLField read and extracted the clients names under the node CustomerName of the
CustomerDetails field of the defined database table.
Scenario 2: Extracting correct and erroneous data from an XML field

in a delimited file
This Java scenario describes a four-component Job that reads an XML structure from a delimited
file, outputs the main data and rejects the erroneous data.

XML components
tExtractXMLField
tFileInputDelimited, tExtractXMLField, tFileOutputDelimited and tLogRow.
• Connect the first three components using Row Main links.
• Connect tExtractXMLField to tLogRow using a Row Reject link.
• Double-click tFileInputDelimited to open its Basic settings view and define the component
properties.
• Select Built-in in the Schema list and fill in the file metadata manually in the corresponding
fields.
• Click the three-dot button next to Edit schema to display a dialog box where you can define
the structure of your data.
• Click the plus button to add as many columns as needed to your data structure. In this
example, we have one column in the schema: xmlStr.

XML components
tExtractXMLField
If you have already stored the schema in the Metadata folder under File delimited,
select Repository from the Schema list and click the three-dot button next to the field
to display the [Repository Content] dialog box where you can select the relevant
schema from the list. Click Ok to close the dialog box and have the fields automatically filled
in with the schema metadata.
For more information about storing schema metadata in the Repository tree view, see Setting
up a File Delimited schema and Dropping components from the Metadata node in Talend
• In the Property type list, select:

fields that follow are automatically filled in with the stored information, or
-select Built-in and fill in the fields that follow manually.
For this example, we use the Built-in mode.
• In the File Name field, click the three-dot button and browse to the input delimited file you
want to process, CustomerDetails_Error in this example.
This delimited file holds a number of simple XML lines separated by double carriage return.
• Set the row and field separators used in the input file in the corresponding fields, double
carriage return for the first and nothing for the second in this example.
• If needed, set Header, Footer and Limit. None is used in this example.
• In the design workspace, double-click tExtractXMLField to display its Basic settings view
and define the component properties.
• In the Property type list, select:

-Repository if you have already stored the metadata of your file in the Repository, the fields
that follow are automatically filled in with the stored information, or
-select Built-in and fill in the fields that follow manually.
For this example, we use the Built-in mode.
Column in the Mapping table will be automatically populated with the defined schema.
• In the Xml field list, select the column from which you want to extract the XML data. In this
example, the filed holding the XML data is called xmlStr.

XML components
tExtractXMLField
• In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve
data.
• In the design workspace, double-click tFileOutputDelimited to open its Basic settings
view and display the component properties.
• Set Property Type to Built-in.

• In the File Name field, define or browse to the output file you want to write the correct data
in, CustomerNames_right.csv in this example.
• Click Sync columns to retrieve the schema of the preceding component. You can click the
• In the design workspace, double-click tLogRow to display its Basic settings view and
define the component properties.
• Click Sync Columns to retrieve the schema of the preceding component. For more
information on this component, see tLogRow on page 1023.
tExtractXMLField reads and extracts in the output delimited file, CustomerNames_right, the client
information for which the XML structure is correct, and displays as well erroneous data on the
console of the Run view.

XML components
tFileInputXML
tFileInputXML
tFileInputXML Properties
Component family XML or File/Input
Function tFileInputXML reads an XML structured file and extracts data row by row.
Purpose Opens an XML structured file and reads it row by row to split them up into
fields then sends fields as defined in the Schema to the next component, via
a Row link.
User Guide
Guide.
File Name/Input File name: Name of the file to be processed and the
Stream access path.
Stream: The data flow to be processed. If the data
has been used in a flow in the past, then it can be
fetched via the variable INPUT_STREAM. Press
Ctrl + Space to select the data flow from the
auto-completion list.
Guide.
Loop XPath query Node of the tree, which the loop is based on.

XML components
tFileInputXML
Mapping Column: Columns to map. They reflect the schema

as defined in the Schema type field.
XPath Query: Enter the fields to be extracted from
Get nodes: Select this check box to recuperate the
XML content of all current nodes specified in the
Xpath query list, or select the check box next to
specific XML nodes to recuperate only the content of
the selected nodes.
The Get Nodes option functions in the
DOM4j and SAX modes, although in SAX
mode namespaces are not supported. For fur-
ther information concerning the DOM4j and SAX
modes, please see the properties noted in the Gener-
ation mode list of the Advanced Settings tab..
Limit Maximum number of rows to be processed. If Limit
= 0, no row is read nor processed. If -1, all rows are
read or processed.
Die on error This check box is selected by default and stops the
job in the event of error. Clear the check box to skip
erroneous rows. The process will still be completed
erroneous rows using a Row > Reject link.
Advanced settings Advanced separator Select this check box to change data separator for
Thousands separator: define the separators to use
for thousands.
Decimal separator: define the separators to use for
decimals.
Ignore the Select this check box to ignore name spaces.
namespaces Generate a temporary file: click the three-dot
button to browse to the XML temporary file and set
its path in the field.
Use Separator for Select this check box if you want to separate
mode Xerces concatenated children node values.
This field can only be used if the selected
Generation mode is Xerces.
The following field displays:

Field separator: Define the delimiter to be used to
separate the children node values.
Generation mode From the drop-down list select the generation mode
for the XML file, according to the memory available
and the desired speed:.
-Fast and memory-consuming (Dom4j)
-Memory-consuming (Xerces).
-Less memory consumed (SAX)

XML components
tFileInputXML
level.
Usage tFileInputXML is for use as an entry componant. It allows you to create a
flow of XML data using a Row > Main link. You can also create a rejection
flow using a Row > Reject link to filter the data which doesn’t correspond to
the type defined. For an example of how to use these two links, see Scenario
2: Extracting correct and erroneous data from an XML field in a delimited file
on page 1237.
Limitation n/a
Scenario 1: Reading and extracting data from an XML structure
This Java scenario describes a basic Job that reads a defined XML directory and extracts specific
information and outputs it on the Run console via a tLogRow component.
• Drop tFileInputXML and tLogRow from the Palette to the design workspace.
• Connect both components together using a Main Row link.
• Double-click tFileInputXML to open its Basic settings view and define the component
properties.

XML components
tFileInputXML
• As the street dir file used as input file has been previously defined in the Metadata area, select
Repository as Property type. This way, the properties are automatically leveraged and the
rest of the properties fields are filled in (apart from Schema). For more information regarding
the metadata creation wizardsDefining Metadata items in Talend Open Studio User Guide.
• Select the same way the relevant schema in the Repository metadata list. Edit schema if you
want to make any change to the schema loaded.
• The Filename shows the structured file to be used as input
• In Loop XPath query, change if needed the node of the structure where the loop is based.
• On the Mapping table, fill the fields to be extracted and displayed in the output.
• If the file size is consequent, fill in a Limit of rows to be read.
• Enter the encoding if needed then double-click on tLogRow to define the separator
character.
The fields defined in the input properties are extracted from the XML structure and displayed on the
console.
Scenario 2: Extracting erroneous XML data via a reject flow
This Java scenario describes a three-component Job that reads an XML file and:
• first, returns correct XML data in an output XML file,
• and second, displays on the console erroneous XML data which type does not correspond to
the defined one in the schema.
• Drop the following components from the Palette to the design workspace: tFileInputXML,
tFileOutputXML and tLogRow.

XML components
tFileInputXML
• Right-click tFileInputXML and select Row > Main in the contextual menu and then click
tFileOutputXML to connect the components together.
• Right-click tFileInputXML and select Row > Reject in the contextual menu and then click
tLogRow to connect the components together using a reject link.
• Double-click tFileInputXML to display the Basic settings view and define the component
properties.
• In the Property Type list, select Repository and click the three-dot button next to the field
to display the [Repository Content] dialog box where you can select the metadata relative
to the input file if you have already stored it in the File xml node under the Metadata folder
of the Repository tree view. The fields that follow are automatically filled with the fetched
data. If not, select Built-in and fill in the fields that follow manually.
For more information about storing schema metadat in the Repository tree view, see Setting
up a File XML schema and Dropping components from the Metadata node in Talend Open
Studio User Guide.
• In the Schema Type list, select Repository and click the three-dot button to open the dialog
box where you can select the schema that describe the structure of the input file if you have
already stored it in the Repository tree view. If not, select Built-in and click the three-dot
button next to Edit schema to open a dialog box where you can define the schema manually.

XML components
tFileInputXML
The schema in this example consists of five columns: id, CustomerName, CustomerAddress, idState
and id2.
• Click the three-dot button next to the Filename field and browse to the XML file you want
to process.
• In the Loop XPath query, enter between inverted commas the path of the XML node on
which to loop in order to retrieve data.
In the Mapping table, Column is automatically populated with the defined schema.
• In the XPath query column, enter between inverted commas the node of the XML file that
holds the data you want to extract from the corresponding column.
• In the Limit field, enter the number of lines to be processed, the first 10 lines in this example.
• Double-click tFileOutputXML to display its Basic settings view and define the component
properties.
• Click the three-dot button next to the File Name field and browse to the output XML file you
want to collect data in, customer_data.xml in this example.
• In the Row tag field, enter between inverted commas the name you want to give to the tag
that will hold the recuperated data.
• Click Edit schema to display the schema dialog box and make sure that the schema matches
that of the preceding component. If not, click Sync columns to retrieve the schema from the
preceding component.

XML components
tFileInputXML
properties.
• Click Edit schema to open the schema dialog box and make sure that the schema matches
that of the preceding component. If not, click Sync columns to retrieve the schema of the
preceding component.
• In the Mode area, select the Vertical option.
The output file customer_data.xml holding the correct XML data is created in the defined path and
erroneous XML data is displayed on the console of the Run view.

XML components
tFileOutputXML
tFileOutputXML
tFileOutputXML properties
Component family XML or File/Output
Function tFileOutputXML outputs data to an XML type of file.

Purpose tFileOutputXML writes an XML file with separated data value according to
a defined schema.
be used.

Component view of Talend Open Studio User Guide
Root tag Wraps the whole output file structure and data.
Row tag Wraps data and structure per row
Column name as tag Select this check box to leverage the column labels
name from the input schema, as data wrapping tag.
Split output in files If the XML file output is big, you can split the file
every certain number of rows.
User Guide.
Guide.
component.
DB data handling.
Limitation n/a

XML components
tFileOutputXML
Scenario: From Positional to XML file
Find a scenario using tFileOutputXML component in the scenario of tFileInputPositional on page

864.

XML components
tWriteXMLField
tWriteXMLField
tWriteXMLField properties
Function tWriteXMLField outputs data to defined fields of the output XML file.
Purpose tWriteXMLField reads an input XML file and extracts the structure to insert
it in defined fields of the output file.
Basic settings Output Column Select the destination field in the output component
where you want to write the XML structure.
Configure Xml Tree Opens the interface that supports the creation of the
XML structure you want to write in a field. For more
information about the interface, see Defining the
XML tree on page 1221.
Repository: You already created the schema and
User Guide.
once the Row connection is linked with the output
component.
Group by Define the aggregation set, the columns you want to
use to regroup the data.
Advanced settings Remove the xml Select this check box if you do not want to include the
declaration XML header.
Create empty element This check box is selected by default. If the Related
if needed Column in the interface that supports the creation of
the XML structure has null values, or if no column is
associated with the XML node, this option creates an
open/close tag in the expected place.
Create associated If one of the XML elements is defined as a
XSD file Namespace element, this option will create the
corresponding XSD file.
To use this option, you must select the Dom4J
generation mode.

XML components
tWriteXMLField
Advanced separator Select this check box if you want to modify the
(for number) separators used by default for numbers.
Thousands separator: enter between brackets the
separators to use for thousands.
Decimal separator: enter between brackets the
separators to use for decimals.
Generation mode Select in the list the faster or slower generation mode
according to the available memory in your system:
Fast but memory-consuming - Dom4J,
Slow with no memory consumed.
Encoding Type Select the encoding type in the list or select Custom
when working with databases.
level.
Limitation n/a
Scenario: Extracting the structure of an XML file and inserting it into

the fields of a database table
This three-component Java scenario allows to read an XML file, extract the XML structure, and
finally outputs the structure to the fields of a database table.
tFileInputXml, tWriteXMLField, and tMysqlOutput.
• Connect the three components using Main links.

• Double-click tFileInputXml to open its Basic settings view and define its properties.

XML components
tWriteXMLField
• If you have already stored the input schema in the Repository tree view, select Repository
first from the Property Type list and then from the Schema list to display the [Repository
Content] dialog box where you can select the relevant metadata.
Setting up a File XML schema and Dropping components from the Metadata node in Talend
• If you have not stored the input schema locally, select Built-in in the Property Type and
Schema fields and fill in the fields that follow manually. For more information about
tFileInputXML properties, see tFileInputXML on page 1241.
• If you have selected Built-in, click the three-dot button next to the Edit schema field to open
a dialog box where you can manually define the structure of your file.
• In the Look Xpath query field, enter the node of the structure where the loop is based. In
this example, the loop is based on the customer node. Column in the Mapping table will be
automatically populated with the defined file content.
• In the Xpath query column, enter between inverted commas the node of the XML file that
holds the data corresponding to each of the Column fields.
• In the design workspace, click tWriteXMLField and then in the Component view, click
Basic settings to open the relevant view where you can define the component properties.
• Click the three-dot button next to the Edit schema field to open a dialog box where you can
add a line by clicking the plus button.

XML components
tWriteXMLField
• Click in the line and enter the name of the output column where you want to write the XML
content, CustomerDetails in this example.
• Define the type and length in the corresponding fields, String and 255in this example.
• Click Ok to validate your output schema and close the dialog box.
• In the Basic settings view and from the Output Column list, select the column you already
defined where you want to write the XML content.
• Click the three-dot button next to Configure Xml Tree to open the interface that helps to
create the XML structure.
• In the Link Target area, click rootTag and rename it as CustomerDetails.

• In the Linker source area, drop CustomerName and CustomerAddress to CustomerDetails.
A dialog box displays asking what type of operation you want to do.
• Select Create as sub-element of target node to create a sub-element of the
CustomerDetails node.
• Right-click CustomerName and select from the contextual menu Set As Loop Element.
• Click OK to validate the XML structure you defined.
• Double-click tMysqlOutput to open its Basic settings view and define its properties.

XML components
tWriteXMLField
• If you have already stored the schema in the DB Connection node in the Repository tree
view, select Repository from the Schema list to display the [Repository Content] dialog
box where you can select the relevant metadata.
Setting up a DB connection and Dropping components from the Metadata node in Talend
• If you have not stored the schema locally, select Built-in in the Property Type and Schema
fields and enter the database connection and data structure information manually. For more
information about tMysqlOutput properties, see tMysqlOutput on page 472.
• In the Table field, enter the name of the database table to be created, where you want to write
the extracted XML data.
• From the Action on table list, select Create table to create the defined table.
• From the Action on data list, select Insert to write the data.
three-dot button next to Edit schema to view the schema.
tWriteXMLField fills every field of the CustomerDetails column with the XML structure of the
input file: the XML processing instruction <?xml version=""1.0""
encoding=""ISO-8859-15""?>, the first node that separates each client

XML components
tWriteXMLField
<CustomerDetails> and finally customer information <CustomerAddress> and

<CustomerName>.

XML components
tXSDValidator
tXSDValidator
tDTDValidator Properties
Function Validates the XML input file against a XSD file and sends the validation
log to the defined output.
Purpose Helps at controlling data and structure quality of the file to be processed
case, the schema is read-only. It contains standard
information regarding the file validation.
XSD file Filepath to the reference DTD file.
XML file Filepath to the XML file to be validated.
If XML is valid, Type in a message to be displayed in the Run
display console based on the result of the comparison.
If XML is not valid
detected, display
Print to console Select this check box to display the validation
message.
Limitation n/a
Related scenario
For related tXSDValidator use cases, see Scenario: Validating xml files on page 1231.

XML components
tXSLT
tXSLT
tXSLT Properties
Function Refers to an XSL stylesheet, to transform an XML source file into a

defined output file.
Purpose Helps to transform data structure to another structure.
Basic settings XML file File path to the XML file to be validated.
XSL file File path to the reference XSL transformation
file.
Output file File path to the output file. If the file does not
exist, it will be created. The output file can be any
structured or unstructured file such as html, xml,
txt or even pdf or edifact depending on your xsl.
Parameters Click the plus button to add new lines in the
Parameters list and define the transformation
parameters of the XSLT file. Click in each line
and enter the key in the name list and its
associated value in the value list.
Limitation n/a
Scenario: Transforming XML to html using an XSL stylesheet
This Java scenario describes a two-component Job that converts xml data into an html document
using an xsl stylesheet. It as well defines a transformation parameter of the xsl stylesheet to change
the background color of the header of the created html document.
• Drop the tXSLT and tMsBox components from the Palette to the design workspace.
• Double-click tXSLT to open its Basic settings view where you can define the component
properties.

XML components
tXSLT
• In the XML file field, set the path or browse to the xml file to be transformed. In this
example, the xml file holds a list of MP3 song titles and related information including artist
names, company etc.
• In the XSL file field in the Basic settings view, set the path or browse to the relevant xsl file.
• In the Output file field, set the path or browse to the output html file.
In this example, we want to convert the xml data into an html file holding a table heading followed
by a table listing artists’ names next to song titles.

XML components
tXSLT
• In the Parameters area of the Basic settings view, click the plus button to add a line where
you can define the name and value of the transformation parameter of the xsl file. In this
example, the name of the transformation parameter we want to use is bgcolor and the value
is green.
• Double-click the tMsgBox to display its Basic settings view and define its display
properties as needed.
• Save the Job and press F6 to execute it. The message box displays confirming that the output
html file is created and stored in the defined path.

XML components
tXSLT
• Click OK to close the message box.

You can now open the output html file to check the transformation of the xml data and that of the
background color of the table heading.

A tSybaseSCD ............................................59
Alias ........................................................... 755
C
B Component, Composant ...........................1197
Business Context .....................................................1183
tCentricCRMInput ................................. 77 Custom Code
tCentricCRMOutput ............................... 78 tJava ......................................156, 157, 159
tHL7Input ............................................... 79 tJavaFlex ...............................162, 168, 169
tMSAXInput .......................................... 93 CustomCode
tMSAXOutput ........................................ 94 tGroovy .................................................156
tOpenbravoERPInput ........................... 101 tGroovyFile ...........................................157
tOpenBravoERPOutput ....................... 103 tJavaRow ..............................................168
tSalesforceBulkExec ............................ 104 tLibraryLoad .........................................169
tSalesforceConnection ......................... 106
tSalesforceGetDeleted ......................... 107 D
tSalesforceGetServerTimestamp ......... 111 Data Quality
tSalesforceGetUpdated ........................ 113 tAddCRCRow .......................................176
tSalesforceInput ................................... 115 tFuzzyMatch .........................................180
tSalesforceOutput ................................. 117 tIntervalMatch ......................................185
tSalesforceOutputBulk ......................... 121 tParseAddress .......................................190
tSalesforceOutputBulkExec ................. 126 tParseName ...........................................192
tSAPCommit ........................................ 131 tReplaceList ..........................................194
tSAPInput ............................................. 133 tSchemaComplianceCheck ...................198
tSAPOutput .......................................... 146 tUniqRow .............................................203
tSAPRollback ....................................... 148 Data quality
tSugarCRMInput ............................ 83, 149 tAddCRCRow ...............................176, 190
tSugarCRMOutput ......................... 91, 151 tFuzzyMatch .........................................180
tVtigerCRMInput ................................. 152 Database
tVtigerCRMOutput .............................. 153 PostgresPlusOutput ...............................600
Business Intelligence tAccessInput .........................................206
tDB2SCD ................................................. 2 tAccessOutput .......................................208
tGreenplumSCD ....................................... 7 tAccessRow ..........................................211
tInformixSCD .......................................... 9 tAS400Commit .....................................214
tIngresSCD ............................................. 11 tAS400Connection ...............................215
tMondrianInput ...................................... 13 tAS400Input .........................................217
tMSSqlSCD ........................................... 17 tAS400LastInsertId ...............................219
tMysqlSCD ............................................ 19 tAS400Output .......................................220
tOracleSCD ............................................ 34 tAS400Rollback ...........................213, 223
tParAccelSCD ........................................ 39 tAS400Row ..........................................224
tPostgresPlusSCD .................................. 41 tCreateTable .........................................227
tPostgresqlSCD ...................................... 46 tDB2BulkExec ......................................232
tSPSSInput ............................................. 51 tDB2Commit ........................................236
tSPSSOutput .......................................... 54 tDB2Connection ...................................237
tSPSSProperties ..................................... 57 tDB2Input .............................................239
tSPSSStructure ....................................... 58 tDB2Output ..........................................241
Creative Commons License Talend Open Studio Components i

tDB2Rollback ..... 235, 244, 284, 299, 353, tGreenplumInput ...................................303
365, . 386, 427, 454, 522, 541, 573, tGreenplumOutput ................................305
594, .......................... 639, 657, 684 tGreenplumOutputBulk ........................308
tDB2Row ............................................. 245 tGreenplumOutputBulkExec ................310
tDB2SCD ..................................... 248, 923 tGreenplumRollback .............................312
tDB2SCDELT .................................. 4, 249 tGreenplumRow ...................................313
tDB2SP ................................................ 250 tGreenplumSCD ...................................316
tDBInput ...................................... 252, 836 tHiveClose ............................................317
tDBOutput ............................................ 256 tHiveConnection ...................................318
tDBSQLRow ........................................ 261 tHiveRow ..............................................319
tELTAggregate .................................... 730 tHSQLDbInput .....................................321
tELTCommit ........................................ 737 tHSQLDbOutput ...................................323
tELTFilterColumns .............................. 739 tHSQLDbRow ......................................326
tELTFilterRows ................................... 741 tInformixBulkExec ...............................328
tELTJDBCInput ................................... 743 tInformixClose ......................................331
tELTJDBCMap .................................... 744 tInformixCommit ..................................332
tELTJDBCOutput ................................ 746 tInformixConnection ............................333
tELTMSSqlInput ................................. 748 tInformixInput ......................................335
tELTMSSqlMap ................................... 749 tInformixOutput ....................................337
tELTMSSqlOutput ............................... 751 tInformixOutputBulk ............................340
tELTMysqlInput .................................. 753 tInformixOutputBulkExec ....................342
tELTMysqlMap ................................... 754 tInformixRollback ................................345
tELTMysqlOutput ................................ 763 tInformixRow .......................................346
tELTOracleInput .................................. 765 tInformixSCD .......................................349
tELTOracleMap ................................... 766 tInformixSP ..........................................350
tELTOracleOutput ............................... 771 tIngresCommit ......................................354
tELTPostgresqlInput ............................ 779 tIngresInput ...........................................356
tELTPostgresqlMap ............................. 780 tIngresOutput ........................................358
tELTPostgresqlOutput ......................... 782 tIngresRollback .....................................361
tELTTeradataInput ............................... 791 tIngresRow ...........................................362
tELTTeradataMap ................................ 792 tIngresSCD ...........................................364
tELTTeradataOutput ............................ 794 tInterbaseCommit .................................366
tEXAInput ............................................ 264 tInterbaseConnection ............................367
tEXAOutput ......................................... 266 tInterbaseInput ......................................369
tEXARow ............................................. 269 tInterbaseOutput ...................................371
tEXistConnection ................................. 271 tInterbaseRollback ................................374
tFirebirdCommit .................................. 285 tInterbaseRow .......................................375
tFirebirdConnection ............................. 286 tJavaDBInput ........................................378
tFirebirdInput ....................................... 288 tJavaDBOutput .....................................380
tFirebirdOutput .................................... 290 tJavaDBRow .........................................383
tFirebirdRow ........................................ 293 tJDBCColumnList ................................385
tFixedFlowInput ................................. 1052 tJDBCCommit ..............................387, 595
tGreenplumBulkExec ........................... 296 tJDBCConnection .................................388
tGreenplumCommit ............................. 300 tJDBCInput ...........................................390
tGreenplumConnection ........................ 301 tJDBCOutput ........................................392
ii Talend Open Studio Components Creative Commons License

tJDBCRollback .................................... 395 617, ..639, 649, 657, 676, 684, 702,
tJDBCRow ........................................... 396 711, ...................................725, 784
tJDBCSP .............................................. 399 tMysqlRow ...................................500, 532
tJDBCTableList ................................... 401 tMysqlSCD ...........................................509
tLDAPAttributesInput ......................... 402 tMysqlSCDELT ..............................31, 510
tLDAPInput ......................................... 405 tMySqlSP ..............................................511
tLDAPOutput ....................................... 409 tMysqlTableList ...................................515
tLDAPRenameEntry ............................ 413 tNetezzaBulkExec ................................520
tMaxDBInput ....................................... 415 tNetezzaCommit ...................................523
tMaxDBOutput .................................... 417 tNetezzaConnection ..............................524
tMaxDBRow ........................................ 420 tNetezzaInput ........................................525
tMSSqlBulkExec ................................. 422 tNetezzaOutput .....................................527
tMSSqlCDC ......................................... 425 tNetezzaRollback ..................................531
tMSSqlColumnList .............................. 426 tNetezzaRow .........................................532
tMSSqlCommit .................................... 428 tOracleBulkExec ...................................535
tMSSqlConnection ............................... 429 tOracleCommit .....................................542
tMSSqlInput ......................................... 431 tOracleConnection ................................543
tMSSqlLastInsertId .............................. 433 tOracleInput ..........................................545
tMSSqlOutput .............................. 433, 434 tOracleOutput .......................................547
tMSSqlOutputBulk ...................... 438, 721 tOracleOutputBulk ...............................551
tMSSqlOutputBulkExec ...................... 440 tOracleOutputBulkExec .......................553
tMSSqlRollback ................................... 443 tOracleRollBack ...................................556
tMSSqlRow .......................................... 444 tOracleRow ...........................................557
tMSSqlSCD ......................................... 447 tOracleSCD ...........................................560
tMSSqlSP ............................................. 448 tOracleSCDELT .............................36, 561
tMSSqlTableList .................................. 450 tOracleSP ..............................................562
tMysqlBulkExec .......................... 451, 520 tOracleTableList ...................................569
tMysqlColumnList ............................... 455 tParAccelBulkExec ...............................570
tMysqlCommit .... 317, 331, 332, 354, 366, tParAccelCommit .................................574
385, . 387, 401, 426, 428, 450, 455, tParAccelConnection ............................575
459, . 515, 523, 542, 569, 574, 595, tParAccelInput ......................................577
616, .......... 640, 658, 685, 712, 737 tParAccelOutBulk .................................582
tMysqlConnection 215, 237, 271, 286, 301, tParAccelOutput ...................................579
. 318, 333, 355, 367, 388, 429, 460, tParAccelOutputBulkExec ...................584
524, . 543, 575, 596, 618, 641, 659, tParAccelRollBack ...............................586
686, .......................................... 713 tParAccelRow .......................................587
tMysqlInput .................................. 465, 715 tParAccelSCD .......................................590
tMysqlLastInsertId ............................... 467 tParseRecordSet ....................................591
tMysqlOutput ............................... 472, 717 tPostgresPlusBulkExec .........................592
tMysqlOutputBulk ............................... 489 tPostgresPlusConnection ......................596
tMysqlOutputBulkExec ....................... 494 tPostgresPlusInput ................................598
tMysqlRollback ... 148, 213, 219, 223, 235, tPostgresPlusOutBulk ...........................604
244, . 284, 299, 312, 345, 353, 361, tPostgresPlusRollback ..........................608
365, . 374, 386, 427, 443, 454, 467, tPostgresPlusRow .................................609
498, . 522, 531, 541, 573, 594, 608, tPostgresPlusSCD .................................612
Creative Commons License Talend Open Studio Components iii

tPostgresPlusSCDELT ................... 43, 613 tVerticaBulkExec .................................708
tPostgresqlBulkExec ............................ 614 tVerticaClose ........................................711
tPostgresqlClose ................................... 617 tVerticaCommit ....................685, 712, 737
tPostgresqlConnection ......................... 618 tVerticaConnection ...............................713
tPostgresqlInput ................................... 620 tVerticaInput .........................................715
tPostgresqlOutput ................................. 622 tVerticaOutput ......................................717
tPostgresqlOutputBulk ......................... 625 tVerticaOutputBulk ..............................721
tPostgresqlOutputBulkExec ................. 627 tVerticaOutputBulkExec ......................723
tPostgresqlRollBack ............................. 628 tVerticaRollback ...................................725
tPostgresqlRow .................................... 629 tVerticaRow ..........................................726
tPostgresqlSCD .................................... 632 Databases
tPostgresqlSCDELT ....................... 48, 633 tEXistDelete .........................................272
tSASInput ............................................. 634 tEXistGet ..............................................274
tSASOutput .......................................... 636 tEXistList ..............................................276
tSQLiteCommit .................................... 640 tEXistPut ...............................................278
tSQLiteConnection .............................. 641 tEXistXQuery .......................................280
tSQLiteInput ........................................ 642 tEXistXUpdate .....................................282
tSQLiteOutput ...................................... 646
tSQLiteRollback .................................. 649 E
tSQLiteRow ......................................... 650 ELT
tSybaseBulkExec ................................. 654 tELTMysqlInput ...743, 748, 753, 779, 786
tSybaseCommit .................................... 658 tELTMysqlMap ....744, 749, 754, 780, 787
tSybaseConnection ............................... 659 tELTMysqlOutput 746, 751, 763, 782, 789
tSybaseInput ......................................... 660 tELTOracleInput ...................................765
tSybaseIQBulkExec ............................. 662 tELTOracleMap ....................................766
tSybaseIQOutputBulkExec .................. 664 tELTOracleOutput ................................771
tSybaseOutput ...................................... 667 tELTRollback .......................................784
tSybaseOutputBulk .............................. 671 Explicit Join ................................................755
tSybaseOutputBulkExec ...................... 673
tSybaseRollback ................................... 676 F
tSybaseRow ......................................... 677 File
tSybaseSCD ......................................... 680 tAdvancedFileOutputXML ...................798
tSybaseSCDELT ............................ 61, 681 tApacheLogInput ..................................799
tSybaseSP ............................................. 682 tCreateTemporaryFile ...........................802
tTeradataCommit ................................. 685 tExtractDelimitedFields ......................1121
tTeradataConnection ............................ 686 tExtractPositionalFields ......................1126
tTeradataFastExport ............................. 688 tExtractRegexFields ...........................1128
tTeradataFastLoad ............................... 690 tFileArchive ..........................................807
tTeradataFastLoadUtility ..................... 692 tFileCompare ........................................809
tTeradataInput ...................................... 694 tFileCopy ..............................................812
tTeradataMultiLoad ............................. 696 tFileDelete ....................................814, 816
tTeradataOutput ................................... 698 tFileExist ...............................................816
tTeradataRollback ................................ 702 tFileInputARFF ....................................820
tTeradataRow ....................................... 703 tFileInputDelimited .....799, 820, 824, 839,
tTeradataTPump ................................... 706 846, ......851, 913, 914, 1198, 1203,
iv Talend Open Studio Components Creative Commons License

1208, ...................................... 1216 tJMSInput .............................................943
tFileInputExcel ..................................... 836 tJMSOutput ...........................................945
tFileInputFullRow ................................ 839 tMomInput ............................................946
tFileInputJSON .................................... 842 tMomMessageIdList .............................950
tFileInputLDIF ..................................... 846 tMomOutput .........................................951
tFileInputMail ...................................... 848 tPOP ......................................................952
tFileInputMSDelimited ........................ 851 tRSSInput .............................................956
tFileInputMSPositional ........................ 858 tRSSOutput ...........................................959
tFileInputPositional ..... 858, 864, 869, 898, tSCPDelete ...........................................967
900, .......................................... 909 tSCPFileExists ......................................968
tFileInputProperties ............................. 869 tSCPFileList .................................966, 969
tFileInputRegex ................................... 873 tSCPGet ................................................970
tFileInputXML .. 842, 860, 877, 901, 1234, tSCPPut .................................................972
1241, ...................................... 1250 tSCPRename .........................................973
tFileList ....... 179, 802, 878, 926, 928, 941, tSCPTruncate ........................................974
1062, ............................ 1077, 1132 tSendMail .............................................975
tFileOutputARFF ................................. 882 tSocketInput ..........................................978
tFileOutputDelimited ........................... 884 tSocketOutput .......................................983
tFileOutputExcel .................. 893, 911, 933 tWebServiceInput .................................985
tFileOutputLDIF .................................. 895 tXMLRPCInput ....................................993
tFileOutputMSDelimited ..................... 898
tFileOutputMSPositional ..................... 900 J
tFileOutputPositional ........................... 907 Join
tFileOutputProperties ........................... 909 Explicit .................................................755
tFileOutputXML ........................ 910, 1248
tFileProperties ...................................... 911 L
tFileRowCount ..................................... 913 Left Outer Join ............................................761
tFileTouch ............................................ 914 Log&Error
tFileUnarchive ............................. 807, 915 tDie .....................................................1012
tFTPFileProperties ............................... 933 tLogCatcher ........................................1019
tPivotToColumnsDelimited ................. 916 Log/error
tLogRow .............................................1023
I Logs&Errors
Internet tAssert ...................................................998
tFileFetch ............................................. 920 tAssertCatcher ....................................1004
tFTPConnection ................................... 924 tChronometerStart ..............................1006
tFTPDelete ........................................... 925 tChronometerStop ...............................1007
tFTPFileExist ....................................... 926 tDie .....................................................1012
tFTPFileList ......................................... 928 tFlowMeter .........................................1013
tFTPGet ................................................ 935 tFlowMeterCatcher .............................1014
tFTPPut 272, 274, 276, 278, 280, 282, 924, tLogCatcher ........................................1019
.......................................... 936, 939 tLogRow .............................................1023
tFTPRename ........................................ 939 tStatCatcher ........................................1024
tFTPTruncate ....................................... 941 tWarn ..................................................1027
Creative Commons License Talend Open Studio Components v

M tSortRow .............................................1178
Misc
tAddLocationFromIP ......................... 1030 S
tAlfrescoOutput ..................................... 66 StoreSQLQuery ..........................................254
tBufferInput ....................................... 1033 System
tBufferOutput ..................................... 1036 tRunJob ...............................................1182
tContextDump .................................... 1047 tSetEnv ...............................................1188
tContextLoad ..................................... 1048 tSSH ....................................................1192
tLoop .............................. 1071, 1074, 1075 tSystem ...............................................1194
tMsgBox ............................................. 1054
tRowGenerator ................................... 1056 T
Table
O Alias ......................................................755
Orchestration tAccessInput
tFileList ............................ 179, 1062, 1132 Advanced settings .................................206
tFlowToIterate ................................... 1063 tAccessOutput
tInfiniteLoop ...................................... 1067 Advanced settings .................................209
tIterateToFlow ................................... 1068 Talend MDM
tLoop .................................................. 1071 tMDMDelete .......................................1198
tPostjob .............................................. 1074 tMDMInput .........................................1203
tPrejob ................................................ 1075 tMDMOutput ......................................1208
tReplicate ........................................... 1076 tMDMReceive ....................................1216
tRunJob .............................................. 1077 tAlfrescoOutput
tSleep ................................................. 1078 Advanced settings ...................................67
tUnite ................................................. 1079 tAS400Input
tWaitForFile ....................................... 1083 Advanced settings .....18, 20, 217, 416, 421
tWaitForSqlData ................................ 1088 tAS400Output
Advanced settings .................................221
P tDB2Input
Processing Advanced settings .............240, 1049, 1052
tAggregateRow .......................... 730, 1094 tDB2Output
tAggregateSortedRow ........................ 1099 Advanced settings ...6, 38, 45, 50, 233, 243
tConvertType ..................................... 1101 tDBInput
tDenormalize ...................................... 1106 Advanced settings .................................252
tDenormalizeSortedRow .................... 1111 tDBOutput
tEmptyToNull .................................... 1115 Advanced settings .................................257
tExternalSortedRow ........................... 1119 tFileInputEBCDIC ......................................830
tFilterColumn ..................................... 1133 tFileInputExcel
tFilterRow .......................................... 1134 Advanced settings .................................837
tMap ................................................... 1137 tFileOutputEBCDIC ...................................890
tNormalize ......................................... 1163 tFirebirdInput
tPerl ...................................................... 172 Advanced settings .................265, 270, 304
tPivotToRows .................................... 1166 tFirebirdOutput
tReplace ............................................. 1170 Advanced settings .........................267, 291
tSampleRow ....................................... 1175 tFTPFileList ................................926, 928, 941
Creative Commons License Talend Open Studio Components vi

tHSQLDbInput Advanced settings .........571, 581, 602, 624
Advanced settings ................................ 322 tPostegrsqlOutput
tHSQLDBOutput Advanced settings .........571, 581, 602, 624
Advanced settings ................ 307, 325, 418 tSASInput
tHSQLDbOutput Advanced settings .................................635
Advanced settings ................ 307, 325, 418 tSASOutput
tInformixOutput Advanced settings .................................637
Advanced settings ........................ 330, 338 tSQLiteInput
tIngresInput Advanced settings .................................643
Advanced settings ................................ 356 tSQLiteOutput
tIngresOutput Advanced settings .................................648
Advanced settings ................................ 359 tSybaseInput
tInterbaseOutput Advanced settings .................................661
Advanced settings ................................ 372 tSybaseOutput
tJavaDBInput Advanced settings .................................669
Advanced settings ................................ 378 tTeradataInput
tJavaDBOutput Advanced settings .688, 690, 693, 697, 707
Advanced settings ........................ 381, 393 tTeradataOutput
tJDBCInput Advanced settings .........686, 695, 700, 704
Advanced settings ................................ 391
tJDBCOutput V
Advanced settings ................................ 393 Variable ....................................................1183
tMSSqlInput StoreSQLQuery ....................................254
Advanced settings ................................ 432
tMSSqlOutput X
Advanced settings ................................ 436 XML
tMysqlInput tAdvancedFileOutputXML .................1220
Advanced settings ................................ 466 tDTDValidator ....................................1231
tMysqlOutput tExtractXMLField ..............................1234
Advanced settings ........ 330, 338, 474, 529 tFileInputMSXML ................................860
tOracleInput tFileInputXML ...................................1241
Advanced settings ................................ 546 tFileOutputMSXML .............................901
tOracleOutput tFileOutputXML .................................1248
Advanced settings ................................ 549 tWriteXMLField .................................1250
tPostegresqlInput tXSDValidator ....................................1256
Advanced settings ................ 578, 599, 621 tXSLT .................................................1257
tPostegresqlOutput
vii Talend Open Studio Components Creative Commons License

TalendOpenStudio Components RG 40b en

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

TalendOpenStudio Components RG 40b en

Hochgeladen von

Copyright:

Verfügbare Formate

Talend Open Studio

ii Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components iii

Creative Commons License Talend Open Studio Components v

vi Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components vii

viii Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components ix

x Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components xi

xii Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components xiii

xiv Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components xv

xvi Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components xvii

xviii Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components xix

This guide is for users and administrators of Talend Open Studio.

This guide uses the following typographical conventions:

Creative Commons License Talend Open Studio Components xxi

Version Date History of Change

v4.0_a 06/04/2010 Updates in Talend Open Studio Reference Guide include:

v4.0_b 27/05/2010 Updates in Talend Open Studio Reference Guide include:

Feedback and Support

xxii Talend Open Studio Components Creative Commons License

Component family Databases/DB2

When a Job contains the parent Job and the

2 Talend Open Studio Components Creative Commons License

Schema and Edit A schema is a row description, i.e., it defines the

Creative Commons License Talend Open Studio Components 3

Component family Databases/DB2

When a Job contains the parent Job and the

4 Talend Open Studio Components Creative Commons License

Action on table Select to perform one of the following operations on

Creative Commons License Talend Open Studio Components 5

6 Talend Open Studio Components Creative Commons License

Component family Databases/Greenplum

Function tGreenplumSCD reflects and tracks changes in a dedicated Greenplum SCD

When a Job contains the parent Job and the

Creative Commons License Talend Open Studio Components 7

Schema and Edit A schema is a row description, i.e., it defines the

8 Talend Open Studio Components Creative Commons License

Component family Databases/Business

When a Job contains the parent Job and the

Creative Commons License Talend Open Studio Components 9

Table Name of the table to be created

10 Talend Open Studio Components Creative Commons License

Component family Databases/Ingress

Creative Commons License Talend Open Studio Components 11

12 Talend Open Studio Components Creative Commons License

Component family Business

Creative Commons License Talend Open Studio Components 13

Scenario: Cross-join tables

14 Talend Open Studio Components Creative Commons License

Creative Commons License Talend Open Studio Components 15

• Eventually, select the Encoding type on the list.

16 Talend Open Studio Components Creative Commons License

Component family Databases/MSSQL

When a Job contains the parent Job and the

Creative Commons License Talend Open Studio Components 17

Schema and Edit A schema is a row description, i.e., it defines the