You are on page 1of 7

Global Open Versity Vancouver Canada

Install Guide Pentaho Data Integration with MySQL v1.2

Global Open Versity Pentaho Business Intelligence BI Suite Training Manual Part II Install Guide Pentaho Data Integration (Kettle) with MySQL
Kefa Rabah Global Open Versity, Vancouver Canada krabah@globalopenversity.org www.globalopenversity.org

Table of Contents

Page No.

INSTALL GUIDE PENTAHO DATA INTEGRATION (KETTLE) WITH MYSQL Introduction Background Information Part 1: Starting MySQL Server Part 2: Download & Install Pentaho Data Integration (Kettle) Part 4: Hands-On Lab Assignment 1 References Part 5: Need More Training on Windows Data Warehousing and BI Principles using Pentaho BI Other Related Training Part 6: Hands-on Labs Assignments

2 2 2 3 4 22 22 22 23 23 23

A GOV Open Access Technical Academic Publications Enhancing education & empowering people worldwide through eLearning in the 21st Century
April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada

www.globalopenversity.org

BM301 - Data Warehousing & Business Intelligence Principles

Global Open Versity Vancouver Canada

Install Guide Pentaho Data Integration with MySQL v1.2

Global Open Versity Pentaho Business Intelligence BI Suite Training Manual Part II Install Guide Pentaho Data Integration (Kettle) with MySQL
By Kefa Rabah, krabah@globalopenversity.org Sept., 13 2010

GTS Institute

Introduction
The Pentaho BI Project is Open Source application software for enterprise reporting, OLAP analysis, dashboard, data mining, workflow and ETL capabilities for Business Intelligence (BI) platform that have mad it the worlds leading and most widely deployed open source BI suite. It also offers self-service dashboard design for business users and cloud computing support for IT. In Part I of this guide we showed you how to install Pentaho Business Intelligence BI Suite CE server with MySQL, Report Designer CE, and Design Studio CE on a Linux machine. It also included how to setup Pentaho Data Integration (Kettle). In this second part of the series, well continue working with Pentaho Data Integration and show you how to build a simple input-output transformation using your own data source from MySQL database. This guide assumes you have some basic knowledge of Linux and MySQL.

Background Information
Data integration focuses mainly on databases. A database is an organized collection of data. It's similar to a file system, which is an organizational structure for files so they're easy to find, access, and manipulate. Pentaho Data Integration (PDI) is a powerful, metadata-driven ETL tool designed to bridge the gap between business and IT. Kettle is an acronym for "Kettle E.T.T.L. Environment." Kettle is designed to help you with your ETTL needs, which include the Extraction, Transformation, Transportation and Loading of data. Kettle itself is part of Pentaho BI applications suite. It is an independent project initiated by Matt Casters until acquired by Pentaho in 2006. Ever since, Kettle is also known as Pentaho Data Integration (PDI). Matt himself still leads the PDI project development in Pentaho. Kettle comprise of 4 applications: Spoon - graphical designer for designing job and transformation schemes. It is based on swing. Pan - script that is used to execute transformation scheme in .ktr xml file form or from a repository. Kitchen - script that is used to execute job scheme in .kjb xml file form or from a repository. Carte - a temporary web server which is used to execute job/transformation in cluster / parallel

Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the Kettle tools Pan and Kitchen. Pan is a data transformation engine that performs a multitude of functions such as reading, manipulating, and writing data to and from various data
April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada

www.globalopenversity.org

BM301 - Data Warehousing & Business Intelligence Principles

Global Open Versity Vancouver Canada

Install Guide Pentaho Data Integration with MySQL v1.2

sources. Kitchen is a program that executes jobs designed by Spoon in XML or in a database repository. Jobs are usually scheduled in batch mode to be run automatically at regular intervals. Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. Pan or Kitchen can then read the data to execute the steps described in the transformation or to run the job. In summary, Pentaho Data Integration makes data warehouses easier to build, update, and maintain E.T.L. and Datawarehousing - being an ETL tool, Kettle is an environment that's designed to: collect data from a variety of sources (extraction) move and modify data (transport and transform) while cleansing, denormalizing, aggregating and enriching it in the process frequently (typically on a daily basis) store data (loading) in the final target destination, which is usually a large, dimensionally modeled database called a data warehouse

Part 1: Starting MySQL Server


In this HowTo well assume that MySQL is already installed and configured appropriately. If not check out an excellent MySQL install HowTo by the same author in Scribd.com. To check if MySQL database is installed, perform the following steps on the MySQL server: 1. As root user, verify that the MySQL server is running properly by entering the following command: #ps -ef | grep mysqld If the MySQL server is running, a process named mysqld displays in the output; otherwise enter the following command t: # /etc/init.d/mysqld start 2. To have MySQL start up on boot, as root enter chkconfig --level 345 mysqld on. 3. IMPORTANT! Set up the mysql database root password. Without a password, ANY user on the box can login to mysql as database root. The mysql root account is a separate password from the machine root account.
# mysqladmin u root password 'NewRootPassword' \\quotes are required

4. Create a Pentaho Bi database user "pbiuser" and a "bankdb" database. Were going to use it later in Part 9. 5. Now lets test the login capability of Pentaho BI user, "pbiuser":

[root@fc10ds ~]# mysql -u pbiuser -ppassword Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.0.77 Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada

www.globalopenversity.org

BM301 - Data Warehousing & Business Intelligence Principles

Global Open Versity Vancouver Canada

Install Guide Pentaho Data Integration with MySQL v1.2

mysql> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | information_schema | | bankdb | | jportaldb | | mytestdb | | osmsdb | | phpwebdb | | storedb | +--------------------+ 14 rows in set (0.37 sec) mysql>
So now were good to go, as our "pbiuser" can login into MySQL server and perform all privileged operations. In the next section you will learn how to Connect to MySQL database repository and deploy your own Repository

Part 2: Download & Install Pentaho Data Integration (Kettle)


1. To download Pentaho Data Integration Community Edition, just go directly to the Pentaho SourceForge page and make sure you download the latest stable version. At the time of writing, we downloaded: pdi-ce-4.0.1-stable.tar.gz file. Save it also in /usr/pentaho directory. 2. To install "pdi-ce" server change to /usr/pentaho directory, and type the following commands.

#tar xzvf pdi-ce-stable-4.0.1-stable.tar.gz


3. After the installation is complete, change to data-integration directory, and issue the command to start the pdi-ce as follows: [root@fc10ds ~]# cd /usr/pentaho/data-integration [root@fc10ds data-integration]# sh spoon.sh Note: you should see Pentaho start-up page, as shown in Fig. 1.

April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada

www.globalopenversity.org

BM301 - Data Warehousing & Business Intelligence Principles

Global Open Versity Vancouver Canada

Install Guide Pentaho Data Integration with MySQL v1.2

Fig. 1

4. Youll be presented with the Repository Connection dialog box page, you have the option to enter Repository name, if you have one or you can click on the repository, see Fig. 2.
icon to connect o a data

Fig. 2: Pentaho Data Integration login screen.

5.

Now, click on the icon to access the Select the repository type dialog box, as shown in Fig. 3. Select the first option as shown and the click OK.

April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada

www.globalopenversity.org

BM301 - Data Warehousing & Business Intelligence Principles

Global Open Versity Vancouver Canada

Install Guide Pentaho Data Integration with MySQL v1.2

Fig. 3: Select the repository dialog box.

6. Youll be presented with Repository information dialog box as shown in Fig. 4. Click on the New button at the to-right-hand corner. 7. Follow the link below to access the full document.

The full document has moved to Docstoc.com. You can access and download it from here: Install Guide Pentaho BI Data Integration (Spoon) with MySQL

OR http://www.docstoc.com/docs/31451411/?key=NzFiYmMyZDgt&pass=MTY5ZS00OTdj

Pentaho BI Suite Training:


You can now register and take our superb Pentaho BI Suite self-paced training course: BM301 Data Warehousing and Business Intelligence Principles with Pentaho BI Suite

Call us today: Tel: +1-604-495-6361 Email: info@globalopenversity.org.

URL: www.globalopenversity.org

Other Related Articles & Hands-on Lab Manuals: 1. 2. 3. 4. 5. 6. 7. Install Guide for Pentaho Business Intelligence BI Suite CE Install & Configure Apache PHP PostgreSQL & MySQL on Linux v1.1 Connecting Tomcat AS to MySQL and Oracle 10g XE DBs on Linux Using JDBC Installing & Configuring Oracle Database10g XE on Linux CentOS5 v1.0 Using Webmin and Bind9 to Setup DNS Server on Linux Build your own ISP Hosting using EHCP on Ubuntu 10.04 LTS Server Build your Own Private Data Center Backup Solutions using Ubuntu Powered RESTORE Backup Server v1.0 8. Install & Setup Astaro Security Gateway to Protect Corporate Network v1.1

----------------------------------------------Kefa Rabah is the Founder of Global Technology Solutions Institute. Kefa is knowledgeable in several fields of Science & Technology, Information Security Compliance and Project Management, and Renewable Energy Systems. He is also the founder of Global Open Versity, a place to enhance your educating and career goals using the latest innovations and technologies.
April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada

www.globalopenversity.org

BM301 - Data Warehousing & Business Intelligence Principles