Data Integration - Kettle

Document created by Pedro Goncalves on Aug 10, 2017Last modified by Diogo Belem on Nov 9, 2017
Version 23Show Document
  • View in full screen mode


Data Integration (or Kettle) delivers powerful Extraction, Transformation, and Loading (ETL) capabilities, using a groundbreaking, metadata-driven approach.

CommunityDataIntegration(Trans).png

 

 

Description

 

Reliable service architecture

With an intuitive, graphical, drag and drop design environment and a proven, scalable, standards-based architecture, Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools.

 

pdi-screen1.pngpdi-screen2.pngpdi-screen3.pngpdi-screen4.png

 


 

Frequently asked questions

When I start Spoon.bat in a Windows environment nothing happens. How can I solve it?

How to use JNDI?

Can I sequence transformations?

Edit the Spoon.bat file and:

  • Replace in the last line "start javaw" with only "java".
  • Add a "pause" in the next line.
  • Save and try it again.
If you look inside the PDI main directory you'll see a sub-directory called "simple-jndi", which contains a file named "jdbc.properties". You should change this file so the JNDI information matches the one you use in your application server.This is not possible, one of the basic things in PDI transformations is that all of the steps run in parallel. So you can't sequence them. This would require architectural changes to PDI and sequential processing would also result in very slow processing.

What's the difference between transformations and jobs?

How to use database connections from repository?

How to do a database join with PDI?

Transformations are about moving and transforming rows from source to target. Jobs are more about high level flow control: executing transformations, sending mails on failure, ftp'ing files, ...Create a new transformation (or job) or close and re-open the ones you have loaded in Spoon.Create a new transformation (or job) or close and re-open the ones you have loaded in Spoon.

 


 

Main concepts

 

Improved system performance monitoring

Data profiling enhancements

Easily add new plugins

Minor bug fixes to the PDI-specific portions of the Pentaho.Data Profiling Perspective includes DataCleaner: Analyze Tables and Columns in preparation for ETL.The PDI Marketplace makes it possible to share and download new plugins.

Deliver data from multiple data sources

Data movement load balancing

Revert changes in job database transactions

The Data Services and Kettle JDBC driver enable you to deliver data from multiple data sources, while enriching, cleansing, and transforming the data.PDI provides load balancing of data within transformations and over multiple cluster nodes when using transformation clustering.Database connections can be used with all jobs. This enables commits and rollbacks on a job level. Prior to this release, this was only possible with transformations.

 


 

Downloads

 

Data Integration 7.1

Pentaho’s Data Integration, also known as Kettle, delivers powerful extraction, transformation, and loading (ETL) capabilities.

7.1 Stable

 

Change Log

Older versions

 


 

How to get PDI up and running

 

Linux

 

Ubuntu 12.04 and later:

  • The libwebkitgtk package needs to be installed. This can be done by running
    apt-get install libwebkitgtk-1.0.0
  • Unzip the downloaded file. Run spoon.sh file, it should be under /data-integration.
  • On some installations of Ubuntu 14.04, Unity doesn't display the menu bar. In order to fix that, spoon.sh has a setting to disable this integration, export
    UBUNTU_MENUPROXY=0
    You can try to remove that setting if you wish to see if it works propery on your machine

 

CentOS 6 Desktop:

  • The libwebkitgtk package needs to be installed. This can be done by running
    yum install libwebkitgtk
  • Unzip the downloaded file and run spoon.sh, it should be under /data-integration.

 

 

Windows

After unzipping the downloaded file, you can launch Spoon by navigating to the folder /data-integration and double clicking Spoon.bat

If you are using Infobright, make sure to copy the following files to your Windows system path (for example %WINDIR%/System32/):

  • libswt/win32/infobright_jni_64bit.dll (Windows 64-bit)
  • libswt/win32/infobright_jni.dll (Windows 32-bit)

Rename the file to: infobright_jni.dll, then run Spoon.bat to launch Spoon.

 

 

Mac OS

After unzipping the downloaded file, you can launch Spoon by navigating to the folder /data-integration and double clicking on the "Data Integration" application icon.

 


 

Attachments

    Outcomes