One of the best features in Pentaho Data Integration is the possibility to create your own transformation step also known as a “Pentaho Data Integration Plug-In”.

For example: I want to collect twitter messages containing specific keywords so I can analyze them afterwards. Pentaho itself does not have a standard plugin for this data input. So I created a plugin called “Twitter Search” which retrieves all messages containing their specified searchterms. Building a database containing all messages since the start of this project.

Pentaho Data Integration Plugin for Twitter

Pentaho Data Integration Plugin for Twitter

With this data I now have access to an extra datasource which I can use for my analytic environment. Giving me extra insights and information on the chosen searchterms.

The only preferable requirement, to create such a custom plugin, is the availability of a JAVA API which can connect to the datasource and retrieve the data and transforms it into a readable format Pentaho can use in it’s ETL flow.

Pentaho Data Integration is part of the Pentaho BI Suite. This suite contains everything you need for a business intelligence project. Data Integration, Reporting, Analysis and data mining possibilities. Check their site for more information.

Feel free to contact me if you want more information regarding this subject.


  1. This is a nice start Bram!

  2. Tellervo Warelius

    Good post! This is the kind of information that should be distributed on the online community. I would like to read more of this.

  3. A cool blog post there mate . Thanks for that .

You must be logged in to leave a reply.