Help & Support



Review of Pentaho Data Integration and Analytics for use in direct marketing organizations

Try it now

About Pentaho

Pentaho Data Integration and Analytics is an open source software solution designed to handle data management tasks often required by many database marketing organizations. Pentaho, as we are examining it here, can be divided into two tools. They can be described as follows:

Data Integration

The Data Integration tool is a standalone ETL application which enables the creation of data pipelines to access, prepare, blend data, and then to eventually load the various sources into a database. It has the capability to load data from multiple sources, ranging from basic CSV files, to various databases and metadata formats.


Data can be transformed with a wide range of built in functions. Beyond basic input/output, validation and mapping functions, Pentaho provides a suite of tools that one would expect from a full-fledged ETL suite, including a wide range of data-mining and statistical analysis features. It has full capability to handle Hadoop map reduce functionality with a minimal level of coding skill required.

Business Analytics

Pentaho also comes packaged with an easy to use high-level business analytics application server, which enables the creation of various analytics tools for quick analysis. By dragging and dropping data from spreadsheets or through database queries, users can quickly create a wide range of charts and graphs, which can be easily configured into interactive dashboards for tracking of direct marketing data, customers, campaigns, and more.

The Business Analytics server is easy to configure from an administrative perspective to allow creation of multiple user accounts with varying levels of access. It also includes a scheduling tool which can refresh data according to a number of specified parameters.


To demonstrate how this might be used for a direct marketing organization, we'll walk through a workflow that might make sense from a business perspective, from data input to reporting.

Data Integration

Pentaho's Data Integration module allows users to extract data from a number of data sources, transform it so that it is in a standardized format, and load this information into a central database to make it possible for future analysis.

Loading customer data from a CSV file is accomplished by dragging a data input node onto the canvas and loading the file. The data can be previewed, and then imported. Storing this into a database simply requires dragging and dropping a “table output” node, specifying a target database, and connecting the two nodes. By defining a table in the database, this will automatically take any of the information from the input file and generate a SQL query:




The transformation created will consist of only two nodes and a connection.




Typically with customer contact data, it's quite common that a significant amount of necessary data may be missing from the original file. It is relatively simple to create a basic workflow which will examine the data from the original file and identify those fields that are missing, such as postal zip codes, and combine them with a different file for lookup purposes.

The workflow below shows connecting two files: the first one is the original csv file, and a simple conditional check for existence of a postal code. If the condition succeeds, it writes those rows to the database. For those that are missing this data, it checks a second table which provides a basic mapping table showing association of address data with postal codes. We can check to see if those postal codes can be determined; then these two pieces of data can be merged and separately loaded into the database.



While this is a relatively simple demonstration, as it uses two similar data source types, Pentaho integrates well with numerous sources, ranging from the aforementioned CSV files as well as multiple relational and non-relational databases, and APIs. It boasts a simple interface for performing some more complicated functions, such as enabling users to be able to filter large amounts of parallel data, with some built-in Hadoop mapreduce functionality.

Business Analytics

In order to gain solid business intelligence out of marketing or sales data, Pentaho provides an easy to use Business Analytics platform.



This tool can be used either directly with data transformed and loaded by the Data Integration module, or with independent datasets. The interface for this business intelligence module is designed specifically with business users in mind; very little technical knowledge is required beyond standard business analysis.

Loading data from a CSV file or from a SQL query is driven by wizards, which allow previewing data, and also allows managing data types and field formatting on the fly:



Data Analysis

Pentaho's Business Analytics tool provides some helpful graphical analysis tools. For instance, if working with multi-dimensional datasets, creating interactive tables is no more complicated than creating pivot tables in spreadsheet software. It provides functionality which allows drilling down to individual data segments, such as sales per year and region:



The resulting summarized data can easily be converted into a graph with only a few clicks.





Each of these individual reports, can be saved as widgets and can be used to quickly create interactive dashboards:






If you data is being regularly updated into a database through a datastream, it's easy to trigger queries off of a schedule, keeping dashboards up to date based on user-specified criteria.

Summary: Key takeaway

Pentaho can be a powerful tool for handling marketing data activities. It is user-friendly and has an appealing look and feel, and can be quite powerful for data access and transformation. It boasts many tools which make data manipulation relatively painless, and it will work well for complex analytics and use cases.

Its drawbacks are mainly tied to the smallness of its community. Some features require a little more technical knowledge than all companies may have on-hand. Other problems have to do with the slowness of its graphical rendering.

Overall, from the point of view of a marketing organization, Pentaho makes a strong showing among its competitors.


Input Types

  • JDBC
  • JNDI
  • Kettle (Pentaho Data Integration)
  • Simple SQL (JDBC Custom)
  • Pentaho Metadata
  • Mondrian MDX
  • OLAP4J
  • XML
  • Simple table
  • Scripting data sources (JavaScript, Python, TCL, Groovy, BeanShell)
  • Java method invocation
  • Jackrabbit
  • MongoDB
  • Community Data Access (CDA)

Output formats

  • PDF
  • HTML
  • Excel
  • CSV
  • RTF
  • XML
  • Plain text

Xperra Star Ratings

Overall functionality useful to a direct marketer
4.5 /5

The uses for Pentaho from a marketer's perspective is patently clear from the outset. By being able to load and transform data with little required expertise, other than a basic understanding of the data itself, it makes it easy to get up and running. Particularly useful are the reporting features, which enable entire teams to be able to create personalized dashboards in order to keep track of customer responses within campaigns, as well as maintaining a customer base . The ease of use provides little barrier to entry.

Intuitive User Experience
3.5 /5

While the interface itself is easy to understand, particularly for the Business Analytics functions, getting it up and running was a bit of an ordeal. In order to run the trial locally on a Windows machine requires the use of the Java application service, which immediately uses up a tremendous amount of active memory. If one does not have a powerful computer, it will run rather slow, especially compared to other tools which make use of an Apache server requiring a much smaller footprint. That said, from an enterprise perspective, it could work well.

Once up and running, Pentaho is remarkably easy to use and get started, and most features were fairly self-explanatory. One drawback is the slowness of the graphical rendering, particularly for some dashboards. While this could be a function of the power of the computer running it, it appeared to have slightly slower performance than comparative tools of this nature.

Active Support Community
2 /5

Information about the Pentaho community is a bit elusive. Access to the support community requires a purchase of the commercial software. The Github repository does not track issues. Pentaho has a tag on Stackoverflow. It has approximately 3200 mentions, of which approximately half remain unresolved.

However there do appear to be many third-party online tutorials, suggesting that the tool is popular and is used.

Pentaho official forum

(Data Integration)


Contributors: 170
Releases: 256









Minimal Technical Skill Required
4 /5

Data Integration: 3.0
Business Analytics: 5.0

The two tools listed here need to be treated separately.

While using the data integration at a basic level is relatively easy to understand, to make use of the more powerful features requires some understanding of data science. Pentaho earns a strong score for creating simple interfaces, however understanding what, for instance, map reduce is, or understanding how to handle Mondrian coding would help one make use of its rich array of features. For small marketing firms with limited in-house technical resources, using the data integration tool can be somewhat overwhelming.

The Business Analytics dashboard tool is refreshingly easy to use. Everything from loading files to generating reports or creating dashboards can be handled by someone with basic spreadsheet skills.

Related Articles

Who uses Pentaho?

Who uses Pentaho?

What is Pentaho BI?

What is Pentaho BI?

Pentaho vs Tableau: Two popular visualization and dashboarding tools

Pentaho vs Tableau:...

Related Experts

Data Architects

Data Architects

Project Manager

Project Manager

Database Administrator

Database Administrator

Data Quality Analyst

Data Quality Analyst

Data Engineer

Data Engineer

Related Solutions

Capture Actionable Data From Anywhere

Capture Actionable Data From Anywhere

Tune Your Data For Peak Performance

Tune Your Data For Peak Performance

Profile Your Best Customers

Profile Your Best Customers

Other Tools

WSO2 Stream Processor
Data Ingestion and Pipeline Management Commercial

WSO2 Stream Processor

WSO2 Stream Processor is a cloud-based ingestion and processing system. It is designed to capture, process and analyze big data in real time.

Data Ingestion and Pipeline Management FREE Open Source


Dremio is particularly useful for how it captures your data lineage. Data lineage is the record of the life cycle of data which includes data origin, data...

Talend Data Preparation
Data ETL & Data Wrangling FREE Open Source

Talend Data Preparation

Unlike Talend Open Studio, Data Preparation Free is not a complete ETL tool, it provides some useful tools which can assist with the data preparation process.