Help & Support

212.660.6600

Capture Actionable Data From Anywhere

Solution: Data Ingestion & Transformation

Capture Actionable Data From Anywhere

The following story — which really did happen — relates an analog experience that helps us understand what we mean by customer signals and the value of being able to act on them. 

In the days before shopping for a car online became common, a local community college president preferred to car shop on Sundays. Car dealerships in that county were closed on Sundays, so he could look at the cars (and their sticker prices) without having to deal with a salesperson. 

He preferred one dealership in particular, because that dealer was on the school’s board and the two had become friends. He wanted to do his shopping incognito, so he parked behind the bank next door instead of on the dealership lot. There were two things the college president didn’t know at the time:

  1. The dealer liked to work on Sunday afternoons because there were no interruptions.

  2. The dealer’s office window faced the parking lot behind the bank next door.

So, every time the dealer saw the college president’s car parked behind the bank on Sunday afternoon, he knew it was just a matter of time before he’d have a used car sale. The information wasn’t actionable because the dealer didn’t want to embarrass his friend and customer by blowing his cover. Besides, the dealer felt confident that he’d get the sale at some point.

When the college president retired, a friendly roast was held in his honor, and the dealer was one of the roasters. Now, the secret data was actionable. 

The dealer roasted his friend by revealing how he had known all those years whenever the college president was looking for a used car upgrade. The attendees were highly entertained, and the subject of the roast was appropriately embarrassed and amused. Then came the clincher. The dealer described the kind of new car the retiring president should be shopping for, and gifted him — not the car, but a photograph of that car. 

One month later, the then-retired college president purchased the new car shown in the photograph.

Notice that every time the dealer observed his friend’s car parked behind the bank building, it was a piece of data and a signal of desire and intent. But only when that data became actionable could the dealer maximize its value.

Today, most businesses with more than 5,000 customer records have more data stored than they know what to do with. Hidden in that data are signals that reveal customer’s desires and intentions. 

 

To bring those signals to the surface and make the data actionable is the purpose of data integration.

What Is Data Integration?

Data integration is a business-driven technological solution to the problem of having lots of data that is not actionable. Why isn’t it actionable? Because there are numerous sources of data, all owned by different parts of the organization. Each source file can use a different format, which makes data from different sources incompatible with each other. 

A partial list of potential data sources includes:

  • Point of sale system

  • Web traffic

  • Web sales

  • Outbound call center

  • Customer service

  • Product information

  • Direct mail

  • Sales partners

  • Data censors

  • CRM

This list could be much longer, but you get the point. Just because you have collected all of this data doesn’t mean it can work together.

Data integration makes data actionable. The most common method of integration is called ETL, and it works in this order:

  • Extract the data from the original source file

  • Transform it into one common format

  • Load the transformed data into a new target file from which all the data is accessible, regardless of its source

The resulting target file builds an accessible 360° view of the data that is actionable for business intelligence and analysis activities such as real-time reporting, data mining and predictive analytics.

With data intelligence you have the ability to craft real-time data pipelines where new data becomes actionable as it is entered. This is how websites are able to suggest products to you based on items you looked at just a short time earlier. 

As CRM records are updated, orders received, payments entered, etc., the data is automatically processed and becomes actionable immediately. What’s more, by employing Artificial Intelligence (AI) and Machine Learning (a subset of AI), the underlying data models are updated with the newly acquired data, improving the decisions, segmentations and predictions the models optimize.

So, the new data is immediately and automatically leveraged to:

  • Present an up-to-the-minute view of the organization

  • Provide a better customer experience for the online shopper

  • Improve models used to predict which marketing scenarios will produce the best outcomes

  • Inform better decision making

 

This is very exciting, but before you jump into the technological part of data integration, you have to develop a business case for integration. Because in spite of all the technology, and while IT will be involved, data integration is a business project, not an IT project. 

The Business Case for Data Integration

The sheer amount of data that companies have at their disposal makes it impractical to simply integrate all the data. Even if software existed that allowed you to click a button and wait for the results, you’d be waiting a very long time and the results would be too unwieldy to be of value. 

For businesses making a complete transition from analog to digital, Gartner notes that, “Through 2020, integration work will account for 50% of the time and cost of building a digital platform.” This quote isn’t meant to scare you, but it should make clear the degree to which data integration is necessary for modern businesses and how big an impact it has on an organization.

This is why data integration needs to be part of a wholistic, long-term business strategy and implemented as a series of  integrations, with each integration designed to address a specific business case

Data integration should be led by a champion of the organization’s various business interests who also understands the different data assets in play. 

In building a business case for integration, each of these questions should be answered:

  • Why is the data integration being done? 

  • What business processes need to be analyzed?

  • What are the objectives and deliverables?

  • What are the business rules?

  • What is the quality of the data?

  • Who will be the owner of the system?

  • How will maintenance and upgrades be funded?

Engaging the relevant business interests in a discussion about these questions helps reduce the amount of data you need to extract for the integration. Answers to these questions also tell you what the target data file should look like. It is like building a highly selective pipeline to channel only the data you need out of an indiscriminate sea of data.

Once the rules for your first pipeline are ready, it’s time to start the extraction process.

ETL — The Technical Foundation of Data Integration

Theoretically, data integration can be done manually, which involves hand coding all of the scripts necessary to extract the data from the source files, transform it into a conformed (or normalized) format and load it to a target file. There are three serious impediments to a manual approach:

  1. Efficiency | Performing a data integration by hand consumes a tremendous amount of time. Modern ETL tools can do the job much faster, thereby reducing the time it takes for the extraction to produce value.

  2. Human error | People make mistakes. When a developer makes a mistake in hand coding a data integration, it takes time to go through all the code until the error is found. When a mistake is made using an ETL tool, it is usually a mistake in one of the rules. Fix the rule and you fix the problem.

  3. Continuity | In data integration, continuity means doing all the coding the same way in each integration project. That is almost impossible if the person who codes the first integration isn’t available for those that follow. Someone has to learn how to write the code exactly the same way as the original coder. 

While there may be times when manual coding needs to be used to a limited degree in conjunction with an ETL tool, modern ETL tools that Extract, Transform and Load data from many sources to one centralized source are much more efficient and effective than manual data integration. Let’s take a look at each phase individually.

Extract

As noted above in What Is Data Integration, organizations collect and store data in a variety of locations and formats. JSON, XML and CSV are all common data formats. In addition to various software formats, there are the ways the source data is stored. One file might put all street address information in one column, while another might break it up into two or three columns. A single name column could contain first and last names in one file, while another might have separate columns for each part of the name. 

The extraction process copies all the data from every file source, determines the source file’s refresh rate (velocity) and the priority of each source (extract order) in relation to the others and places it in the ETL environment. 

Once the data has been extracted, data profiling gives you a picture of the data that includes:

  • Size of the data set

  • Column heads

  • Data type of each column

  • Relationships between columns

  • Range of values in each column

  • Frequency of missing data

  • Number of rows

The data profile shows you how much work needs to be done before and during transformation to make the data usable and actionable. Much of this will be cleansing the data to remove errors and duplicates. 

There also may be columns where the frequency of missing values renders the whole column useless in its current state. In those cases, you will discuss with the owner of that source data how best to handle the issue. You will need to ask if they can provide more data, or suggest how to fill in the missing values. It may be necessary to remove some of those columns completely.  The more precisely you create rules for the transformation, the better the outcome.

Transform

In the transformation phase, columns in the source files are mapped to columns in the target file, but with the source data conformed to match the formatting of the target columns.

  • Dates are consolidated into buckets

  • Java strings are parsed to provide business meanings

  • Transactions are modeled as events

  • Useless data and errors are set aside so they don’t reach end users

  • Personally identifiable information is masked to meet privacy requirements

In plain English, that means that the ETL tool can read data in all the different source file formats. It can also rewrite the data to the specified formats of the target file. So, mapping the columns from the source file to the target file also tells the tool how the data needs to be transformed. If you have eight different source files and each one has a different way of dealing the customer addresses, all of those differences will be eliminated and the address data will all be presented the same way. 

If there are columns of data that are set up identically in all the source files in the desired format of the target file, those columns can be moved directly to the target file without having to go through the transformation process. 

Conforming data accomplishes two major goals by:

  1. … breaking down the silos that separate different operating units and prevent operational transparency.

  2. … reducing the time needed to poll all the data by creating one source of truth for every record

Load

Once the data has been transformed it can be loaded to the target file and put to work. One choice to make at this point is whether to load all the data at once (full load) or at scheduled intervals (incremental load). 

Full loading creates new records for all incoming data, so it comes with the potential that the process will be made difficult to maintain due to the surge of data sets. By contrast, incremental loading compares incoming data with data already accessible and only generates new records when unique data is added. In addition to being easier to manage, the smaller footprint of the incremental load costs less to warehouse.

Once the data has been loaded, the target file becomes the new source file for all business intelligence activities, making the insights that have been lurking in the data immediately accessible to all business units.

ETL

It should be noted that when data integration is done via the cloud, loading takes place before transformation, so the process is called ELT instead of ETL. 

Benefits of Data Integration

By not integrating their data, businesses leave a lot of useful information on the table. According to Forrester, the amount of data that is never analyzed for business intelligence ranges from 60 percent to 73 percent. Here’s what marketers in those businesses are missing.

  • Providing better online customer experiences

  • Bringing customer signals to the surface

  • Leveraging real-time access to data for better decision-making

  • Optimizing campaigns with Artificial Intelligence and Machine Learning

  • Turning insights into actions

Data integration can be similarly effective in all other business units —finance, supply chain, inventory, etc. — as the silos between lines of business are torn down and replaced with a 360° view of the organization

  • Transformed data is faster and easier to access

  • Data integration allows workers to perform tasks more efficiently and spend more time on tasks that help grow the business instead of writing reports

Resources

Related Articles

The Data Transformation Process Explained in Four Steps

The Data Transformat...

An executive’s guide to data integration

An executive’s guide...

Related Tools

Talend Data Preparation

Talend Data Preparation

4.5
WSO2 Stream Processor

WSO2 Stream Processor

4
Dremio

Dremio

3.5

Related Experts

Data Architects

Data Architects

Project Manager

Project Manager

Database Administrator

Database Administrator

Other Solutions

Gain a 360⁰ View of Your Customers

Gain a 360⁰ View of Your Customers

Tune Your Data For Peak Performance

Tune Your Data For Peak Performance

Better Marketing Decisions

Better Marketing Decisions