How Crisp normalizes data (and why it matters for CPG brands)

Data normalization must be completed in real time for CPG companies to use it to their full advantage. Engineer Mihir Pande breaks down how Crisp normalizes retail data from multiple sources.

Data normalization is an integral, but complex aspect of measuring CPG business health. Brands need to know how they’re doing at all their various retailers and distributors in order to make data-driven decisions – but as any CPG brand knows, collecting, organizing, and understanding this information is no easy task.

Sifting through uniquely-structured retailer datasets and reorganizing the data into a usable format on an ongoing basis takes tremendous effort. For many companies, normalizing data either takes a large team, an expensive data scientist, or…it doesn’t get completed at all. In this article, we’ll explain how Crisp streamlines data normalization, and how CPG brands are using this automation to find important insights, from high-level trends down to store-level details.

Here’s what we’re going to talk about, with real-life examples:

A quick refresher on how data normalization works
Why so many businesses struggle with normalizing data from multiple sources
The exact process Crisp uses to standardize data and make it more usable

By the end of this article, you’ll gain a deeper understanding of the exact data normalization method CPG brands are using to make timely, data-driven decisions while saving time and resources.

What is data normalization?

Database normalization is a technique for creating databases from multiple source datasets with inconsistent structures. Normalization creates a predictable outcome by removing anomalies, duplications and errors, and organizes data into dimension and fact tables with standardized column values. Put simply, it brings all that disparate data together and makes it more useful.

How data normalization helps CPG brands

Once datasets arrive from various retailers in their unique formatting and systems, they need to be re-organized into one coherent system. This end result allows anyone in the company to analyze the business as a whole. For example, they can:

Understand how stores and products are performing
Track distribution and velocity across retailers
Identify new accounts and repeat orders
Build compelling sales stories for retail buyers
Forecast demand and allocate resources accordingly
Manage sales and operations planning
Reduce product waste
Build custom reports

Problems data normalization eliminates

Before companies can take these actions, problems must be eliminated that cloud the data from showing a complete, accurate picture. Here are the issues proper data normalization resolves:

Duplication or multiple versions of information
Formatting issues
Bugs and errors
Unstructured, incomplete data
Improper implementation of dependencies
Issues with data governance

Why is normalizing data so challenging?

If your company is normalizing data on its own, you’re probably familiar with the challenges of organizing multiple weekly reports in a timely manner to apply to decision-making. And if you’re not taking this step, you have probably doubted the reliability of your data when making decisions.

Here are the top issues that companies face with data normalization:

Each retailer has their own methods for organizing data
Multiple retailers will sell the same product, but have different naming conventions and granularity within their reports, creating duplicates and differences that, if not caught, lead to incorrect analysis (e.g. comparing apples to oranges)
As reports arrive, brands are responsible for combing through and reorganizing the data into their own spreadsheets and systems
Any time a retailer’s reports are restructured, naming conventions change, or retailers change their report extraction API, the whole normalization process breaks and requires significant – and expensive – developer time to fix

Here’s how this often looks in the real world. Take a granola bar company that sells at Whole Foods, Target, and Dollar Tree locations across the country. Each retailer makes the brand’s data available in a portal including sales, pricing, promotions, and inventory. The problem: while Whole Foods uses a 5-digit format for store location zip codes, Target uses a 9-digit format, while Dollar Tree uses a 2-line format. There’s no consistency across the reports, disabling any quick analysis across stores or regions unless someone goes in and manually inputs everything into their own, consolidated system.

Another tricky issue pops up when a brand doesn’t know which store a report line is referring to, or if a store closes. As time goes on and stores close or reopen, the weekly reports change. Many brands find this process of realigning locations in the data overwhelming, particularly smaller companies with less personnel to comb through it all.

How Crisp normalizes retail data

Crisp ingests all the original data from retailer and distribution portals, and transforms it into organized tables with a logical system prepped for database storage, reporting and analysis, and modeling.

At a high-level, normalization follows these core steps:

Crisp gathers a brand’s data from retailers and distributors. With a team of engineers dedicated to maintaining retailer connections and APIs, these data streams are uninterrupted and always up-to-date.
Next comes standardization of the data into our system, which Crisp does efficiently through a complex set of rules that applies to most scenarios (and can be tweaked for unique situations).
We make the data easy for brands to follow and analyze by making it recognizable and consistent across all data sources, with as much granularity as the brand needs.
Crisp pipes normalized data into existing systems and reporting tools, or makes data accessible through interactive and shareable dashboards.

Crisp’s detailed data normalization process

Here’s a deeper look at the data steps our normalization engineers take to turn disparate data sources into a single source of truth.

Step 1: Cleaning

The first step is cleaning up inconsistencies in the data. This prepares our system to later organize the information with specific rules and formatting as it places each piece of data into tables or visualizations.

Here are some of the ways Crisp cleans data:

Correcting for multiple spellings and naming conventions
Formatting cleanup, including extra spaces, special characters, and capitalization
Assigning each store a unique key to keep individual stores clearly separated
Standardizing terminology for all reports and tables (For example: “SPROUTS #1 FZN” and “SPROUTS FARMERS MARKET 1” both get identified as “SPROUTS FARMERS MARKET”)
Correcting zip code and format inconsistencies. For example: reformatting all addresses so that they are on a single line for uniformity
Correcting UPC code inconsistencies (we make them all 13 numbers)
Addressing inconsistencies with distribution centers and suppliers
Standardizing pack units of measure and pricing

Cleaning as the first step means that brands can reliably access both per-retailer and cross-retailer data.

An example of how Crisp standardizes store names

Step 2: Staging models – deduplication and aggregation

The staging step is where the data transformation happens. Deduplication is a major part of this step, and staging modeling ensures that columns and factors look the same across all brands and retailers.

Staging modeling

Using our granola bar brand example, let’s say Whole Foods provides data that five granola bars were sold October 1, while Target’s data says that one bar was sold at 10 am on October 1, another two were sold at 4 pm, and so on:

Date	October 1 granola bar sales
Retailer: Whole Foods	5 single, 1 pack of 10
Retailer: Target	2 bars sold at 10:25 am 20 bars sold at 4:12 pm 4 bars sold at 7:15 pm

Because the time and date data from each source is not standardized, it must be aggregated and deduplicated to be put into standardized tables. So with this example, we would aggregate to show the total sales for the granola bar in that store on a single day:

Date	October 1 granola bar sales
Retailer: Whole Foods	15
Retailer: Target	26

Deduplication

Deduplication is necessary because retailer reports can change drastically from one week to the next. Take the example of Whole Foods opening a group of new stores in one region, and breaking up the region into two separate ones.

Brands shouldn’t have to worry every time a shift like this occurs – they just need a list of stores where they can sell. So, here’s what Crisp does:

Make sure each store only shows up once
Remove old versions of stores when they change
Create master reports for sales, items, and inventory with each product only showing up once
Make sure all item lists are unique, including store names, regions, addresses, product names and categories, departments, and product pack sizes
Show as much data as possible for a given product. For example, one sales report might only give the SKU or UPC for a product, while another report shows dimensions like the color, category, subcategory, size, weight, etc. We make sure those dimensions show up in the master report.

Step 3: Augmentation

Augmentation involves geocoding, which translates store locations into coordinates to enable mapping, visualizations, and more exact, accurate data.

This is a piece of information that’s really important for CPG brands, so they can better understand where they’re selling and shipping products.

An example sales heat map enabled through geo-coding

How does Crisp do this, when some store locations have incomplete data? We use our own store locator system and integrate census data to give the complete picture.

Step 4: Documentation

The documentation step is where we help brands understand what each data table is, how they can be used, and how all the tables interconnect with one another. It might not sound like much, but this is game-changing for brands digging into their CPG analysis.

Finally, the normalized data is sent through rigorous testing to make sure everything is correct. Then, the data is piped in real-time into Crisp dashboards or other tools of choice, like Snowflake, Excel and Power BI. The result? Seamless reporting and data that’s ready for analysis and presentation.

Simplify your normalization process

Crisp specializes in data normalization for CPG companies, and we know first hand how much of a difference it makes for brands when they start taking advantage of this process. To learn more, contact us today for a demo. To keep up with other industry insights and best practices, subscribe to the blog.

How Crisp normalizes data (and why it matters for CPG brands)