October 12, 2022
Ben Donovan Casey Luther

DIY Data Pipelines: Why CPGs Shouldn’t Try This At Home

Automated retail data ingestion can be an impossible task for even the biggest brands to manage and maintain on their own. In this article, we explore what CPGs can do to outsmart this challenge.

Ask any enterprise CPG company, and they’ll tell you that data automation is an important part of their digital transformation strategy. CPGs who continue to white-knuckle the process of pulling and maintaining their retailer reports are almost certainly operating with less efficiency than our fast-paced, highly competitive industry requires. 

But even the most advanced and sophisticated IT teams struggle to replace high-touch processes with automated data pipelines. In fact, automating the collection and organization of retailer reports remains one of the great challenges of the digitally-transformed CPG industry. But help is on the way.

In this article, we’ll explore:

  • Why data ingestion is such a monumental task that even the most well-resourced giants of the CPG industry struggle with
  • What the attempt to master DIY data maintenance is costing your business
  • How Crisp is working to solve the data ingestion problem for CPGs, once and for all

By the end of this article, you’ll understand the technical and labor hurdle of retailer data ingestion—and why the CPG industry needs to rethink what tackling this problem looks like.

Why is data ingestion so hard?

Data ingestion is one of those necessary tasks that CPG companies have been taking on themselves for years. Here’s what generally happens:

  • IT teams create a data ingestion solution for one major retailer
  • The system is duplicated and tweaked for 2-3 more retailers
  • The IT team realizes they’re spending half of their resources on maintaining the first few automations, which break frequently due to retailer data changes

Building all these pipelines for data, especially for multiple retailers, is a major investment. Research shows that it takes two engineers and one project manager 18 weeks on average to build automated ingestion for a new data source – costing an estimated $100k each time. Once a team has built a few pipelines, they do get more efficient at the process, but it still takes around 12 weeks to build an additional retailer pipeline. From there, it costs companies an average $500k per year to maintain all of their data pipelines.

Due to the size of the initial and ongoing investment, not a single CPG company we’ve spoken to is hitting its goal of 100% of retailer pipelines built and maintained on a regular basis.

Because each retailer has its own unique system for distributing reports, each data pipeline must be individually customized to fit into a brand’s automated data ingestion workflow. Additionally, the information must be cleaned and deduplicated so that the data doesn’t become skewed by things like duplicates, missing information, or formatting issues.

And even when the pipelines are built, new challenges arise in maintaining them:

  • Employee turnover can create delays in pipeline maintenance as new team members learn how the automations work
  • Retail portal outages occur often because many retailer portals are old and have multiple versions
  • Retailers update portals and reports every 3-6 months, meaning the pipelines must be re-built again

And that’s just what happens when business is humming along as usual. If you worked in CPG in 2020, you experienced this firsthand as retailers invested in e-commerce, which caused huge reporting changes with pickup, delivery, SNAPP payments, and digital coupons. This created a host of new data that had to be incorporated into reports and pipelines.

On top of all this, retailers often don’t communicate these changes to their suppliers, which means they must react to the broken data as it comes in. Until they can complete the task of re-establishing the proper pipeline, business teams across the CPG are  operating on outdated or incomplete information – which they may or may not know about.

These issues cause a lot of backtracking and lost data, making data ingestion a never-ending maelstrom that’s losing you money and reducing opportunities for ROI (besides creating frustration for valuable employees).

The opportunity costs of DIY data maintenance

Building out automated data pipelines for retailer reports is simply too much burden for one company. We’ve spoken to hundreds of CPG companies, and even the most advanced struggle to maintain automation for even 50% of their retailer data.

Here are the opportunity costs associated with white-knuckling data ingestion and organization:

  • Manually pulling data reports from retailers takes 10-20 hours per week (including downloading, cleaning, and organizing it)
  • Besides the time-consuming nature of this process, it’s also vulnerable to human error and data silos
  • Maintaining just one or two data pipelines often takes up ~25% of an engineering team’s time, keeping them from making progress on higher-return projects
  • Using data involves busywork that keeping talented business teams  from using data to drive value for the brand
  • Sales are lost and money is wasted due to the lack of real-time data, such as failing to prevent voids or spending ad dollars where inventory is low. By the time the insights can be accessed, it’s too late.

It’s clear that the inefficiencies of manual report-pulling are too great to ignore, but it’s also clear that home-grown data pipelines are such a burden to maintain that they don’t accomplish what they need to. To outsmart this challenge, we need to rethink what it takes to develop pipelines that are easy to enable and maintain.

How Crisp is solving the data ingestion problem forever

The data ingestion problem is just too difficult, costly, and time-consuming for each company to solve individually. But now, brands can leave it to a team that’s fully dedicated to building and maintaining retail data pipelines. That’s the role we play at Crisp.

Crisp ingests data in real time from retailer and distributor portals, normalizes the data, and pipes it into a usable format that your sales, marketing, analytics, finance, and supply chain teams can use to drive the business forward. Crisp makes your retail data available in easy-to-use dashboards or piped into core enterprise applications, from Excel to Power BI to cloud-based applications like Snowflake and Google Cloud Storage.

Instead of spending a quarter of your engineer hours in an attempt to accomplish a near-impossible task, Crisp enables CPG brands to access harmonized, clean data from dozens of retailers in real time. When reports change or portal APIs go down, we’re at the scene immediately—it’s all we’re here to do.

So, here’s the choice CPG brands have to make: Do we want to invest internally to maintain pipelines, taking several years to reach our goal? Or, would we rather partner with a company that’s fully committed to data pipelines and actively maintains dozens of them, allowing us to offload labor and costs so that our technical teams can focus on higher priority projects?

(We think we know the answer.)

Want to try automated data ingestion and harmonization for yourself? Contact us to get started with a free trial of Crisp. For more industry insights and best practices, subscribe to the blog.

Subscribe to the blog

* Required

We will handle your contact details in line with our Privacy Policy.

Be Data Ready for Anything

Crisp uses the power of the cloud to connect and analyze all of your data sources in real-time, providing you with the most meaningful insights and trends for your business. When you know exactly what’s in store, you can keep shelves stocked and customers happy while skyrocketing profitability.

gordos-black high-road-black ezgif.com-gif-maker (8) nounos-logo-black
nature meats logo-black olipop-black gordos-logo