How to Fix the 3 Trillion Dollar Dirty Data Problem

First of all, is 3 Trillion dollars really an accurate number? It comes from an article which further references the stat published in another article way back in 2011. The reliability of that data in itself is debated in this article "What's Worse, Fake News or Dirty Data? Debate."

Closer to home for businesses, an Experian survey estimates that, on average, U.S. organizations believe 32 percent of their data is inaccurate.

And that’s just the perception and impact of basic data quality (DQ). Even more critical, business decisions are made every day on uncorrelated data which may not be “dirty,” but missing key information that might have resulted in a better decision and outcome.

What can companies do to not just minimize the impact of dirty data, but thrive by using data as a strategic asset? Here's a quick checklist that can be used to achieve the best outcomes.

1. Use basic cleansing and Data Quality (DQ) not in isolation, but as a precursor to the correlation of data across different siloed sources 

2. Leverage core master data management capabilities to match and merge entities to do the correlation of master profiles of any entity (people, products, organizations, places) in a single multi-domain platform

3. Once a continuous process is in place for reliable data, use graph technology to capture relationships of any type between such entities (e.g. people-to-people, people-to-products, people-to-orgs, products-to-suppliers, locations-to-people-to-orgs-to-suppliers) 

4. With this reliable data foundation in place, bring in transaction, interaction and social data related to each entity to get a true 360-degree understanding of the behaviors and pattern that can provide the relevant insight that can help improve operating efficiency and execution

5. Now with reliable data guaranteed in an ongoing repeatable process, seamlessly allow data scientists or analysts to model, run algorithms and apply machine learning to the data

6. Tie derived relevant insights to the role and goals of the frontline business users, giving them recommended actions they can take right within their operational data-driven applications

7. Use a closed loop to reconcile their actions back to the insights and recommendations to measure ROI and outcomes, as well as provide historical context for improved learning  

8. Repeat the process in a efficient seamless manner while adding new data sources across the enterprise to tie in more siloes to solve more business problems

Modern Data Management Platforms as a Service today do all of the above and more. They help both IT and business work together to prevent dirty data from flushing "3 Trillion dollars" worth of wasted effort and lost opportunities down the drain.