Exploring how data can be holding companies and society back

A few weeks ago, I started researching content around measuring the ROI of data. My research helped me understand common challenges, that low ROI is the norm, and that data analytics could be reinforcing inequality. My goal with this article is to outline ways to increase the value we obtain from data, both in ROI and societal impact terms.

I want to start by highlighting great resources covering the intersection of data and inequality. This short video from Google exemplifies how bias can be built into machine learning. A complement to the video is this Salon article interviewing the author of the book Weapons of Math Destruction, which covers “how big data increases inequality and threatens democracy.” It is on all of us to be proactive to make society better. When reviewing your data, ensure data points in your analysis will not bias the model. I outlined a couple of sample questions to help guide the process:

  1. Does any data point going into the model have the potential to affect certain groups of society negatively?
  2. Did you take the time to understand unfavorable correlations related to your dataset?

In addition to not including bias-inducing data, you should also monitor the overall data quality. No matter how sophisticated your analytical model, its overall value will always be limited by the quality of the data going in. Unfortunately, data quality issues are known to be pervasive across industries.

A Harvard Business Review article showed that only 3% of companies’ data meets basic quality standards. Bad data flows through the value chain, in some cases only incurring additional costs to clean it up, but in some cases being part of a model and providing the wrong insights to leaders. An easy way to start analyzing your data quality is to start leveraging a Data Quality (DQ) score. Here are the steps:

  1. Select a sample of records at a specific cadence (maybe once a week)
  2. Work through each record and mark obvious errors
  3. Count up the total number of error-free records
  4. Divide that number by the sample size (that percentage is your DQ score)

To improve your DQ score, perform Root Cause Analysis with your team and ensure every teammate understands data quality is a part of everyone’s jobs.

The last data related impediment to valuable ROI is unnecessarily having too much of it. This article from Dell explains how lack of communication around what data is versus isn’t valuable leads to teams holding on to information “just in case.” Hoarding data slows the turnaround of useful insights and requires spending resources to manage and protect the additional datasets. As part of the DQ Score exercise outline in the previous paragraph, review with leadership which data points are valuable and delete what isn’t needed.

Researching the ROI of data expanded my view of data quality and made me more aware of my duty to leverage information ethically. The steps outlined in the article help us move in the right direction, though I emphasize that data quality is a journey and not a destination.

Simplifying data analytics and cybersecurity best practices