Bad Data is costly- three ways to improve data quality

The cost of bad data is high. Inaccurate or Governed by faulty algorithms, it can have catastrophic consequences for businesses everywhere — from one person’s lost reputation on social media to entire industries losing out in the marketplace because they were never able find success where others had been before them with their own products/services

The saying goes “garbage-in = garbage out.” This holds true when looking at how inaccurate information leads us down false paths that ultimately leaves our company no wiser than if we hadn’t collected any

Bad data prevented Gan Wing, an ambassador in the 100s, from reaching Rome.

Gan Ying was given wrong information

Gan Ying was a Chinese diplomat, explorer, and military official sent to Rome by Chinese military general Ban Chao.

Gan Ying did not make it to Rome, only as far as the "western sea," present-day Iran; to protect their trading monopolies, the Anxi merchants provided Ying with false information, claiming the journey could take up to two years. Ying turned back because the wait was too long, and China never connected with the Roman Empire.

NASA lost millions in 1999 due to inaccurate data.

NASA Mars Orbiter

NASA lost the Mars Orbiter in 1999, costing the agency $125 million. The engineering team in charge of developing the Orbiter used English units of measurement, whereas NASA used the metric system. The issue here is that the data was inconsistent, which resulted in a costly and disastrous error.

But what exactly is good quality data?

There is no universally accepted yardstick for determining what constitutes good data. However, data is generally considered to be of high quality if it is relevant to the purpose for which it was collected. Good data can be divided into five categories based on five primary criteria:

  1. Accuracy — all data correctly reflects the object or event in the real world
  2. Completeness — all data that should be present is present
  3. Relevance — all data meets the requirements for intended use
  4. Timeliness — all data reflects the correct point in time
  5. Consistency — values and records are represented in the same way within/across datasets

How do we ensure that we have high-quality data?

The challenge with data quality improvement is that it is a marathon, not a sprint. There is a widespread misconception that all we need is to run a single magic query, and voila, the data is clean. That may work for that dataset until you obtain a new one and must restart. That is not scalable in the long run.

Define

Establish data quality standards early on. These guidelines will act as a guide for your efforts. This step enables you to establish objectives and visualize how improving the quality of your data will benefit your business's growth.

Collect

Gather and categorize all data quality issues. When the problems are identified and documented, it is much easier to create a framework and data literacy and governance program to address them.

Monitor and maintain

Quality data is the result of ongoing effort; a one-time cleansing will not suffice. Your data types may become obsolete as your needs change. Regular data cleansing should be performed by a single group of people who adhere to a set of rules for consistency.

In conclusion

High-quality data serves as a foundation in the long run. As data quality improves, your foundation becomes more stable, allowing more to be built on top of it and multiplying the potential uses of your data.

--

--

--

I'd like to think of myself as someone who analyzes data, deduces meaning, and then threads it all together to create coherent visual narrative.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How SMEs Can Benefit From Data Science Projects

Timelines are not always lines: An evaluation of different timeline shapes

7 Challenges for Argument Mining in Law

A graphic depiction of argument mining in law, with underlying picture containing brown rocky mountain under blue sky during daytime

Building a Succesful Data Initiative

生意花招 Business Signs

UIUC Online Master of Computer Science: A personal post-mortem

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
cengkuru michael

cengkuru michael

I'd like to think of myself as someone who analyzes data, deduces meaning, and then threads it all together to create coherent visual narrative.

More from Medium

Launching and Scaling Data Science Teams: Three Years Later

Big Data: A big term, but what is it?

Building a Better Analytics Organization -Visualization Issues -Part 4

8 Women in Data & Insight to Inspire You in 2022