data governance

Ignore Bad Data... Sort Of

Ignore Bad DataIn George Orwell's classic Animal Farm there is a well-known quote that sums up the issues tackled in the book: "All animals are equal, but some animals are more equal than others". In terms of data, this is a truism: all data is equal, but some data is more equal than other data.

Before expanding on this, it is important to establish that bad data costs money. All of the processes in a business rely on good data, from finance to marketing to quality control. Bad data results in inefficient practices, the wrong approach with customers, wasted effort by employees, poor decision making, and more.

But there are two important points to consider when it comes to data. The first is that most experts believe it is impossible to get 100 percent clean and accurate data. The second point is that some data is not worth correcting.

On Oracle's blog, John Siegman and Murad Fatehali make this very point in an article titled “Data Quality, Is it worth it? How do you know?”. Siegman is an applications sales manager for master data management and data quality, while Murad Fatehali is a senior director with Oracle's insight team.

They say that businesses need to ask questions about the cost of bad data. Once you know this, you will be able to focus your efforts on improving the areas that will have the greatest impact on your business. They say the Pareto 80/20 rule applies in that 20 percent of your data is more valuable than the rest, so it is worth 80 percent of your effort. But first you need to find and define the 20 percent, and you have to analyze the cost of any bad data within that.

This is not an IT problem, however. Instead, it should involve all aspects of a business, from the point where the data is created through to where it is used, and on to where it is analyzed. The impact on sales, staffing or software costs for cleanup, regulatory or legal costs that could result from bad data, and the impact on various parts of your business, should all be considerations.

Once you know this you will have a clearer idea of the action that should be taken to fix your bad data, or at least the bad data that is worth fixing. You will then be able to formulate and implement a plan to correct the problem. Again, Siegman and Fatehali stress that the solution is not solely IT, but needs to involve all of the individuals, systems and processes that contribute to the data in question during its lifecycle.

Following this will mean that you will not have perfect data, but you will have better data where it matters most.