Insights

Data Quality Management Series: Article 2 – Identifying the best approach to address Data Quality Issues

In the first issue of the Data Quality Management Series, we determined that Data Quality Issues (DQI’s) are too costly to be ignored We will now look at the strategy and costs involved to identify the best approach to solving DQIs.

We will begin with a study of the different approaches, old and new, to address Data Quality Issues (DQI’s). We will then establish two key characteristics to group these solutions and help identify the best approach. Our conclusion is that, putting aside process and organisation improvement, which even if performed continuously cannot solve all anomalies, the best strategy is one that does not disrupt your operational and IT structure.

The different approaches to solve DQI’s

Alongside the mandatory documentation of data, systems and process, companies have adopted distinct and often complementary approaches to identify and correct DQI’s;

# Approach / Main responsibilities Identify & Control Correct Correct root cause Cost (10 years, £K)
1 Creation of a dedicated in-house Data Quality Team   ‎● 1,000
2 Manual Cleansing to offer ST remediations (Offshore)     2,000
3 Review of the IT architecture ‎● ‎● 2,000
4 Creation of an automated Controls framework/system     900
5 Usage of data-cleaning software and validation rules1     350
6 Development of Data Science and Artificial Intelligence solutions ‎● 200

From these approaches, we can see two distinct strategies to solve DQI’s:

  1. ‘Light’ – identifying and correcting the erroneous data post-treatment without necessarily locating and addressing the root causes (DQI approaches #2, 4, 5 & 6).
  2. ‘Heavy’ – Locating the root cause of the DQI for long-term changes (DQI approaches #1 & 3) to the systems, IT architecture or reporting flow.

Classifying the different DQI’s approaches

We have tried to create the following features to classify these strategies efficiently:

  1. Efficiency Ratio to assess if the strategy can solve most DQI’s, address their complexity and create a culture shift.
  2. Cost to assess which solution is the less costly to implement and maintain.

Ranking the 6 aforementioned DQI approaches using these two features, we can create four groups. Starting from less efficient (bottom left) to the optimal solution (the top right corner):

Data-Quality-Management-Series-graphic

  • Group 1: Creation of a dedicated in-house Data Quality Team (DQI approach #1 – ‘heavy’) & manual cleansing (DQI approach #2 – ‘light’): Humans are better at understanding complex issues and working across systems, but they lack scalability. Also, a Data Quality team requires lengthy training to be operational. This activity is resource intensive with sometimes unsatisfactory results making it the least preferred solution, even if offshored.
  • Group 2: Creation of a controls & reporting framework (DQI approach #4 – ‘light’) & usage of data cleaning and validation rules (DQI approach #5 – ‘light’): A rule-based control framework is less intrusive and helps identify systematically DQI’s. Controls can limit wrong input of users and ensure consistency of data across systems. But it still requires monitoring and manual resolution.
  • Group 3: Review of IT architecture (DQI approach #3 – ‘heavy’): One solution is to identify where and how the issues in your IT architecture occur. But this kind of project requires time and important investments to be delivered, without promise of achieving the best results and being future proof.
  • Group 4: Development of data science and artificial intelligence solutions (DQI approach #6 – light): AI models can detect DQI’s and resolve them automatically. They require training data but allow the algorithm to automatically flag all outliners and understand more complex patterns than rule-based controls. It can be configured to predict the correct value and cross reference it with internal or external systems to resolve DQI’s.

Identifying the optimal solution

Based to this diagram we see that the optimal strategy (top right corner) is to develop a data science and artificial intelligence solution (DQI approach #6) – as it is efficient and at a relatively low cost. Because, after the upfront investment, there is no running fees, the correction being largely automated and having ‘light’ impact on your information system.

In addition, data science allows to work on complex problems across several systems, making it more efficient than a data quality team. And since, the company doesn’t need to update their operation, the solution can be implemented quickly.

Reverting to the diagram, we see that two other approaches can also deliver good results:

  1. Creation of a Controls & Reporting framework (DQI approach #4), an Automated & ‘light’ strategy:
    • Benefits: ease to implement and maintain, future-proof & easily replaceable, raise awareness around data
    • Costs: require monitoring, doesn’t address the root cause or complex issues, can be by-passed
  2. Review of IT Architecture (DQI approach #3), a Manual & ‘heavy’ strategy:
    • Benefits: documentation of issues and processes, complex cases are addressed, potentially resolve most DQI’s
    • Costs: several years to implement, often require additional investments, need for change management & training, not future proof

Conclusion:

We find that the optimal approach to address DQI is through a data science and AI solution. However, various strategies exist and choosing the best one, or more likely the best combination of several solutions depends on the complexity of your data strategy, systems and budget. Often, the outcome of the investment will be vastly diminished because the culture around data quality is not strong. Since we often see profound difference in culture around data between leading banks and others, starting with “light” method is often preferable. It can help shift the company culture around data quality and be more profitable in the long term.

Data Science projects can be launched as independent initiatives, delivering satisfying results and raising awareness about the importance of Data Quality. In our next article we will present examples of Data Science and AI solution that you can implement today to start a Data Quality culture shift.