As specialists in data analytics and model building in the health arena, ‘bad data’ is commonly found in our work, and detracts from these activities and reduces efficiency.

In her article Good Pipeline, bad data, “How to start trusting data in your company, Barr Moses writes, “Data downtime refers to periods of time when your data is partial, erroneous, missing or otherwise inaccurate — and almost every data organization I know struggles with it.”

 The first step at Wimmy when we undertake a project is to look at whether the data is fit for purpose to work with, as we don’t want to be in a situation of ‘garbage in’ and ‘garbage out’, which occurs when you start with ‘bad data’.

The World Health Organisation guidelines summarise the foundation of a good health system: “Sound and reliable information is the foundation of decision-making across all health system building blocks. It is essential for health system policy development and implementation, governance and regulation, health research, human resources development, health education and training, service delivery and financing.”

In addition, it goes on to say that four key functions underpin decision-making for the health information system, which include data generation, compilation, analysis and synthesis and communication and use.  “The health information system collects data from health and other relevant sectors, analyses the data and ensures their overall quality, relevance and timeliness, and converts the data into information for health-related decision-making.”

Wimmy frequently encounters several issues (outlined below) affecting the quality of data when applying the four key functions of the health data process.

Data generation The data frequently has gaps, and the data itself may be wrong due to incorrect data capture or incorrect measurement (such as heights that vary for the same patient). Data from different sources may have different formats and compatibility issues.

Compilation There is frequently a lack of visibility of the various data sources, with not all data sources disclosed and identified. It is frequently difficult and slow accessing all data sources. Limited metadata can compromise the traceability and preservation of the meaning of compiled data.

Analysis and synthesis There is frequently sub-optimal communication and documentation between leadership and the analyst on the specific requirements, resulting in a deficit in alignment. The issues with data generation and compilation compromises the speed and quality of the analysis.

Communication and use In terms of reporting, communication and use, what is needed are timely, narrated, contextual, and actionable reports, so that insights and learnings can be shared. Leadership often have lack of visibility of the data source and quality issues, and insight into these impact speed, quality and validity of the analysis and reports, which are likely to be sub-optimal where data quality is poor.

Addressing data quality at source In the same article, Barr Moses makes the following comment: “Great data teams make investments in data observability — the ability to determine whether the data flowing in the system is healthy. With observability comes the opportunity to detect issues before they impact data consumers, and to then pinpoint and fix problems in minutes instead of days and weeks.”

 With many of our clients, we are becoming more involved in addressing the data quality at source.  The following infographic describes our broad-based approach that we adopt in order to address the data integrity challenges:

Wiimmy's Approach

At Wimmy, we believe that much of this “observability” can be achieved by building in metrics for measuring data quality and integrity, and displaying this information to stakeholders in live dashboards and reports. This allows management to monitor data quality and to set targets for quality standards and continuous improvement. We focus on 12 different data quality metrics, which include elements such as completeness, consistency, accuracy, validity, integrity, etc.  If data quality in health systems is not prioritised then the daily production of data and the usefulness of this data to improve lives is eroded. To find out more about our approach, visit