Most medical researchers will agree – when it comes to using real-world data, the quality of the data is just as important as the data itself. But there is debate around what exactly defines this quality. A record that may be valuable for one study may be incomplete for another.  

Here’s how we see it: Real-world data is a complex product of multiple authors, systems, and processes. While we can do a lot to bring consistency to records (a topic we cover in our earlier data quality blog), we know that real-world data quality will not be perfect across all records, for all studies.  

Researchers know this. One thing we often hear from customers is that, while data of varying quality can still be valuable, they need to know exactly what defines that quality – and any limitations on that data.   

It’s for this reason that our approach to data quality is driven by another critical factor: transparency. It begins with four simple, straightforward, and industry-recognized categories: 

  • Completeness measures whether all expected data fields and values are present. 
  • Validity measures the degree to which all values are plausible relative to clinical expectations. 
  • Timeliness measures how quickly data is delivered to Truveta and available for research.  
  • Representativeness measures how the patient diversity within a study compares to the overall U.S. population or the geography being studied.

We use over 500 metrics to measure data quality against each of these categories. Measurements are taken across our entire data pipeline – from the moment we receive a record from a health system member to when it’s compiled with other records for a study. (We also measure an additional fifth category, but we’ll cover that topic in another upcoming blog.)  

These metrics are then used to generate detailed data quality reports which we provide to researchers for both study-specific data cohorts and across our entire data platform.  

Given the unparalleled depth and diversity of our platform, and combined with daily updates, a Truveta dataset will typically score well in each category. But not always, based upon specific study requirements. By consulting the data quality reports, researchers can decide for themselves if the quality is sufficient – or if a dataset or query needs to be refined.  

Our data quality goal is that all Truveta data made available to researchers meet the requirements of peer-reviewed published research and regulatory filings. These data quality reports help make that possible. They also provide our customers with yet another unique advantage.  

While others are claiming quality data, often through large, un-defined sets of high-volume data, we are making it a priority to consistently deliver on that promise – and support it with detailed metrics. The result is the most comprehensive data quality system in the industry today. We think it’s well worth the investment as it’s bringing a new level of trust to the often imperfect world of real-world data. And that can only lead to more accurate and insightful research down the road.  

To learn more about this system and its impact on the utility of real-world data in medical research, check out our new whitepaper, Our Approach to Data Quality