The excitement around real-world data (RWD) continues to grow as the tools to capture, curate, and clean it to be research ready become more sophisticated. RWD is sourced from numerous settings from clinical to electronic health records (EHR) to claims.

The ability that we now have to turn all of that information into something fit for purpose as real-world evidence (RWE) is powerful. It means that the potential for us to accurately analyze product safety, understand emerging diseases quickly, and evaluate reasons for patients’ misdiagnoses or delayed diagnoses is exponentially greater.

Mary Kuchenbrod, senior director of data operations at Arcadia and Jim Robbins, SVP life sciences at Arcadia make managing data for life sciences research their life. They recently spent time explaining how RWD is processed for RWE use and identified some of the data quality challenges that researchers must address in the Arcadia webinar, EHR Data for RWE: Understanding Data Quality Challenging Impacting Clinical Research.

The business drivers of real-world data

The business drivers of real-world data

Kuchenbrod acknowledges that RWD can be complex and says researchers need to know the business drivers behind why the original data was recorded to tackle that complexity head on.

“A business driver could be charging for services or an incentive for data capture,” she explained. “For example, primary care physicians are often incentivized for their patients getting flu shots, so they capture that data in a clear and structured way.” 

She continued: “When there’s not a business driver for the data recording, we [Arcadia] use AI tools and machine learning to curate data, extract it, and find nuanced pieces of records within EHR data and make them available for research.”

Four data quality challenges with RWD

Four data quality challenges with real-world data

There are four common challenges researchers encounter when it comes to the quality of real-world data when it’s used for RWE:

  • missingness
  • inconsistencies and lack of standardization
  • recency
  • provenance/traceability

“All of these challenges have an impact on usability and fit for purpose of EHR data,” said Robbins.

1. Missingness

To address missingness in real-world data, researchers need to understand the business case for collecting the data because that might influence how much completeness or missingness a researcher has. They might have different expectations if the data came out of claims clearinghouse versus a population health vendor as it relates to if those claims are closed or open.

“Data that was billed for [using a procedure code] is captured in a clear, structured way. Data that are not traditionally billed for or happens outside of a provider’s EHR, you have to look at and dig into more to understand what you have to answer your research question,” said Kuchenbrod.

From there, Kuchenbrod says researchers will have differences in their definitions of missingness depending on their needs. She cites images as a common example of the expectation differences on how much data is collected. For some researchers the file name and results field is enough, whereas others might need that plus the image and/or metadata.

2. Inconsistencies and lack of standardization

EHR data are messy and always changing as providers add features, update systems, and refine workflows. In addition, there is no single universal way that everyone creates an original data record for RWD.

“One example of something simple that can get very complex is blood pressure,” said Kuchenbrod. “A blood pressure reading of 140/90 is consistently captured and recorded and defined. But was the patient reclining? Where was it taken? What was the patient position and activity level?”

When you add in the variances, Kuchenbrod says there are triple digit ways to record a blood pressure. Depending on the research questions, clinical context matters. The data may or may not need more curation and researchers have different views of what consistency means.

“Data curators must actively look at the clinical variances and ways data might have been recorded,” Kuchenbrod explained. “Then make decisions to normalize that at the outset and keep doing so consistently over time.”

3. Recency

Another factor that comes into play with real-world data is the age of the data. Getting data quickly can aid some types of research.

With EHR data we have vast abilities for near real time and real time application. Depending on the need, EHR data could be extracted as often as every night, compared to claims data which has about a 90-day lag. This speed really supports live responsiveness types of monitoring.

“In the early days of COVID, looking for EHR data on patients who might have COVID was critical,” said Robbins. ‘That early data gave us insights we waited months for in structured forms.”

4. Provenance and traceability

Knowing the route the data took to the researcher also plays a role in quality for RWD. For example, did the data come out of the EHR at some point or was it a direct database to database transition?

Robbins says it’s important to think about how you can preserve the original provenance (even an error in original recording of data), but still make it research ready.

“Researchers want to know where it was originally recorded and what happened to it between its original record and the various data systems and collection mechanisms it flowed through to get to them,” said Kuchenbrod. “At Arcadia, every record is linked back to allow you to trace it back to the raw original record.”

Three actions to ensure high data quality for real-world evidence

Three actions to ensure high data quality for real-world evidence

Life science researchers can take three actions to ensure their real-world data is high quality and fit for purpose.

1. Use active data sets from clinical settings

An active and growing dataset gives researchers an advantage. They can trust that the data is fresh and use it for longitudinal research over a period of time. When data is regularly pulled directly from EHRs and other systems used in clinical settings, it reflects reality in a way that data sitting on the shelf cannot.

2. Follow the new FDA guidelines

In December 2021, the FDA released draft guidance covering the use of RWD to support regulatory decisions on safety and effectiveness. One section focuses on provenance and data quality throughout the study lifecycle. Robbins provided an explanation of the impact of the guidance on the life sciences industry.  In it, he notes the guidance will promote standardization and interoperability.

3. Work with a vendor who cares about data quality and standardizes data sets

In his article on FDA guidance, Robbins says it is crucial to adopt a standards-based vetting process to ensure your data suppliers meet data quality standards and offers questions to ask potential partners.

Accelerate research with real-world data

Accelerate research with real-world data

It’s exciting for researchers to have access to clinically rich RWD because it allows them to get new products to market faster and accelerate outcomes for patients.

“To be at this pivotal moment in our industry where we know we can achieve positive results for our patients faster than we’ve ever been able to before is awe-inspiring,” said Robbins.

Researchers in life science can achieve success by understanding the data quality challenges for RWE and ensuring data quality through the framework above.

Ready to learn more? Watch the webinar recording to learn more about RWD for RWE and the new FDA guidelines. And get access to Arcadia’s clinically rich RWD that’s built by researchers for researchers.