The draft FDA guidelines for life science researchers on using real-world data (RWD) last September marked a major turning point in the field. It signaled regulatory support for real-world data’s place in life science research, and it was a helpful first step toward establishing best practices on RWD quality for regulatory submissions. It’s about fast-tracking innovation so people get healthier, faster — what’s not to love? Below, we dig into the timeline, and chart RWD’s flow from the hospital room to the lab.

Real-world data meets FDA guidelines

Real-world data meets FDA guidelines

September’s conversation centered on three main tenets:

  1. The selection of data sources and their relevancy to the study’s question
  2. The development and validation of definitions for study design elements
  3. Data provenance and quality throughout the study’s lifecycle.

In layman’s terms, you can translate these into simple questions. Are you using the right test results for, say, a study on blood pressure? Are you setting consistent parameters for what real world evidence you will (and won’t) assess? And is your data accurately measured and notated (the “quality” of “data quality management”)?

At Arcadia, we’re on board with the FDA’s commitment to deploying real-world data in service of real-world problems. Powerful metrics are the backbone of everything we do. Two decades of experience with EHR data and systems — in partnership with health systems including IDNS, providers, and payers — has given us a distinct perspective on data’s power and pitfalls. Our strategy prioritizes complete datasets that tell a full, holistic healthcare story by connecting anchor facilities like hospitals and medical centers with the outpatient services that refer into them. The logical next step? Research and development questions rendered in contextualized detail.

A short leap from real-world evidence to analytics

A short leap from real-world evidence to analytics 

For us, the leap from healthcare settings to research and development isn’t a pole vault. The two fields share similar challenges.

Perhaps most significant is the updating of granular detail and population-level insight — both use cases rely on continuous refinement of real-world evidence. For a clear view of one individual’s disease profile, clinicians and researchers need an EHR that shows a person’s visits, past to present. So too with larger swaths of a community, where changing trends in air quality (or even the cost of housing) might become a key indicator for future issues. The timing of those updates is one of the litmus tests the FDA proposes in their checklist for good reason, because lagging statistics can stall (or introduce errors in) time-sensitive research.

Another major consideration is the tension between scientific rigor and the imperfections of collecting real world data in the field. RWD vendors understand the implications of the FDA guidance on the disclosure of their data quality management process, from the frequency (how often is this updated?) to the measures themselves (who collects them, and how?). The buck stops at you (the researcher) when it comes to data quality, so vetting suppliers is essential if you want to structure your research on reliable information.

Who’s doing data quality control? 

Who’s doing data quality control? 

There are a lot of data suppliers and aggregators out there, and the degree of data expertise and data quality management runs the gamut. At Arcadia, owning the integrity of the numbers we provide is a core principle. But buyer beware: that’s not the case everywhere. Some suppliers may delegate this to their customers, meaning they aggregate the data but leave clients to ensure its accuracy. We believe in taking responsibility for what we aggregate and curate, and we deliver that same standard to all our partners. Let’s dig into the methodology.

Data quality management drives our real-world data strategy 

Data quality management drives our real-world data strategy 

The real-world data we collect covers a wide swath of EHRs and healthcare networks, and merging these disparate sources necessitates careful deduplication. Beyond that, we standardize and curate that data, for superior accuracy, completeness, and usability. Throughout our process, we’re hyper-focused on how to equip our partners with clinically meaningful, actionable data to drive patient care decisions.

To get there, we’ve crafted a rigorous data quality assessment process that yields a single patient identity with a complete care journey, regardless of data source. Think of it like stitching different pieces of fabric — sewn together, the clinical and payer data become one meaningful unit of care.

Data quality management through three key lenses:

1. Normalization and Standardization 

Once data comes into our pipeline, we transpose it into a common Arcadia schema. Our terminology engine, with 10+ years of mappings, ensures data streams are mapped to clinical concepts. For completeness, we leverage NLP to extract clinical terms from free text associated with structured fields. With over 2,000+ active data sources, our NLP engine is constantly learning. We track how the data was mapped and if the terms are specific to the customer, EHR system, or global in nature. The goal is to ensure consistency across all real-world data sources, for all customers.

2. Matching up patients 

To create the most usable and actionable data, we must create a single patient identity. Our effort includes matching and de-duplicating EHR and claims data across sources, then globally across all customers. The result? A comprehensive, 360° view of the patient that powers our research database.

3. Health check-up 

Throughout this process, we have our combined human data experts and automated processes, that work in tandem to validate both the data format and data values are expected and reasonable. We ensure any configuration changes such as new values in drop down menus or new fields are captured and processed. This feedback loop ensures completeness in how we capture aggregated data across sources over time.

Let’s talk about data governance for real-world evidence

Let’s talk about data governance for real-world evidence 

For the sake of transparency, we’ve got a two-pronged approach to data governance. They benefit us, the data aggregators, but they’re also a promise to our partners.

1. Traceability 

We can always take you back to the source. Arcadia provides documentation for all our standard connectors, indicating the specific source data elements used, including details unique to individual EHR workflows.

2. Reliability  

Our data connectors automatically run nearly 1,000 fit for use rules to score the integration on its data quality and readiness. This positions it for use in quality measures, risk adjustment models, and other key cases where integrity is paramount.

Once the data is processed and de-duplicated globally, we transpose into our life science data model. We run further quality checks to ensure data received matches the data after we transpose including a distribution comparison of mapped concepts and procedure types.

Your partner in data quality management

Your partner in data quality management

We believe that a thorough process like ours, when working with EHR and claims data at scale, aligns to the FDA’s data reliability standards of accuracy, completeness, provenance, and traceability. So, this draft FDA guidance made our data-driven hearts grow at least three sizes. It’s exciting to see the research community unite over values we also share — interrogating data quality management with real-world data sources, from a study’s beginning to end. Our decades of experience with EHR data positions us to step confidently into the world of research, and we’d like to walk alongside your team. Let’s work together to accelerate your insights.