5 questions to ask before you use predictive algorithms in healthcare

By Michael A. Simon, PhD, Director of Data Science at Arcadia

Posted: April 25, 2022

Healthcare Analytics Predictive Analytics

Predictive analytics offer huge potential value to healthcare organizations — but only if you pick the right problems to solve.

Predictive and suggestive tools offer huge potential benefits to health systems striving to provide better care to their patient populations. They can surface patients with unaddressed needs and match them with the interventions most likely to improve their health outcomes — or provide physicians with guidance on the potential treatment options for one patient based on the outcomes of similar patients. Artificial intelligence (AI) and machine learning (ML) can support a vast universe of appropriate and beneficial applications in healthcare — but not every use case is a good fit. Here are five questions to ask as you think through potential applications of predictive algorithms.

What is predictive analytics?

Predictive analytics is the use of statistics, quantitative models, and other inferential tools to identify trends and patterns in data for the purposes of predicting future outcomes. For example:

Using statistics to estimate the likelihood that an individual will have similar health outcomes to other patients
Extracting meaning from a text string in the context of a patient exam
Identifying a cohort of providers who behave similarly in some meaningful way
Suggesting an optimal intervention for a patient given multiple options and limited resources

In all of these cases, we use existing data about people, processes, and events as source knowledge to come to a conclusion not otherwise readily apparent.

Predictive analytics can help drive valuable outcomes for patients, but not every opportunity to use those tools is appropriate. Developers and users of predictive analytics can maximize the effectiveness of these tools while avoiding common pitfalls by asking the following questions:

1. Is the outcome a question of fact (or at least, a known fact)?

If there is a way to find out the answer to your question based on what is known, do that instead.

For example, if you want to know how many patients were admitted to your emergency department due to avoidable causes in the past year, you should consider whether data specifically providing that information are already available to answer the question.

If the answer truly cannot be derived directly from the data, predictive analytics may be appropriate. On the other hand, if the issue is simply that the available data suffer from poor formatting, integration, or quality, the first step should be addressing those issues to ensure your data are fit for use.

2. Is the outcome quantifiable or otherwise clearly defined?

If the question to be answered can’t be measured, how can a predictive tool be expected to provide insight?

If you want to know whether patients are benefiting from a new therapeutic but are hard-pressed to identify what “benefiting” means in this context, the development and use of any models to answer the question will be similarly vague and unhelpful.

Although this may seem obvious, the reality of this challenge often doesn’t set in until development is well underway. Can we qualify what we’re interested in? Can cost stand in for quality of life? Do controlled blood sugar levels constitute success? Should any mortality count, or only due to specific causes? Outcome selection, sometimes bordering on philosophy, can be the hardest part of the whole process.

3. Would the outcome, if known, influence clinical decisions?

If the answer you seek would not help provide better and more equitable care, why is a predictive tool being used in the first place?

For example, if you want to build a tool that predicts the likelihood that a patient will go out of network for a particular service but have no plans on how to use that information, the result is just a mechanism for picking out patients who will exhibit certain highlighted behaviors. (If identified patients were selected for outreach, or even drove decisions in the aggregate to create new services, the value becomes clearer.)

You might think about this the same way a physician decides to run a lab test. Beyond any physical side effects, a provider must consider risks like false positives, interpretation errors, and unnecessary anxiety for patients. A similar risk/benefit analysis should exist for predictive analytics: Does the potential benefit of knowing the outcome outweigh the potential for errors, implicit bias, and unhelpful classification?

4. Can the outcome be estimated for large groups of people?

Applying models to extremely rare scenarios may not be worth the error rates impacting those most affected.

Let’s say you’ve built a population health tool based on disease affecting hundreds of thousands or even millions of patients. If you are considering extending that model to a disease that impacts only 100 or so patients per year, the requirements for underlying data and effective controls become significantly more complex.

This doesn’t necessarily apply to all cases; some extremely rare diseases — a rare disease is defined as impacting less than 200,000 patients in the United States — can benefit from quantitative techniques applied to very well characterized datasets. It is critical, however, that the underlying data be adequate and the statistical methods appropriate for characterizing a smaller cohort.

5. Are the consequences of the wrong choice known and acceptable?

No model is perfect. Can the result of an erroneous action (or inaction) be justified for one person? For a thousand?

For example, the consequences of identifying the wrong patient for a care management intervention may include wasted time and resources and missed opportunities to improve health. In contrast, the consequences of an imprecise prediction of appropriate dosage for a new pharmaceutical could be injury, worsening condition, or death.

Unlike the previous four questions, this question is less about the analysis or the tools used to generate it, and more about the nature of the relationship between organizational objectives and those using the analytic endpoints to make decisions. Clear communication — between those driving development objectives, those developing or acquiring the tools, and those applying them — is absolutely critical to ensure there is no misalignment of expectations and consequences.

Don’t forget to address implicit bias

It’s not enough to select the right problem — you also need to make sure that you consider and address potential drivers of implicit bias as you develop a predictive model. Make sure you design for effective, actionable, and equitable predictive tools and programs from the start.

Are you building algorithms using the right data for your patient population?
Have you selected model outcomes that are universally applicable and accessible?
Are you evaluating your algorithmic outputs for potential bias?

Our new white paper offers three strategies for diminishing bias in predictive analytics — download a complimentary copy to learn more.

Download the white paper