AI is not Healthcare’s Magic Bullet
A critically ill ICU patient deteriorates while the hospital’s AI model stays silent. Later, researchers find it missed two-thirds of patient declines.
A critically ill patient in the intensive care unit (ICU) takes a turn for the worse. The first changes are subtle changes in vital signs and lab values. Then, the patient crashes.
The hospital had deployed an artificial intelligence (AI) model that was supposed to predict the kind of clinical deterioration that happened here. But this time, the AI stayed silent.
Later, independent researchers ran simulations where the model missed two-thirds of similar patients declining.
In this article, I discuss why this type of predictive model failed so spectacularly, and more importantly, the questions you can ask before making the decision to employ an AI model in the healthcare setting.
Why Some Healthcare AI Projects Are Doomed from the Start
In our example with the ICU patient above, we might be quick to blame the AI model for being faulty. However, the real issue came from implementing AI in a setting where its strengths don’t shine.
Pattern recognition is one of artificial intelligence’s greatest strengths. AI succeeds at handling narrow tasks where the parameters are well-defined and variation in data is relatively low.
This mismatch leads to the core problem with many healthcare AI projects: medicine does not deal in averages. In contrast, the highest stakes appear in edge cases, such as a rare complication or the one-in-a-thousand clinical presentation.
If an AI tool works for the “typical” patient presentation but fails with an outlier, it becomes worse than useless. Every patient can be an outlier in their own unique way.
The key takeaway here is that not every problem is suited for AI. Human judgement always needs to be there for the extreme cases.
What Healthcare Problems Can Be Solved by AI?
Ambient AI, also known as AI medical scribes, are a use case where AI can be successfully employed in healthcare. This technology listens to conversations between doctors and patients, then helps generates a note to go in the patient’s medical record.
As previously discussed, the technology has the potential for risk if used carelessly. However, with the proper guardrails in place, ambient AI can save doctors hours of time charting.
What makes ambient AI different from an AI model predicting the deterioration of ICU patients?
The problem is bounded, repetitive, and error-tolerant.
Bounded means that the task is clearly identified: listen to a conversation at an appointment and summarize it.
The task is repetitive because the same thing is happening each time (more or less). Contrast with a critically ill patient who can have worsening health in a thousand different ways (some which the model may not have trained on).
And finally, ambient AI is error-tolerant because the physician can review the AI-generated note and correct and mistakes before submitting it. If the ICU model makes a mistake, a patient can die.
The Litmus Test: 4 Questions to Ask Before Using AI in Healthcare
Is the problem bounded and structured? Transcribing a conversation is bounded. Predicting every possible form of clinical deterioration is not.
Can success be measured clearly? “Reduced documentation time by 50%” is measurable. “Better outcomes” without a clear endpoint is not.
Is the data reliable and representative for the patient population? AI trained only on common patterns will miss outliers, and in medicine, outliers are often the cases which matter most.
Are error tolerances aligned with the stakes? An AI model mishearing one sentence in a conversation is more acceptable than missing one in ten deteriorating patients.
Conclusion
AI can be a powerful healthcare tool but only if used for the right kinds of problems.
It thrives when the tasks are narrow, repetitive, and forgiving of small errors. It often fails when asked to master the messy, high-stakes edge cases that define so much of practicing medicine.
Healthcare leaders shouldn’t chase every lofty AI promise. They need to ask insightful questions from the start: Is the problem structured, can we clearly measure success, is the data representative, and are the stakes realistic for the error potential?
Because in healthcare, the averages were already easy; it’s the margins that matter the most. And if AI can’t handle the margins, it’s the wrong tool for the job.