Dive Brief:
- A large language model could help assess patient acuity in emergency departments, according to a study published in JAMA Network Open.
- When given pairs of patient histories extracted from emergency department documentation, the LLM correctly picked the higher acuity patient in 89% of scenarios — comparable to a subset that used a physician reviewer.
- The research demonstrates LLMs’ potential to streamline triage in emergency departments, especially since the general-purpose models used in the study weren’t fine-tuned for medicine, the authors wrote.
Dive Insight:
LLMs, a type of generative artificial intelligence focused on creating new text, received plenty of media attention last year in the industry as technology giants rolled out their own tools, many of which focused on lessening administrative work for providers.
A number of products aim to assist clinicians with clinical note taking and documentation, while others could be used to draft responses to patient questions. One study published early this year used large language models to pull social determinants of health data from clinician notes, like housing or employment status.
Other research has demonstrated LLMs could achieve a passing score on medical licensing exams or solve some diagnostic challenges, the JAMA study noted. But that research largely uses simulated scenarios, not taken from electronic health records, which limits their applicability to clinical practice.
In the latest study, researchers pulled adult emergency department visits over more than a decade at the University of California, San Francisco with an Emergency Severity Index acuity score, which classifies cases as immediate, emergent, urgent, less urgent or nonurgent.
They took a sample of 10,000 pairs of visits with different acuity levels, and asked the LLM to pick which patient’s condition was more serious based on clinical history.
The model correctly identified the correct patient for 8,940 of the pairs. A comparator LLM, a predecessor to the first model, performed slightly worse at 84% accuracy.
The model’s performance also improved when shown pairs with more extreme differences in their condition. Its accuracy was up to 100% when choosing between cases with immediate and nonurgent acuity levels.
“Overall, the LLM’s only significant performance weakness was in distinguishing patients assigned a less urgent vs nonurgent acuity, which is unlikely to have significant clinical consequences,” the study’s authors wrote. “In addition, this performance was achieved despite providing only patients’ clinical history to the LLM, omitting the vital signs and other physical examination findings that may be available to triage clinicians on initial evaluation.”