Amazon Web Services generated significant buzz earlier this week when it announced HealthScribe, its own clinical documentation tool that uses generative artificial intelligence to summarize doctor-patient visits.
With the launch, Amazon joined a number of other tech companies including Microsoft in the space, jockeying for a share of the market as providers look for ways to cut documentation burden on physicians.
In an interview on the sidelines of the AWS Summit on Wednesday, AWS Head of Health AI Tehsin Syed told Healthcare Dive what differentiates HealthScribe from other medical notetakers, how the technology is expected to drive client demand and how Amazon is approaching pernicious accuracy, ethics and liability issues around generative AI in healthcare.
Editor’s note: This interview has been edited for clarity and brevity.
HEALTHCARE DIVE: Explain HealthScribe — how does the service work?
TEHSIN SYED: HealthScribe transcribes the conversation between a doctor and a patient. It identifies the speakers and segments the transcript. If you start with some chitchat, it’ll recognize, okay, this is not relevant to the clinical note. It extracts specific medical information — things like medications, tests and references to anatomy — to take those as structured elements. This is where the generative AI capability comes in. That information, and the large language model we’ve trained, is used to generate summary documentation, in the shape of chief complaint, history of present illness, assessment and plan.
So we take a Titan model — that’s an Amazon first-party model — and we do what’s called grounding. We adapt the model to this specific purpose of learning how to generate this kind of documentation.
Finally, we show the end user the transcript and the summary, and cite conclusions in the summary back to the dialogue. So it’s a really natural and easy review process for clinicians.
What differentiates HealthScribe from other clinical documentation tools, like Suki or Microsoft-owned Nuance, that also use generative AI?
SYED: HealthScribe is an API that Amazon is selling to its partners for them to provide an end user experience. That’s a very different business model. We’re saying, you don’t have to have all the machine learning expertise and all of the infrastructure that’s needed to manage a large language model.
AWS is going to do that part of the work, and maintain the security and privacy of the data. We provide this capability to anyone who wants to use it, and we have public pricing based on usage, so it’s clear what’s going on.
What we’re trying to do is say, whoever you are, whatever software builder you are, you get to market much faster with this capability. You can concentrate on the end user experience.
There’s also innovation in the technology itself. Some of the grounding techniques are patent-pending. Underlying HealthScribe itself — at a high level everyone’s using an LLM and whatever — but part of this is the expertise that we bring.
3M, Babylon and ScribeEMR have already said they plan to integrate HealthScribe. How is AWS forecasting client demand moving forward?
SYED: There’s a number of others who are interested. I absolutely anticipate new customers coming to the table who previously didn’t have the ability to do this [work] themselves. We’ve seen a lot of interest from software vendors, because we’re providing an API and you do have to have some expertise to build. When we talk to provider systems, they’re interested, and they usually ask who’s a good partner, and how do we work with them to incorporate this.
Can you share any accuracy rates for HealthScribe?
SYED: The best way to show accuracy and build trust in the system is the method of showing end users what in the summary came from the dialogue. I don’t think showing them an accuracy stat is going to help as much.
Internally, obviously, we use a number of approaches. Some are about factual completeness — is what was in the dialogue reflected in the summary? Did the model come up with something that wasn’t in the dialogue? There’s a couple other metrics we track internally but it’s not shared externally.
Will you ever share accuracy metrics externally?
SYED: I think we’re directly addressing the concern about quality by showing the user every time why a decision was made.
To play devil’s advocate, from the doctor’s perspective they have a visit with a patient and then a summary pops out of their computer. What if there’s an issue with the original voice-to-text software that’s reflected in the transcript after being run through AI? What if doctors review summaries at the end of the day, and don’t remember specifics of a patient visit so don’t catch errors? How is AWS thinking about staying responsible here?
SYED: This is part of why we’re announcing in preview, and we’re working with our partners to understand what’s necessary to do here. I don’t have a definitive answer for you. There’s a lot of complexity to it, right? You have an ambient listening device, but maybe it’s too far away, or the air conditioner’s too loud. It’s not as simple as HealthScribe sharing accuracy numbers, because there could be an issue with the recorded conversation that we’re getting.
I agree that there’s a lot of think about in doing this responsibly.
What we’re saying is here’s what we can do, and as HealthScribe is deployed in the workflow there are other things that our customers have to do to ensure it’s the right level of accuracy. We are very clear that it’s generating the summary but it is not automating the documentation. The idea is, this is a tool, and then the tool is applied in a workflow. And the workflow has a lot of those checks and balances in it.
What if there’s an issue with that workflow that later results in a patient getting improper care, resulting in a malpractice suit? What party is liable — AWS? AWS’ client? The physician? How is AWS approaching the liability issue here, given the use of generative AI in clinical settings is still very new?
SYED: The way we look at that is, we’re providing a capability that has to be incorporated in an end-to-end workflow, and that workflow has to deal with those aspects you’re talking about. It’s the vendor who’s incorporating the capability that has to make sure that workflow is appropriate, and the clinician also has to make sure.
I worked at Cerner a long time before this. The dictation thing has been around for a very long time. It’s the same thing, right? Is it accurate or not? So folks have mechanisms in place to handle the workflow around this. But across the board with AI, there’s lots of new issues.
Is AWS concerned that generative AI might be being implemented too quickly in clinical settings?
SYED: I don’t have an opinion on moving too quickly or not quickly enough. We work with our customers on very specific use cases. With HealthScribe, we’re iterating during this preview period to make sure it is actually fit for its purpose, and it meets criteria that would result in real use in a responsible way.