Faces of digital health

View Original

NLP in Healthcare 3/3: ChatGPT, Med-PaLM and the Potential of Natural Language Processing in Healthcare

In January, we invited Alexandre Lebrun - CEO of Nabla and Israel Krush, CEO of Hyro - two companies specializing in AI in healthcare, to join a debate to clarify the state of natural language processing in healthcare.

Not all of it is necessarily good.

If a doctor uses AI to generate a reimbursement request, why wouldn’t the insurance agent use AI to write a compelling rejection? "It's going to be a whole new world. Governmental entities are starting to think about what is allowed, what is not allowed. As a society, we will need to create the right balances or at least how can we deal with fake news, fake entities, fake information that become much easier to generate," commented Israel Krush, CEO of Hyro.

Nabla is a French company that has created an AI-based medical assistant that makes healthcare professionals more efficient. For instance, it automates clinical documentation and patient engagement. 

Hyro - monstly present in the US market -  is the world's first headache-free conversational AI, especially focused on healthcare. It’s used for automation across call centers, mobile apps, websites and SMS include physician search, scheduling, prescription refills, FAQs and more.

Consequence 1: The Field of LLMs Will Evolve Very Rapidly, Thanks to ChatGPT

Alexandre Lebrun anticipates a new wave of entrants in the natural language processing space because ChatGPT is making the creation of a minimum viable product much easier. “Before large language models, if you wanted to test an idea, you could use machine learning models, but you still needed a minimal amount of data to train the first version. Now, if you have an idea of automating something, you can test it without any data with ChatGPT. It'll be wrong sometimes, but it'll be enough for, say, physicians to imagine themselves using your future products. You can find out if it's valuable and if it's worth investigating further.”

Israel Krush compares this potential to the invention of AWS. “Before AWS, you had to have a bunch of people that knew hardware and were able to build servers in specific rooms and so on and so forth, and all of a sudden, everything is in the cloud, and it's in the click of a button. ChatGPT is not the exact equivalent, but somehow similar in terms of how easy it is today to start an NLP company.”

See this content in the original post

Startups Can Move Fast by Combining LLMs With Their Smaller Case-Specific Language Models

It is nearly impossible for startups to build their own large language models. Still, they can build smaller models for fine-tuning and creation of their solutions, explains Alexandre Lebrun. “For us as a startup, it's out of the question to train a large language model. We don't have enough computer power. We are talking about tens of millions of dollars for each cycle of training. No normal startup can do that. We can, however, use a large language model and work on how we prompt it. And on the side, we can train our smaller language models, which are very specific to our task. There are different weapons we can use, but clearly, the game is different now than it was a year ago. I think the startups who are learning the fastest how to use these things will eventually win the new game,” says Alexandre Lebrun.

Here are a few additional questions the two experts answered in the debate. You can watch it on Youtube or listen on iTunes or Spotify.

What is the Most Difficult Thing for Companies Developing LLMs in Healthcare?

Alexandre Lebrun: I think in healthcare, the most difficult thing with machine learning models, in general, is that it is very slow and complicated and expensive to get data and also feedback from users. So, for instance, if I change my model to generate the clinical documentation after a consultation, it's very difficult to know if what we change works better or not because of all the data protection and privacy barriers. The feedback loop is extremely long and expensive because we need to talk to doctors one by one.

Israel Krush: On top of that, one of the big problems conversational AI companies face is preserving context and understanding context, which means being able to respond to context switches. Another thing, less problematic in text, more in voice, is latency and everything related to actually conducting the conversation with potential interruption, meaning that while the assistant is starting to reply to the user if the user interferes in the middle of a response, the model needs to respond appropriately. In voice, when you don't have a visual in front of you, the response has to be in real-time.

How Well Does ChatGPT Fair as a Translator of Medical Jargon to Plain Language Patients Can Understand?

Alexandre Lebrun: LLMs are really good at translations. For example, Google PaLM proved that with just five examples, their model is on par with highly specialized systems for translating English to German. It is so strong without specific training that we can assume that it should be really good at translating medical language into lay patient language because we could see them as two languages.

How Reliable is ChatGPT?

Alexandre Lebrun: ChatGPT is not reliable at all. It makes mistakes, it makes hallucinations, and the danger is that the form is so confident, the language is so perfect that it takes concentration and detailed reading to detect inaccurate or false meanings or the wrong deep reasoning.

Could FDA Approve ChatGPT as a Medical Device?

Alexandre Lebrun: ChatGPT is everything the FDA is having nightmares about. It's not deterministic. We don't understand the how, or why did it gives the output. It's impossible today to prove that ChatGPT will behave or not behave in a certain way in some situations. And it is very hard to prove that it won't suddenly advise a patient to kill himself, for instance. So I think ChatGPT alone will never pass FDA approval or any service approval.

Israel Krush: I don't see ChatGPT in its current form anywhere close to approval. Perhaps some smaller iterations focusing on niche cases that might also be coupled with other technologies, would increase the chances for approval. But currently, as the name says, it’s a general model. I would also claim that it's like asking whether a specific physician would be FDA-approved regarding symptom checking. Ask three physicians about a complex situation, not an easy one, you'll probably get at least two different answers.