Munjal Shah Is Training Health Care AI To ‘Do No Harm’ With LLM Startup Hippocratic AI

Munjal Shah and Hippocratic AI want to build an evidence-based, nondiagnostic health care LLM with a bedside manner.


It was only a year ago in November that Open AI launched ChatGPT, an artificial intelligence chatbot trained on the internet’s body of knowledge to answer wide-ranging questions and draft original creative work. The capabilities of this new generation of AI, grounded in large language models that learn using neural networks inspired by biological brains, has surprised even experts in the AI community.

With the demonstrable improvements of LLMs, researchers, creatives, and entrepreneurs have been dreaming up new use cases that may have previously seemed like science fiction. Munjal Shah, a serial entrepreneur with a history of working in both health care and AI, is building his company, Hippocratic AI, to leverage the communication and learning abilities of LLMs to address a very real problem: health care worker staffing shortages.

The company’s name is drawn from the physicians’ Hippocratic Oath, the essence of which is to “do no harm.” Its goal is to address the widening gap between the rising demand and the declining numbers of providers to meet that demand but with a focus only on applications that don’t put AI in the position to make high-risk medical judgments.

What does this look like in practice? Shah is clear on this point: LLMs can be used to provide crucial — but nondiagnostic — healthcare services. Hippocratic AI won’t be determining whether a patient has a serious heart condition or what type of cancer treatment is best suited for an individual case. Rather, it will be used to reduce the burnout and time strain that comes with the demand of providing services like chronic care treatment, patient navigation, and dietitian services at a massive scale.

Can AI Have a Bedside Manner?

For Munjal Shah, the key is that LLMs are not only ideally suited to absorb massive amounts of research. They’re also ideally suited to communicate what they’ve learned in a conversational manner that connects with patients. The result is what he called, in a recent appearance at the World Medical Innovation Forum, “bedside manner with a capital B.

“The No. 1 correlation to bedside manner is: Will you let the patient finish their story? However, the average emergency room position cuts off the average patient within 20 seconds. Well, language models have time, they have infinite time actually, and [they] speak every language,” said Shah.

“We can’t get patient engagement because we don’t engage them. Can your chronic care nurse spend 35 minutes chatting with a patient?” he continued. “We now have the time and energy and power to build relationships with [patients]. We don’t even have to give them a sheet of instructions on discharge and say, ‘Remember on day four to change your bandage or you might get an infection.’ Just call them on day four and say change your bandage today. We need to rethink our patient interactions if this technology comes to bear, and this is what I mean by bedside manner with a capital B. We can do things truly differently when we have tremendously more capacity than we have today and tremendously more patience.”

It may seem strange to think of AI as improving bedside manner, a quality we associate with human beings with empathy and compassion. However, it turns out that LLMs trained to project a bedside manner aren’t just decent mimics of human clinicians. They could potentially outpace humans in empathetic communication for healthcare applications.

In a recent study published in The Journal of American Medical Association’s Internal Medicine, researchers found that a panel of physicians rated responses to patient questions generated by ChatGPT as preferable in both quality and empathy. The panel preferred AI-generated responses 79% of the time across nearly 200 exchanges, finding 45% of AI responses to be empathetic or very empathetic, compared to only 5% of responses from physicians.

Both researchers and entrepreneurs like Munjal Shah are cautious about just how far to extrapolate what this sort of result means, but at the very least it suggests a potential use case in patient interaction when the way in which information is communicated is crucial in ensuring proper understanding and adherence to treatments.

The counterintuitive result that emotionless AI systems perform better in communicating empathetic understanding becomes easier to understand given Shah’s point about time and burnout. The emotional life of a human healthcare worker can be extremely stressful, and humans are limited by the amount of energy and time they can realistically expend. Lacking this emotional life but able to mimic its expression, an appropriately trained AI could provide the needed communication without the understandable burnout facing overextended healthcare workers.

How an LLM Learns To ‘Do No Harm’

Of course, even for non-diagnostic applications, the importance of accuracy is paramount. A mistake in an AI’s reminder to change a bandage at a particular time or to book an appointment with a certain kind of specialist, if not necessarily life-threatening, can still be harmful. A fundamental tenet of Hippocratic AI’s mission is that these sorts of services have real positive impacts when improved. By extension, errors in providing these services could lead to negative patient results.

It’s for this reason that the training process of health care LLMs is so crucial, explained Munjal Shah. He emphasized that training on evidence-based research focused on the specific health services that are the target of the LLM can help improve its ability beyond a more generally trained LLM like GPT-4. A healthcare LLM doesn’t need to know about the range of topics of ChatGPT, but it should be better at understanding and communicating the medical information that patients need. To this end, Hippocratic AI avoids the common crawl of the internet content used by more general LLMs, instead favoring training based on peer-reviewed and evidence-based medical content.

“We have to find a lot more evidence-based content to feed these machines. Most of them have very few tokens, which are basically the words that they were pre-trained on, that are truly from health care,” said Shah. “We really do want things like our standards of care to go on there. We really do want all of the really careful, not always evidence-based but certainly experience-based, documents that we follow when we do health care to be on there. We are doing that at Hippocratic AI because we just saw it missing.

“There's a lot of great health care content that we use to teach our physicians and use to teach all of our other medical professionals that we need to get into these models.”

The other crucial component is feedback from human providers. Hippocratic AI is training its LLM on content ranging from healthcare textbooks to up-to-date research, but it’s also subjecting it to analysis by large groups of professionals in the healthcare space, a process known as reinforcement learning with human feedback.

Thus far, the company has tested its LLM on 114 certifications: 106 medical role-based examinations, three standard published benchmarks, and five novel bedside manner benchmarks.

According to data published on its website, Hippocratic AI has outperformed GPT-4 and other LLMs evaluated on the vast majority of these certifications, including all major clinical exams.
Previous Post Next Post