“The chatbot will see you now.” When the nurse tells you that as you sit in the doctor’s waiting room, perhaps it’s time for you to run.
Artificial intelligence is getting lots of attention lately in many walks of life. It may be useful in writing term papers and preparing speeches. It may be helpful in designing new cars and other forms of mechanical equipment. But I’m not sure it’s ready to replace your doctor.
A recent study was reported at the American Academy of Orthopedic Surgeons meeting in San Francisco by Branden Rafael Sosa and colleagues at the Weill Cornell Medical School. They analyzed the validity and accuracy of the information for orthopedic procedures that large language model chatbots provided to patients. They also assess how the chatbots explained basic orthopedic concepts, integrated clinical information into decision-making and addressed patient queries.
They concluded that large language model chatbots may provide misinformation and inaccurate musculoskeletal health information to patients.
In the study, Sosa and colleagues prompted OpenAI ChatGPT 4.0, Google Bard and BingAI chatbots to each answer 45 orthopedic-related questions in the categories of “bone physiology,” “referring physician” and “patient query.” Two independent, masked reviewers scored responses on a scale of zero to four, assessing accuracy, completeness, and useability.
Researchers analyzed the responses for strengths and limitations within categories and among the chatbots. They found that when prompted with orthopedic questions, OpenAI ChatGPT, Google Bard and BingAI provided correct answers that covered the most critical points in 77%, 33% and 17% of queries, respectively. When providing clinical management suggestions, all chatbots displayed significant limitations by deviating from the standard of care and omitting critical steps in workup, such as ordering antibiotics before cultures or neglecting to include key studies in diagnostic workup.
“I think clinical context is one of the things that they struggled with most and particularly when coming up with an assessment or a plan for a patient who presents with infection. Oftentimes, they forgot to get cultures before initiating antibiotics, forgetting to order radiographs and the workup of a patient with hip osteoarthritis, or to the point of seminal papers that highlight changes in the way that treatment is delivered,” Sosa told Healio/Orthopedics Today.
“I would say that in certain applications, AI chatbots, in particular ChatGPT, performed pretty well. It was able to give clinically useful information in the majority of cases, broadly speaking. But that generally good performance carries with it some significant risks as well,” said Matthew B. Greenblatt, M.D., PhD, an associate professor of pathology and laboratory medicine, Weill Cornell Medicine, and co-author of the study.
Greenblatt said results of this study highlight the importance of oversight by subject matter experts in using large language model chatbots in clinical contexts. “It could potentially be a timesaver or helpful in summarizing information. When all of that is overseen and checked by someone who is truly an expert, one can be well aware of where the chatbot led astray,” Greenblatt said.
Personally, I don’t want to go to a doctor who at best is only correct 77% of the time. I believe there is still a place for good old human physicians and human intelligence in this world of increasing technology and artificial intelligence.