News
Article
Author(s):
Investigators found that artificial intelligence chatbots did not consistently provide recommendations for cancer treatment that correspond with NCCN guidelines.
When using ChatGPT 3.5, approximately one-third of recommendations for cancer treatment did not align with recommendations from the 2021 National Comprehensive Cancer Network (NCCN), according to a study published in JAMA Oncology.1,2
Investigators for them Brigham and Women’s Hospital, part of the Mass General Brigham health care system, wanted to determine how consistently artificial intelligence (AI) chatbots provide recommendations for cancer treatments that correspond with the NCCN guidelines.1
"Patients should feel empowered to educate themselves about their medical conditions, but they should always discuss with a clinician, and resources on the Internet should not be consulted in isolation," Danielle Bitterman, MD, from the Department of Radiation Oncology and the Artificial Intelligence in Medicine Program of Mass General Brigham, said in a statement. "ChatGPT responses can sound a lot like a human and can be quite convincing. But, when it comes to clinical decision-making, there are so many subtleties for every patient's unique situation. A right answer can be very nuanced, and not necessarily something ChatGPT or another large language model can provide."1
The investigators of the study created 4 prompt variations for 26 different diagnosis descriptions, including cancer types with or without relevant extent of disease modifiers. There were a total of 104 prompts that were input into ChatGPT, which were used against the 2021 NCCN guidelines due to the chatbot’s data cutoff being September 2021.2 Investigators focused on the 3 most common cancer types, including breast, prostate, and lung cancers, and prompted the chatbot to provide treatment plans for these cancers based on the severity of the disease.1
The study authors used 5 scoring criteria to develop assessments between the ChatGPT recommendations and the NCCN recommendations. They said that the output did not have to recommend all possible regimens to be considered aligned to the NCCN guidelines, but it was considered unaligned if the treatment was only partially correct.1,2 The results were assessed by 3 of 4 board-certified oncologists with a majority rule needed for the final score. If there was a disagreement, then the fourth oncologist made the final judgement.2 The data were analyzed between March 2 and March 14, 2023.2
The outputs of the 104 prompts were scored on 5 criteria, which resulted in a total of 520 scores. All 3 oncologists agreed on 61.9% of scores; however, the disagreements were more frequent when the output was unclear, for example, when the results did not specify which treatments to combine.2
Approximately 34.6% of the 26 diagnosis descriptions yielded the same scores for each of the 5 scoring criteria across all 4 prompts. Additionally, the chatbot provided at least 1 NCCN recommendation for 98% of the prompts; however, 34.3% of the outputs also recommended 1 or more treatment that did not align with NCCN treatment.2
Furthermore, 12.5% of responses were not part of any recommended treatment, which included localized treatment of advanced disease, targeted therapy, or immunology.2
The study authors said that the misinformation provided by chatbots could set incorrect expectations about treatment for patients and potentially impact the relationship between physicians and patients.1
The investigators are planning to explore how patients and physicians can distinguish between medical advice written by a physician compared to AI. They also plan to prompt ChatGPT with more detailed clinical cases to evaluate the AI’s clinical knowledge futher.1
References