Regulatory Hurdles and Ethical Concerns in FDA Oversight of AI/ML Medical Devices

Author(s):

Alana Hippensteele, Lead Editor

Key Takeaways

The FDA struggles with regulating AI/ML devices, often underestimating their prevalence and impact in medical devices.
Continuous glucose monitors, despite AI/ML integration, follow traditional approval pathways, bypassing AI-specific scrutiny.
Ethical concerns arise from AI biases, as studies reveal racial and gender disparities in medical diagnoses.
Many AI/ML devices lack transparency and validation, posing challenges for regulatory oversight and public trust.

The FDA faces a multitude of challenges in regulating artificial intelligence (AI) and machine learning (ML) medical devices.

There are a lot of parallels with how the FDA manages the review of devices and drugs, explained Omar Badawi, PharmD, MPH, FCCM, during a presentation at the 2024 American College of Clinical Pharmacy (ACCP) Annual Meeting. However, for devices, the FDA has not quite yet figured out the best approach, as rapid advancement and changes in this sphere has made this process a moving target, according to Badawi.¹

“The FDA has some challenges [with devices],” said Badawi, chief for the Division of Data Sciences, US Telemedicine and Advanced Technology Research Center, during the ACCP session. “There are Class I devices, and those are typically exempt from what you call the pre-market approval process, but they still need to comply with general controls, labeling, etc.”¹

Class II devices are moderate risk and generally still have to go through the FDA, but it's often for clearance only, explained Badawi. Further, most Class I and Class II devices are exempt from 510(k) requirements, which means the FDA has determined that a 510(k) is not required because the device shows some equivalency to another device already on the market.^1,2

“Then there's the Class III high-risk devices, which generally would have to go through a full pre-market approval. Again, that's sort of the difference between clearance and approval,” Badawi said. “Most devices are FDA cleared, which usually means they went through some very minimal pathway where the FDA said that this device is like something else already on the market.”¹

Currently, the FDA lists all of the artificial intelligence (AI) and machine learning (ML) medical devices the organization has authorized.^1,2 As of an update on August 7, 2024, the FDA states that the organization has authorized 950 AI/ML-enabled medical devices.^1,2

“But something I found that was really interesting…if you look up continuous glucose monitors on this site, there are no continuous glucose monitors listed under the AI/ML devices [the FDA has authorized],” Badawi said. “So, there's a lot of devices that have AI and ML in them that the FDA is not considering as an AI/ML device because it's not their primary mode.”¹

Regulatory Hurdles and Ethical Concerns in FDA Oversight of AI/ML Medical Devices

A continuous glucose monitor provides the results of a blood sugar test. Image Credit: © Andrey Popov - stock.adobe.com

Badawi explained further that continuous glucose monitors are approved through the same pathway that a lab assay for glucose gets approved, which means they do not go through the same review process as AI/ML devices, even if they contain AI/ML models. In turn, this impacts whether the FDA requires manufacturers of continuous glucose monitors to adhere to the same standards and guidelines as have been put in place for AI/ML devices.¹

“For AI/ML devices, the FDA wants to see a lot of post-market safety surveillance and [assessments of] bias, and I worry that the FDA overlooks that in a lot of these devices where AI and ML is not the primary focus of the device, and instead [AI and ML] are kind of just slipped in,” Badawi said. “So, these manufacturers are using this technology, but it is not being held to the same standard as a device that's more obviously designed for AI.”¹

Additionally, there are a lot of wearable technology, such as smart clothing, smart watches, and fitness activity trackers that have AI in them, but the FDA has not been interested in reviewing and authorizing these devices, according to Badawi. He explained that this is likely due to these wearable devices not being involved in medical diagnosis or treatment. In contrast, drug delivery devices do not have AI in them, but are a Class III high risk device and would go through a more thorough pre-market approval process.¹

“But smart glasses, they can vary widely in terms of what they might be doing. Smart glasses can be used to support surgery and have real-time augmented reality, or they may just be used for other simple things,” Badawi said.¹

However, smart glasses can also be used for more problematic endeavors outside of the FDA’s purview that can still have a significant impact on data privacy and security.¹

“A few weeks ago, there was an article about Harvard students who did a test with doxing people using the Meta Ray-Ban glasses,” Badawi said. “The students live streamed with Instagram using their glasses, and then they wrote a computer program that was doing computer vision and face recognition from people they were encountering, and then tying that back with databases to figure out who those strangers were. They were then walking up to those individuals and pretending like they knew them, and saying, ‘Hey, didn't we meet at this place,’ ‘Don't you work on this research,’ or ‘Didn't we go to high school together,’ and they knew the strangers’ addresses and phone numbers.”¹

Badawi explained that in this same vein, there are a significant number of ethical considerations pertaining to AI tools and devices, with some more evident than others. According to Badawi, one challenge currently is that there is little information provided to the public about how AI tools are developed, validated, and monitored.¹

“There was a recent article in Nature Medicine that looked at over 500 devices from that list of FDA-cleared AI/ML devices and found only 28% were prospectively validated, 4% were validated using randomized controlled trials, and nearly half did not have publicly available published clinical validation data,” Badawi said. “So, there’s very little transparency, very low quality of evidence, and I think that's very concerning.”¹

An example of this low quality of evidence can be seen in a study published in JAMA in 2022 that looked at pulse oximeters in patients with COVID-19, explained Badawi. The investigators found that the devices overestimated oxygen saturation in patients with darker skin tones.¹

“So, 55% of patients with COVID-19 who were not recognized to be eligible for oxygen treatment based on their 02 saturation were Black, and they were 39% of the study population,” Badawi said. “So, you had patients who had lower 02 saturation than what it was reading who should have been getting oxygen and were not, and when they ultimately did [receive oxygen], although only some of them ultimately did, it was delayed by a median time of about an hour compared with White patients. This is just showing that the way devices are developed, the way drugs are developed, the way they're validated, whether it's AI or not, the type of population you use really can impact how generalizable the data are and what they’re going to tell you.”¹

In a paper published in The Lancet Digital Health in January 2024, Zack et al looked at the potential of ChatGPT-4 to perpetuate racial and gender biases in health care. To do this, they took case vignettes published in New England Journal of Medicine, and they entered them into ChatGPT-4 and asked for differential diagnoses. The investigators then used the same case vignette information, but changed the race and gender of the patient.^1,3

“They did this for things like shortness of breath or sore throats, and then reported how ChatGPT came up with a differential diagnosis separately for them,” Badawi said. “If you go through this paper, it’s fascinating, because for college students with sore throats, if they changed their race to Black, then the differential diagnosis had HIV and syphilis much higher than if they were White. Basically, there were differences in all kinds of diagnoses.”¹

Furthermore, the investigators found that when White women were noted as having shortness of breath, ChatGPT was more likely to think it was anxiety or a panic attack and to not refer them for a cardiac workup or imaging. According to Badawi, biases that are present in the data end up being perpetuated and even exaggerated with these AI/ML models.¹

“The FDA has at least 950 medical devices listed as AI and ML, but it's probably dramatically under-representing those that integrate AI and ML at various sub-levels of their devices. Many wearables are heavily dependent on AI and not considered devices at all, and so they don't have any regulatory oversight,” Badawi said. “There's very little information on those devices other than marketing that comes out, but these all provide a really massive source of data for training and validating AI and supporting regulatory approvals, which I think is strength, but also presents a lot of risk in that area, and there's a lot of risk of bias in these data, and significant challenges with providing transparency.”¹

REFERENCES

Badawi O.(AI)ming Higher: Artificial Intelligence in Frontline Clinical Practice and Technology in Guideline Development. Presented at: 2024 American College of Clinical Pharmacy Annual Meeting; October 12-15, 2024; Phoenix, Arizona.
Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. FDA. Updated on August 7, 2024. Accessed October 15, 2024. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices
Zack T, Lehman E, Suzgun M, et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit Health. 2024 Jan;6(1):e12-e22. doi:10.1016/S2589-7500(23)00225-X

Regulatory Hurdles and Ethical Concerns in FDA Oversight of AI/ML Medical Devices

Key Takeaways

REFERENCES

Badawi O.(AI)ming Higher: Artificial Intelligence in Frontline Clinical Practice and Technology in Guideline Development. Presented at: 2024 American College of Clinical Pharmacy Annual Meeting; October 12-15, 2024; Phoenix, Arizona.

Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. FDA. Updated on August 7, 2024. Accessed October 15, 2024. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices

Zack T, Lehman E, Suzgun M, et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit Health. 2024 Jan;6(1):e12-e22. doi:10.1016/S2589-7500(23)00225-X