Generative AI: FDA adcomm discusses performance evaluation, risk management for medical devices

Editor's note: This article was updated on 22 November 2024 to correct a word in a quote by Michelle Tarver.

The inaugural meeting of its digital health advisory committee gave the US Food and Drug Administration (FDA) a lot to think about when considering premarket performance evaluation and risk management of generative artificial intelligence (AI) in medical devices.

Over the course of the first day of the meeting, where committee members and stakeholders discussed the benefits and drawbacks of generative AI, they made it clear that AI is a wide-ranging topic with many moving pieces, but at times disagreed over where and whether FDA should weigh in on the technology (RELATED: FDA officials outline need for oversight of AI in healthcare, biomedicine, Regulatory Focus 17 October 2024; Experts advise against prescriptive FDA policies for regulating AI, Regulatory Focus 10 June 2024).

FDA perspective

In a welcome message, FDA Commissioner Robert Califf said the committee was established because of the “great potential for digital health technologies to help address critical health care issues that we face today,” which should be developed, deployed and used responsibly. He noted that the decline in life expectancy in the US “is largely driven by disparities that are a function of race, ethnicity, education, and wealth, as well as where someone lives.”

“Artificial intelligence is changing how we think about health and health care, and it’s one of the most exciting and promising areas of science because it’s built to transcend boundaries,” Califf said. “While we often describe the promise of AI in terms of helping us facilitate speedier delivery of new treatments, I believe the larger role for AI is to improve the efficient and coordinated delivery of care to patients in all facets of health care settings, including the operating room, the clinic, and even the home.”

FDA must embrace new technologies like AI “not only to keep pace with the industries we regulate, but also to use regulatory channels and oversight to improve the chance that they will be applied effectively, consistently and fairly,” he said.

Michelle Tarver, director of the Center for Devices and Radiological Health (CDRH), said AI-enabled medical devices “offer a promise” for communities that have seen a contraction of providers and health facilities, giving them options for care.

“When we’re thinking about the burden of disease and who is most bearing that disease, that’s small-town America. Those are rural communities. They are people who are of older age, racial and ethnic minorities, those with fewer resources and who live the furthest from health care facilities, so technologies have to come meet them where they are,” she said.

Troy Tazbaz, office director of the Digital Health Center of Excellence within CDRH, said FDA wants to take a total product lifecycle approach to generative AI in medical devices. “[T]he lifecycle perspective allows us to consider safety and performance from initial design and development all the way through the post-market monitoring,” he said.

Aldo Badano, director of the Division of Imaging, Diagnostics, and Software Reliability in the Office of Science and Engineering Laboratories within CDRH, presented some of the regulatory science challenges for generative AI-enabled medical devices, which include defining a product’s intended use because of open-ended inputs and outputs, foundation models not originating from device manufacturers, oversight over generative AI that is adaptive in nature, hallucinations, transparency, adequacy and diversity of data sets for testing, and real-world performance of generative AI.

“Many of these challenges might exist in other technologies, but are exacerbated in AI applications, particularly in gen AI,” he said.

Premarket performance evaluation

Much of the discussion on the first day was centered around premarket performance evaluation, and the advisory committee members made a number of recommendations to FDA on what kinds of information the agency should ask for when considering the evaluation of safety and effectiveness of generative AI devices.

When answering a question posed by FDA about what information should be included in a premarket submission device description or characterization of a generative AI-enabled device, the committee members said concepts that were important to them included the intended use case and intended population of the device, the setting where it is used, the potential tracking of data using standardized pro forma data sheets, uncertainty estimations for the data and the model, descriptions of the model itself and how it was developed, whether the model creates bounded or unbounded results, and cybersecurity and privacy standards.

Members of the advisory committee said FDA should consider a benchmark for how a generative AI model matches up against other existing models that have not used generative AI, and whether the manufacturer during the premarketing evaluation has a postmarketing surveillance plan for the device. Committee members recommended FDA examine how different population and settings impact AI performance, how different temperature parameters can affect the response, and how narrow the indication is for the AI’s creation. They also noted that hallucination rates, sensitivity and specificity for device-specific tasks, and what level of human interaction is needed are important considerations.

Concerning usability, the advisory committee said users of generative AI should be made aware that output and reasoning may not be reproduced, and it is important to explain and be transparent with users throughout the process. Generative AI may require training on the part of clinicians and parents who use the technology, and studies may be needed to see how non-clinical non-trained users interact with generative AI products. FDA also asked the panel what prospective performance metrics would work best with generative AI products, and the committee members noted that a metric is needed that assesses models and determines whether they remain accurate without drift.

Risk management

The advisory committee members recommended a risk-based approach to categorizing generative AI-enabled devices, and noted a centralized database similar to FDA’s adverse events database where users could report errors is essential for generative AI and should particularly allow for the reporting of bias detection and hallucination. They also stated that postmarketing surveillance is needed when considering risk mitigation for off-label use, and that human-in-the-loop feedback is necessary for transparency.

Committee members also raised a number of concerns about generative AI and said that patient harm is a central concern with AI products. Another concern is that clinicians may be busy and not engage in training for generative AI products. Peter Elkin, professor and chair of the department of biomedical informatics at the University at Buffalo, said this is a new way of presenting information that “seems more humanistic, and because it seems more humanistic, it gives the impression to the user of real intelligence, whether there is real intelligence or not.”

While stakeholders are awaiting national guidance on AI, “local governance is probably the most important thing rather than national governance,” Elkin said, “because value sets are often local.”

At times, panel members went back and forth on whether accountability for generative AI medical devices was under FDA’s purview, arguing that the responsibility for these products may be with the manufacturers or with the health systems that deploy them. Some panel members questioned whether FDA should get involved at all, as the agency does not regulate the practice of medicine.

However, other speakers noted the cat may already be out of the bag. Apurv Soni, assistant professor of medicine in the division of health systems science, clinical informatics section at UMass Chan Medical School, and other speakers noted that some clinicians are already using off-the-shelf AI products.

“I’m all for having nutrition labels for these models to know what’s inside it and what we should expect from it. But if those ingredients are coming from repurposing of foundation models, we also need to be practical about what is the onerous task that’s unachievable versus what’s for the safety, recognizing that a lot of these tools are already starting to be used,” Soni said.

Thomas Radman, of the National Center for Advancing Translational Sciences at the National Institutes of Health, said that generative AI products have different risks than AI products that came before them, such as the ability to be used off label.

“If we agree that the risks of gen[erative] AI is higher than pre-generative AI products, now we’re talking about FDA regulation where there’s this substantial equivalence pathway,” he said. “I think we maybe as a panel should determine, [does] the benefit of increased accuracy have to be better than a pre-generative AI product?”

Elkin said that AI models “can do a lot more than they're being tested to do.”

“I think that this body, at some point, should discuss whether they would ban off-label use of the Ais, or, as an alternative, some kind of postmarketing surveillance off-label uses that would continue to monitor closely what they’re doing, because there is the chance of doing harm in situations like that,” Elkin said.

Thomas Maddox, professor of medicine at the Washington University School of Medicine, pushed back on that idea, noting FDA typically stops at the point of regulating drugs and devices.

“If they approve it for a use case, they ensure the device or drug is safe and effective, and then they give it to our clinician community to then use it as they see fit,” Maddox said. “Recognizing that some things are off label, some things are on label. I’m trying to think of a similar framework here, and I think what’s important maybe is to stop at dictating use. I think that’s sort of a fool’s errand, anyway, just given how heterogeneous this will be. But I actually think it’s actually not appropriate for the FDA to be in that space, just given how the statutes are set up.”

Meeting materials

Generative AI: FDA adcomm discusses performance evaluation, risk management for medical devices

Related topics

Recommended content