Generative AI: FDA adcomm makes recommendations on postmarket performance of medical devices

Members of the Digital Health Advisory Committee convened for a second day to offer the US Food and Drug Administration (FDA) recommendations on how to think about postmarket performance evaluation of generative artificial intelligence (AI)-enabled medical devices.

Committee Chairperson Amy Bhatt said the second day of the meeting was an opportunity to think about the infrastructure and guardrails for postmarketing performance monitoring of generative AI.

“We’ve done it before through the FDA for drugs, for devices, but generative AI is different,” Bhatt said. “The goal is really for us to create the infrastructure here amongst this group to start saying, here is how we can think about postmarket surveillance for something as challenging as generative AI.”

FDA perspective

Troy Tazbaz, Director of the Digital Health Center of Excellence in the FDA Center for Devices and Radiological Health (CDRH), said the previous day’s discussion identified many of the problems associated with generative AI (RELATED: Generative AI: FDA adcomm discusses performance evaluation, risk management for medical devices, Regulatory Focus 21 November 2024).

The advisory committee should start thinking about solutions, with the goal of having a set of priorities to tackle, he said. From FDA’s perspective, this involves having a list of problems to solve, such as the need for different evaluation frameworks, different guidances, and thinking about regulation for generative AI.

“Throughout history in the United States, public-private partnerships have tackled very challenging and complex issues to actually push innovation forward, and this is one of those opportunities for us to collaborate as an entire ecosystem,” Tazbaz said. “That’s government, that’s academia, that’s the industry to push this innovation forward in a way that is safe, effective, has the guardrails to be applied to the criticality of the nature of health care.”

Jessica Paulsen, Associate Director for Digital Health in the CDRH Office of Product Evaluation and Quality, said FDA has long promoted a total product lifecycle approach for regulating medical devices, and “this has become increasingly relevant and important for medical devices that are incorporating technologies that are intended to iterate or change at a much faster pace than ever before.”

She noted that FDA has been working with international harmonization and regulatory communities to establish good machine learning (ML) practice guiding principles. “These aim really to facilitate the development and assessment of high-quality AI/ML-enabled technologies,” she said (RELATED: FDA officials outline need for oversight of AI in healthcare, biomedicine, Regulatory Focus 17 October 2024).

Although FDA has approved more than 950 AI-enabled devices to date, the agency has not yet approved a generative AI-enabled device, she said. However, the April 2024 approval of AI/ML-enabled software to predict patients at risk for sepsis may offer some clues on where to start with generative AI-enabled products. The software uses a patient’s electronic health record data together with lab tests and clinical assessments to predict risk of developing sepsis, and because of its intended use, there is a need for the software to continue working as intended throughout the lifecycle in the real world.

Paulsen said FDA evaluated the risk of this AI/ML-enabled product, as it does with other products, through the de novo pathway and established special controls to mitigate the product’s risks, which included bias, poor quality data, and missing data. “The new regulation actually included a special control that requires manufacturers to develop and implement a postmarket performance management plan,” she said.

Paulsen explained that FDA considers intended use and technological characteristics in its risk-based regulatory framework. “There are technological characteristics of generative AI that may sometimes introduce new or different risks for a particular generative AI-enabled product, which raises in turn new questions of safety and effectiveness,” she said. “What that means is that a de novo may be necessary to regulate that product, and through that de novo process, the output is that FDA would then establish a new device type, a new regulation, and that would be accompanied by special controls as appropriate for that new device type.”

The task of the advisory committee, she said, is to help the agency figure out what a “good or appropriate monitoring plan” would be for a generative AI-enabled device.

Postmarket performance evaluation

Members of the advisory committee said specific monitoring capabilities for generative AI-enabled medical devices could include internal periodic review of chat transcripts, reporting of errors to the manufacturer, and a central database for erroring reporting that is tiered by level of error as well as the ability to flag during monitoring when significant human oversight is warranted.

The panel generally agreed that monitoring should start using existing data standards but acknowledged that generative AI-enabled devices would likely need other data elements and that common data terms should be standardized across subspecialties where generative AI is being used. Watermarking outputs of generative AI-enabled devices was also discussed to track which data are created by generative AI and which are created by humans.

“We have to examine whether those data standards can support capturing those additional data elements, and there may be a challenge there, but I think we have to start from there and really evaluate what additional data elements are really needed,” said Taxiarchis Botsis, associate professor of oncology and medicine at Johns Hopkins Medicine.

Other considerations are that large-scale data monitoring would necessitate a low-cost accessible mechanism for users to input and receive data. The types of data captured for postmarketing evaluation should include accuracy, safety, and bounded use case of the AI model.

Jessica Jackson, a licensed psychologist specializing in digital mental health, said postmarket evaluation monitoring should also be able to detect data drift, and tell the difference between product failure and implementation failure.

“There’s a lot of work in psychiatry and psychology looking at biomarkers, so the data that we put in now for potentially software as a medical device that could be AI enabled will change as we get more and more into the biomarkers compared to what the data we’re putting in now,” she said. “I think if we are not monitoring what data drift could look like, that will impact how those devices are used.”

Diana Miller, senior director of data science at Medtronic Diabetes, said it will be important not to add a lot of conflicting regulatory requirements, which may cause confusion. “I think we should start by looking at what existing is out there for postmarket monitoring,” she said.

As the metrics change, FDA will need to make sure during postmarket monitoring that the device validity and reliability remain consistent, Jagdish Khubchandani professor of public health at New Mexico State University, said.

The advisory committee members also offered a number of strategies for postmarket performance evaluation and monitoring, especially for regional biases and local variations in generative AI. The strategies included auditing of data at local institutions through synthetic data or quality review, looking at the percentage of misinterpretations by institution, comparing of local data sets to training data sets, long-term measurements of high-level shifts in answers from generative AI models, stochastic sampling, and de-identifying images to be verified by the manufacturer.

A major point of discussion was the use of guardrails to maintain proper use of a generative AI-enabled medical device in the real world. Thomas Radman, of the National Center for Advancing Translational Sciences at the National Institutes of Health, said that often, you do not realize what guardrails you will need until something has already occurred.

Chevon Rariy, physician and digital health executive, said that FDA should perhaps broaden, rather than specify what errors and adverse events need to be defined for a generative AI-enabled medical device with a defined trigger for manufacturers to allow for reporting.

Several panel members also highlighted the potential for generative AI to widen disparities due to the lack of data from diverse populations.

“At the post-monitoring stage, that is one of the opportunities with generative AI, is that the amount of input is infinite,” she said. “We can then take in data from diverse populations, just as long as there is an expectation that the model, the algorithm, will be changed and updated and evaluated for these changes or drifts specific to this diverse data. I think there’s a way that we could incentivize an opportunity to collect real-world data from real diverse patients, again, depending on the use case, and ensure that the postmarketing or the post-monitoring is sufficient enough to take that into consideration and improve upon the initial model.

Meeting materials

Generative AI: FDA adcomm makes recommendations on postmarket performance of medical devices

Related topics

Recommended content